September 5, 2016
A More Natural Text to Speech
Comments
(6)
September 5, 2016
A More Natural Text to Speech
I've been an eLearning designer and developer since 2005. In 2015 I started my own eLearning design company. I began creating Adobe Captivate video tutorials to help promote my business through my YouTube channel at https://youtube.com/captivateteacher. My intention with my YouTube videos was to attract attention from organizations looking for a skilled Captivate developer. This strategy proved successful as I've worked with clients worldwide, helping them build highly engaging eLearning solutions. In addition, my YouTube channel presented another benefit of attracting aspiring Captivate developers to seek me out as a teacher. I now offer online and onsite training on Adobe Captivate, teaching users the skills to build engaging and interactive learning.
Legend 639 posts
Followers: 981 people
(6)

One of my previous employers decided early on not to use voice talent for the narration in eLearning. I understand their decision. While a human voice has great advantage, text to speech is less costly and easier to edit and update in the future. The challenge for me was that they still expected a high quality narration for their eLearning. Anyone who has worked with the speech agents in Adobe Captivate soon realizes that simply copying and pasting the script into the text to speech window and clicking on Generate Audio doesn’t always produce desirable results. In this article I thought I would share some of the things I do when working with text to speech.

PAUSES

One of the differences between a human voice and text to speech is that as humans we breathe. A computer generated voice doesn’t have to breathe and can continue to read the narration without interruption. Even with the appropriate number of commas this still sounds artificial to us when we hear it. I add pauses in several spots in my script.

The first place I put a pause is at the beginning of a slide. Imagine what a classroom facilitator does when they advance to a new slide. They take half a second or so to make sure they are on the correct slide and then they begin the lesson. As humans we expect this, and therefore I put a half second pause at the beginning of each slide that includes narration. You can do this by using the following command written right into your narration text.

<vtml_pause time=”500″/>

You see, Neospeech, the company that provides Adobe with the speech agents for Adobe Captivate has written a markup language called VTML for working with text to speech narration. VTML stands for Voice Text Markup Language and you can use it to add commands like the one above to alter how the text to speech sounds. I also generally put this code at the end of each block of narration as well. You can play with the number in quotation marks to have the pause be longer or shorter. Incidentally this number is in milliseconds. Alternatively you can add additional pauses throughout the narration by inserting extra commas. Read your narration out loud and see where you would naturally pause to take a breathe, or pause for emphasis, and so on, and insert pauses in those locations.

ME, MYSELF, AND I

When you write the storyboard and script for your eLearning course, don’t think of the narrator as a person. When my stakeholders or audience listen to the voice, they are obviously not fooled by it, but there is a small amount of suspension of disbelief that occurs when the voice speaks in the third person. The speech agent isn’t an employee of your company, nor are they a voice actor hired for the job. I always find that when the speech agent refers to itself in the first person by using words like “we” or “us”, your audience is instantly reminded they are listening to a computer speak.

PRONUNCIATION

There are certain words in the English language that are pronounced slightly differently depending on how they are being used. The example that I think of is the word “record.” We keep “records” in a database or a file cabinet, but I would use my computer to “record” my voice. If you say those two words in those different contexts they are pronounced differently. One is a noun and one is a verb. But how does Adobe Captivate know the difference?

So here is what you do. There is a syntax for changing the parts of speech. Here is an example of how you could get the speech agent to pronounce the two different versions of the word record:

<vtml_partofsp part=”verb”>record</vtml_partofsp> or

<vtml_partofsp part=”noun”>record</vtml_partofsp>

Like before you just type all of this into the text to speech window and generate it along with all your other narration and you’re good to go.

DOCUMENTATION

In addition to the examples I have given you, you can find more from the documentation from Neospeech for their VoiceText product by downloading the user’s guide here:

VTML Tag Set User’s Guide

 

 

6 Comments
2020-01-08 19:14:23
2020-01-08 19:14:23

As someone who strongly dislikes TTS, thank you for the tips to make it more natural. You are appreciated!

Like
()
(1)
>
Mahogani
's comment
2020-01-13 09:54:17
2020-01-13 09:54:17
>
Mahogani
's comment

Thank you and you’re welcome.

Like
()
2017-12-18 22:48:48
2017-12-18 22:48:48

When I need natural sounding voices, I just use my Neospeech software for my TTS needs.

Like
()
(1)
>
Anonymous
's comment
2018-01-04 19:13:51
2018-01-04 19:13:51
>
Anonymous
's comment

Yes, the Neospeech voices are not bad. I used the VTML coding in places to add pauses and emphasis. It’s pretty easy! Thanks

Like
(1)
2017-01-06 22:17:18
2017-01-06 22:17:18

Thanks for this post — I am really struggling to make these voices sound natural.

Like
()
(1)
>
aejefferson
's comment
2017-01-07 18:49:58
2017-01-07 18:49:58
>
aejefferson
's comment

My recommendation to my clients is to only use text to speech if you absolutely must. Human voices are always preferred. Watch this video to get further details: https://www.youtube.com/watch?v=zORtAKQllTY

Like
(1)
Add Comment