A More Natural Text to Speech

September 5, 2016

Paul Wilson Follow

One of my previous employers decided early on not to use voice talent for the narration in eLearning. I understand their decision. While a human voice has great advantage, text to speech is less costly and easier to edit and update in the future. The challenge for me was that they still expected a high quality narration for their eLearning. Anyone who has worked with the speech agents in Adobe Captivate soon realizes that simply copying and pasting the script into the text to speech window and clicking on Generate Audio doesn’t always produce desirable results. In this article I thought I would share some of the things I do when working with text to speech.

PAUSES

One of the differences between a human voice and text to speech is that as humans we breathe. A computer generated voice doesn’t have to breathe and can continue to read the narration without interruption. Even with the appropriate number of commas this still sounds artificial to us when we hear it. I add pauses in several spots in my script.

The first place I put a pause is at the beginning of a slide. Imagine what a classroom facilitator does when they advance to a new slide. They take half a second or so to make sure they are on the correct slide and then they begin the lesson. As humans we expect this, and therefore I put a half second pause at the beginning of each slide that includes narration. You can do this by using the following command written right into your narration text.

<vtml_pause time=”500″/>

You see, Neospeech, the company that provides Adobe with the speech agents for Adobe Captivate has written a markup language called VTML for working with text to speech narration. VTML stands for Voice Text Markup Language and you can use it to add commands like the one above to alter how the text to speech sounds. I also generally put this code at the end of each block of narration as well. You can play with the number in quotation marks to have the pause be longer or shorter. Incidentally this number is in milliseconds. Alternatively you can add additional pauses throughout the narration by inserting extra commas. Read your narration out loud and see where you would naturally pause to take a breathe, or pause for emphasis, and so on, and insert pauses in those locations.

ME, MYSELF, AND I

When you write the storyboard and script for your eLearning course, don’t think of the narrator as a person. When my stakeholders or audience listen to the voice, they are obviously not fooled by it, but there is a small amount of suspension of disbelief that occurs when the voice speaks in the third person. The speech agent isn’t an employee of your company, nor are they a voice actor hired for the job. I always find that when the speech agent refers to itself in the first person by using words like “we” or “us”, your audience is instantly reminded they are listening to a computer speak.

PRONUNCIATION

There are certain words in the English language that are pronounced slightly differently depending on how they are being used. The example that I think of is the word “record.” We keep “records” in a database or a file cabinet, but I would use my computer to “record” my voice. If you say those two words in those different contexts they are pronounced differently. One is a noun and one is a verb. But how does Adobe Captivate know the difference?

So here is what you do. There is a syntax for changing the parts of speech. Here is an example of how you could get the speech agent to pronounce the two different versions of the word record:

<vtml_partofsp part=”verb”>record</vtml_partofsp> or

<vtml_partofsp part=”noun”>record</vtml_partofsp>

Like before you just type all of this into the text to speech window and generate it along with all your other narration and you’re good to go.

DOCUMENTATION

In addition to the examples I have given you, you can find more from the documentation from Neospeech for their VoiceText product by downloading the user’s guide here:

VTML Tag Set User’s Guide

PAUSES

<vtml_pause time=”500″/>

ME, MYSELF, AND I

PRONUNCIATION

So here is what you do. There is a syntax for changing the parts of speech. Here is an example of how you could get the speech agent to pronounce the two different versions of the word record:

<vtml_partofsp part=”verb”>record</vtml_partofsp> or

<vtml_partofsp part=”noun”>record</vtml_partofsp>

Like before you just type all of this into the text to speech window and generate it along with all your other narration and you’re good to go.

DOCUMENTATION

In addition to the examples I have given you, you can find more from the documentation from Neospeech for their VoiceText product by downloading the user’s guide here:

VTML Tag Set User’s Guide

Accessibility

media

narration

neospeech

(8)

Comments

(6)

Paul Wilson Follow

I’ve been an eLearning designer and developer since 2005, specializing in the creation of interactive and engaging learning experiences. In 2015, I launched my own eLearning design business and began producing Adobe Captivate video tutorials on my YouTube channel, CaptivateTeacher, to support clients and fellow developers. This content not only helped me grow my business globally—it also introduced me to a wider community of aspiring Captivate users who now turn to me for training and mentorship. Today, I provide both online and onsite Adobe Captivate training, empowering others to build effective eLearning with confidence. I’m proud to be part of the Adobe eLearning Community, sharing knowledge, collaborating with other developers, and contributing to the growth of this platform.

You must be logged in to post a comment.

Mahogani

2020-01-08 19:14:23

Mahogani

2020-01-08 19:14:23

As someone who strongly dislikes TTS, thank you for the tips to make it more natural. You are appreciated!

()

(1)

Paul Wilson

2020-01-13 09:54:17

Paul Wilson

2020-01-13 09:54:17

Mahogani

's comment

Thank you and you’re welcome.

()

Anonymous

2017-12-18 22:48:48

Anonymous

2017-12-18 22:48:48

When I need natural sounding voices, I just use my Neospeech software for my TTS needs.

()

(1)

Consultme

2018-01-04 19:13:51

Consultme

2018-01-04 19:13:51

Anonymous

's comment

Yes, the Neospeech voices are not bad. I used the VTML coding in places to add pauses and emphasis. It’s pretty easy! Thanks

(1)

aejefferson

2017-01-06 22:17:18

aejefferson

2017-01-06 22:17:18

Thanks for this post — I am really struggling to make these voices sound natural.

()

(1)

Paul Wilson

2017-01-07 18:49:58

Paul Wilson

2017-01-07 18:49:58

aejefferson

's comment

My recommendation to my clients is to only use text to speech if you absolutely must. Human voices are always preferred. Watch this video to get further details: https://www.youtube.com/watch?v=zORtAKQllTY

(1)