Transforming the narration of text using Loquendo Tags.

August 6, 2010

Ashwin Bharghav Follow

(1)

TABLE OF CONTENT

Global Controls

Prosodic Control

Sound Effects

Text Interpretation

Special Events

Audio Mixer features

Through my previous blog series on text-to-speech, you have learnt about:

In this blog post, let us learn how to tweak the speech generated using Loquendo voices.

Loquendo allows you to control how the text will be read out by the voices, such as the language in which the text will be read, the voice to be used, speaking rate, loudness, the interpretation of numbers, the stress prominence of a word and its pronunciation.

You can specify these aspects by:Setting parameters in the system configuration files; or by
Inserting commands directly into the input text (in slide notes)

We will explore the second approach, which is to tweak the speech by inserting commands directly into slide notes. The commands are grouped as follows:

Global controls
Prosodic controls
Sound effects
Text interpretation
Special events
Audio mixer features

Click the name of the group to view more information.

I have used these controls in an Adobe Captivate project (.cptx and SWF file) and attached it here for your reference. To see the usage of tags, open the .cptx file, and select Audio > Speech Management. To hear how the voices modify the actual text, open the SWF file, plug-in your headphones and listen to the narration.

Do try out these tags and let us know your experience.

Stay tuned for my next blog post on “How to change the pronunciations of words used to generate speech”

The family of Global Controls includes commands that change the value of some of the Reading Parameters of Loquendo TTS , which affect the quality of the output speech:

Voice and Language
Prosodic aspects of the voice (speaking rate, volume, voice pitch and timbre)
Sound effects
Text interpretation.

Voice Control: forces a voice switch between voices.

voice=<mnemonic>

Example:

voice=Simon hello. voice=Stefan hi.

(“hello” is read by the voice “Simon”, then “hi” is read by the voice “Stefan”).

Language Control: forces a language switch between languages. The mnemonic must be the name of an installed language.

language=<mnemonic>

Example:

language=English Paris language=French Paris.

(the first occurrence of the word Paris will be pronounced: p”}rIs , and the second: paR”i).

The following commands allow the quality of the output voice to be controlled by modifying its rhythm, intonation, volume and timbre. The output speech is modified from the word following the command, up until the end of the prompt.

Speed Control: Allows the speaking rate to be modified, expressed in an abstract scale 0-100.

speed=<num>

Example:

speed=60 (Scale 0-100)

speed=60 This text is read at a faster speed.

Pitch Control: allows the fundamental frequency (tone or pitch) to be modified, expressed in an abstract scale 0-100

pitch=<num>

Example:

pitch=60 (Scale 0-100)

pitch=60 This text is read at a 60Hz frequency rating.

Volume Control: allows the volume (loudness) to be modified, expressed in an abstract scale 0-100 or in decibels (dB).

volume=<num>

Example:

volume=60 (Scale 0-100)

volume=60 This text will be read at a 60 decibel rating.

Timbre Control: allows the voice timbre to be modified by a shift in frequency, expressed in an abstract scale 0-100.

timbre =<num>

Example:

timbre=60 (Scale 0-100)

timbre=60 This text is read at a timbre value of 60.

The following commands create certain sound effects by acting on acoustic parameters of the speech output signal. For example, Reverb gives the impression of a large hall or a church, while delay (or echo) repeats the audio signal at every diminishing volume.

Reverb Effect: Creates reverbations with an intensity of <gain> and a delay of <delay> milliseconds

reverb=<gain>,<delay>

Example:

reverb=80,500 (0<gain<100, 0<delay<2000)

reverb=0,0 (removes the reverb effect)

Robot Control: Applies the ‘robotization’ effect to the voice currently active in the system. There are 9 robots available: Robby, Gort, Twiki, Torg, Tobor, Ash, Hector, Max and Lynjx.

robot=<robotName>

Example:

robot=Max

robot (removes the robotization effect)

Whisper effect: Applies the whisper effect to the voice currently active in the system. The possible values are: on, off.

Example:

whisper=on

(the effect is active)

whisper=off

(the effect is not active)

The following commands control certain general aspects of text interpretation. Here, I describe how to adjust them synchronously with the text by means of a User Control embedded in the text.

The general syntax for these User Controls is the following:

@<key>=<value>

where <key> is the name of the Reading Parameter to be changed and <value> is its chosen value.

Example:

@TextEncoding=utf8 ( will interpret all the text as UTF8 format text. Characters like Ä/ä, Ö/ö, and Ü/ü will be read properly)

Pause insertion
Inserts a pause (silence) in the absence of punctuation marks. The effect is not applied if punctuations already present in the text.

pause	inserts a medium-length pause (120 ms), preceded by a ‘comma intonation’
pause,	inserts a medium-length pause (120 ms), preceded by a ‘comma intonation’
pause.	inserts a long pause (500ms), preceded by a ‘conclusive intonation’
pause?	inserts a long pause (500ms), preceded by a ‘question intonation’

Example:

Here pause is a comma pause. (inserts a 120ms ‘comma intonation’ pause between “Here” and “is”)

Here pause, is a comma pause. (leaves unaltered the ‘comma pause’ between “Here” and “is”)

Here pause. is a conclusive pause. (inserts a 500ms ‘conclusion’ pause between “Here” and “is”)

Here pause? is a question pause. (inserts a 500ms ‘question’ pause between “Here” and “is”)

Pause duration
When followed by a punctuation mark, forces the duration of the corresponding pause to <num> milliseconds. In the absence of punctuation, inserts a ‘comma intonation’ pause of <num> milliseconds.

pause=<num> sets to <num> milliseconds the duration of the following pause

Example:

This pause=10 , is a comma pause. (reduces to 10ms the following ‘comma intonation’ pause)

This pause=10, is a comma pause. (reduces to 10ms the following ‘comma intonation’ pause)

This pause=10 is a comma pause. (inserts a 10ms ‘comma intonation’ pause)

No final pause pause=0. (reduces the final silence to a minimum duration, while keeping the conclusive intonation)

These commands trigger particular actions at the moment when the synthesis output reaches the exact point in the text where they have been inserted.

Play sound
Plays one of the paralinguistic sounds recorded for the voice in use. For most voices, the following sounds at least are available: Cough, Cry, Eh, Kiss, Laugh, Mmm, Oh, Sniff, Swallow, Throat, Whistle, and Yawn.

item=<sound name>

Example:

item=Laugh

The audio mixer allows synthetic speech to be mixed with sound files.

". wav" files are only supported and played.

Example 1:
This is audio(play=<audioPath>/music.wav) a test.

Result:
“This is” will be read, then the music.wav will be played, then “a test” will be read.

Example 2:
This is audio(mix=music.wav) a test.

Result:
Speech and music.wav will be mixed and heard together

Example 3:
This is audio(mix=music.wav) audio(volume=50) a test.

Result:
The volume of the audio file is set to 50% (from the start).

Attachments :

Link1 : CPTX File

Link2 : Published SWF file containing all the tags mentioned above

Through my previous blog series on text-to-speech, you have learnt about:

In this blog post, let us learn how to tweak the speech generated using Loquendo voices.

You can specify these aspects by:Setting parameters in the system configuration files; or by
Inserting commands directly into the input text (in slide notes)

We will explore the second approach, which is to tweak the speech by inserting commands directly into slide notes. The commands are grouped as follows:

Global controls
Prosodic controls
Sound effects
Text interpretation
Special events
Audio mixer features

Click the name of the group to view more information.

Do try out these tags and let us know your experience.

Stay tuned for my next blog post on “How to change the pronunciations of words used to generate speech”

Global Controls

The family of Global Controls includes commands that change the value of some of the Reading Parameters of Loquendo TTS , which affect the quality of the output speech:

Voice and Language
Prosodic aspects of the voice (speaking rate, volume, voice pitch and timbre)
Sound effects
Text interpretation.

Voice Control: forces a voice switch between voices.

voice=<mnemonic>

Example:

voice=Simon hello. voice=Stefan hi.

(“hello” is read by the voice “Simon”, then “hi” is read by the voice “Stefan”).

Language Control: forces a language switch between languages. The mnemonic must be the name of an installed language.

language=<mnemonic>

Example:

language=English Paris language=French Paris.

(the first occurrence of the word Paris will be pronounced: p”}rIs , and the second: paR”i).

Prosodic Control

Speed Control: Allows the speaking rate to be modified, expressed in an abstract scale 0-100.

speed=<num>

Example:

speed=60 (Scale 0-100)

speed=60 This text is read at a faster speed.

Pitch Control: allows the fundamental frequency (tone or pitch) to be modified, expressed in an abstract scale 0-100

pitch=<num>

Example:

pitch=60 (Scale 0-100)

pitch=60 This text is read at a 60Hz frequency rating.

Volume Control: allows the volume (loudness) to be modified, expressed in an abstract scale 0-100 or in decibels (dB).

volume=<num>

Example:

volume=60 (Scale 0-100)

volume=60 This text will be read at a 60 decibel rating.

Timbre Control: allows the voice timbre to be modified by a shift in frequency, expressed in an abstract scale 0-100.

timbre =<num>

Example:

timbre=60 (Scale 0-100)

timbre=60 This text is read at a timbre value of 60.

Sound Effects

Reverb Effect: Creates reverbations with an intensity of <gain> and a delay of <delay> milliseconds

reverb=<gain>,<delay>

Example:

reverb=80,500 (0<gain<100, 0<delay<2000)

reverb=0,0 (removes the reverb effect)

Example:

robot=Max

robot (removes the robotization effect)

Whisper effect: Applies the whisper effect to the voice currently active in the system. The possible values are: on, off.

Example:

whisper=on

(the effect is active)

whisper=off

(the effect is not active)

Text Interpretation

The following commands control certain general aspects of text interpretation. Here, I describe how to adjust them synchronously with the text by means of a User Control embedded in the text.

The general syntax for these User Controls is the following:

@<key>=<value>

where <key> is the name of the Reading Parameter to be changed and <value> is its chosen value.

Example:

@TextEncoding=utf8 ( will interpret all the text as UTF8 format text. Characters like Ä/ä, Ö/ö, and Ü/ü will be read properly)

Pause insertion
Inserts a pause (silence) in the absence of punctuation marks. The effect is not applied if punctuations already present in the text.

pause	inserts a medium-length pause (120 ms), preceded by a ‘comma intonation’
pause,	inserts a medium-length pause (120 ms), preceded by a ‘comma intonation’
pause.	inserts a long pause (500ms), preceded by a ‘conclusive intonation’
pause?	inserts a long pause (500ms), preceded by a ‘question intonation’

Example:

Here pause is a comma pause. (inserts a 120ms ‘comma intonation’ pause between “Here” and “is”)

Here pause, is a comma pause. (leaves unaltered the ‘comma pause’ between “Here” and “is”)

Here pause. is a conclusive pause. (inserts a 500ms ‘conclusion’ pause between “Here” and “is”)

Here pause? is a question pause. (inserts a 500ms ‘question’ pause between “Here” and “is”)

pause=<num> sets to <num> milliseconds the duration of the following pause

Example:

This pause=10 , is a comma pause. (reduces to 10ms the following ‘comma intonation’ pause)

This pause=10, is a comma pause. (reduces to 10ms the following ‘comma intonation’ pause)

This pause=10 is a comma pause. (inserts a 10ms ‘comma intonation’ pause)

No final pause pause=0. (reduces the final silence to a minimum duration, while keeping the conclusive intonation)

Special Events

These commands trigger particular actions at the moment when the synthesis output reaches the exact point in the text where they have been inserted.

item=<sound name>

Example:

item=Laugh

Audio Mixer features

The audio mixer allows synthetic speech to be mixed with sound files.

". wav" files are only supported and played.

Example 1:
This is audio(play=<audioPath>/music.wav) a test.

Result:
“This is” will be read, then the music.wav will be played, then “a test” will be read.

Example 2:
This is audio(mix=music.wav) a test.

Result:
Speech and music.wav will be mixed and heard together

Example 3:
This is audio(mix=music.wav) audio(volume=50) a test.

Result:
The volume of the audio file is set to 50% (from the start).

Attachments :

Link1 : CPTX File

Link2 : Published SWF file containing all the tags mentioned above

Accessibility

Captivate 5

loquendo

T2S

text to speech

TTS

(1)

Comments

(23)

Ashwin Bharghav Follow

You must be logged in to post a comment.

wimd93656457

Aug 16, 2017

wimd93656457

Aug 16, 2017

Hi Ashwan,

The links in this (old) article don’t work any more… (the 3 in the 1st paragraph)

wkr
Wim

()

Anonymous

Sep 24, 2013

Anonymous

Sep 24, 2013

Thanks for your help… helped me!

()

Edit

Delete

Anonymous

Oct 5, 2012

Anonymous

Oct 5, 2012

Hi anybody know how to get the French voice to pronounce the accents correctly? the language= doesn’t work for me.

()

Edit

Delete

Anonymous

May 18, 2012

Anonymous

May 18, 2012

The voice just reads the text of tags…anyone know why?

()

(1)

Anonymous

Oct 31, 2012

Anonymous

Oct 31, 2012

Anonymous

's comment

In Captivate V5, these tags only worked with the voice Simon. In Captivate V6, Simon doesn’t seem to exist and these tags don’t seems to work at all.

()

Edit

Delete

Anonymous

Feb 3, 2012

Anonymous

Feb 3, 2012

Thank you for sharing these. I can get the laugh and cry and some others to work but pause won’t work. I’m working with Simon Loguendo and Cap 5.5. I used pause=50. Anything I’m doing wrong?

()

Edit

Delete

Anonymous

Sep 22, 2011

Anonymous

Sep 22, 2011

Adobe Captivate is a Troyan. If you install the Loquendo TTS engine for Captive guess what? You cannot remove it anymore. This shows how bad Adobe codes their software. Yea try to remove the TTS engine, they make you manual, videos and all the fancy stuff to installing but not for removing. If you happen to uninstall Captivate it will leave the Loquendo TTS engine on the system without any way at all, impossible to remove it. Not without having to reinstall your whole OS. There is a specific reason why I need to remove the TTS that came with Adobe because I need to install my own version and I cant because it says the TTS engine is already installed. Adobe you are terrible, how in the world you infect a computer system this way. You should allow people to remove every single file they installed using your software if they want to.

()

(1)

Anonymous

Dec 2, 2015

Anonymous

Dec 2, 2015

Anonymous

's comment

Did you ever get a reply?

()

Edit

Delete

Anonymous

Jul 13, 2011

Anonymous

Jul 13, 2011

Hi Ashwin
I’m using Captivate 5 with some other voices from Loquendo but I can’t make this tgas work!!! Please help me!
Thanks,
Diana

()

Edit

Delete

Anonymous

Apr 7, 2011

Anonymous

Apr 7, 2011

Hi Ashwin
I’m trying to open the attached SWF file in the latest Adobe Media Player but it is not opening. Could you please help me?

()

(1)

Ashwin Bharghav

Apr 7, 2011

Ashwin Bharghav

Apr 7, 2011

Anonymous

's comment

Hi booncoon,
SWF plays fine for me in Media Player. Can you try playing it in browser or standalone flash player.
Thanks,
Ashwin

()

Anonymous

Mar 31, 2011

Anonymous

Mar 31, 2011

Thanks for the wonderful guide and demo file. I have a question, though. Is there a way to hide the tags from the closed captions? I can edit them out after generating audio, but that seems like a less than ideal way to handle it.

()

(2)

Ashwin Bharghav

Mar 31, 2011

Ashwin Bharghav

Mar 31, 2011

Anonymous

's comment

Hi James,
Thanks for reading the blog. There is no restriction on the number of slide notes you can use in a slide. So you can have one slide note with tags, marked for TTS and another slide note without tags, marked for Closed captions. Try it out and let me know.
Thanks,
Ashwin

()

Anonymous

Apr 5, 2011

Anonymous

Apr 5, 2011

Ashwin Bharghav

's comment

Hi Ashwin,
Thank you very much for the support. Your advice was just what I needed. I’ve gone back through my presentation and created separate TTS and CC notes whenever I need to use markup or phonetic spellings. It works beautifully. I really appreciate your help.
Sincerely,
James

()

Edit

Delete

Anonymous

Mar 4, 2011

Anonymous

Mar 4, 2011

This is a list of paralinguistic sounds for Simon that I confirmed work in Captivate 5. Please respond if you find any more. I listed the original item syntax and indicated how many extra versions there are (example: “item=laugh (0-5)” means Simon can do: item=laugh, item=laugh_01, item=laugh_02, item=laugh_03, item=laugh_04, item=laugh_05.

item=cry (0-1)
item=laugh (0-5)
item=Sneeze (0-2)
item=Yawn (0-1)
item=Kiss (0-3)
item=Cough (0-2)
item=Ouch (0-0)
item=Sigh (0-1)
item=Snore (0-1)
item=Whistle (0-5)
item=Aha (0-0)
item=throat (0-3)
item=Mmm (0-1)
item=Eh (0-0)
item=Sniff (0-2)
item=Swallow (0-0)
item=Oh (0-2)

()

Edit

Delete

Anonymous

Jan 24, 2011

Anonymous

Jan 24, 2011

Thanks for a great blog Ashwin. I would really like to hear it at work but the swf file acts funny when I try to run it. It flashes and has a mechanical sound that keeps playing in a loop and the buttons slightly overlaping eachother and don’t work. I have never had trouble playing swf files before. Do you have any idea what could be causing this? I can’t use the cpx file because I have Captivate 4. I was wondering if you have the option to save it as a Captivate 4 file and send that one to me so I could publish it to swf and see if that would run?

Thanks so much and I really appreciate your thorough writing.
Lody

()

(1)

Allen_Partridge

Jan 25, 2011

Allen_Partridge

Jan 25, 2011

Anonymous

's comment

Lody, you need the latest version of Flash Player to see Captivate 5 content. Those artifacts at your work station suggest the machine needs a Flash update. I recommend you contact your IT as in general running older Flash players is not recommended.

()

Anonymous

Oct 14, 2010

Anonymous

Oct 14, 2010

More exploring and I noticed that If I use Stefan, Simon, or Juliett it works as advertised. If I use Paul or Kate, they pronouce the codes not apply the codes. Something that might be an issue is that I had a demo copy of Captivate V4 installed. I just bought Captivate V5. I know those two voices came with Captivate V4.

()

(1)

Allen_Partridge

Oct 15, 2010

Allen_Partridge

Oct 15, 2010

Anonymous

's comment

Anthony,

This is because the codes are for the Loquendo voices, not the Neo-Speech voices. I don’t believe that the NeoSpeech voices use these codes. You might search forums & the blog to see if there are similar triggers for Kate & Paul.

()

Anonymous

Oct 13, 2010

Anonymous

Oct 13, 2010

Looks like I have to use the tags on every line. I want to change the whole system. Also, I don’t know how to get this to work. You example works fine, but when I type in my own tags, the tags get read instead of them doing what is in the tag. The project was imported from PowerPoint if that matters.

()

Edit

Delete

Anonymous

Sep 16, 2010

Anonymous

Sep 16, 2010

[…] Betonung, Stimmenwahl und einiges mehr beeinflusst werden. Nähere Infos im Artikel “Transforming the narration of text using Loquendo Tags” im […]

()

Edit

Delete

Anonymous

Aug 11, 2010

Anonymous

Aug 11, 2010

how to change the speed of tts loquendo without using tags but modifying system files ?

()

Edit

Delete

Anonymous

Aug 10, 2010

Anonymous

Aug 10, 2010

Great article! I am looking forward to your next one–specifically would like to know how to edit the emphasis on a particular word. Thanks for posting this excellent information.

()

Edit

Delete