Through my previous blog series on text-to-speech, you have learnt about:
- The different text-to-speech voices in Adobe Captivate 5 (NeoSpeech and Loquendo)
- Tweaking the pronunciation of text generated using NeoSpeech voices
- Using VTML tags to change speed, pitch, and volume of the NeoSpeech voices.
In this blog post, let us learn how to tweak the speech generated using Loquendo voices.
Loquendo allows you to control how the text will be read out by the voices, such as the language in which the text will be read, the voice to be used, speaking rate, loudness, the interpretation of numbers, the stress prominence of a word and its pronunciation.
- You can specify these aspects by:Setting parameters in the system configuration files; or by
- Inserting commands directly into the input text (in slide notes)
We will explore the second approach, which is to tweak the speech by inserting commands directly into slide notes. The commands are grouped as follows:
- Global controls
- Prosodic controls
- Sound effects
- Text interpretation
- Special events
- Audio mixer features
Click the name of the group to view more information.
I have used these controls in an Adobe Captivate project (.cptx and SWF file) and attached it here for your reference. To see the usage of tags, open the .cptx file, and select Audio > Speech Management. To hear how the voices modify the actual text, open the SWF file, plug-in your headphones and listen to the narration.
Do try out these tags and let us know your experience.
Stay tuned for my next blog post on “How to change the pronunciations of words used to generate speech”
The family of Global Controls includes commands that change the value of some of the Reading Parameters of Loquendo TTS , which affect the quality of the output speech:
- Voice and Language
- Prosodic aspects of the voice (speaking rate, volume, voice pitch and timbre)
- Sound effects
- Text interpretation.
Voice Control: forces a voice switch between voices.
voice=Simon hello. voice=Stefan hi.
(“hello” is read by the voice “Simon”, then “hi” is read by the voice “Stefan”).
Language Control: forces a language switch between languages. The mnemonic must be the name of an installed language.
language=English Paris language=French Paris.
(the first occurrence of the word Paris will be pronounced: p”}rIs , and the second: paR”i).
The following commands allow the quality of the output voice to be controlled by modifying its rhythm, intonation, volume and timbre. The output speech is modified from the word following the command, up until the end of the prompt.
Speed Control: Allows the speaking rate to be modified, expressed in an abstract scale 0-100.
speed=60 (Scale 0-100)
speed=60 This text is read at a faster speed.
Pitch Control: allows the fundamental frequency (tone or pitch) to be modified, expressed in an abstract scale 0-100
pitch=60 (Scale 0-100)
pitch=60 This text is read at a 60Hz frequency rating.
Volume Control: allows the volume (loudness) to be modified, expressed in an abstract scale 0-100 or in decibels (dB).
volume=60 (Scale 0-100)
volume=60 This text will be read at a 60 decibel rating.
Timbre Control: allows the voice timbre to be modified by a shift in frequency, expressed in an abstract scale 0-100.
timbre=60 (Scale 0-100)
timbre=60 This text is read at a timbre value of 60.
The following commands create certain sound effects by acting on acoustic parameters of the speech output signal. For example, Reverb gives the impression of a large hall or a church, while delay (or echo) repeats the audio signal at every diminishing volume.
Reverb Effect: Creates reverbations with an intensity of <gain> and a delay of <delay> milliseconds
reverb=80,500 (0<gain<100, 0<delay<2000)
reverb=0,0 (removes the reverb effect)
Robot Control: Applies the ‘robotization’ effect to the voice currently active in the system. There are 9 robots available: Robby, Gort, Twiki, Torg, Tobor, Ash, Hector, Max and Lynjx.
robot (removes the robotization effect)
Whisper effect: Applies the whisper effect to the voice currently active in the system. The possible values are: on, off.
(the effect is active)
(the effect is not active)
The following commands control certain general aspects of text interpretation. Here, I describe how to adjust them synchronously with the text by means of a User Control embedded in the text.
The general syntax for these User Controls is the following:
where <key> is the name of the Reading Parameter to be changed and <value> is its chosen value.
@TextEncoding=utf8 ( will interpret all the text as UTF8 format text. Characters like Ä/ä, Ö/ö, and Ü/ü will be read properly)
Inserts a pause (silence) in the absence of punctuation marks. The effect is not applied if punctuations already present in the text.
|pause||inserts a medium-length pause (120 ms), preceded by a ‘comma intonation’|
|pause,||inserts a medium-length pause (120 ms), preceded by a ‘comma intonation’|
|pause.||inserts a long pause (500ms), preceded by a ‘conclusive intonation’|
|pause?||inserts a long pause (500ms), preceded by a ‘question intonation’|
Here pause is a comma pause. (inserts a 120ms ‘comma intonation’ pause between “Here” and “is”)
Here pause, is a comma pause. (leaves unaltered the ‘comma pause’ between “Here” and “is”)
Here pause. is a conclusive pause. (inserts a 500ms ‘conclusion’ pause between “Here” and “is”)
Here pause? is a question pause. (inserts a 500ms ‘question’ pause between “Here” and “is”)
When followed by a punctuation mark, forces the duration of the corresponding pause to <num> milliseconds. In the absence of punctuation, inserts a ‘comma intonation’ pause of <num> milliseconds.
pause=<num> sets to <num> milliseconds the duration of the following pause
This pause=10 , is a comma pause. (reduces to 10ms the following ‘comma intonation’ pause)
This pause=10, is a comma pause. (reduces to 10ms the following ‘comma intonation’ pause)
This pause=10 is a comma pause. (inserts a 10ms ‘comma intonation’ pause)
No final pause pause=0. (reduces the final silence to a minimum duration, while keeping the conclusive intonation)
These commands trigger particular actions at the moment when the synthesis output reaches the exact point in the text where they have been inserted.
Plays one of the paralinguistic sounds recorded for the voice in use. For most voices, the following sounds at least are available: Cough, Cry, Eh, Kiss, Laugh, Mmm, Oh, Sniff, Swallow, Throat, Whistle, and Yawn.
The audio mixer allows synthetic speech to be mixed with sound files.
". wav" files are only supported and played.
This is audio(play=<audioPath>/music.wav) a test.
“This is” will be read, then the music.wav will be played, then “a test” will be read.
This is audio(mix=music.wav) a test.
Speech and music.wav will be mixed and heard together
This is audio(mix=music.wav) audio(volume=50) a test.
The volume of the audio file is set to 50% (from the start).
Link1 : CPTX File
Link2 : Published SWF file containing all the tags mentioned above