• Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint

12.3. Text-to-Speech Interface

The Text-To-Speech Interface (TTSI) is used for the generation of synthetic speech from textual data (either text or phoneme). In this framework, the transmission of speech data is enabled at very low bit rates (200 bps to 1.2 kbit/s). Speech synthesis, in general, is useful in various kinds of multimedia applications, and thus MPEG-4 defines flexible means for its use in different situations. MPEG-4 TTSI allows the following additional information to the plain text:

  • Speaker-related information (speech rate, age, and gender of the speaker);

  • Prosody (e.g., time-dependent variation of pitch);

  • Language code, or lip shape information when used for video dubbing; and

  • Face animation-related parameters when used in synchronization with an animated face.


PREVIEW

                                                                          

Not a subscriber?

Start A Free Trial


  
  • Creative Edge
  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint