Note: This feature is only available on the following devices: Roku Streaming Stick (3600X), Roku Express (3700X) and Express+ (3710X), Roku Premiere (4620X) and Premiere+ (4630X), Roku Ultra (4640X), and any Roku TV running Roku OS version 7.2 and later.

Table of Contents


Text to Speech Components

Components available since firmware version 7.2

Text to speech (TTS) allows you to provide an audible spoken version of the strings shown to the user in your application. For platforms that are required to comply with the FCC Communications and Video Accessibility Act of 2010 (CVAA), this capability can be used as part of compliance with CVAA, and the current text to speech flite_tts library is built into the image. The Roku text to speech capability supports different languages, voices, rates of speech, volume of speech, and other aspects of text to speech. Roku provides text to speech support in the following components, interfaces, and events:

Components available since firmware version 7.5

Audio Guide Behavior for SceneGraph Nodes

Audio Guide behavior for built-in SceneGraph panels and scenes:

 MEDIA speech is spoken in the following order:

 There is no additional speech for the following nodes (they will behave the same as RenderableNode):

Audio Guide Support for BrightScript Components

Implementation Tips

TTS Interruptions

Many channel UI elements have default TTS behavior.  It is possible that speech triggered by these implementations can interrupt your TTS implementation at times. You should keep track of the IDs of your TTS utterances, as returned by say() and silence(), and handle interruptions accordingly.

Other TTS Implementation Changes

Other TTS implementations may change the current voice, the current language, the current volume, the current pitch, and/or the current speech rate. You should keep track of how these parameters might change.

Long Text Delays

A long text string to be spoken by TTS may have a noticeable delay before starting the speech, at least for the first speech of the long string. For long text strings, you can break up the text string so that the first speech is a reasonably short sentence, followed by longer sentences as needed. You should not break up the long text string into individual words, as it will affect phrasing without improving the perceived delay in any noticeable way.