Android’s Expressive Captions uses AI to bring emotion to captions
Captions were originally popularized in the 1970s for the millions of people who are d/Deaf and hard of hearing to consume TV content. Now, 70% of Gen Z uses captions most of the time — whether they're watching videos on the subway, in a loud public space or just want to better understand what's being said. In some cases, the lack of pre-loaded captions for livestreams, social content or videos from friends or family can make them inaccessible. But in general, the way captions are presented hasn’t changed much in the past 50 years. And this means that nuances of language and sound, including emphasis, tone and personality, are often lost.
Today, we want to change that with the introduction of Expressive Captions on Android, a new feature within Live Caption that will not only tell you what someone says, but how they say it. This is a meaningful update for advancing our Google Captions product suite that includes Live Transcribe, Sound Notifications, and more. Because even if you can’t hear it, you should still be able to feel it.
Bringing feelings to captions
Expressive Captions bring more intensity and emotion to your captions
Expressive Captions uses AI on your Android device to communicate things like tone, volume, environmental cues and human noises. These small things make a huge difference in conveying what goes beyond words, especially for live and social content that doesn’t have pre-loaded or high-quality captions.
- All CAPs: Captions will now reflect the intensity of speech with capitalization, so you’ll know when a friend excitedly wishes you a “HAPPY BIRTHDAY!”
- Vocal bursts: You'll see even more sounds identified, like sighing, grunting and gasping, giving you essential expressions of tone.
- Ambient sound: We’ll label additional noises in the foreground and background, like applause and cheers, to give you a fuller picture of what’s happening in the environment.
Expressive Captions uses three features to give you the context often missing from captions
Expressive Captions are part of Live Caption, so they’re built into the operating system and available across apps on your phone.1 This means you can use Expressive Captions with most things you watch, like livestreams on social platforms, memories in your Google Photos reel and video messages from friends and family. When enabled, captions will occur in real time and on device, so you can use them even while you’re on airplane mode.
Bringing Expressive Captions to life
To build Expressive Captions, our Android and Google DeepMind teams worked to understand how we engage with content on our devices without sound. Using multiple AI models, Expressive Captions not only captures spoken words but also translates them into stylized captions, while providing labels for an even wider range of background sounds. This makes captions just as vibrant as listening to audio. It’s just one way we’re building for the real lived experiences of people with disabilities and using AI to build for everyone.
Starting today, Expressive Captions will be available in the U.S. in English on any Android device running Android 14 and above that has Live Caption. This is part of our work to find even more ways to bring emotional expression and context to captions.