How To Drive Adoption of Voice Experiences

How can designers let users know all that is possible in a voice experience without overwhelming the user?

Bryan Sebesta
UX/UI Designer

In Hooked: How to Build Habit-forming Experiences, Nir Eyal discusses the practices that brands leverage to make products that people come to again and again. First, a trigger, like a notification is sent; then an action is made (i.e. clicking the notification); next, a reward is provided, like learning about a breaking news story, seeing new “likes,” or getting a message; and then, often, an investment is made. For example, after investing so many photos to Google Photos or iCloud or buying so many Kindle books or videos from Amazon, it’s less likely you’ll leave that service.

This is a helpful paradigm for thinking about the challenges that voice apps face. Without screens and icons, how do voice apps cue the user on smart speakers? And how do businesses and brands leverage situations and contexts that might provide a cue to try voice, such as hands-free situations?

When it comes to actions, voice interfaces are often billed as “intuitive.” In theory, yes; but voice apps are not Jarvis or C-3PO, intelligent assistants who can understand any command; they’re more limited than that. And yet–the more human an interaction (and conversation is among the most), the more likely people’s expectations inflate. How do designers let users know all that is possible without overwhelming the user? And how do they prepare for everything a user might say? How do companies like Amazon and Google make discovering and enabling their voice apps easier?

When it comes to rewards, what kind of incentives will bring users back? Is the experience immersive, with high production values? What kind of voice products make the most sense on an (often) audio-only platform? And what kind of investments can users make on a voice-only platform? What can “lock users in,” in a positive sense?

So, how can we design voice experiences to combat these challenges?

  • Cooperate with the User and their Memory. The main principle for voice design is cooperate with the user. For our purposes, this means: be brief, don’t overwhelm people, and work with the constraints of people’s working memory. When you first open a skill, give a few suggestions. As the user continues to use the action or skill, you can taper, meaning you provide fewer cues (or provide different cues, suggesting different features). Other strategies, like providing overviews and keeping any list of options to fewer than four, can also be helpful. When you’re writing out utterances–phrases that allow your voice app to capture what users are saying–be robust; include as many phrases as you can. And of course, when all this is done, test, test, test with real people. All of these suggestions are ways to cooperate with memory and make it a great experience.
  • Leverage Cues. Take any opportunity to remind the user what features your voice app has. If an individual is using a smart speaker or app, show suggestions or chips to clue users in on how this can be used. If a voice app exists within a larger product ecosystem, include example phrases people can say within the other products or on your website, newsletter, or packaging (if that applies).
  • Create Real Value. Make something that really creates value for customers. This doesn’t mean it has to be complex or involved; the ability to ask “When is my next appointment,” and hear a short response, can be exactly what is needed. Voice apps do not need to have a massive range of use cases and abilities; often, that can only make every feature harder to remember. Additionally, if the voice experience is tied into a larger ecosystem, you should be able to leverage opportunities there for reminders and onboarding to the voice experience’s value. For example, if you’re using a budgeting app with a voice companion, and you log onto to check your balance–and you know they’re connected to the voice app–offer a tooltip or popup with a note suggesting an utterance for the voice experience.
  • Avoid Mistakes by talking out loud with other people. This helps you catch the rhythm of conversation at the first stages, and keeps you from “writing for screens,” which is what most of us are trained to do. It also keeps you from saying too much at once. Talk out loud several times before committing anything to type.

Let’s review some examples that illustrate these concepts.

Example 1. Amazon effectively advertises on their website with small blue bubbles containing italic text, that Amazon’s customers have come to associate with Alexa speech bubbles.

Example 2. Both the Google Assistant app and “Hub” devices allow you provide suggestions in the form of “suggestion chips.” (Alexa allows for something similar.) These provide cues for how the conversation might proceed and clue the user into relevant features at each turn.

Example 3. Headspace provides a page explaining Alexa on their website. Along with pages like this, example utterances can be occasionally woven into the main app, website, or other product. For example, while listening to a meditation, a few utterances might be shown indicating how to access that same meditation from the Alexa skill.

Example 4. Headspace has hundreds of meditations. It’s an exercise of restraint in deciding what to suggest or offer within their voice experience at any one point. Which suggestions are most relevant? How much needs to be said?

New User

In this example, a new user with access to only a few meditations logs in. We suggest three possible ways forward.

User: Talk to Headspace
HS: Welcome back to Headspace. You can do today’s meditation, a sleep exercise, or unlock more meditations. What would you like to do?
User: Today’s meditation
HS: <Plays meditation>

Paid Subscriber (Version 1)

In this example, a long-time, paid user- who has access to the entire Headspace library, and familiar with certain terms- opens the action. They navigate a two-tiered menu.

User: Talk to Headspace
HS: Welcome back to Headspace. You can meditate or get ready for bed. What would you like to do?
User: Sleep
HS: You can try a sleepcast, do a sleep exercise, or listen to soothing music. What would you like to do?
User: A sleep exercise.
HS: <Plays meditation>

Paid Subscriber (Version 2)

In this example, the decision is made to present just one option- perhaps based on the user’s regular behavior and time of day. Machine learning and personalization allows us to learn what would be the best suggestion over time.

User: Talk to Headspace
HS: Welcome back. Would you like to continue with the “Focus” pack today?
User: Yes
HS: <Plays meditation>

Portions of this content originally appeared as part of SoundHound’s Finding Your Brand Voice: 6 Ways to Build a Better VUI Guide

What’s a Rich Text element?

The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.

Static and dynamic content editing

A rich text element can be used with static or dynamic content. For static content, just drop it into any page and begin editing. For dynamic content, add a rich text field to any collection and then connect a rich text element to that field in the settings panel. Voila!

How to customize formatting for each rich text

Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of"

nested selector system.

About RAINdrops 

Created and curated by our team of experts, RAINdrop articles cover the many ways voice technology is transforming your industry.

See All Articles

Get Voice on Voice 

Every Tuesday, our industry leading briefing covers the latest updates on voice and beyond. Join over 12,000 subscribers and sign up today.

Voice on Voice

Don't miss another briefing