Last year, the not-so-surprising top gift of the holiday season were smart speaker devices. This huge burst in device purchases was thanks in large part to Amazon and Google’s aggressive discounting of their entry-level line of devices (the echo dot and home mini, respectively).
This year, tech companies are gearing up to do it again, but this time there’s a new type of device in town– the multimodal voice assistant. What is a multimodal device? These are smart devices that include not only a speaker, but a screen as well— hence, allowing for multiple modalities of user experience.
Studies suggest that 79% of smart speaker sales occur during Holidays and that by the end of the 2018 season, almost half of U.S. consumers will own a smart speaker.
And, not only will more homes have a speaker, but 45% of existing smart speakers intend to purchase a second device.
If holiday deals so far are to be any indication, the big players are planning to use their discounting power again in the holiday season for these new screen-added devices including Amazon’s Echo Show and Spot, Google’s Home Hub, and Facebook’s Portal. As consumers buy up these new devices they will force the marketing landscape into Voice Assistance Phase 2 or, as we call it, the Multimodal Revolution. Is your organization ready?
To help you prepare for the new multimodal reality, RAIN has compiled 5 areas that will need to be considered to succeed with multi-modality.
Keep a Voice-First Mindset
While these new devices have screens, they are not to be treated like a tablet or smartphone devices. Organizations and developers would be wise to keep a voice-first posture when creating experiences to be accessed on the new devices. Given that these are intended to be stationary items placed in key rooms in the home, visuals are intended to convey information much more than they are expected to drive tactile interaction. However, this doesn’t mean that developers can ignore the added benefit of visuals. For example, developers should be taking advantage of built-in intents such as ‘ordinal’ slot values to allow users to navigate visual lists through verbals commands.
Know The New Rules
As you begin building multimodal experiences, you must be aware of the new design languages published by the leaders in the space. Recently, Amazon has created the Alexa Presentation Language (APL, for short) to provide developers with guidelines and templates to communicate visually. Similarly, Google has rolled out a visual canvas and “rich responses” for their Assistant platform. Get familiar with these standards to ensure your experiences are taking full advance of the new visual capabilities.
Embrace Your Use Cases
When considering what to create for these new multimodal devices it’s critical to recall that they will be used as background assistance, not the focal point of interactions. Experiences for these devices must be context-driven. Some screens are smaller than you might expect and developers should bear in mind that users will likely be multi-tasking and looking to the screen from a distance. Accordingly, visuals need to be used and sized appropriately for the task they are supporting. For example a recipe experience might want to focus on zoomed in video content and large text showing measurements that a home cook could look to and see from across a kitchen counter.
Take Inventory of Your Owned Assets
One of the exciting things about smart speakers evolving to include visual displays is the opportunity for organizations to leverage existing creative assets for the new medium. Sephora, for instance, has given their YouTube video catalogue a whole new life on these devices via a skincare advice experience built in partnership with Google. Taking stock of visuals and video content you have and leveraging them in voice-first experiences can give you a head-start toward publishing a rich multimodal experience without major asset creation costs.
Update Your Measurement Strategy
With every new device comes a new set of interactions to be tracked. With the addition of visuals, new analytics are now available to help organizations learn how users are engaging with their experiences. Whether you are tracking natively through the platform provider (e.g. Google, Amazon) or through a vendor like Dashbot (a leader in the voice-first analytics space, who have already rolled out a measurement tool for multimodal), it is key to have your measurement strategy in place from the start. With a new medium, signals from analytics and the ability to execute on tracking user behaviors are very important to understand what’s working and where there are breaks in the user experience requiring changes or optimizations.
Want to learn more about multi-modal possibilities? Whether you have an existing voice experience that needs an upgrade or want to create a new one, we’re here to help. Email us at firstname.lastname@example.org to start the discussion.
What’s a Rich Text element?
The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.
Static and dynamic content editing
A rich text element can be used with static or dynamic content. For static content, just drop it into any page and begin editing. For dynamic content, add a rich text field to any collection and then connect a rich text element to that field in the settings panel. Voila!
How to customize formatting for each rich text
Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of"
nested selector system.