The State of Custom Assistants in 2022

In 2020, we predicted custom “owned” virtual assistants (OVAs) were the future of voice. In the roughly 18 months since, a number of developments have leant credence to our bullishness on custom assistants.

Eric Turkington
VP, Growth
Predictions are nothing if you don’t keep score. 

In November 2020, we at RAIN predicted that custom “owned” virtual assistants (OVAs) were the future of voice. We asserted that these assistants, built in a specific brand’s image, deployed inside their digital properties and devices were going to unlock disproportionate value because they put control back in brands’ hands.

In the roughly 18 months since, a number of developments have leant credence to our bullishness on custom assistants. There are, admittedly, fewer examples of robust OVAs in the market today than we expected (and some good possible reasons for that), but our viewpoint on the importance and momentum of this trend has been largely validated. Here’s a summary of what we’ve seen and why it matters.

Brands Buy Voice Assistant Capabilities & Talent 

A number of acquisitions point to growing brand belief in the value of offering their customers a custom assistant. While many companies have invested in home-grown teams to build and support their voice assistants, others are snapping up voice tech companies to accelerate their product-centric voice agendas. 

Peloton’s early 2021 acquisition of Aiqudo, a mobile app-focused voice tech company, shows that the market for voice-enabled home fitness equipment is heating up. Peloton is also hiring voice-centric engineering roles, doubling down on accessibility features, and adding subtitles to live workouts. The competitively-priced Peloton Guide product released in late 2021 also makes use of voice control for basic operations. While fitness enthusiasts are heading back to gyms and buying less at-home equipment, voice controls - especially those that deliver a personalized experiences a gym bike doesn’t - might be a draw to revive some growth in the home fitness equipment market. 

Consumer electronics companies previously content to leverage big tech voice offerings are developing their own assistants. Based on some late 2021 code leaks, Sonos’ $37M purchase of the privacy-centric voice assistant company Snips has seemingly progressed toward the imminent release of a dedicated assistant. The Sonos Voice Control system, once launched, is expected to be able to cohabitate on Sonos devices with Alexa, which could be used for more general queries (more on the theme of Alexa + custom assistants later). Per the leak, Google Assistant, which has been an option for Sonos device users, does not appear to be accessible alongside Sonos’ assistant. Perhaps this is to be expected, given the patent litigation between the two companies and Google’s conspicuous absence from the Amazon-initiated Voice Interoperability Initiative (of which RAIN is also a member). 

Walmart, which has launched an employee-facing voice assistant delivering product details on command (“AskSam”), is aiming to supercharge its conversational AI efforts (both voice and text) through their November 2021 acquisition of conversation design tool Botmock. A Walmart exec underscored the complexities inherent in handling something as seemingly simple as an “add milk to my cart” utterance (“...the right action and the response to the customer depends on…if the customer has bought milk in the past, what their preferred type of milk is…do they already have some type of milk in their cart, and if so, should we ask whether they want to change the quantity or let them know they already have it in their cart”), and hope that by bringing Botmock’s technology assets in-house, they can speed development times significantly on the consumer side. 

Social media companies are also getting in the custom assistant game. After building their initial AR lenses-selection “Voice Scan” feature with Houndify, Snapchat moved forward with spending a reported $70M on a voice company of their own, Israel’s, only to then work with Houndify again to deliver automated video captioning in February 2022. The jury’s still out on how’s conversational “virtual agent” capabilities will be leveraged by Snap, when commands and automatic speech recognition have been Snap’s focus to-date.

Notable In-Market Assistants

Financial services remained one of the hottest verticals in the last few years. Bank of America may have the most mature and well-adopted custom assistant in Erica. As of Q1 2021, Erica’s total user base grew to nearly 20 million clients, an increase of 7.3 million compared to the same quarter in 2020. Engagement shot up too. 106 million total interactions in 2021 eclipsed the 28 million the year prior. 

Competitor US Bank continues to invest in their smart assistant, with substantial in-house conversation design and voice architecture teams, enabling sophisticated one-shot queries (“How much did I spend at Target last month?”) and commands (“Send Sarah twenty-five dollars to cover last night’s dinner”) that save users time and deliver personalized experience. 

The complexity of banking apps is perhaps the best evidence of how a voice assistant can add value to the otherwise menu-laden experience, allowing users to execute buried functions in seconds.

A US Bank executive noted that there are over 300 functions a user may want to access at any given time, and no easy way to make them all touch-accessible, but voice changes all of that. 

If finance underscores the swiss-army knife attributes of a voice assistant, entertainment is the industry where one single tool - voice search - arguably has been tuned close to perfection. While Spotify’s voice search was initially confined to the search bar, in 2021 Spotify decided to make search and other control functions possible from anywhere in the app via “Hey Spotify” wake word. Spotify, alongside Pandora, offer both in-app native voice assistant features and also have prominent presences on mainstream voice platforms like Alexa, Google Assistant, and Bixby, showing that for some services, an owned assistant inside an app can complement a big-tech-assistant-led distribution strategy. 

In QSR, McDonald’s was comparatively early to test out voice ordering in 2019, and has now been the recipient of a class action suit around how voice biometrics were collected and leveraged to feed personalized ordering. While McDonald’s may end up bearing the scars of being an early innovator, the pending legal challenge has not deterred others from investing in similar technology. In late 2021 Wendy’s inked a deal with Google Cloud to power its restaurant operations, including automated voice ordering at the drive-thru and with an eye to more innovative use cases, such as providing real-time chef assistance around things like when to flip burgers. White Castle, Lee’s Chicken, and Good Times Burgers have also embraced the voice-enabled drive-thru and the labor savings and efficiency it promises. 

While we’ve been a bit surprised that there haven’t been more numerous examples of custom assistants in the market, the ones that have tend to be prominent, category-leading companies. And given some recent consumer data, those companies stand poised to reap the benefits of being early innovators. 

Consumer Demand & Executive Buy-In

Voicebot’s February 2022 mobile assistant report contained some eye-opening findings with regard to voice inside apps. In short, it appears mobile app voice assistant demand is indexing far above supply. Of consumers that have used them, ~69% of consumers are favorable toward voice assistants inside mobile apps, compared to only ~4% who are negative. At the same time, consumer demand for voice interactive features in apps has held relatively steady at 45% over the last two years. While many major app publishers have not yet implemented in-app voice assistants, the fact that consumers are overwhelmingly happy with the experience provided by those who have, leads us to expect there will be a cycle of further development and competitive pressure to reach parity. This will drive consumers to grow more expectant of these new interaction paradigms. 

A voice-friendly, multi-modal app will become the new standard bearer for UI excellence.  

On the business value side of the equation, Opus & Houndify polled 320 executives in February 2021 about the perceived value of custom assistants. 83% of executives reported that voice-enabled products or services resulted in an improved overall customer experience. North of 70% of execs also cited custom voice assistants as increasing loyalty and delivering operational efficiencies. 

These findings, paired with the big tech smart speaker sales plateau, indicate some shifting away from voice assistants as a channel and towards voice assistance as a product or service differentiator

Advances in Tooling & Distribution

Since 2020, the tooling ecosystem for custom assistants has matured and there are more ways to bring these products to life than ever before, which should help balance the scales of consumer openness and available custom assistants to use. 

Voicify and Voiceflow, two of the leading voice experience design and development suites, have expanded their capabilities to support the development of a broad array of custom assistants.

Houndify’s full-stack tooling was taken public via SPAC in late 2021 at a $2B valuation. This comes as Houndify has become available in 22 languages and processes 100M queries per month across its white-label customer base. 

The startup tooling landscape, already populated by companies like AlanAI, Spokestack, and Cobalt Speech, got more crowded with Finnish speech tech firm Speechly coming out of stealth and recently being accepted into Y Combinator. Speechly’s focus, similar to Alan AI, is in voice-enabling existing apps and web experiences, leaning into the belief of the power of voice-as-input with highly responsive visual output. 

This focus on command and control inputs raises an interesting question about the role of voice inside existing properties.

Is a voice assistant always synonymous with a conversation-having personality, or can it equally be a killer UI feature that listens and reacts really well, and eliminates the need for both conversation and tapping? 

In voice UI, there’s a continuum from conversations to commands, and Speechly’s placed its bet on one pole of that spectrum, which can be great for things like casually browsing products, but less of a fit for more ambiguous tasks, like resolving customer service issues. 

On the distribution side of things, Native Voice received $14M in a seed round to build new ways for custom assistants to scale beyond brand-made devices or apps. The company is working with headphone and speaker makers to make their devices compatible with invoking and interacting with an array of assistants, allowing brands to scale their cloud-based assistants on other hardware while maintaining their wake word, rather than being intermediated by Alexa or Google Assistant. 

Big Tech Enters the Fray

While there’s much startup innovation in voice tooling, might some familiar big tech tools drive a profusion of new custom assistants? 

Taking a page out of the AWS playbook of once-proprietary systems being sold to others, in early 2021 Amazon announced the Alexa Custom Assistant (ACA) program. Based on Alexa’s technological backbone and Alexa Skills Kit tools, ACA promises to deliver brands the best of both worlds - Alexa built-in to a brand’s device, plus the ability to use Alexa Skills Kit technology to build custom assistant functionality complete with branded wake words. This multi-assistant setup, already expressed in products from Verizon and Disney, makes heavy use of hand-offs, enabling one assistant to automatically pass a query to the other, based on whether it fits into their wheelhouse. The prospect is compelling for brands who may see value in both having Alexa and her generalist capabilities embedded in their products but who also want their customers to enjoy the convenience of talking directly to their brand to control their product or access their services. As the VII’s broad membership would indicate, interoperability is widely believed to be an important design and technical principle for the future of voice. But will some brands bristle at the notion of having Amazon (and Alexa) ride shotgun in their product in exchange for a turnkey approach to building a custom assistant? 

It’s too early to say how brands will embrace ACA relative to other paths to building custom assistants, but it’s already proven a highly attractive proposition in automotive, one of the critical battlegrounds for voice tech. Amazon recently announced a big new deal with OEM juggernaut Stellantis, securing Alexa - and Alexa-powered custom branded assistants - seats in vehicles from 14 major brands under the Stellantis corporate umbrella.  

Insights from the Frontlines of RAIN’s Work
In addition to watching industry developments, we’ve been busy working on a half-dozen custom assistant initiatives over the last few years, and are in discussions with about a dozen more companies about building theirs. Even from this modest sample size, we are beginning to see some interesting patterns. 

First, custom assistants truly come in all shapes and sizes, from where they do their work to how they’ve been built. Some live in apps. Others live in custom speakers. Some extend across both digital and physical touch points, and manifest slightly differently depending on the surface. Some orchestrate data visualizations. Others retrieve single data points for workers much faster than conventional software. Some leverage one or two tech tooling providers across the whole stack, while others stitch together many providers across Wake Word, ASR, NLU/NLP, and TTS. 

As the name implies, there is no standard way to build a custom assistant. The best are purpose-built for their use cases and users. 

Which leads to the second point: custom assistants can be significant undertakings, laden with design and technical complexity. They are products of their own, but often nested within other products. They need to work in harmony with a product’s functionality and existing modalities, be they touch screen controls in an app or physical buttons on a device. The “Hippocratic Oath” of custom assistants is to do no harm and only add value to the customer experience. That means striking a balance of being accessible without being obtrusive. This is as much about nuanced visual design - say, the size and placement of a push-to-talk button - as it is about properly chosen and trained wake words that aren’t over-sensitive to unintended invocations. 

A third observation is that custom assistants can be viable as businesses of their own.

A number of companies have emerged where a custom assistant is not simply an extension of services for an existing business, but is the product of the business.

This is especially prevalent in the employee productivity space (see Suki in healthcare, Merlyn in education, or Athena in manufacturing) and it’s the core thesis behind our recent investment to build an automotive repair assistant as a SaaS business.

What comes next? 

The brands we support are only a sliver of what the industry has underway, but when we look at our client collaborations, our SaaS assistant-in-development, and the signals from the market, a few developments seem likely for the year ahead. 

Early-to-market custom assistants will improve and see greater adoption. Voice has reached a maturity level where it’s less about experimentation and more about commitment. Companies tend to be either in or out. And those who are “in” have increasingly large bodies of data to use to optimize their assistants’ comprehension, utility, and integration with their products and UI. Similar to how Erica’s engagement has skyrocketed, we expect other early movers to see success as consumers get increasingly comfortable with a new way of interacting with branded products and services. 

A crop of new brands (and organizations) will launch custom assistants. To keep up with trends and competitive pressures set by leaders in their categories, be those BofA, Peloton, or Snap, we’ll see a number of high profile companies launch first versions of their custom assistants in 2022. We anticipate this will be highest in the area of operations and productivity for employees, followed by customer self-service functions in apps and devices. 

Assistants will not require voice to be meaningfully adopted. Custom assistants will still embrace voice as a common modality, but become readily accessible in chat/text-led interfaces and through learned routines and programmed task automation. That is not to say that chatbots as we conventionally know them (reactive, issue resolution-focused) will simply become more prevalent. Rather, these bots will start to assume a far more personalized and proactive posture toward their users, and voice will be an option to engage, but not a requirement. 

What’s a Rich Text element?

The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.

Static and dynamic content editing

A rich text element can be used with static or dynamic content. For static content, just drop it into any page and begin editing. For dynamic content, add a rich text field to any collection and then connect a rich text element to that field in the settings panel. Voila!

How to customize formatting for each rich text

Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of"

nested selector system.

About RAINdrops 

Created and curated by our team of experts, RAINdrop articles cover the many ways voice technology is transforming your industry.

See All Articles

Get Voice on Voice 

Every Tuesday, our industry leading briefing covers the latest updates on voice and beyond. Join over 12,000 subscribers and sign up today.

Voice on Voice

Don't miss another briefing