Why Voice Commerce Won’t Take Off Until Our Eyes Get Involved, Too

The current paradigm of voice shopping hasn’t yet fully leveraged what voice does well, or explored how to reinforce these behaviors. Enter multimodality.

Kristen Wong
Strategy Analyst

Gargantuan projections have long been hinting at the potential of voice commerce. By 2023, voice commerce is predicted to reach $80 billion. Smart home devices will carry out $164 billion in transactions in 2025.

But a newer narrative has emerged — one that challenges the view that voice commerce is on the brink of transforming how we shop. A December 2020 study by eMarketer found that only 9% of adults in the US have ever shopped via voice. Looking around at the services and apps available at our fingertips, only a handful of companies actually enable shopping through voice, whether it be through our smart speakers or mobile phones.

While these two perspectives contradict each other, it doesn’t mean that either of them are inaccurate. Voice commerce harbors rich opportunities in altering how customers shop for both items delivered to their door or services they receive in-person. But the current paradigm of voice shopping hasn’t yet fully leveraged what voice does well, or explored how to reinforce these behaviors and facilitate the purchasing of more sophisticated goods.

State of Voice Commerce

Customers are already primed to use their voice assistants to perform more advanced tasks like buying an item. The pandemic has spurred not only an increase in smart speaker usage, but also a shift in how consumers approach retail. These new changes have signaled that customers chase convenience and ease of use, which are voice’s strengths. When it comes to restocking your house with low-stakes necessities like soap and paper towels, voice tech is the ideal channel.

Amazon and Walmart are encouraging their customers to turn to voice to fill their shopping carts. Google partnered with Walmart and French retailer Carrefour to bring grocery shopping to Google Assistant devices. Amazon, already a pioneer in the space, created the ultimate, speedy shopping experience by pairing Alexa with Amazon Prime.

Every brand should be considering the ways in which its products show up — and speak up.

There’s a reason why voice commerce’s first use case is for groceries and other household necessities — it shines when it is tied to the effortless purchase of everyday basics. If you’re already regularly putting an item on a digital shopping list with commands, why not just purchase it at that moment? If you’re already using the Amazon or Walmart app to buy groceries, why not just use the smart speaker in your house when your hands are busy? There’s no need to second guess buying products that you know you need.

However, outside of those major retailers, not many other businesses have tried their hand at voice shopping. It hasn’t gained significant traction…but why?

How Should Brands Approach Voice Commerce?

Multimodality’s Promise

Multimodality has the potential to both fortify the shopping of low-cost necessities and aid brands in adopting voice commerce for more expensive goods. But this key attribute has largely been left out of the conversation on voice commerce. When we shop on our computers or our phones, we’re relying on the screens to show us exactly what we’re purchasing — providing a tangible grasp on the task we’re executing.

But when we shop on a speaker, we don’t have the gratification of a visual confirmation, nor the security or confidence that comes with seeing that the correct order has been placed. As a respondent of a PwC survey said about voice commerce, “I would shop for simple things like dog food, toilet paper, pizza…but ‘can you order me a sweater?’ That’s too risky.”

But the current paradigm of voice shopping hasn’t yet fully leveraged what voice does well, or explored how to reinforce these behaviors and facilitate the purchasing of more sophisticated goods.

Multimodal devices that deploy a systems-led approach hold the key to winning over the trust of consumers in voice commerce. 20.8% of consumers cited ‘no visual browsing’ as a drawback of voice tech. Additionally, 44% of those who use voice commands for shopping use voice for browsing new products, and 34% for performing a product search. Accounting for these attitudes, there is a desire for searches to be facilitated with voice-led, visual interfaces. In just one year from January 2020 to January 2021, US smart display adoption rose around 9%, indicating the rising popularity of smart screen devices.

With voice, we’re moving to a new process: see it, say it, then secure it. Positioning voice as a driver of commerce in front of the TV and in the kitchen can tap into existing behaviors while also unlocking opportunities for shopping.

Instant Shopping

Although advertising is migrating to social and digital, the TV has long been the primary vehicle for a brand to connect with its customers, especially during flashy live events in sports and entertainment. Brands must rethink how they can leverage this traditional medium to captivate interested audiences and fulfill their desires for immediate gratification. Imagine that you see an ad for a new vacuum you want. You’ve put it on your shopping list for the future. But what if you could just purchase it instantly when you view the advertisement?

Amazon is currently testing the “actionable video ads solutions” that insert a CTA onto creative, allowing customers who are watching a TV ad to add the product to their Amazon shopping carts through Alexa voice commands. Not only does this make buying commonplace household goods effortless, but it also paves the way for customers to purchase more sophisticated items like clothing, electronics, and more because they are seeing exactly what they will receive.

With voice, we’re moving to a new process: see it, say it, then secure it.

Similarly, with a new wave of voice-enabled TV remotes and TV screens built with Alexa or Google Assistant, there is opportunity to purchase directly from the screen itself. If your favorite brand runs an ad for that new sneaker you’ve been eyeing, there is a future where you can bypass the speaker and speak directly to your remote or TV. If products can be purchased via the TV interface, this opens a new path to generate revenue for the device manufacturer, internet provider, or streaming service.

As brands start exploring voice-enabled commerce on TVs through major assistants, they may consider how their own apps or websites can play direct roles in shopping over having their customers purchase through third-party services.

The Rise of the Voice-Enabled Kitchen

The kitchen also harnesses deep potential for the future of the voice-driven smart home. As new products embed screen surfaces, customers can shop through multiple touchpoints. From smart refrigerators to overhead displays, these appliances are building voice capabilities — including control of other devices, managing your calendar, and adding to your shopping list — to establish the kitchen as the hub of the home.

With the integration of popular third-party apps, these appliances set the stage for voice to undertake more consistent tasks. Voice is most likely to be used for grocery shopping, with 34% of users buying groceries. What if we could ask our refrigerator to add items directly to our online shopping cart of our favorite supermarket? Imagine saying a simple command to order coffee pods via the fridge when you run out, or ordering food from a delivery service for dinner. Voice shortcuts that can access third-party apps’ core services can expand the existing ecosystem of the smart home outside of Amazon Prime.

Additionally, major voice assistants already integrate with smart kitchen appliances like refrigerators. Smart fridges now can remind you when items are getting close to their expiration dates, and they now even sport cameras inside that provide updates on stock or give a real-time view on what’s inside your fridge. The arrival of these advanced capabilities, paired with voice technology, can be valuable for the busy resident. What if your fridge notifies you that you’ve run out of eggs, and then suggests that you purchase a new carton via voice? Talk about effortless grocery runs.

Can Voice Shopping Be Revived?

It’s not difficult to imagine the vast impact voice commerce can have by transforming the way we shop or how businesses can drive purchases. Ordering products instantly right as consumers discover a need is valuable. But voice commerce’s weakness is rooted in its limited availability on screen devices. Multimodal interfaces can facilitate voice-enabled purchasing behaviors by adding a layer of security and confidence to the purchase of a product customers want.

Every technology company should be experimenting with voice plus visual interfaces for shopping interactions, moving from low-stakes, everyday purchases up to more high-ticket items, and assessing where and how consumers convert, or drop off. And every brand should be considering the ways in which its products show up — and speak up — in these new environments, because mixed-modality shopping is almost certain to be a dominant paradigm in the 2020s.

What’s a Rich Text element?

The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.

Static and dynamic content editing

A rich text element can be used with static or dynamic content. For static content, just drop it into any page and begin editing. For dynamic content, add a rich text field to any collection and then connect a rich text element to that field in the settings panel. Voila!

How to customize formatting for each rich text

Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of"

nested selector system.

About RAINdrops 

Created and curated by our team of experts, RAINdrop articles cover the many ways voice technology is transforming your industry.

See All Articles

Get Voice on Voice 

Every Tuesday, our industry leading briefing covers the latest updates on voice and beyond. Join over 12,000 subscribers and sign up today.

Voice on Voice

Don't miss another briefing