Why the Privacy Controversy Over Voice Assistants Misses the Point

Here is what actually happens with voice assistant recordings and why discussions about privacy in voice today are missing the bigger picture

Shanna Walia
Strategy Analyst

Imagine, you have a cute little pup named Bobo. One day, you come home after a long day of work and Bobo meets you excitedly at the door, happy as always to see you. As you set down your work bag Bobo playfully starts nudging at your legs – his way of signaling he wants a treat – you *sigh* with a mixture of awe and resignation, reaching for that treat jar you let out a simple “Ok, Bobo.”

Unexpectedly, you are met with a synthesized, “Hello, what can I do for you?”

No, your precious Bobo did not get taken over by robots, The smart speaker sitting on your kitchen counter misinterpreted your “Ok, Bobo” as “Ok, Google.” Google Assistant also interprets “booboo,” “goo,” “doodle,” and a list of other similar sounding words as indicators to come to the rescue.

Google products aren’t the only ones, reported instances of false invocations happen across platforms and countries. “Avec sa” (translates to “with him/her” in French) sounds a lot like “Alexa” while “hecho” (meaning to make or create in Spanish) sounds a lot like “Echo” …It’s all kind of like the modern day equivalent of a “butt dial.” If you own a smart speaker – or even an iPhone – you’ve no doubt experienced this phenomenon.

Instances of false invocations and phantom touches prompting devices to go into “record” mode have dominated press attention, amplifying privacy concerns amongst consumers, many of whom feel they are being actively spied on – or at risk of being spied on – by their smart speakers. While Amazon, Google, and Apple have all admitted that small samples of voice recordings have in the past been analyzed by their teams and independent contractors, here is what actually happens and why mainstream discussions about privacy in voice today are overblown, lacking context and missing the bigger picture:

Privacy is an illusion

In The Age of Surveillance Capitalism author Shoshana Zuboff cites a study done by two professors at Carnegie Mellon who calculated that reading all privacy policies one single person encounters over the duration of a year would require 76 full workdays and would result in a national opportunity cost of $781 billion. Yep, it would cost the US economy $781 billion dollars if all of us started actually reading the privacy policy sections of “Terms & Conditions” documents…and that was 11 years ago.

Her book makes evident that there simply isn’t an option to fully participate in society today and without having your privacy intruded upon in some way. Sure, there are things you could do…stay off social media, use incognito browsers, never use Google maps, only make calls using two burner phones placed in plastic baggies in different locations, you know…the usual.

Sorry to burst your bubble, but regardless of smart speakers’ hearing capabilities, in today’s digital age, achieving true privacy is an illusion. That’s because nearly every form of digital communication (phone calls, emails, texts, photos, etc.) leaves a trace – even when deleted – which gets tracked and sold and used to target us. We’ve all heard the cautionary phrase, “once it’s on the internet, it’s there forever.” This is the nature of how the web was built and how it has evolved, and we as consumers have almost unanimously embraced the tradeoff thus far.  

There’s a huge difference between “hearing” and “listening.”

How many times have you sat in a presentation and heard someone  speaking but not really retained a word of what they’ve said? You see a mouth moving, hear a string of sounds coming out, but there is no associated meaning mapped to your brain. You were hearing, but not really listening. Your smart speaker (regardless of which brand) is not so dissimilar. It’s focused on both the recognition of speech – the sounds your mouth strings together to make words – and matching those sounds to something meaningful in its’ knowledge bank. While humans can easily pick up on the difference between “Bobo” and “Google,” smart speakers need to be trained to decipher the phonetic nuances, and they need to be trained to understand our utterances, which is an even bigger leap.

Only when speech recognition and natural language understanding combine forces are our voice assistants capable of truly listening to us, and that’s the only thing that makes voice assistants useful.

It makes them smarter (so they can better serve you)

Smart speakers need to hear more to improve their understanding of phonetics, pronunciation nuances, and conversational cues. However, they can’t do that by themselves. Machine learning can only take these systems so far – most experts in voice technology believe that training AI is a process that must be aided by humans. Smart speakers must be fed large data sets of people saying the exact same thing because that’s what helps them train to grasp the subtleties between different pitches, inflections, accents, etc. Machines may be sufficient to use data sets of thousands of iterations of “Bobo” vs. “Google” to disambiguate these sounds and improve wake word response accuracy, but for more complex utterances – also known as basically anything a human would naturally say to a voice assistant – machines need a human hand to improve.

They’re the only ones that don’t have ads, and come with a mute button!

Have you ever unlocked your phone, opened your Instagram app, scrolled through some pretty pictures of food and friends until *bam* you’re hit with an ad for a product you were just talking about?! This is the illusion of eavesdropping, when really, there are massive data lakes informing when and how we are served advertisements that can appear to only have come from a direct line into our most intimate conversations and activities. Ever notice how that doesn’t happen on smart speakers? Well, for one thing, they don’t serve ads (yet).

And, unlike all other smart devices (mobile phones, laptops, tablets, etc.) that have listening and recording capabilities, smart speakers come with a clearly labeled, impossible-to-miss mute button, which cannot be easily tampered with remotely based on how the hardware is designed. So you can have a level of confidence when their mics are hot, and when they’re not, with the touch of a button.

You still can Marie Kondo your Alexa

Deleting your smart speaker search history is possible and not significantly different from deleting your web browser history.

It seems, everyone has grown accustomed to the fact that the embarrassing Facebook photos we’ve now deleted still live on somewhere in the deep, dark ethers of the internet. With voice, instead of living in the deep corners of the World Wide Web, voice search queries and commands live on Amazon, Google, and Apple cloud networks – which it’s safe to assume are pretty difficult to hack into, and which we trust daily with every other aspect of our digital lives.

Everyone else is doing it

Quite frankly, everything else is already listening. Take finance. Credit cards, bank accounts, Venmo all track digital data on our spending habits, but how many people do you know that exclusively carry cash to avoid being tracked?

People seem to be more concerned with smart speakers because consumers are disconcerted by their placement in intimate spaces – living rooms, bedrooms, bathrooms – without realizing that these are all the places they carry their phones (and other devices) into on a regular basis, what they are made uncomfortable by is already happening.

Consumers should care about privacy, and should be informed about how their data is collected and managed. Just because we’ve essentially given up privacy does not mean we can’t get some of it back in the future. But it does no good to get selectively outraged and go into panic mode about voice, just because it is a new modality that happens to involve what we say, not what we type.  

Consumers that demand better performance from voice assistants would be wise to understand the hybrid approaches (human and machine) that most experts believe are required to improve them.

Popular fear and outrage would be better applied to the greater system of surveillance capitalism that has been assembled across every touchpoint of our lives. Until systematic changes are made to the way privacy is handled, there frankly is no point in being up in arms when tech companies selectively employ humans to crack conversational computing, one of the most difficult problems in AI, to ultimately create better user-centric products and experiences.

Until then, go ahead and point out the hypocrisy of your paranoid friend who swears they’ll never let a smart speaker into their home, but have no qualms about using their smartphone 8 hours a day. For now, it’s okay to take off the tin foil hat around your Echo or Google Home.

Portions of this article originally appeared in DZone

What’s a Rich Text element?

The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.

Static and dynamic content editing

A rich text element can be used with static or dynamic content. For static content, just drop it into any page and begin editing. For dynamic content, add a rich text field to any collection and then connect a rich text element to that field in the settings panel. Voila!

How to customize formatting for each rich text

Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of"

nested selector system.

About RAINdrops 

Created and curated by our team of experts, RAINdrop articles cover the many ways voice technology is transforming your industry.

See All Articles

Get Voice on Voice 

Every Tuesday, our industry leading briefing covers the latest updates on voice and beyond. Join over 12,000 subscribers and sign up today.

Voice on Voice

Don't miss another briefing