A life saved by a Carbon Monoxide detector. A silent alarm system triggering law enforcement to protect a family from burglary. Lane assist corrections to avert a potential car crash. It’s always been easy to spot the tangible advances of technology around us and the value it adds to detect issues and keep us safe. In the ‘detection’ technology space, and generally in the products and services we use, it’s also easy to lose sight of the profound role that sound, or simply the human voice, plays in our interconnected lives.
While it can be argued that ongoing advances in today’s digital world are adding value to people, places, and things at a different rate of change, they are less visible and tangible — more “sonic-forward” — and therefore not as obvious. One of these trends that also has tremendous potential to change people’s lives is voice biometric technology. Like the aforementioned examples, this tech is building momentum in our consumer products, our most important personal services, in how we track our health, and more.
While there are different applications for voice biometric technology, it’s premise is in the synthesis of varying individual paralinguistic biomarkers like pitch, tempo, articulation, volume — to name a few — into specific personalized outcomes ranging from mood analysis, to detecting certain likeliness of illness, to offering an added security measure in identity authentication.
Voice biomarker tech is building momentum in our consumer products, our most important personal services, in how we track our health, and more.
Organizations have found value in general biometric use cases for years now, like the Apple fingerprint or facial recognition services on the very device you’re likely reading this on. Voice biometric technology has also been around in places for some years now (you can find the applications mostly in clinical or Interactive Voice Response (IVR) settings). But with voice-first interfaces and intelligent assistants becoming firmly entrenched into our lives, there are more relevant ways than ever for voice biomarkers to elevate the services around us, from specialized healthcare treatment to personalizing consumer entertainment services. But this evolution is not without its risks or costs, both real and perceived.
When Voice Saves Lives
It’s likely that in clinical healthcare settings voice biometrics have the chance for the most radical impact. In the past few years, trials with voice biometric technology have shown signals of carrying significant information on the mental health of the speaker. In one of these instances, through the analysis of a patient’s speech and paralinguistic elements clinicians were able to detect early depression signals in patients diagnosed with Parkinson’s, and a high accuracy in associating the levels of depression based on the severity of the Parkinson’s. With depression being one of Parkinson’s’ highest associated comorbidities, the widespread scale of screening patient audio for signs of depression would be life-saving, literally. In another more recent clinical instance, startup Vocalis and the Mayo Clinic identified frequency-level voice biomarkers to link an association of high blood pressure in patients with pulmonary hypertension. With voice biometric technologies, what is typically a difficult to treat, late-stage diagnosis for many may actually have potential to be caught and treated earlier; again, life-saving.
In HCP settings where ambient listening is already helping physicians offload the burden of some administrative tasks, screenings of voice recordings during a doctor’s visit will inevitably be used to identify and treat life threatening conditions preventatively. And just as telehealth rises in popularity, patients may never need to leave their homes to be afforded such conveniences.
A Unique Voice
As a personalization measure, voiceprints — a unique voice recording similar to a fingerprint or retina — are not new, but with the rise in voice cloning and deep fakes, the sentiments around personalized voice recognition applications is not unanimously positive. In actuality, voiceprints have proven utility and governance, even having legislation backing the validity of a voiceprint when used in e-signatures. So while it can feel like the wild west of synthetic voices from a scroll through social media, when voice profiles and voice signatures are used in-bounds, there’s real benefit of security and personalization.
We’re going beyond the simple tracking of speech for context, and moving into voice in a way that truly exceeds what humans can detect from speech alone.
At a personalization level, look at Alexa and Google Assistant, the leaders in voice application enablement. After a necessary set-up, these assistants can detect the active speaker based on their voice and serve up personalized responses and experiences that the intended user is likely to ask for. While especially helpful in a home with shared users of a single voice device, the idea of personalizing the handling of voice queries amid “social voice” environments has potential to explode as voice interfaces become more commonplace in an employee facing capacity, in shared workspaces, job sites, and more — where knowing who is speaking is as important as recognizing what they’re saying.
Security matters to us when we’re users of anything, it’s why we accept and find comfort in multi-factor authentication even though it delays us logging in to a service we’ve used countless times. We want things to be personal for us, to be able to trust them as secure, and voice is just scratching the surface of what should follow as voice adds a layer of secure personalization.
But, personalization is another divisive topic between the major enablers and us, their subjects. An average U.S. adult has access to 11 discrete connected devices; clearly we need these devices and cross-device personalization does make our experiences better, yet we decry the revelations of super surveillance in our lives — the passing conversation that becomes a string of ads on social media. The unintended triggering of our voice assistants from TV commercials, or the whistleblowers of big tech’s poor data handling. With too many examples to count, it’s natural to question if, benefits aside, giving ambient access to tech that “listens” is the personalized secure expectation we actually want.
Keeping Surveillance in Check
For instance, recently Spotify announced that through its speech recognition technology they may soon recommend music to you based on your mood, or what it can perceive from your environment (ie. alone, social setting). It could be argued that an entertainment service accepting that much control of your curated experience is, in fact, too much. Detection of that kind is simply more than many signed up for.
These recordings present the early stages of a massive voice profiling revolution that could change the way we’re marketed to and served, as customers.
And sure, Spotify is just one service. It could be argued that by opting into it that you are welcoming the personalization, and it won’t affect those who abstain. But you’d be hard pressed to find a person unfamiliar with the phrase “this conversation is being recorded for training and quality purposes” while on a customer service call. As Joseph Turrow covers masterfully in a recent Fast Company piece, it’s the caller that could be monitored too. Along with the rise in ambient listening due to smart speakers and voice-responsive phones, these recordings present the early stages of a massive voice profiling revolution that could change the way we’re marketed to and served, as customers. Did anybody ask for that? Will it actually help? We’re going beyond the simple tracking of speech for context, and moving into voice in a way that truly exceeds what humans can detect from speech alone.
If humans can’t hear disease markers in other humans’ voices, but computers can, what else can computers hear in our voices that we do not know we’re conveying? Emotional vulnerability? Lack of confidence? The fact that we relapsed into a bad smoking habit a few days ago? This is all information that marketers could use to hyper-target us.
The Voice First Future
It would be pretty easy to go for the Skynet narrative when considering the rise of the Voice biometric technology and the potential intention used by those who will control the data, but the truth is that the very same risk of giving away too much access has an equal but opposite benefit of achieving what we as people, alone, cannot.
One thing that is certain is that voice experiences and custom assistants will help to scale the application of voice biomarkers. With obvious applications in healthcare, customer service, in leisurely pursuits, deployed in shared spaces, and as a personalization element in entertainment, delivering on the promise of voice biometrics will require continued governance, deep and proactive ethical consideration, and an openness to experimentation.
What’s a Rich Text element?
The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.
Static and dynamic content editing
A rich text element can be used with static or dynamic content. For static content, just drop it into any page and begin editing. For dynamic content, add a rich text field to any collection and then connect a rich text element to that field in the settings panel. Voila!
How to customize formatting for each rich text
Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of"
nested selector system.