We recently sat down to chat with RAIN’s Senior Technical Director Mark Tucker about how he got into voice technology, and his views on what makes voice tech uniquely exciting for both developers and users. An Alexa Champion and Bixby Premier Developer, Mark has been an early and frequent innovator in voice and an outspoken member of the global conversational AI community. Mark was recently designated as one of the top technologists in voice by Voicebot.ai, and is co-host of the podcast Two Voice Devs.
Mark, what first inspired you to start experimenting with voice technology?
What brought you to RAIN and what sorts of projects have you been working on?
Soon after learning how to program voice assistants, I wanted to turn my night and weekend passion into a daily one. I looked for companies that were focusing and succeeding in voice. I found RAIN listed on the Alexa agencies page. I followed employees from these companies on social media and when they talked about specific voice apps, I tried them out. RAIN consistently produced quality experiences. When I interviewed at RAIN, I found great people that were also passionate about voice. They thought about providing value, solving challenges, and doing quality work as much as I did. Since joining, I have worked on projects in healthcare and entertainment as well as opportunities to form and strengthen partnerships. I have been able to use my skills and experience on the engineering team as well as learn from others that focus on design and strategy.
Over the last 5 years, I have really enjoyed designing and creating voice experiences for Alexa, Google Assistant, Bixby, and other scenarios. I’ve really found my passion.
You’ve been a big believer in open source, and have tinkered and developed some tools for the broader voice community. Can you tell us about those?
In 2016 while I was still learning the basics of voice development, I worked on a starter template that would make it easier for new developers to learn to code for Alexa. I released this on GitHub because I believe in giving back and the power of Open Source software. Since then I have created open source projects for those using the ASK SDK and Jovo. I’ve spent many hours creating cross-platform voice experiences and Jovo has been my framework of choice. A big reason is its ability to be extended with plugins. I noticed that different platforms support different flavors of Speech Synthesis Markup Language (SSML) which made cross-platform a challenge. I created Speech Markdown as a simplified way to format speech output. You can find my projects here.
What’s the best and the worst thing about being a voice tech engineer?
I really enjoy learning. I always have. When architecting and developing for voice, there are always new things to learn. There are plenty of opportunities to do new and innovative things in this space. I coded an award-winning Alexa skill to release a musician’s new single and was one of the first developers to write multimodal experiences for the Echo Show. My game, Snatch Word, is a MMO that allows you to play in real time against players from around the world and on multiple voice platforms.
Sometimes it can be frustrating when feedback is slow to show up in voice platforms or developer tools. There are features available to 1st party voice apps that are not released for 3rd party app developers to use. It often seems like innovation is not moving as fast as I would like it to.
Overall, I love being a voice tech engineer.
You’ve been a prolific voice game developer. What’s the key to making a good voice-first game? How do you foresee voice-first gaming changing in the future?
Game development is itself challenging and voice adds a unique facet. I have not had a breakout hit game yet, but that is part of experimenting.
A voice game should have a name that is easy to remember and say. A companion website is important as a channel for discoverability and a place for fans to learn more. It is important to get players into the action as quickly as possible with a tutorial or walkthrough. Immediately reward the player with a badge and hint at ways to earn more. Provide value in the game and a chance to buy digital goods. Use reminders and notifications to bring the player back to the game as well as a daily challenge. Use quick links and routines.
Voice games would benefit from a voice-specific game engine to help create player profiles, design levels, add content, support leaderboards, and more.
When architecting and developing for voice, there are always new things to learn. There are plenty of opportunities to do new and innovative things in this space.
As someone with a deep understanding of the tech stacks that power voice experiences today, what do you see as some of the biggest technological hurdles voice tech needs to overcome in the years ahead to step-change its utility?
Developers of voice apps need better libraries and frameworks as well as reusable components. Each major voice platform (Alexa, Google Assistant, or Bixby) provides their own tools or frameworks. Much of what I have done over the years is cross platform so a framework like Jovo is valuable. Part of the power is the ability to create Open Source plugins. This type of extensibility needs to continue. Instead of coding common voice flows from scratch, I should be able to select a pre-build component that includes the language model and code as well as customization hooks.
Voice platforms themselves need to be more extensible. If I want to buy a celebrity text-to-speech (TTS) voice from a vendor, then I should be able to plug it into whatever platform I am using.
Developers need fine-grained control over what input is recognized in certain situations to improve the experience.
Who are some of the people and companies you admire most in the voice tech industry?
Overall, I find the voice tech community very supportive of each other and that level of positivity is very encouraging. There are too many people to list, but I will include a few.
Allen Firstenberg is my co-host for the Two Voice Devs podcast. I have found many friendships at VoiceLunch thanks to Michal Stanislawek and Karol Stryja. Being selected as an Alexa Champion and Bixby Premier Developer has created opportunities and over the years many in those programs have provided support. Heidi Culbertson is a gem. Teri Fisher has so much enthusiasm. The Jovo team is impressive as are those at Voiceflow and other partners. Thanks to Bret Kinsella, Bradley Metrock, Pete Erickson, and their teams. A hat tip to any developer that releases Open Source software or creates tools.
I respect so many voice designers that take these experiences to the next level. There are so many more.
What’s a Rich Text element?
The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.
Static and dynamic content editing
A rich text element can be used with static or dynamic content. For static content, just drop it into any page and begin editing. For dynamic content, add a rich text field to any collection and then connect a rich text element to that field in the settings panel. Voila!
How to customize formatting for each rich text
Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of"
nested selector system.