Earplay: What Chatbots can Learn from Interactive Voice Games

(Part of the Bot Master Builders Series)

Arun Rao
Chatbots Magazine

--

Many people believe Earplay is the best voice-first experience available on chat and voice platforms like Amazon Alexa and Google Home. Earplay is a company that has launched a series of voice-driven games like Code Name Cygnus and the Orpheus Device for conversational platforms.

The Orpheus device is one of the cool games Earplay offers.

I recently interviewed Jon Myers, CEO and co-founder at Earplay. He sees chatbots as “creatively messing around with text worlds that accelerated with NLP processing and the Loebner prize.”

Jon views the Earplay voice experience as a completely different path than text and chatbots. That is because of the difference between text-to-speech (TTS) synthesis versus pure human speech: a voice out loud has a stronger interactive capability and emotional connection — patterns of voice, how nuanced that can be — becomes more human.

Jon and his team also bring in the perspective of writers and playwrights in order to create a single, immersive experience; an area where voice games and chatbots overlap. He notes that Bruce Wilcox of Chatscript has made great characters — a sort of craft in prose and text format, while dramatic dialogue and playwriting are meant for speech and so are different.

What was the original design vision for your games? Do they have one or two clear functions?

My background is being a playwright, with an MFA from Boston University. I moved into games, but what I loved the most about the games wasn’t mechanics, but the story-driven play. Ludology vs. narratology (story context and details versus gameplay features matter most). We want to make interactive stories. We start with dialogue trees to actually use dialogue where the player speaks. This is radio drama, a play in audio format. Cygnus was a spy thriller based on old tropes, an exploratory process to make into a series. We started building our own technology to build out more titles and put out more episodes. We were mobile first with an iOS app, shooting for an app as a library of stories, and then shifted when Alexa came along. Our content needs to match up with the system.

How did your team come together, and what are the roles?

We have 3 people on the team. Jon as the CEO, Dave as Chief Creative, and Bruno as our CTO doing back-end. We also have lots of contractors. We hope to hire more people after doing our seed round.

How do you measure success? What are your metrics?

We are trying to figure this out — how to measure engagement and retention. We look at successful session percentages operationalized differently by each voice app, “yes” or “no” at the end of the demo. We also measure completions of each game or demo. We measure the amount of time, number of decisions over time, the average session play time and average number of intents. There is no optimal path. Widespread use of metrics to affect game design was not popular until Zynga, where common ones were daily/monthly active users, D1, D3, D7, D14, D30 retention. For us, it’s much more difficult to catalogue.

What are successful interactions? What are failed interactions?

Most of the interactions center around the prompt moment — people need to know what to say at each point in the game. We cannot have the AI capture everything a user could say, so we have to limit the possibility space and guide the user. It’s always going to be closed system, no matter how open it feels. There is also a sort of voice Uncanny Valley. If something is missing in a complex super-powered AI, it feels really off and the littlest things show the cracks in the voice game and turn people off. We spend more time on user flows and guidance versus technical flexibility. Simplicity and clarity are still important. We avoid unnecessary complexity.

In terms of editorial and scripting, what have you learned from flows so far? Is all your content evergreen, or do you do limited time or seasonal content also?

In the earliest days, we used focus groups and asked them questions as they played. I read dialogue to see what a group responded to — if you have to help them, then the voice prompt wasn’t good. Don’t use rhetorical questions (or user interrupts)! For prompts — you can parse text objectively, speech has to be transcribed, but many things can go wrong, and in 2013 when we started, ASR accuracy wasn’t great. It has improved over time, and it’s quite good now with Alexa, Google, and api.ai, among others.

How do people hear about your bot and start using it?

We were an early adopter into Alexa and have high quality content. People want to try new things and often give reviews for getting in early. We also got featured in the Amazon store.

You can play an entire game by listening — it’s pretty addictive and great for a lazy, on the couch evening.

How do you get initial users to engage?

We haven’t really focused on this yet. We are trying to launch a new storytelling medium with an interactive audio system with opportunities for engagement outside the core voice experience. It’s something we are working on.

What other bots have you looked to for inspiration? Are there other use cases you’ve thought were simply brilliant?

We only really look at entertainment skills and games. Magic Door is getting really good at text adventure style. Runescape by Jagex is doing well. It’s just so early, it’s hard to know.

What are your thoughts on the three different platforms (Alexa, Google, Siri/iOS app)?

We started on iOS with earbuds as a solitary experience. Now we have Alexa and Google Home skills and actions. With Alexa, it’s out in the open and you can multi-task a little and have some social opportunities. Some users love playing it together, like listening to radio as a group.

Many games are a great free service, but how have you tested monetizing it?

We plan to release some paid, premium content and are nearing completion on a payment system. We are not waiting around for the platforms to develop their own system. We’re using account linking and registrations, but people will find it annoying. It’s different for us because we’re building a library — we need to attract fans who love the medium.

What can you tell us about your tech stack? Do you do NLP in-house? What external services do you like?

We are a creation and distribution platform. We have a voice-UI engine that handles the user experience. We use the native speech recognition on each platform. We are using AWS for cloud infrastructure. Scala with a Play2 framework for the back end technology of the Earplay platform, and Angular 2 for front end.

Any lessons to share with other interactive game builders and bot builders on useful tools?

We spend as much time as possible prototyping and building. Once you launch broadly, the reviews are tattooed on your product. We got it right the first time. It’s not too hard to prototype in voice because we can go out with a script and see how people respond. I suggest you do what you can to get testers and run them through a session. When you create a skill, it’s tempting to think you’re done. But it’s a live service that’s never complete. You have to iterate and keep it going.

--

--