Last month, I wrote a piece outlining why Apple, Google, Microsoft, automakers, home electronics manufacturers and appliance makers – virtually every consumer device developer – should be rethinking their user interfaces in light of the Echo's success, asking themselves What Would Alexa Do?I've continued to think about the impact of speech user interfaces, and it's become clear to me that Alexa challenges the very foundations of today's mobile operating systems.
As I illustrated in that previous piece (do read it if you haven't already), a comparison of conversations with Alexa on the Echo and with Google's voice recognition on my Nexus 6P Android phone, the fundamental interaction paradigm of the phone isn't well suited to the conversational era. Like the touchscreen, the voice agent simply serves as a launcher. Control is passed to the whichever app you launch, and once that app is up and running, the voice agent is out of the picture. I'm back in the touchscreen-oriented paradigm of last generation's apps. By contrast, with the Amazon Echo, I can "stack" multiple apps (music, weather, timers, calls out to independent apps like Uber) while Alexa remains on call, dealing with ongoing questions or commands and passing them along to whichever app seems most appropriate.
The more I thought about it, the more I realized that Alexa on the Echo seems so surprising not because its speech recognition is better (it isn't), nor because it lets you ask for things that neither Siri nor Google can do (it doesn't), but because its fundamental human interface is superior. The agent remains continuously, courteously present, doing its best to help.
On the phone, the easiest thing for developers to do is to simply use voice to summon the app, and then let the app's old touchscreen metaphor take over. That's what Apple and Google do, and that's why the interactions seem so flawed whenever they involve a function that Siri or Google can't complete on their own.
In short, Apple and Google will need to completely rethink iOS and Android operating systems for the age of voice. Not only that, every app will have to be refactored to accept the new interaction paradigm.
I'd already been thinking about the further implications of Alexa. In my first piece, I made the case that every device maker would need to redesign for voice control, but I hadn't taken the thought to its logical conclusion: that there's an opportunity for a completely new mobile OS.
The question is whether Apple, Google, Amazon, or some as-yet unknown player will seize this advantage. Given Jeff Bezos' penchant for bold bets, I wouldn't put it past Amazon to be the first to create a phone OS for the conversational era. I doubt that the first Alexa-enabled phone will do this, but the limitations of the handoff to Android or iOS will make clear the opportunity.
P.S. A few nights ago, when I was on stage with Mike George and Toni Reid of the Amazon Alexa team for an interview with them at the Churchill Awards, Mike said something really important when I asked him for the secret of the Echo's success. "We didn't have a screen," he said. And that's exactly it. In the age of voice, you have to design as if there is no screen, even for devices that have one. When you use an app that relies on the screen, you still have to provide affordances for the controlling voice agent to interrupt the app or modify its operation.
Founder of O'Reily Media