Thursday, April 14, 2016


Back in 2005-2006 my friend Liesl told me about the coming age of chat bots.  I had a hard time imagining how people would embrace products that simulated human voice communication but were less “intelligent”.  She ended up building a company that allowed people to have polite automated service agents that you could program with a certain specific area of intelligence.  Upon launch she found that people spent a lot more time conversing with the bots than they did with the average human service agent.  I wondered if this was because it was harder to get questions answered, or if people just enjoyed the experience of conversing with the bots more than they enjoyed talking to people.  Perhaps when we know the customer service agent is paid hourly, we don't gab in excess.  But if it's chat bot you're talking to, we don't feel the need to be hasty?

Fast forwarding over a decade later, IBM has acquired her company into the Watson group.  During a dinner party we talked about Amazon’s Echo sitting on her porch.  She and her husband would occasionally make DJ requests to “Alexa” (the name for Echo’s internal chat bot) as if it was a person attending the party.  It was definitely seeming that the age of more intelligent bots is upon us.  Most folk who have experimented with speech-input products of the last decade have become accustomed to talking to bots in a robotic monotone devoid of accent because of the somewhat random speech capture mistakes that early technology was burdened with.  If the bots don't adapt to us, we go to them it seems, mimicking the 50's and 60's movies of how we've heard robotic voices depicted to us in science fiction films.

This month both Microsoft and Facebook have announced open bot APIs for their respective platforms.  Microsoft’s platform for integration is an open source "Bot Framework" that allows any web developer to re-purpose the code to inject new actions or content tools in the active discussion flow of their conversational chat bot called Cortana, which is built into the search box of every Windows 10 operating system they license.  They also demonstrated how the new bot framework allows their Skype messenger to respond to queries intelligently if they have the right libraries loaded. Amazon refers to the app-sockets for the Echo platform as "talents", whereby you load a specific field of intelligence into the speech engine to allow Alexa to query the external sources you wish.  I noticed that both Alexa team and Cortana team seem to be focusing on pizza ordering in both their product demos.  But one day we'll be able to query beyond the basic necessities.  In my early demonstration back in 2005 of the technology Liesl and Dr. Zakos (her cofounder) built, they had their chat bot ingest all my blog writings about folk percussion, then answer questions about certain topics that were in my personal blog.  If a bot narrows a question to a subject matter, its answers can be uncannily accurate to the field!

Facebook’s plan is to inject bot-intelligence into the main Facebook Messenger app.)  Their announcements actually seem to follow quite closely the concept Microsoft announced of developers being able to port in new capabilities for the chatting engines of each platform vendor.  It may be that both Microsoft and Facebook are planning for the social capabilities of their joint collaborations on the launch of Oculus, Facebook's immersive virtual environment of head-set based virtual world environments which run on Windows 10 machines.

The outliers in this era of chat bot openness are the Apple Siri and Ok Google speech tools that are like a centrally managed brain.  (Siri may query the web using specific sources like Wolfram Alpha, but most of the answers you get from either will be consistent with the answers others receive for similar questions.)  The thing that I think is very elegant about the approaches Amazon, Microsoft and Facebook are taking is that they make the knowledge engine of the core platform extensible in ways that a single company could not.  Also, the approach allows customers to personalize their experience of the platform by specifically adding new ported service to the tools.  My interest here is that the speech platforms will become much more like the Internet of today where we are used to having very diverse “content” experiences based on our personal preferences and proclivities.

It is very exciting to see that speech is becoming a very real and useful interface for interacting with computers.  While the content of the web is already one of the knowledge ports of these speech tools, the open-APIs of Cortana, Alexa and Facebook Messenger will usher in a very exciting new means to create compelling internet experiences.  My hope is that there is a bit of standardization so that a merchant like Domino's doesn't have to keep rebuilding their chat bot tools for each platform.

I remember my first experience having a typed conversation with the Dr. Know computer at the Oregon Museum of Science and Industry when I was a teenager.  It was a simulated Turing test program designed to give a reasonably acceptable experience of interacting with a computer in a human way.  While Dr. Know was able to artfully dodge or re-frame questions when it detected input that wasn’t in its knowledge database, I can see that the next generations of teenagers will be able to have exactly the same kind of experience I had in the 1980’s.  But their discussions will go in the direction of exploring knowledge and exploring logic structures of a mysterious mind instead of ending up in rhetorical Cul-de-sacs of the Dr. Know program.

While we may not chat with machines with quite the same intimacy of Spike Jonze’s character in “Her”, the days where we talk in robotic tones to operate with the last decade’s speech input systems is soon to end.  Each of these innovative companies is dealing with the hard questions of how to get us out of our stereotypes of robot behavior and get us back to acting like people again, returning to the main interface that humans have used for eons to interact with each other.  Ideally the technology will fade into the background and we'll start acting normally again instead of staring at screens and tapping fingers.




P.S  Of course Mozilla has several initiatives on speech in process.  We'll talk about those very soon.  But this post is just about how the other innovators in the industry are doing an admirable job making our machines more human-friendly.