TrulyHandsfree™ - The Important First Step in a Voice User Interface October 10th, 2011
An interesting blog post (from PC World) came out following Apple’s iPhone 4s intro with Siri. I think everyone knows what Siri is…it’s the Apple acquisition that has turned into a big part of the Apple user experience. Siri technology allows a user to not only search but control various aspects of a smartphone by voice in a “natural language” manner.
The blog post depicts a looming showdown between Sensory and Apple’s Siri. It is quite kind to Sensory, pointing out our near-flawless performance in noise and how TrulyHandsfree™ does not require button presses. While those points are true, Sensory is certainly NOT a competitor to Siri. We do partner with companies like Vlingo that might be considered a Siri competitor, but Sensory’s TrulyHandsfree is just the first part of a multi-stage process for creating a true Voice User Interface.
Here is the basic process:
It’s just that first step that Sensory does better than anyone else. However, it’s an important step that requires a few critical characteristics:
- Extremely fast response time. Since it basically competes with a button press, it has to have a similar or faster response time. Because TrulyHandsfree uses a probabilistic approach, it can respond without having to wait for the recognizer to determine if the word is even finished! Slow response times lead users to speak before the Step 2 recognizer is ready to listen, which is a major cause of failure.
- Low power consumption. If it’s always on and always listening, it can’t be a power hog. Sensory can perform wake-up triggers with as little as 15 MIPS, and has the ability to operate in the 1-10mA range on today’s smartphones.
- Highly accurate with poor S/N ratios. This means several things:
- Works in high noise. TrulyHandsfree Voice Control performs flawlessly in extremely loud environments, including music playing in the background or even outdoors in downtown Portland!
- Works without a microphone in close proximity. TrulyHandsfree is responsive even at distances of 20 feet (in a relatively quiet environment) and at arms length in noise. This is critical because many VUI based applications of the future will become commonplace in a wide variety of consumer electronics devices, and users won’t want to get up and walk over to their devices to control them.
Companies like Nuance, Vlingo, Google and Microsoft are pretty good at the second step, which is a more powerful (often cloud-based) recognition system.
The third step “Understanding Meaning” is what the original Siri was all about. This was an AI component developed under DARPA funding at SRI and later spun off and acquired by Apple. Apple is rumored to be using Nuance as the “Step 2” in Siri.
Vlingo does a really nice job of implementing Steps 1-3 (using Sensory as its partner for Step 1.) I’m sure Google, Microsoft, Apple and Nuance all have efforts underway in the area of AI and natural language understanding. It’s really not that different than what they have needed for text-based “meaning” recognition during traditional searches.
The SEARCH in Step 4 is done via typical search engines (Google, Microsoft, Apple) and I’d guess Vlingo and other independent players (are there any still around???) have developed partnerships in these areas.
Step 5 is basically a good quality TTS engine. Providers like Nuance, Ivona, ATT, NeoSpeech, and Acapella all have nice TTS engines, and I believe Apple, Microsoft and Google all have in-house solutions as well!
The important point in comparing Sensory’s technology is that we provide the logical entryway to a successful Voice User Interface experience–with a lightning-fast voice trigger that replaces tactile button presses. It is a given that noise immunity and extremely high accuracy are also required, and Trulyhandsfree accomplishes this without requiring a prohibitive amount of power to function reliably and consistently.
AND…while we appreciate the comparison to the most profitable company on the planet, we’d like to focus on what we do better…making Truly Hands-Free really mean Trulyhandsfree™.