TrulyHandsfree™ - The Important First Step in a Voice User Interface October 10th, 2011
An interesting blog post (from PC World) came out following Apple’s iPhone 4s intro with Siri. I think everyone knows what Siri is…it’s the Apple acquisition that has turned into a big part of the Apple user experience. Siri technology allows a user to not only search but control various aspects of a smartphone by voice in a “natural language” manner.
The blog post depicts a looming showdown between Sensory and Apple’s Siri. It is quite kind to Sensory, pointing out our near-flawless performance in noise and how TrulyHandsfree™ does not require button presses. While those points are true, Sensory is certainly NOT a competitor to Siri. We do partner with companies like Vlingo that might be considered a Siri competitor, but Sensory’s TrulyHandsfree is just the first part of a multi-stage process for creating a true Voice User Interface.
Here is the basic process:
It’s just that first step that Sensory does better than anyone else. However, it’s an important step that requires a few critical characteristics:
- Extremely fast response time. Since it basically competes with a button press, it has to have a similar or faster response time. Because TrulyHandsfree uses a probabilistic approach, it can respond without having to wait for the recognizer to determine if the word is even finished! Slow response times lead users to speak before the Step 2 recognizer is ready to listen, which is a major cause of failure.
- Low power consumption. If it’s always on and always listening, it can’t be a power hog. Sensory can perform wake-up triggers with as little as 15 MIPS, and has the ability to operate in the 1-10mA range on today’s smartphones.
- Highly accurate with poor S/N ratios. This means several things:
- Works in high noise. TrulyHandsfree Voice Control performs flawlessly in extremely loud environments, including music playing in the background or even outdoors in downtown Portland!
- Works without a microphone in close proximity. TrulyHandsfree is responsive even at distances of 20 feet (in a relatively quiet environment) and at arms length in noise. This is critical because many VUI based applications of the future will become commonplace in a wide variety of consumer electronics devices, and users won’t want to get up and walk over to their devices to control them.
Companies like Nuance, Vlingo, Google and Microsoft are pretty good at the second step, which is a more powerful (often cloud-based) recognition system.
The third step “Understanding Meaning” is what the original Siri was all about. This was an AI component developed under DARPA funding at SRI and later spun off and acquired by Apple. Apple is rumored to be using Nuance as the “Step 2” in Siri.
Vlingo does a really nice job of implementing Steps 1-3 (using Sensory as its partner for Step 1.) I’m sure Google, Microsoft, Apple and Nuance all have efforts underway in the area of AI and natural language understanding. It’s really not that different than what they have needed for text-based “meaning” recognition during traditional searches.
The SEARCH in Step 4 is done via typical search engines (Google, Microsoft, Apple) and I’d guess Vlingo and other independent players (are there any still around???) have developed partnerships in these areas.
Step 5 is basically a good quality TTS engine. Providers like Nuance, Ivona, ATT, NeoSpeech, and Acapella all have nice TTS engines, and I believe Apple, Microsoft and Google all have in-house solutions as well!
The important point in comparing Sensory’s technology is that we provide the logical entryway to a successful Voice User Interface experience–with a lightning-fast voice trigger that replaces tactile button presses. It is a given that noise immunity and extremely high accuracy are also required, and Trulyhandsfree accomplishes this without requiring a prohibitive amount of power to function reliably and consistently.
AND…while we appreciate the comparison to the most profitable company on the planet, we’d like to focus on what we do better…making Truly Hands-Free really mean Trulyhandsfree™.
A Tale of Two Awards August 5th, 2011
I recently learned about 2 awards that Sensory has won over the past year. The contrast is in how we learned about them, and the different nature of these awards. It’s really amusing, so I thought I’d share my take.
Both awards were for our TrulyHandsfree™ Voice Control. One was for the significance of Sensory’s truly hands-free trigger in implementing speech recognition without using buttons, and the other was for Sensory’s chip-based implementation of a truly hands-free interface.
The first award came from Speech Technology Magazine. Sensory won their Star Performer award for 2011, and I didn’t even know we had been nominated. In fact, nobody ever told me that we had won; I found out really by chance (thanks, Bernie!) They only gave out four of these awards this year, and I’m honored and thrilled that Sensory won one of them. It’s really a testament to our team behind TrulyHandsfree… IT’S THE MOST AMAZING TECHNOLOGY. I sent kudos to Speech Tech for having the insight to understand the significance of this technology! Speech Technology Magazine has gotten so independent and non-self-serving in their awards process, that they didn’t even take the opportunity to call us and let us know! Now we know, so thanks again, Speech Tech!
In contrast…The second award came from a market research firm I’ll call the Cold Irishman. Why don’t I use their real name? Well I can’t or they might sue me. I received a call from their “Manager of IP and Copyrights” to congratulate me, and to let me know about their thoroughly independent and fair process that looked at the entire speech market and decided that Sensory stood out… blah blah blah…
I knew there was something funny going on by the guy’s title. Yeah you guessed it. To be able to tell people we won their award costs a certain price; you pay more the more you want to use it, and you can even pay more to go to an awards banquet. He offered me programs for as little as $10K, which went up in price to WAY more than that. One of the more expensive programs was that they’d make a video for us receiving the award with lots of praise from their esteemed analysts. So, I decided to go onto YouTube and see for myself how many hits last year’s award winners were getting…my memory said low double digits, but that didn’t seem possible (Sensory’s little home-made video’s often get thousands of hits.) Just for fun I looked just now at this year’s award winners – one of them had only 10 (yes TEN) hits. Most of them must have been employees… Pretty hefty price to stroke your own own ego and get almost nothing in return! I’ve always wondered who pays to be in Whoever’s Whatever? It’s probably the same CEO’s that pay to go to award dinners!
So…Many Thanks to Leonard Klie and Speech Technology Magazine…and Cold Irishman…thanks, but no thanks! Sensory deserves recognition for innovation in speech technologies based on our hard work, not on how much we pay to market it.
Truly Handsfree™ Trigger Technology Taking Over Sensory! February 24th, 2011
I haven’t had much time to blog lately, and you may have noticed that when I do, I often write about our revolutionary new Truly Handsfree™ Trigger speech technology. Technically it’s a phrase-spotting technology, but Sensory is using a revolutionary new multi-patent pending approach that’s changing the way we do speech recognition. The Truly Handsfree™ Trigger doesn’t use typical techniques like background noise modeling or speech detection (i.e. start and ending speech.) In operation, it ends up being MUCH more noise robust, yet still very efficient as it consumes less current than it would if we also included all the traditional approaches. The basic idea is that it’s on and listening all the time, and able to reject all of the wrong words and correctly identify the right words! This eliminates the need for activation via button pressing.
A lot of companies are using our technology now as a voice trigger for other speech recognition applications. At the recent Mobile World Congress, Samsung introduced the first Truly Handsfree Smartphone, the Galaxy sII, which uses a Truly Handsfree™ Trigger followed by the Vlingo experience. You say “Hey Galaxy” and it wakes up, no touching necessary! I tried this on the noisy showroom floor at Mobile World Congress, and it nailed my “Hey Galaxy” every time, even from a distance of 5 feet away!
Chris Schreiner over at Strategy Analytics recently tried out an early beta demo for Android, and in a blog late last year he said, “In a demo experience on my Android phone, the hands-free trigger worked remarkably well with varying types of background noise.”
With Truly Handsfree™ Trigger’s noise-robust nature and the ability to always be on listening, we are able to do more natural language-like schemes. A couple of great examples are in the toy space (and we do love toys at Sensory!)
- I mentioned Hallmark in my last blog…now they are rolling out a whole new product line built with Sensory chips because of the huge success of Jingle, the Husky Pup.
- Mattel has pushed us to deploy this phrase spotting technology even in our lowest cost, entry level processor. They have a new product line coming out this year that’s for sure to be a BIG HIT called Fijit. The Fijit’s are these cute wiggly characters with amazing skin, and they do the TOUGHEST speech recognition feats ever. They listen for a bunch (30??) of short key words like “hungry” so you can say a variety of things to it (Like…Hungry?…I’m Hungry…Are you Hungry?) and it can intelligently respond and interact. (Actually I don’t know if “Hungry” is a one of its actual words, that’s for example only.) SpeechTech just did a nice summary on Fiji Friends in their blog, and Mattel has some nice YouTube videos and websites where you can learn all about Fijits.
So what’s happening here at Sensory is that this technology initially invented as a trigger is migrating into being an amazingly noise-robust speech solution for any command and control application! It’s nominated for awards by MobileTrax in both the Speech Processing and Software Technology innovation categories!
Sensory has developed a whole product roadmap around our new approach, and this includes speaker adaptive recognition, larger vocabulary solutions, improvements in accuracy, and consumer created triggers. A funny thing about consumer created triggers…Our initial release was NOT INTENDED for this, but one of our customers, Adelavoice, did a few tricks and allowed end users to create their own triggers. Know what’s the most common trigger phrase?? “Yo Bitch”…I guess that says something about the demographic of the user base!
OK…I could go on and on about this new phrase spotting technology, but I gotta get some real work done!