Voice Search, M&A, and the Economy May 3rd, 2010
Haven’t blogged in a long time…I have plenty to say but have just been too busy. That’s good news. Sensory is signing up new deals at a very rapid rate, so 2011 should be an excellent year for us. I declare the economic recovery in full swing (although I do have some trepidation it could be short lived). Right now my biggest issue is chip SUPPLY! We’ve actually had some trouble getting enough chips (this is endemic to the entire chip market right now!). Luckily, our software business is exploding and a growing percentage of overall revenues is not dependent on buying silicon!
The cool thing is for the first time in Sensory’s 15 year history we are putting text-to-speech into products. We’ve done a handful of deals in just the last couple of months, and I expect that within 2 years we’ll have over 10 million TTS devices that will have hit the market (we’re at around 60 million speech recognition products right now).
I went to Voice Search last week. This is the show that Bill Meisel and AVIOS co-host every year. It’s my favorite speech industry show and pretty much the only one I attend. At the show I spoke on a consumer speech panel and demonstrated Sensory’s Truly Hands-Free Voice Trigger. Nobody thinks that wordspotting can be always on and always listening without false firing - and still catch the trigger word when it’s spoken. Sensory’s spotting technology WORKS! It’s my pet technology right now and I think it will change the world, by making speech recognition TRULY HANDSFREE (that was the title of my presentation)…anyways…I demoed it live. Nobody is supposed to do live speech recognition demos because they always fail (Microsoft has had the misfortune of proving that more than once!), so most people at the conferences show video clips. I know Sensory’s stuff works well, but I got a little nervous when I started talking and I could hear the echo of the microphone, and as I spoke I was hoping it wouldn’t false trigger and totally embarrass me. It didn’t false trigger…then it just had to recognize my trigger words. It got the first and the second one right. Then on the 3rd time the small device started sliding down the podium and the mic got covered up and for a brief moment my heart froze and I thought I was going to need to repeat my trigger word…then all of a sudden I felt my heart exploding as I waited microseconds for the response…then it spoke and it got it! No false fires and 3/3 triggers accurately recognized. Oh the trials and tribulations of a speech industry veteran! The technology is great and in a car it’s nearly flawless; it was this new acoustical environment that made me nervous. It came through!!!
So…Apple acquired SIRI, Inc., an iPhone developer that supplies a personal assistant application featuring speech recognition. Cool. That means Apple is in the game - the speech game, with apparently a slightly different twist than Microsoft or Google. All 3 companies are investing in speech recognition. But Apple is doing very light investing while Google and Microsoft are HEAVILY invested. Apple apparently isn’t using any of its home grown technologies as they keep licensing Nuance…and SIRI uses a Nuance engine as well. SIRI is a voice concierge type service that uses the Nuance recognizer then throws a layer of “meaning” interpretation or “intelligence” into the process. Anyways, I’m glad Apple is taking voice control seriously…they’re gonna have a tough time catching up with Google. My take is the Google stuff works best right now. I was playing with a Nexus One phone and the recognition on it is really amazing. BING is pretty good too and has wrapped better apps around their technology in BING411.
I remember a Keynote talk 15 years ago at a speech conference titled something like “the Ever-Imminent-Never-Arriving Speech Bonanza”…well it’s finally here, and I have to thank Google and Microsoft (and Vlingo too!) for clearly taking us over the hurdle and making speech recognition accessible and usable by the masses. Now it’s time for Apple to kick in and do its part…and now that HP has acquired Palm, it will need to get in the game too. I don’t even know if HP has a speech recognition team, but if they don’t they will soon. So will Cisco. So will all the major consumer electronics and automotive companies! Our time has come!!! Speech Recognition has arrived and is working for the masses! It will just get better!
Posted in ICs, Industry News, bluetooth | No Comments »
“I talk to myself but I don’t listen” – Elvis Costello November 18th, 2009
The new Android OS doesn’t have this problem! I read about one of these devices with TTS (Text-To-Speech) built in and voice commands too, so of course I had to try one out. I put it into TTS mode where it speaks everything, hit the recognition button and it prompted “SPEAK NOW.” I said something like “Starbucks in Sunnyvale, California”…and guess what it recognized??? “SPEAK NOW.” I guess the recognizer started listening too early and heard the TTS itself saying “SPEAK NOW.”
Listening at the right time is always a challenge for speech recognizers, but in Speech Recognition 101, programmers learn to make the recognizer listen AFTER the prompt is spoken. In Speech Recognition 201, students are taught to trim the silence after the end of the speech prompt, otherwise those that studied Speech Reco 101 will have it listening for a recognition word too late (because there’s usually a silent tail on the prompt that users don’t hear, so they speak too early if it’s not trimmed). Therefore, the first few hundred milliseconds of the user’s speech will be clipped off.
That same TTS in the Android was a Verizon product. Guess how it pronounces Verizon? Well, not the way I’ve ever heard it pronounced. TTS isn’t easy, but this should be an easy fix. Someone at Google or Verizon will figure it out soon, and Nuance will probably get a call.
I heard a great NPR report the other day about the Amazon Kindle. The product is being boycotted by groups as diverse as Syracuse University, the National Federation for the Blind, and the Burton Blatt Institute for Disability Studies. The complaint is that the while the Kindle offers Text-To-Speech as an option, it only reads from the books, and does not provide a friendly user interface for the visually impaired. In fact, one spokesperson said that the Text-To-Speech function is just about impossible for a blind person to use. Basically, Amazon needed to offer a mode where the TTS reads any button that was pressed, which shouldn’t have added any real cost to the bottom line. Better yet, they could have added a little speech recognition so the buttons weren’t even necessary!
Posted in Industry News | No Comments »