Haven’t blogged in a long time…I have plenty to say but have just been too busy. That’s good news. Sensory is signing up new deals at a very rapid rate, so 2011 should be an excellent year for us. I declare the economic recovery in full swing (although I do have some trepidation it could be short lived). Right now my biggest issue is chip SUPPLY! We’ve actually had some trouble getting enough chips (this is endemic to the entire chip market right now!). Luckily, our software business is exploding and a growing percentage of overall revenues is not dependent on buying silicon!

The cool thing is for the first time in Sensory’s 15 year history we are putting text-to-speech into products. We’ve done a handful of deals in just the last couple of months, and I expect that within 2 years we’ll have over 10 million TTS devices that will have hit the market (we’re at around 60 million speech recognition products right now).

I went to Voice Search last week. This is the show that Bill Meisel and AVIOS co-host every year. It’s my favorite speech industry show and pretty much the only one I attend. At the show I spoke on a consumer speech panel and demonstrated Sensory’s Truly Hands-Free Voice Trigger. Nobody thinks that wordspotting can be always on and always listening without false firing - and still catch the trigger word when it’s spoken. Sensory’s spotting technology WORKS! It’s my pet technology right now and I think it will change the world, by making speech recognition TRULY HANDSFREE (that was the title of my presentation)…anyways…I demoed it live. Nobody is supposed to do live speech recognition demos because they always fail (Microsoft has had the misfortune of proving that more than once!), so most people at the conferences show video clips. I know Sensory’s stuff works well, but I got a little nervous when I started talking and I could hear the echo of the microphone, and as I spoke I was hoping it wouldn’t false trigger and totally embarrass me. It didn’t false trigger…then it just had to recognize my trigger words. It got the first and the second one right. Then on the 3rd time the small device started sliding down the podium and the mic got covered up and for a brief moment my heart froze and I thought I was going to need to repeat my trigger word…then all of a sudden I felt my heart exploding as I waited microseconds for the response…then it spoke and it got it! No false fires and 3/3 triggers accurately recognized. Oh the trials and tribulations of a speech industry veteran! The technology is great and in a car it’s nearly flawless; it was this new acoustical environment that made me nervous. It came through!!!

So…Apple acquired SIRI, Inc., an iPhone developer that supplies a personal assistant application featuring speech recognition. Cool. That means Apple is in the game - the speech game, with apparently a slightly different twist than Microsoft or Google. All 3 companies are investing in speech recognition. But Apple is doing very light investing while Google and Microsoft are HEAVILY invested. Apple apparently isn’t using any of its home grown technologies as they keep licensing Nuance…and SIRI uses a Nuance engine as well. SIRI is a voice concierge type service that uses the Nuance recognizer then throws a layer of “meaning” interpretation or “intelligence” into the process. Anyways, I’m glad Apple is taking voice control seriously…they’re gonna have a tough time catching up with Google. My take is the Google stuff works best right now. I was playing with a Nexus One phone and the recognition on it is really amazing. BING is pretty good too and has wrapped better apps around their technology in BING411.

I remember a Keynote talk 15 years ago at a speech conference titled something like “the Ever-Imminent-Never-Arriving Speech Bonanza”…well it’s finally here, and I have to thank Google and Microsoft (and Vlingo too!) for clearly taking us over the hurdle and making speech recognition accessible and usable by the masses. Now it’s time for Apple to kick in and do its part…and now that HP has acquired Palm, it will need to get in the game too. I don’t even know if HP has a speech recognition team, but if they don’t they will soon. So will Cisco. So will all the major consumer electronics and automotive companies! Our time has come!!! Speech Recognition has arrived and is working for the masses! It will just get better!

Todd
sensoryblog@sensoryinc.com

I was in Barcelona last month at the Mobile World Congress. Here are some of my speech-centric observations:

I went by the Microsoft booth on the first day of the show and asked when WinMobile7 would be announced. The guy on the floor acted like he had no clue what I was talking about. He wouldn’t even confirm it hadn’t been announced yet. The really ironic thing is that EVERYWHERE I went I saw Windows 7 advertisements…subways, stairs, hotel lobbies, etc. My friend Dan had a couple of corporate suites at the hotel across from the show, and asked about putting up a flier to say what floor they were on. He found out the entire hotel advertising space was taken by Microsoft! They had gotten an exclusive from the hotel.

Speaking of Dan…we’re old friends from school and decided to meet up for dinner. He said “Are you OK with a Tapas Bar?” and I said “Actually, I’m kinda hungry, if you really want to go, let’s do it after we eat.” I had made a speech recognition error…think about it.

Anyways…WinMobile 7 was announced on Day 2, and I saw some of the demos. I must say that Microsoft is taking a brave approach by completely redesigning the interface to be more focused on data (people, places) than on functions (applications, etc.) However, even with the new look and feel I didn’t hear any mention of any new speech recognition features, like um, a voice interface. I asked a guy on the floor, and he said the voice search was much improved. I like BING search, Google search and Vlingo search too as they are all getting more useful and robust. A couple of years ago, I was trying one of these search engines to find my hotel in downtown Boston, and after 3 or 4 failed attempts on a street corner, a woman pointed down the street and said “Your hotel is just down there”. A memory flashback…a cabbie on that trip asked me what I did and I said “speech recognition.” He said “oh I’ve been trying that for years…my wife talks to me and sometimes I respond properly.” But I digress…

Back to Barcelona. I saw a nice demo of MOTONAV at the Motorola booth. With a new independent consumer-product company spun out and Sanjay Jha in charge, they really seem to have turned things around. The people on the show floor seemed very upbeat and excited about where Motorola is right now. In addition to the 23 phones they currently offer, they have new ones coming out, including the new Devour and Cliq XT, both of which are based on the Android OS. I didn’t see much new stuff in the Bluetooth space, however. They are doing PNDs (portable navigation devices) and cell phones with MOTONAV. It’s a nice voice-controlled driving application, and the speech recognition in the demo I saw worked quite well on the hard stuff (addresses, etc.), but messed up on the easy things (it was a simple 2 word set that it got wrong.) Then again, small sets aren’t always easier than big ones. The Yes/No response is one of the hardest sets to get right (I heard that there are more than 50 ways to say No and almost as many ways to say Yes…like unh-unh and unh-huh…(I can’t even get that right spelling it!).

The big thing missing from MOTONAV is a Truly Hands-Free Trigger. In fact, that’s what is missing from the entire cell phone industry. All these products have built-in speech recognition, but the only way to activate it is with button presses. Here’s an article I found about “The First Truly Hands-Free Phone.” HOWEVER, when you read through it you find it really requires 2 button presses…one to turn it on and a second to activate the voice recognition. Well, Sensory can get rid of one of those button presses, which is a HUGE savings for products that can be turned on and are always listening. As battery technology improves and more “smart” listening windows are deployed, Truly Hands-Free triggers will become increasingly important for all products with speech technologies.

Todd
sensoryblog@sensoryinc.com

Yeah everyone’s writing about the new Google phone. I’ve heard various reports about it being underwhelming, and in-need of the marketing hype that Apple is so good at. Everybody loves to compare the iPhone with the Nexus One and talk about screen size, weight, camera capabilities, software, etc.

Here’s my 2 cents on speech recognition and Bluetooth for these devices:

Apple’s initial iPhone release had speech recognition–phobia, with no factory options for implementing voice recognition commands. It was such a shocking omission that many of the mainstream reviewers even pointed it out. In various industry conversations I heard “Steve doesn’t like speech recognition”. As a result, 50 speech recognition applications quickly appeared in the Apps store, and by necessity Apple soon implemented Voice Control for music and voice dialing. I assume Apple implemented Nuance technology and most likely in a local version that runs on the iPhone.

What Google’s done with the Nexus is WAY different. They are embracing speech recognition from the start, and not just implementing “me too” features. Google is pushing the boundaries by including speech recognition for dictation (text messaging, email, social networking, etc.) and mapping/GPS type functions. I remember the original Android announcements mentioned that Nuance was their speech partner, but it seems like all the big guys like to start with Nuance then switch away. My guess is that the Nexus One uses homegrown (Mike Cohen and Co.) speech recognition, and since it is server based, it should adapt and improve and just get better with the data they are collecting.  I give Kudo’s to Google for this!

On the Bluetooth side of things, we were shocked and hurt that we couldn’t use our BlueGenie Voice Interface Bluetooth headsets to easily call up recognizers on the iphone for name dialing. Although Bluetooth makes a clear protocol for this, it wasn’t implemented on the initial iPhone. New iPhone versions do support this, but Apple never clearly thought through the importance of a cohesive user interface and functionality with Bluetooth connected to its phones, especially when speech recognition is involved.

If Google is smart, they won’t only introduce a Nexus One phone, but they’ll come out with a really cool Nexus One headset that TAKES ADVANTAGE of all the great speech recognition software on the handset, with one seamless voice user interface! The Nexus One has been blasted as nothing really new, but this type of integration with a hands-free headset or car kit could make it TOTALLY REVOLUTIONARY.

Hey Google – make a BLUEGENIE VOICE INTERFACE HEADSET!

Todd
sensoryblog@sensoryinc.com