Top 10 Consumer Electronic Products with Speech Recognition October 7th, 2013
- Radio Rex. There’s always something special about the first one - this was from almost 100 years ago! Rex was a toy dog that lived in a doghouse, and the waveform from calling his name would vibrate a spring at a certain frequency that would make Rex exit the doghouse. Basically, a mechanical speech recognition device!
- Radar the Robot. Sure, this list will be highly biased with products that used Sensory technology. Fisher Price released Radar the Robot back in 1995! Radar would talk to kids, sing songs with them, do math games, word games, and much, much more. I remember one of my kids walking into my room and speaking in a robotic voice to imitate Radar, “I’m sorry, I can’t hear you. Would you like to play word games? Please say yes or no.”
- Password Journal. Not only is this the bestselling girls’ electronic product of all time, but it uses voice biometrics as a key feature (to lock a diary). I once heard that half of all 11-year-old girls in the US have a diary and their top concern is that someone unintended will open it and read it. This product was so successful that Girltech, the company Sensory worked with, was acquired by Radica, who was then acquired by Mattel. Most new toy introductions have a 1-2 year life. This product, and its many revisions, has been on the market for over 15 years!
- Voice Signal and VOS light switches. Voice Signal Technologies was a company started around 1995 to build voice controlled light switches. They got so excited about speech technology that they successfully transitioned into a leader in embedded speech (they went from Sensory’s customer to competitor!), and were eventually sold to Nuance for just under $300M! Sensory’s customer VOS also made light switches. VOS even introduced a Star-Trek branded light switch and licensed Majel Roddenberry’s voice. Computer Lights On!
- Uniden Voice Dial. I’ll never forget the thrill of landing in Las Vegas for CES, and going down the escalator into the baggage claim area and seeing a HUGE sign saying “Uniden Introduces VoiceDial.” The phones worked great. They even ran a TV commercial featuring the famous sumo wrestler Konishiki saying “Pizza-man.”
- Moshi Clock. What a great clock! You could set the alarm or time just by speaking to it. The clock would even tell you the weather. And this was pre-SIRI!!
- BlueAnt V1. BlueAnt moved two steps ahead of its competitors with the V1. It had a completely voice-driven user interface that replaced the buttons and flashing lights on a Bluetooth headset. This was probably the first consumer electronic device that enabled a full and complex VUI-based experienced. And the reviews were some of best reviews I have ever seen.
- Apple SIRI/iPhone 4s. SIRI was an amazing breakthrough for voice recognition - not so much in the capabilities it presented, but in the marketing and brand support behind it. When Apple said the time was right for speech recognition, the world listened and consumer electronic OEMs suddenly changed!
- Google Glass. OK, it’s not shipping yet, but they have taken a VERY novel approach to speech by using what they refer to in the press as “hotword” models. We in the industry call this Keyword spotting. I handed my Glass to my wife and she put it on and said “You mean I just say OK Glass? Oh now I see all these other things so I can say Get Directions to Chef Chus restaurant? Woah! It’s showing me directions to Chef Chus!” The device throws out all the wrong words and captures the key words it wants to hear then seamlessly switches to a cloud-based recognizer.
- Motorola MotoX. 15M plus views for a TV commercial featuring voice control!!! And the users LOVE it! Touchless Control is one of the best reviewed apps in the GooglePlay store!
CES 2013 January 15th, 2013
I’ve been going to CES for about 30 years now. More than half of that has been with Sensory selling speech recognition. This year I reminisced with Jeff Rogers (Sensory’s VP Sales who has been at Sensory almost as long as me) about Sensory’s first CES back in 1995 where we walked around with briefcases that said “Ask Me About Speech Recognition for Consumer Electronics”. A lot of people did ask! There’s always been a lot of interest in speech recognition for consumer electronics, but today it goes beyond interest…it’s in everything from the TV’s to the Cars to Bluetooth devices…and a lot of that is with Sensory technology. Often we are paired with Nuance, Google and increasingly ATT as the cloud speech solution, while Sensory is the client.
In 2013, Sensory counted about 20 companies showing its technology on the floor or in private meeting rooms. An increasing percentage of our products are now connected to the cloud and using client/cloud speech schemes. Here’s just a short summary of some of the new things here at the show:
BlueAnt, Bluetrek, Drive and Talk, Monster Cable, Motorola, Plantronics, all showed products using Sensory’s BlueGenie speech technologies for Bluetooth devices. I noticed Plantronics won a show award for one of their new devices with Sensory technology. This market seems to have flattened and stopped growing, and Sensory is lucky to be working with the leaders who appear to be gaining in marketshare against their competition…correlation or causation?? Our customers in this segment introduced a dozen or more new products ranging from carkits to headsets to Bluetooth speaker systems.
Conexant announced their new DSP CX20865 running Sensory’s TrulyHandsfree and gave demo’s in their Suite at the LVH.
Tensilica announced their new HIFI Mini and gave some of the best demo’s on the showroom floor of speech recognition (Sensory’s of course!) working in adverse noise conditions at ultra low power.
QNX showed off their beautiful Bentley concept car with built in graphics and speech recognition including Sensory’s TrulyHandsfree Voice Control paired with AT&T’s cloud based Watson ASR engine
Visteon – Did some pretty neat demo’s that we can’t discuss other to say they featured Sensory’s TrulyHandsfree Voice Control! The car companies love us because WE WORK in noise!
Samsung had a huge booth showing Galaxy products (Note, S3, etc.) using Sensory’s TrulyHandsfree triggers as a part of the S-Voice system
VTech showed a variety of phone products using Sensory technologies including our micro-TTS solutions for caller ID
IVEE paired a Sensory IC for local command and operation with the ATT cloud recognizer to create a very impressive demo that got nice coverage on NPR! (scroll down to “heard on the air”)
Behind closed doors – around half a dozen other companies showed cool new things in private suites. Unfortunately I can’t discuss these, but I will say that 2013 will see some major product releases with interesting user experiences and Sensory will be very proud to be a part of these!
My favorite non Sensory things – Yeah the 4K/8K TV’s were pretty amazing. Crisper than real life, which doesn’t seem possible but it’s true. The new 3D printers and services to make hardware prototypes are amazing (why isn’t HP dominating this market???). But…my favorite stuff is robotics. There was a robot glass cleaner that climbs vertically around windows and cleans them off without falling. Kinda like a Roomba for windows. I met some hacker guys that as a hobby make giant servo/mechanical/electro robot snakes and creatures they can ride in. Think MadMax/Burning Man kinds of artistic technology. I have some neat video’s of this I’ll send anyone who wants them.
Follow the Leader in Mobile October 2nd, 2012
I really enjoyed reading this article interviewing Vlad Sejnoha, Nuance’s CTO. Most people would consider Nuance the leader in speech recognition today, and Vlad is certainly a very smart, thoughtful, and articulate man.
I enjoyed it for a few different reasons. The first and main reason I liked the article is it helps to push the idea Sensory has been championing for the past several years that devices don’t have to be touched to enable voice commands, and that you should be able to just start talking to things like we talk to each other. That’s what Sensory calls TrulyHandsfree, and it’s the technology that showed up in the first Bluetooth carkit that requires no touching (by BlueAnt) AND the first mobile phones that responded to voice without touch (Samsungs Galaxy SII and SIII and Note – check out this video from Samsung and this one, also from Samsung). Even hit toys like Mattel’s award winning Fijit Friends and Hallmarks Interactive Books use this unique technology that just works when you talk to it. In fact, it really was the TrulyHandsfree feature that made Vlingo so popular, as this Vlingo video nicely states in its comparison between Vlingo and Siri. (Nuance bought Vlingo earlier this year, but the Sensory TrulyHandsfree didn’t come with it!).
The article says “Sejnoha believes that within a year or two you’ll be able to talk to your smartphone even as it lies idle on a desk, asking it questions such as, “When’s my next appointment?” The phone will be able to detect that you are speaking, wake itself up, and accomplish the task at hand.” Check out this Sensory video…this is definitely what Vlad is talking about! Yeah, we can do it today, and it’s REALLY FAST and really accurate.
But is it low power? Well that’s ABSOLUTELY KEY. That’s why Sensory partnered with Tensilica. Tensilica is a leader in low power audio DSP’s for Mobile Phones. Sensory already has its TrulyHandsfree running on chips that run under 5 mW for a COMPLETE audio system. And that’s without having to wake up to understand the task at hand. We can drop by another 1-2mW by not being always on, but turning the recognizer off doesn’t do much. That’s because even if the full recognizer is shut down, you still need to run a mic and preamp, which drives a lot of the current consumption when you have a low power recognizer like TrulyHandsfree (it can run on as little as 7 MIPS!). This means it’s REALLY critical to have a low power recognizer as well, and that’s Sensory’s forte. We are expecting that by next year we will have systems running at 1-3mW!
The article mentions “persistent” listening, but even though I’ve always preached this “always on” concept, I think what will really explode is “intelligent automatic listening”. That is, the device figures out when it needs to listen for what and turns on to listen for it. So it doesn’t always have to be on…it will just seem that way because the devices are so intelligent. For example a certain traveling speed could make a phone listen for car commands or car wake up words. An incoming call could cause the recognizer to wake up and listen for Answer/Ignore. For these to work, the device needs to run not only at very low power but also with VERY high accuracy. You don’t want to have a background conversation triggering the phone call to hang up! Accuracy is another Sensory forte! The combination of accuracy with low power consumption is a difficult mix to conquer! Sensory’s accuracy is not only in noise but also from a distance…that is when a recognizer works well with a poor S/N ratio, that means the signal can be lower (like from distance) and/or the noise can be higher.
So it’s really cool that Nuance is getting on the bandwagon behind Sensory’s innovations like TrulyHandsfree at low power. In fact after Samsungs release on the Galaxy SII with Sensory, Nuance did come out with an always “on and listening mobile device”; for fun we quickly ported our technology onto the same phone to compare…check out this video.
Something interesting we noticed was that after Sensory announced its speaker verification and speaker ID for mobile devices at CTIA this year, Nuance shortly thereafter came out with their own announcement, but there were no demo’s available so we couldn’t do a comparison video.
I’ve been in the speech technology field since the beginning and I have to say, there has never been a more exciting time for this space. Recently some of the biggest names in technology have announced the integration of voice capabilities into their products. At this year’s E3 conference, Microsoft stated that the next version of it’s Xbox Live will include voice commands. Also, it appears Apple will integrate speech-to-text input in the iOS 5. Android 2.1 already has speech-to-text built in to its mobile platform. And just this week, Google announced that voice search capability is coming to the Google.com search box (how cool?!)
All of these developments will be exposing more and more mainstream users to the benefits of the voice user interface on a daily basis. Consumers demand so much from personal devices and if they expect to control them via voice, they’ll want to do so from beginning to end (no button pressing, ever). This is where Sensory comes in. Our Truly Hands-Free technology is better than anything out there and lets manufacturers add a hands-free trigger to the interface so the user can give the device a call to action without ever lifting a finger. No need to take eyes off the road to make a call from a hands-free car kit, no need to dirty up your tablet or computer by using messy (cooking) hands to call up a recipe, no need to disturb your comfortable state of rest to set an alarm clock, etc.
I can say from where I sit, many manufacturers see the value of a voice user interface that includes a hands-free trigger phrase. Expect to see the makers of automotive products, smartphones, home entertainment products and more using Sensory’s technologies in the coming year. And be sure to stay tuned for exciting enhancements and innovations in store for our Truly Hands-Free technology, as well.
The Holy Grail in Speech is Almost Here! May 6th, 2011
For far too long, speech recognition just hasn’t worked well enough to be usable for everyday purposes. Even simple command and control by voice had been barely functional and unreliable…but times, they are a changing! Today speech recognition works quite well and is widely used in computer and smart phone applications…and I believe we are rapidly converging on the Holy Grail of Speech - making a recognition and response system that can be virtually indistinguishable from a human (a really smart human with immaculate spelling skills and fluency in many languages!)
I think there are 4 important components to what I’d call the Holy Grail in Speech:
- No Buttons Necessary. OK here I’m tooting my own whistle, but Sensory has really done something amazing in this area. For the first time in history there is a technology that can be always-on and always-listening, and it consistently works when you call out to it and VERY rarely false-fires in noise and conversation! This just didn’t exist before Sensory introduced the Truly Handsfree™ Voice Control, and it is a critical part of a human-like system. Users don’t want to have to learn how to use a device, Open Apps, and hold talk buttons to use! People just want to talk naturally, like we do to each other! This technology is HERE NOW and gaining traction VERY rapidly.
- Natural Language Interactions. This is a bit tricky, because it goes way beyond just speech recognition; there has to be “meaning recognition”. Today, many of the applications running on smart phones allow you to just say what you want. I use SIRI (Nuance), Google and Vlingo pretty regularly, and they are all very good. But what’s impressive to me isn’t just how good they are, it’s the rate at which they seem to be improving. Both the recognition accuracy and the understanding of intent seem to be gaining ground very rapidly.
I just did a fun test…I asked each engine (in my nice quiet office) “How many legs does an insect have?”…and all three interpreted my request perfectly. Google and Vlingo called up the right website with the question and answer…and SIRI came back with the answer – six! Pretty nice! My guess is the speech recognition is still a bit ahead of the “meaning recognition”…
Just tried another experiment. I asked “Where can I celebrate Cinco de Mayo?” SIRI was smart enough to know I wanted a location, but tried to send me off to Sacramento (sorry - too far away for a margarita!) Vlingo and Google both rely on Google search, and did a general search which didn’t seem to associate my location… (one of them mis-recognized, but not so badly that they didn’t spit out identical results!) Anyways, I’d say we are close in this category, but this is where the biggest challenge lies.
- Accurate Translation and Transcription. I suppose this is ultimately important in achieving the Holy Grail. I don’t do much of this myself, but it’s an important component to Item 2 above, and also necessary for dictating emails and text messages. When I last tested Nuance’s Dragon Dictate I was blown away by how well it performed. It’s probably the Nuance engine used in Apple’s Siri (you know, Nuance has a lot of engines to choose from!), and it’s really quite good. I think Nuance is a step ahead in this area.
- Human Sounding TTS. The TTS (text-to-speech) technology in use today is quite remarkable. There are really good sounding engines from ATT, Nuance, Acapela, Neospeech, SVOX, Ivona, Loquendo and probably others! They are not quite “human”, but come very close. As more data gets thrown at unit selection (yes, size will not matter in the future!), they will essentially become intelligently spliced-together recordings that are indistinguishable from live performance.
Anyways, reputable companies are starting to combine and market these kinds of functions today, and I’d guess it’s a just a matter of five to ten years until you can have a conversation with a computer or smartphone that’s so good, it is difficult to tell whether it’s a live person or not!
Truly Handsfree™ Trigger Technology Taking Over Sensory! February 24th, 2011
I haven’t had much time to blog lately, and you may have noticed that when I do, I often write about our revolutionary new Truly Handsfree™ Trigger speech technology. Technically it’s a phrase-spotting technology, but Sensory is using a revolutionary new multi-patent pending approach that’s changing the way we do speech recognition. The Truly Handsfree™ Trigger doesn’t use typical techniques like background noise modeling or speech detection (i.e. start and ending speech.) In operation, it ends up being MUCH more noise robust, yet still very efficient as it consumes less current than it would if we also included all the traditional approaches. The basic idea is that it’s on and listening all the time, and able to reject all of the wrong words and correctly identify the right words! This eliminates the need for activation via button pressing.
A lot of companies are using our technology now as a voice trigger for other speech recognition applications. At the recent Mobile World Congress, Samsung introduced the first Truly Handsfree Smartphone, the Galaxy sII, which uses a Truly Handsfree™ Trigger followed by the Vlingo experience. You say “Hey Galaxy” and it wakes up, no touching necessary! I tried this on the noisy showroom floor at Mobile World Congress, and it nailed my “Hey Galaxy” every time, even from a distance of 5 feet away!
Chris Schreiner over at Strategy Analytics recently tried out an early beta demo for Android, and in a blog late last year he said, “In a demo experience on my Android phone, the hands-free trigger worked remarkably well with varying types of background noise.”
With Truly Handsfree™ Trigger’s noise-robust nature and the ability to always be on listening, we are able to do more natural language-like schemes. A couple of great examples are in the toy space (and we do love toys at Sensory!)
- I mentioned Hallmark in my last blog…now they are rolling out a whole new product line built with Sensory chips because of the huge success of Jingle, the Husky Pup.
- Mattel has pushed us to deploy this phrase spotting technology even in our lowest cost, entry level processor. They have a new product line coming out this year that’s for sure to be a BIG HIT called Fijit. The Fijit’s are these cute wiggly characters with amazing skin, and they do the TOUGHEST speech recognition feats ever. They listen for a bunch (30??) of short key words like “hungry” so you can say a variety of things to it (Like…Hungry?…I’m Hungry…Are you Hungry?) and it can intelligently respond and interact. (Actually I don’t know if “Hungry” is a one of its actual words, that’s for example only.) SpeechTech just did a nice summary on Fiji Friends in their blog, and Mattel has some nice YouTube videos and websites where you can learn all about Fijits.
So what’s happening here at Sensory is that this technology initially invented as a trigger is migrating into being an amazingly noise-robust speech solution for any command and control application! It’s nominated for awards by MobileTrax in both the Speech Processing and Software Technology innovation categories!
Sensory has developed a whole product roadmap around our new approach, and this includes speaker adaptive recognition, larger vocabulary solutions, improvements in accuracy, and consumer created triggers. A funny thing about consumer created triggers…Our initial release was NOT INTENDED for this, but one of our customers, Adelavoice, did a few tricks and allowed end users to create their own triggers. Know what’s the most common trigger phrase?? “Yo Bitch”…I guess that says something about the demographic of the user base!
OK…I could go on and on about this new phrase spotting technology, but I gotta get some real work done!
Lots of Great Stuff at Sensory! December 7th, 2010
I’ve been so busy, I haven’t had much of a chance to blog, but here are some of the exciting new products I can talk about:
Last CTIA seems like yesterday, but it was 2 months ago. At the show Motorola introduced 3 new Bluetooth accessories, all of them using Sensory TTS and speech recognition. Moto has a very clever design using cloud based TTS for email reading, cloud based VR for dictation, then Sensory on the client for the “light lifting” tasks of command and control (like answering phones) and reading caller ID.
Sensory’s Truly Handsfree Trigger continues to get rave reviews and fans. Vlingo’s WONDERFUL In Car solution is using the Sensory Trigger “Hey Vlingo”; Enustech’s “Drive N Talk” solution just added a Truly Handsfree Trigger, and AdelaVoice started shipping their “StartTalking” WITHOUT Sensory, but quickly switched over to us when they tested out our Truly Handsfree Trigger. We consider it a real KUDO to win over companies like Vlingo that have some of the best speech technologists in the universe!
We know EVERY speech company on the planet is working on a Trigger WordSpot solution to compete with Sensory’s Truly Handsfree Triggers, so we challenge them all to a shootout! We’re happy to send you our stuff if you send us yours!!!!
OH HERE’S A GREAT TOY PRODUCT….Hallmark has released Jingle the Husky Pup Interactive Storybook and Story Buddy. The basic idea is a book that comes with a plush dog that interacts while you read the story. It’s an interesting product for several reasons:
- It’s a big speech recognition hit and just in time for the holidays. It has already won several awards and is selling out in many retail outlets.
- It’s from Hallmark, which is an interesting move. Hallmark is a multi-billion dollar privately held giant, of course best known for greeting cards. This successful move into high tech speech recognition toys brings them into a new market that given the success here, will experience rapid growth.
- The speech recognition is Sensory’s new phrase spotting technology (yep, our Truly Handsfree Triggers applied in a new way.) The Jingle product marks a new use of Sensory’s technology to do MULTI-WORD phrase spotting rather than single trigger words. As the person reads the book, Jingle listens for a half dozen or more key phrases, and when those phrases are spoken, Jingle chimes in with various barks and songs.
- It’s only $24.95 at retail…pretty breakthrough pricing for an advanced speech technology product!
“Beam me to Hong Kong Scotty” October 29th, 2010
My name is Jeff Rogers, and I am Sensory’s VP of Sales and a guest writer for the blog this month. I’m a big fan of the Star Trek movies, and other then the Holodeck (which I can’t figure out why anyone would ever leave once inside), the idea of being able to just beam somewhere would be GREAT! When I started at Sensory in 1994, I would come into the office in the morning and first check the fax machine for faxes from Asia and then I would check my voice mail. The internet and email changed all this quickly, and now it’s amazing to me how much more I can accomplish and for virtually free. The iPhone came along and now I can even check and respond to emails while sitting at a red light – yeah, I know, not while driving!
The one aspect of doing business that hasn’t changed in the 16 years I have been selling with Sensory are face to face meetings; in fact, I think it’s moved in the wrong direction. With the long lines at security and reduced flights from airlines there are fewer options, planes are nearly always full, and everything just takes more time. Sure would be nice just to beam to NYC.
While some aspects of Star Trek have become reality (personal communicators for example, i.e. the cell phone) I think we’re still many years away from beaming places. So the issue is how do I travel and show all these companies that we do business with the great new technologies that Sensory engineering continues to bring out? SALES VIDEOS!
I started doing these little 1-3 minute videos showing off Sensory chips, software and technologies some time ago. I quickly found that they were a big hit with companies. I can demonstrate Sensory without leaving the comforts of my office. Customers can view these videos at a convenient time without having to schedule time for a meeting and even better, they can then forward these to others around the company. I have had several companies tell me that they post these videos on their internal sites so that product designers, marketing and others can reference them and literally see how the technology works.
Here is a video of our BlueGenie™ software running in an actual hands free kit. I had my sales manager film as I was driving. The result is a video in a car while driving with road and radio noise! I recently made a video of our NLP-5x chip and how it could be applied in a microwave oven. I once used my son to help make a video of our sound sourcing technology running on our RSC-4x chip – what you can’t see in the video is me making a motion to my son to stop talking so that I could finish the video — he was having too much fun with the sound sourcing demo. You can view all of these video’s on our technologies web page.
Of course face to face meetings are still required and much can be accomplished in them, but when there’s not time and you want to show many people and companies something new, these sales videos are great! So while I can’t beam myself to Hong Kong, I can send sales videos!
Improving Signal to Noise Ratio July 14th, 2010
Dealing with a poor signal to noise ratio is one of the toughest issues in automating speech recognition. At Sensory, we develop lots of techniques so our customers’ products can sit at one end of a noisy room and still recognize a speaker at the other end of the room. Our technologists typically don’t like to implement active noise cancellation techniques because of the belief that active noise cancellation’s signal processing will extract useful information from the speech data. Nevertheless we have a whole host of other techniques to make performance in noise work really well.
In Bluetooth® headsets we use a dual mic beamforming technology, and we’ve found that this approach improves our ability to recognize by about 7 or 8 dB. In the Bluetooth® space there are lots of noise cancellation providers, and there are many well proven techniques for removing noise.
What I’ve been wondering for the last few months are why those vuvuzelas are so dang loud during the World Cup broadcasts. Seems like a relatively easy task to just filter them out, or have the broadcasters microphones be in a silent booth.
I guess I’m not the only one that wondered about this: If you Google Vuvuzela, “filter” is one of the most common words following it, and clicking on it showed over 1.3 million listings from hackers guides to products for sale.
Voice Search, M&A, and the Economy May 3rd, 2010
Haven’t blogged in a long time…I have plenty to say but have just been too busy. That’s good news. Sensory is signing up new deals at a very rapid rate, so 2011 should be an excellent year for us. I declare the economic recovery in full swing (although I do have some trepidation it could be short lived). Right now my biggest issue is chip SUPPLY! We’ve actually had some trouble getting enough chips (this is endemic to the entire chip market right now!). Luckily, our software business is exploding and a growing percentage of overall revenues is not dependent on buying silicon!
The cool thing is for the first time in Sensory’s 15 year history we are putting text-to-speech into products. We’ve done a handful of deals in just the last couple of months, and I expect that within 2 years we’ll have over 10 million TTS devices that will have hit the market (we’re at around 60 million speech recognition products right now).
I went to Voice Search last week. This is the show that Bill Meisel and AVIOS co-host every year. It’s my favorite speech industry show and pretty much the only one I attend. At the show I spoke on a consumer speech panel and demonstrated Sensory’s Truly Hands-Free Voice Trigger. Nobody thinks that wordspotting can be always on and always listening without false firing - and still catch the trigger word when it’s spoken. Sensory’s spotting technology WORKS! It’s my pet technology right now and I think it will change the world, by making speech recognition TRULY HANDSFREE (that was the title of my presentation)…anyways…I demoed it live. Nobody is supposed to do live speech recognition demos because they always fail (Microsoft has had the misfortune of proving that more than once!), so most people at the conferences show video clips. I know Sensory’s stuff works well, but I got a little nervous when I started talking and I could hear the echo of the microphone, and as I spoke I was hoping it wouldn’t false trigger and totally embarrass me. It didn’t false trigger…then it just had to recognize my trigger words. It got the first and the second one right. Then on the 3rd time the small device started sliding down the podium and the mic got covered up and for a brief moment my heart froze and I thought I was going to need to repeat my trigger word…then all of a sudden I felt my heart exploding as I waited microseconds for the response…then it spoke and it got it! No false fires and 3/3 triggers accurately recognized. Oh the trials and tribulations of a speech industry veteran! The technology is great and in a car it’s nearly flawless; it was this new acoustical environment that made me nervous. It came through!!!
So…Apple acquired SIRI, Inc., an iPhone developer that supplies a personal assistant application featuring speech recognition. Cool. That means Apple is in the game - the speech game, with apparently a slightly different twist than Microsoft or Google. All 3 companies are investing in speech recognition. But Apple is doing very light investing while Google and Microsoft are HEAVILY invested. Apple apparently isn’t using any of its home grown technologies as they keep licensing Nuance…and SIRI uses a Nuance engine as well. SIRI is a voice concierge type service that uses the Nuance recognizer then throws a layer of “meaning” interpretation or “intelligence” into the process. Anyways, I’m glad Apple is taking voice control seriously…they’re gonna have a tough time catching up with Google. My take is the Google stuff works best right now. I was playing with a Nexus One phone and the recognition on it is really amazing. BING is pretty good too and has wrapped better apps around their technology in BING411.
I remember a Keynote talk 15 years ago at a speech conference titled something like “the Ever-Imminent-Never-Arriving Speech Bonanza”…well it’s finally here, and I have to thank Google and Microsoft (and Vlingo too!) for clearly taking us over the hurdle and making speech recognition accessible and usable by the masses. Now it’s time for Apple to kick in and do its part…and now that HP has acquired Palm, it will need to get in the game too. I don’t even know if HP has a speech recognition team, but if they don’t they will soon. So will Cisco. So will all the major consumer electronics and automotive companies! Our time has come!!! Speech Recognition has arrived and is working for the masses! It will just get better!