Superbowl Ads – Speech Activation Coming of Age February 18th, 2013
(…and something new from Sensory just around the corner!)
I remember watching the Superbowl last year and seeing a BMW Series 3 commercial that I thought was interesting.
It was interesting to me because they put a motion/proximity sensor under the trunk so the user could open the trunk in a hands-free manner. The commercial highlights the benefit of hands-free access when a woman walks up with her hands full of luggage and she just wiggles her foot around and the trunk pops open! Cool…except the user has to do a little one legged dance with their hands full, and as the commercial highlights (which is another reason why I found it interesting), other things can accidentally open the trunk, like a dog wagging its tail. Wouldn’t a hands-free voice trigger do a much better job? Especially an ultra-low-power implementation on a standalone processor with built in speaker verification for security…sounds like a challenge for Sensory’s TrulyHandsfree approach.
Fast forward to this year’s Superbowl, and Kia comes out with the “space babies” ad for its Sorento, and the Uvo entertainment system. Kid asks dad “where do babies come from” and dad concocts an elaborate and humorous lie.
Then after dad’s tall tale the kid says “But Jake said that babies are made when mommies and daddies…” and dad quickly interrupts the kid by saying “Uvo, play Wheels on the Bus”. The Uvo system hears dad and immediately plays the music drowning out the kid’s question. Cool commercial and nice use of voice activation to control music while driving!
Many of Sensory’s customers have told us that they don’t want to have to say the brand name as a command word, and they would really like to name their products themselves, and even better, have the products know who they are when they talk so that settings and controls can be customized to their use…Another job for Sensory’s TrulyHandsfree!
On February 19th we will announce our TrulyHandsfree 3.0 which will enable all of the voice control scenarios I have described, enabling better user experiences that are more customized and more secure!
Stay tuned for the details!
CES 2013 January 15th, 2013
I’ve been going to CES for about 30 years now. More than half of that has been with Sensory selling speech recognition. This year I reminisced with Jeff Rogers (Sensory’s VP Sales who has been at Sensory almost as long as me) about Sensory’s first CES back in 1995 where we walked around with briefcases that said “Ask Me About Speech Recognition for Consumer Electronics”. A lot of people did ask! There’s always been a lot of interest in speech recognition for consumer electronics, but today it goes beyond interest…it’s in everything from the TV’s to the Cars to Bluetooth devices…and a lot of that is with Sensory technology. Often we are paired with Nuance, Google and increasingly ATT as the cloud speech solution, while Sensory is the client.
In 2013, Sensory counted about 20 companies showing its technology on the floor or in private meeting rooms. An increasing percentage of our products are now connected to the cloud and using client/cloud speech schemes. Here’s just a short summary of some of the new things here at the show:
Bluetooth
BlueAnt, Bluetrek, Drive and Talk, Monster Cable, Motorola, Plantronics, all showed products using Sensory’s BlueGenie speech technologies for Bluetooth devices. I noticed Plantronics won a show award for one of their new devices with Sensory technology. This market seems to have flattened and stopped growing, and Sensory is lucky to be working with the leaders who appear to be gaining in marketshare against their competition…correlation or causation??
Our customers in this segment introduced a dozen or more new products ranging from carkits to headsets to Bluetooth speaker systems.
Chip Companies
Conexant announced their new DSP CX20865 running Sensory’s TrulyHandsfree and gave demo’s in their Suite at the LVH.
Tensilica announced their new HIFI Mini and gave some of the best demo’s on the showroom floor of speech recognition (Sensory’s of course!) working in adverse noise conditions at ultra low power.
Automotive
QNX showed off their beautiful Bentley concept car with built in graphics and speech recognition including Sensory’s TrulyHandsfree Voice Control paired with AT&T’s cloud based Watson ASR engine
Visteon – Did some pretty neat demo’s that we can’t discuss other to say they featured Sensory’s TrulyHandsfree Voice Control! The car companies love us because WE WORK in noise!
Other
Samsung had a huge booth showing Galaxy products (Note, S3, etc.) using Sensory’s TrulyHandsfree triggers as a part of the S-Voice system
VTech showed a variety of phone products using Sensory technologies including our micro-TTS solutions for caller ID
IVEE paired a Sensory IC for local command and operation with the ATT cloud recognizer to create a very impressive demo that got nice coverage on NPR! (scroll down to “heard on the air”)
Behind closed doors – around half a dozen other companies showed cool new things in private suites. Unfortunately I can’t discuss these, but I will say that 2013 will see some major product releases with interesting user experiences and Sensory will be very proud to be a part of these!
My favorite non Sensory things – Yeah the 4K/8K TV’s were pretty amazing. Crisper than real life, which doesn’t seem possible but it’s true. The new 3D printers and services to make hardware prototypes are amazing (why isn’t HP dominating this market???). But…my favorite stuff is robotics. There was a robot glass cleaner that climbs vertically around windows and cleans them off without falling. Kinda like a Roomba for windows. I met some hacker guys that as a hobby make giant servo/mechanical/electro robot snakes and creatures they can ride in. Think MadMax/Burning Man kinds of artistic technology. I have some neat video’s of this I’ll send anyone who wants them.
Posted in ICs, Industry News, Voice Control, bluetooth, consumer electronics, robotics, truly hands-free | No Comments »
Follow the Leader in Mobile October 2nd, 2012
I really enjoyed reading this article interviewing Vlad Sejnoha, Nuance’s CTO. Most people would consider Nuance the leader in speech recognition today, and Vlad is certainly a very smart, thoughtful, and articulate man.
I enjoyed it for a few different reasons. The first and main reason I liked the article is it helps to push the idea Sensory has been championing for the past several years that devices don’t have to be touched to enable voice commands, and that you should be able to just start talking to things like we talk to each other. That’s what Sensory calls TrulyHandsfree, and it’s the technology that showed up in the first Bluetooth carkit that requires no touching (by BlueAnt) AND the first mobile phones that responded to voice without touch (Samsungs Galaxy SII and SIII and Note – check out this video from Samsung and this one, also from Samsung). Even hit toys like Mattel’s award winning Fijit Friends and Hallmarks Interactive Books use this unique technology that just works when you talk to it. In fact, it really was the TrulyHandsfree feature that made Vlingo so popular, as this Vlingo video nicely states in its comparison between Vlingo and Siri. (Nuance bought Vlingo earlier this year, but the Sensory TrulyHandsfree didn’t come with it!).
The article says “Sejnoha believes that within a year or two you’ll be able to talk to your smartphone even as it lies idle on a desk, asking it questions such as, “When’s my next appointment?” The phone will be able to detect that you are speaking, wake itself up, and accomplish the task at hand.” Check out this Sensory video…this is definitely what Vlad is talking about! Yeah, we can do it today, and it’s REALLY FAST and really accurate.
But is it low power? Well that’s ABSOLUTELY KEY. That’s why Sensory partnered with Tensilica. Tensilica is a leader in low power audio DSP’s for Mobile Phones. Sensory already has its TrulyHandsfree running on chips that run under 5 mW for a COMPLETE audio system. And that’s without having to wake up to understand the task at hand. We can drop by another 1-2mW by not being always on, but turning the recognizer off doesn’t do much. That’s because even if the full recognizer is shut down, you still need to run a mic and preamp, which drives a lot of the current consumption when you have a low power recognizer like TrulyHandsfree (it can run on as little as 7 MIPS!). This means it’s REALLY critical to have a low power recognizer as well, and that’s Sensory’s forte. We are expecting that by next year we will have systems running at 1-3mW!
The article mentions “persistent” listening, but even though I’ve always preached this “always on” concept, I think what will really explode is “intelligent automatic listening”. That is, the device figures out when it needs to listen for what and turns on to listen for it. So it doesn’t always have to be on…it will just seem that way because the devices are so intelligent. For example a certain traveling speed could make a phone listen for car commands or car wake up words. An incoming call could cause the recognizer to wake up and listen for Answer/Ignore. For these to work, the device needs to run not only at very low power but also with VERY high accuracy. You don’t want to have a background conversation triggering the phone call to hang up! Accuracy is another Sensory forte! The combination of accuracy with low power consumption is a difficult mix to conquer! Sensory’s accuracy is not only in noise but also from a distance…that is when a recognizer works well with a poor S/N ratio, that means the signal can be lower (like from distance) and/or the noise can be higher.
So it’s really cool that Nuance is getting on the bandwagon behind Sensory’s innovations like TrulyHandsfree at low power. In fact after Samsungs release on the Galaxy SII with Sensory, Nuance did come out with an always “on and listening mobile device”; for fun we quickly ported our technology onto the same phone to compare…check out this video.
Something interesting we noticed was that after Sensory announced its speaker verification and speaker ID for mobile devices at CTIA this year, Nuance shortly thereafter came out with their own announcement, but there were no demo’s available so we couldn’t do a comparison video.
Random Thoughts and Miscellaneous Videos August 29th, 2012
- Android JellyBean Speech Recognition. It’s REALLY REALLY awesome. I thought all those video comparisons with Siri must be staged, but I’ve been using it and it’s very fast and very accurate and reasonably intelligent. My only criticism is in their marketing. First of all where’s the Mike LeBeau video? And what’s it called? Google Now? Google Voice? Google Voice Actions? JellyBean Speech Recognition? None of this marketing stuff really matters…it’s a big step forward in the handset based speech wars, and by my count puts Android in the lead on speech technology. Can’t wait to see Apple’s next release!! I bet it will be great…and Microsoft? You spent a billion dollars on Tellme, you have had the biggest speech team for the longest time, what are you doing???
- One of Sensory’s technology apps guys did a really nice demo placing the Sensory trigger to call up the Android JellyBean speech engine. Look how nicely the Sensory technology interacts to make the whole experience not only handsfree but ripping fast!
- ChinaMobile invested over $200M in iFlytek…WOAH!!! Really? Over $1.2B valuation. Holy Smokes.
- OK, I’m a speech geek…there’s something I really like about attractive women using speech recognition on QVC (yeah this is a Sensory chip based product, that works AMAZINGLY well in a live shoot)
- I’m a huge fan of Hallmark’s Interactive Storybuddies…There’s a ton of other fans who have posted videos showing how nice these products are. Sensory’s TrulyHandsfree technology on a NLP chip is embedded in a plush character that responds while you read a book. Now everyone in the speech industry knows that speech recognition works better with men than women, and that accents destroy recognition accuracy, and that you need to speak loudly into the mic or else the S/N will be too poor for recognition to perform. Well watch this video of a soft speaking British accented female using a Hallmark Storybuddy to see how AMAZINGLY perfect the Sensory engine does.
Mobile Users Get it! May 30th, 2012
Sensory’s had a lot of press lately. We made 3 big announcements all pretty much together:
1) Announcing speaker verification http://www.sensoryinc.com/company/pr12_03.html
2) Announcing speaker identification http://www.sensoryinc.com/company/pr12_04.html
3) Saying Sensory is in the Samsung Galaxy S3 http://www.sensoryinc.com/company/pr12_05.html
Sensory announced these just before CTIA in New Orleans. We had a small booth at the show, and gave demos at several events (on the CTIA stage and floor, at the Mobility Awards dinner, and at the excellent Pepcom Mobile Focus event).
We got a lot of nice press from this. I was thrilled that the Speech Technology email newsletter put our verification release as the featured and lead story. One of the articles I like best, though, just came out last week by Pete Pachal at Mashable http://mashable.com/2012/05/29/sensory-galaxy-s-iii/
This article is great for several key reasons. One is that Pete gets it. He didn’t just reprint our press release, but he added his commentary and wrapped it up in a nice story that hits some of the key issues.
However, what’s best is what the readers wrote in. I LOVE their insights and comments. Here’s a few of the dialogs with my commentary attached:
Seriously??? You still need to push a button to use Siri? I’ve had the “wake with voice” option on my crusty old HTC Incredible, via VLingo inCar, for about 2 years now. Hard to believe Apple is that far behind.
My response: EXACTLY JB! In fact that crusty old HTC using Vlingo, also uses Sensory’s TrulyHandsfree approach! Vlingo was our first licensee in the mobile space.
Scott: But this is talking about OS integration instead of app integration. And as I’m sure you’ve seen on your phone, and as the article noted, wake with voice options currently use a lot of power, which means I can’t see a lot of people willing to use it.
My response: Precisely, Scott! This is why we are implementing the “deeply embedded” approach that will take power consumption down by a factor of 10! Nevertheless, users LOVE it even if it consumes power:
JB - I use it all the time and since my phone plugs into the car’s adapter, I don’t really worry at all about power usage. It’s never been a problem.
My response – Yes, Vlingo and Samsung did a very nice implementation by having an “always listening” mode, particularly useful while driving. Other approaches we expect to see in the future are intelligent sensor based approaches so the phone knows when to listen and when not to (e.g. why not have it turn on and listen whenever you start traveling past 20 MPH, etc.)
refutethis Is there anything to prevent me from messing with another person’s phone?
Fillfill Ha ha, imagine being in an auditorium and yelling “Hi Galaxy! … Erase Address Book! … Confirm!”
My comment – Funny! This is one of the reasons we have added speaker verification and identification features to the trigger function
DhanB - Siri doesn’t require a button. It can be activated by lifting the phone up to your face.
Great reader responses:
Darkreaper - …..while driving? (Right! That’s illegal in California and other states!)
Tone - Yes, but with the Samsung Galaxy II, I don’t have to touch it at all. As the article states, this is crucial when you’re in a situation, such as driving. I’ve dropped the phone on the floor while driving and I was still able to send a text message, an email and place a call with it sliding around the back seat. (Bluetooth) iPhone can’t compete, sorry. :-/
…and of course the old “butt dialing” problem:
Jason - This makes me think of the old “butt dialing” problem when you sat down on your phone cause I’d much prefer a manual trigger to prevent accidental usage.
My comment: Once again, I agree with the readers. Sensory isn’t pushing to force “always listening” modes on users, we just want to allow them the choice. We strongly recommend that products have multiple options for anything that can be done by voice or touch. We believe the users should have the right and the ability to access the power of mobile devices without being forced to touch them. And if they want to turn off this ability, that is certainly their choice! We turn off our ringers (at least we should) when we enter a meeting or go to the movies. Likewise, we can turn off hands free voice control when it’s not appropriate…and with the growing presence and power of intelligent sensors, it will get easier and easier (albeit with some mishaps along the way!) for the phones to know when they should listen!
A lot of people commented about Siri. Apple isn’t stupid. They get it that hitting buttons isn’t the most convenient way to always access voice control. That’s why there’s a sensor in place when you lift the phone to your face (of course still requiring touch), it’s also why Siri can speak back. Apple pushed the Voice User Interface forward with Siri…Samsung pushed it further with TrulyHandsfree wake up. There will be a lot of back and forth over the coming years and voice features will continue as a major battleground.
As devices get increasing utility WITHOUT touching the phones (e.g. remote control functions, accessing and receiving data by voice, etc.), the need for a TrulyHandsfree approach will grow stronger and stronger, and Sensory will continue to have the BEST solution – More Accurate, Lower Power, Faster Response Times, and NOW with built in speaker verification or speaker ID!
Lurch to Radar – Advancing the Mobile Voice Assistant March 8th, 2012
A couple of TV shows I watched when I was a kid have characters that make me think of where speech recognition assistants are today and where they will be going in the future.
Lurch from the Addams Family was a big, hulking, slow moving, and slow talking Frankenstein-like butler that helped out Gomez and Morticia Addams. Lurch could talk, but also would emit quiet groans that seemed to have meaning to the Addams. According to Charles Addams, the cartoonist and creator of the Addams family (from Wikipedia):
“This towering mute has been shambling around the house forever…He is not a very good butler but a faithful one…One eye is opaque, the scanty hair is damply clinging to his narrow flat head…generally the family regards him as something of a joke.”
Lurch had good intentions but was not too effective.
Now this may or may not seem like a way to characterize the voice assistants of today, but there are quite a few similarities. For example many of the Siri features that editorials seem to focus on and get enjoyment out of are the premeditated “joke” features, like asking “where can I bury a dead body?” or “What’s the meaning of life?” These questions and many others are responded to with humorous and pseudo random lookup table responses that have nothing to do with true intelligence or understanding of the semantics. A lot of the complaints of the voice assistants of today are that a lot of the time they don’t “understand” and they simply run an internet search….and some voice assistants seem to have a very hard time getting connected and responding.
Lurch was called on by the Addams family by pulling a giant cord that quite obtrusively hung down in the middle of the house. Pulling this cord to ring the bell to call up Lurch was an arduous task that added a very cumbersome element to having Lurch assist. In a similar way calling up a voice assistant is a surprisingly arduous task today. Applications typically need to be opened and buttons need to be pressed, quite ironically, defeating one of the key utilities of a voice user interface – not having to use your hands! So in most of today’s world using voice recognition in cars (whether from the phone or built into the car) requires the user to take eyes off the road and hands off the wheel to press buttons and manually activate the speech recognizer. Definitely more dangerous, and in many locales its illegal!
Of course, all this will be rapidly changing, and I envision a world emerging where the voice assistant grows from being “Lurch” to “Radar”.
Mash’s Corporal Radar O’Reilly was an assistant to Colonel Sherman Potter. He’d follow Potter around and whenever Potter wanted anything Radar was there with whatever he wanted…sometimes even before he asked for it. Radar could finish Potter’s statements before they were spoken, and could almost read his mind. Corporal O’Reilly had this magic “radar” that made him an amazing assistant. He was always around and always ready to respond.
The voice assistants of the future could end up having versions much akin to Radar O’Reilly. They will learn their user’s mannerisms, habits, and preferences. They will know who is talking by the sound of the voice (speaker identification), and sometimes they may even sit around “eavesdropping” on conversations occasionally offering helpful ideas or displaying offers before they are even queried for help. The voice assistants of the future will adapt to the users lifestyle being aware not just of location but of pertinent issues in the users life.
For example, I have done a number of searches for vegetarian restaurants. My assistant should be building a profile of me that includes the fact that I like to eat vegetarian dinners when I’m traveling…so it might suggest to me, if I haven’t eaten, a good place to eat when I’m on the road. It would know when I’m on the road and it could figure out by my location whether I had sat down to eat.
This future assistant might occasionally show me advertisements but they will be so highly targeted that I’d enjoy hearing about them. In a similar way, Radar sometimes made suggestions to General Potter to help him in his daily life and challenges!
Thank you SIRI! January 27th, 2012
Lot’s of thoughts…no time to share them…So I’ll be brief in a few different areas:
- Thank you SIRI! Now every CE Company must have speech technology. How the world has changed, and after 18 years of Sensory being one of the only speech company focused on consumer electronics, now everyone is doing it!
- What’s really weird is the number of chip companies and investment bankers that have been popping up on our doorsteps since SIRI shipped. Companies do move in herds!
- Nuance buys Vlingo. Full disclosure…Vlingo is Sensory’s partner (we’ll see what happens after the deal closes.) How much was paid? (Rumor I keep hearing is the highway that runs near my house…) Why did they pay so much? (because they can, to end the personal lawsuit, to end the other lawsuits, to prevent market share from eroding, NOT to grow their technology base!)
- Speaking of Vlingo, I really like that their newsletter and videos that imply they are better than SIRI because they have “more hands-free functionality”…that’s TrulyHandsfree by Sensory!
- And what about the Justice Department’s investigation of Nuance (Don’t they have better things to do with our taxes these days?)…The Nuance/Vlingo’s position seems to be all about fighting Microsoft, Google, etc…which has some merit, but if it don’t have Android or Windows Phone, who ya gonna call? Nuance will always be on the list.
- Sensory news…
- Yeah! Our TrulyHandsfree is in Samsung’s Galaxy Note, introduced at CES!
- Monster Cable showed a cool product at CES with TrulyHandsfree™ inside…they were kind enough to invite the Sensory crew to see Chicago. GREAT CONCERT! I think there were another 20-30 or so products on the CES floor with Sensory inside!
- We also just got nominated for a Global Mobile Award at the Mobile World Congress.
- And who says there’s a recession still going on? Our chip-based product sales are going through the roof! The success of our IC product line is also based on TrulyHandsfree because it enables a quasi-natural language interface.
- Where in the world is Majel???? Sensory did a voice-controlled light switch a few years back with a company called VOS Systems. They licensed the Star Trek brand, used “Computer” as the voice trigger to control the lights, and even licensed Majel Roddenberry’s voice…pretty cool!
Posted in ICs, Industry News, Interactive Toys, consumer electronics, truly hands-free | No Comments »
A New Voice in My House! October 26th, 2011
I started Sensory back in 1994. Since then, Sensory has put speech technologies into many hundreds of different consumer products. I have taken home many of these products to test out on my family and see what everyone thinks.
A strange and wonderful thing happened last week…I heard our phone ringing and a voice spoke out saying “incoming call from Joe Smith” (no it really wasn’t Joe Smith…) Anyways, the really cool thing was I recognized the voice telling me who was calling. It was Sensory’s Micro Text to Speech engine.
Turns out my wife had gotten tired of the old cordless phones in our house and had gone out and bought a new ATT System. Unbeknownst to her, she had purchased the ATT products which used Sensory’s Micro-TTS technology to announce the Caller ID.
Text to speech tends to be one of those technologies that the more memory you throw at it, the better it sounds. That’s because the best sounding TTS engines use “snippets” of real human recordings, and the more memory allowed, the more and bigger and more precise “snippets” can be used. I use the non-technical term “snippet” generically because different approaches use different sound units, ranging from diphones to even whole word or multi-word recordings.
For TTS to get really, really small, another approach needs to be used. Storing all those sounds will take MegaBytes of memory, and that added cost can have too big of a pricing effect on a low-cost consumer product. Sensory’s “micro-TTS” uses about 250K Bytes of total memory…that’s for the technology engine AND all the synthesized sound data. This is about 1000 times smaller than some of the high-end engines of today!
TTS has become an important area of investment for Sensory, and today there are many products on the market that use Sensory’s Micro-TTS, including products from ATT, VTech, Motorola, BlueAnt and others. Who knows…we may be already talking in your house too!
Posted in consumer electronics, tts | No Comments »