Top 10 Consumer Electronic Products with Speech Recognition October 7th, 2013
- Radio Rex. There’s always something special about the first one - this was from almost 100 years ago! Rex was a toy dog that lived in a doghouse, and the waveform from calling his name would vibrate a spring at a certain frequency that would make Rex exit the doghouse. Basically, a mechanical speech recognition device!
- Radar the Robot. Sure, this list will be highly biased with products that used Sensory technology. Fisher Price released Radar the Robot back in 1995! Radar would talk to kids, sing songs with them, do math games, word games, and much, much more. I remember one of my kids walking into my room and speaking in a robotic voice to imitate Radar, “I’m sorry, I can’t hear you. Would you like to play word games? Please say yes or no.”
- Password Journal. Not only is this the bestselling girls’ electronic product of all time, but it uses voice biometrics as a key feature (to lock a diary). I once heard that half of all 11-year-old girls in the US have a diary and their top concern is that someone unintended will open it and read it. This product was so successful that Girltech, the company Sensory worked with, was acquired by Radica, who was then acquired by Mattel. Most new toy introductions have a 1-2 year life. This product, and its many revisions, has been on the market for over 15 years!
- Voice Signal and VOS light switches. Voice Signal Technologies was a company started around 1995 to build voice controlled light switches. They got so excited about speech technology that they successfully transitioned into a leader in embedded speech (they went from Sensory’s customer to competitor!), and were eventually sold to Nuance for just under $300M! Sensory’s customer VOS also made light switches. VOS even introduced a Star-Trek branded light switch and licensed Majel Roddenberry’s voice. Computer Lights On!
- Uniden Voice Dial. I’ll never forget the thrill of landing in Las Vegas for CES, and going down the escalator into the baggage claim area and seeing a HUGE sign saying “Uniden Introduces VoiceDial.” The phones worked great. They even ran a TV commercial featuring the famous sumo wrestler Konishiki saying “Pizza-man.”
- Moshi Clock. What a great clock! You could set the alarm or time just by speaking to it. The clock would even tell you the weather. And this was pre-SIRI!!
- BlueAnt V1. BlueAnt moved two steps ahead of its competitors with the V1. It had a completely voice-driven user interface that replaced the buttons and flashing lights on a Bluetooth headset. This was probably the first consumer electronic device that enabled a full and complex VUI-based experienced. And the reviews were some of best reviews I have ever seen.
- Apple SIRI/iPhone 4s. SIRI was an amazing breakthrough for voice recognition - not so much in the capabilities it presented, but in the marketing and brand support behind it. When Apple said the time was right for speech recognition, the world listened and consumer electronic OEMs suddenly changed!
- Google Glass. OK, it’s not shipping yet, but they have taken a VERY novel approach to speech by using what they refer to in the press as “hotword” models. We in the industry call this Keyword spotting. I handed my Glass to my wife and she put it on and said “You mean I just say OK Glass? Oh now I see all these other things so I can say Get Directions to Chef Chus restaurant? Woah! It’s showing me directions to Chef Chus!” The device throws out all the wrong words and captures the key words it wants to hear then seamlessly switches to a cloud-based recognizer.
- Motorola MotoX. 15M plus views for a TV commercial featuring voice control!!! And the users LOVE it! Touchless Control is one of the best reviewed apps in the GooglePlay store!
Secretive Customers and Partners October 3rd, 2013
It’s a very exciting time here at Sensory. Voice recognition is gaining steam with consumers, and we’re seeing Sensory’s technology in more devices and products than ever before. I absolutely love turning on the TV and seeing a commercial for a Sensory-enabled product! Unfortunately, we can’t always divulge when Sensory technology is helping to power the speech technology in an electronic device.
So it’s always extra special when one of our partners wants to tell the world that we are working together. Our IP partner CEVA did just that when they surprised us the other day with this AWESOME video that features Sensory’s TrulyHandsfree technology. It was such a great surprise to see this video, and we’re thrilled that CEVA is as excited about our technology as we are!
Have I told you how much I love those Moto X ads? September 27th, 2013
I think everybody in the speech industry must know about Motorola’s touchless control feature. Their ad campaign using comedian/actor TJ Miller has been a smashing success. Although their ads started off a bit racy (“touch each other not phones”), the switch to Miller introduced the “lazy phone guy” (which appears to be a knock on Apple) and better showcases key features and advantages of Moto X. The big advantage is in the low power speech activation technology that calls up Google Now without touching the phone!
The lazy phone campaign has ads for each of the device’s key features – Touchless Control, Quick Capture, Active Notifications, and the “Design It Yourself” concept. They are all entertaining, but it’s the touchless control that brings the most laughs. The first video went viral with over 15 million views, making it one of the most popular mobile phone ads ever.
Here’s the new touchless control ad. It’s pretty funny with hundreds of thousands of views and growing!
Hello MOTO! September 24th, 2013
Motorola, who just happens to be a Sensory customer, launched a suite of new phones including Moto X and three Droids – Maxx, Ultra, and Mini - all with this awesome feature called “touchless control.” The “touchless control” uses a technology to wake up the phone by voice from a low power state, so the phone is always on and listening. Sorta like TrulyHandsfree! It links into GoogleNow so you can control pretty much anything and access information without touching the phone.
- Moto launched an advertising campaign around the Lazy Phone Guy. These are my favorite ads ever, and the best of all these ads is the “no touching” Moto X phone. It’s already hit about 15M views!
- Just saw this AdAge article about the Lazy Phone gone viral and beating out iPhone at its new launch. Says the touch ad has hit about 20M!
- Even more impressive are the customer reviews for the “touchless control” technology. It’s one of the highest rated apps in the GooglePlay shop.
Galaxy Gear, Galaxy Note 3, Toq and more… September 4th, 2013
Samsung was kind enough to invite me to their roll-out of Galaxy Gear and Galaxy Note 3, but I had no plans to be at IFA Berlin, and I couldn’t justify the time to get out to New York. I did catch some of the roll-out live on my computer…a few misc. thoughts:
- Who was that guy with the weird glasses? Was that a European thing, or jab at Google Glass?
- I remembered a few years back when the first Note was introduced. Everyone thought it was crazy big. Samsung was right! Samsung won, and foresaw the direction of the mobile phone.
- Does anybody think it’s a coincidence that Google’s acquisition of WIMM (smart Android watch) and Qualcomm’s move into the Smartwatch space with Toq all happen in the same week as Samsung intros its Galaxy Gear watch?
- S -Voice is in Note 3 and Galaxy Gear! Great move for Samsung! Wearables, with their smaller displays and almost non-existent keyboards, definitely need speech recognition as part of a multi-modal interface.
- Seems like Steve Jobs had it right about the close integration of consumer hardware and software. Everyone seems to be following in Apple’s footsteps. Google/Moto, Microsoft/Nokia, and now Qualcomm, with Toq, are getting into consumer hardware. Although maybe Toq is just an attempt to promote their display tech from Mirasol
- Qualcomm is expanding its business models these days. Along with their move into smart watches, they also recently announced they are licensing chip IP. They even have their own in-house speech recognizer. I wonder what Samsung thinks of Qualcomm’s announcement of Toq?
What’s New in Galaxy Note 3 August 21st, 2013
Saw an article about game changers in the Galaxy Note 3.
It has a few interesting insights. They refer to Samsung’s S-Voice now as “Always on S Voice” and mention that the new Note 3 will be designed to be always on, listening for your wake up command.
The Galaxy Note 3 also uses the Qualcomm SnapDragon 800. This is the chip from Qualcomm that has an always listening wake up command built in. Sorry, Qualcomm, but I don’t think Samsung will be using your technology!
The best performing “always listening” processors combine Sensory’s TrulyHandsfree with an ultra-low power chip, like IP from Tensilica and CEVA. Chip companies like Cirrus Logic, DSPG, Realtek, and Wolfson seem well positioned to lead in mobile chips with “always on” listening features.
Always looking August 19th, 2013
Enough about always listening (for a moment)…what about Always Looking?
Google’s Glass seems to get a lot of flack about being able to watch, but cameras are already everywhere. Whether it’s a concert, the zoo, a kid’s soccer game, or just walking down the street, everyone seems to have their cameras or phones out snapping photos and taking videos. Back in February the world got to see videos from all the dashboard cams across Russia when the meteor exploded. I had no idea so many cars were outfitted to be watching everything.
Store’s around the world commonly deploy cameras as a means of security. A recent NPR story even discussed the use of facial recognition software to identify VIPs and celebrities. Thank goodness we never have to miss an opportunity to see what the Kardashians are up to! Hey, this technology could have prevented Oprah from being told a handbag was too expensive for her by a (clueless?) clerk in Switzerland!
So it seems like we are living in a world where cameras and microphones are going to be on, watching and listening. The cool thing is how much better Sensory functions can be when we COMBINE vision and hearing!
“Always Listening” doesn’t have to be always listening August 16th, 2013
I saw a post recently in the Android Central forum that talked about Sensory’s technology as used by Samsung:
What makes it different from any other voice app is its part of the OS. e.g. get a call, you can say ‘Answer’ or ‘Ignore’. alarm rings, just say ‘Snooze’. You don’t have to launch an app or press buttons to do this, the phone is always active and listening. No one else does this!
It’s an astute comment but not 100% accurate. When people talk about “always listening” what they really mean is that it appears to be “always listening”. At Sensory we call it TrulyHandsfree, and the idea is that there can be certain “modes” or “windows” where it listens for specific words. Like when the alarm goes off, it listens for “snooze” etc. If you say “snooze” when the alarm isn’t going off you find it’s not really “always listening”.
Glass has a similar usage model. It’s “always listening” but for different things at different times and only for short periods of time. I put my Glass on and timed it. The OK Glass trigger window seems to last 3-4 seconds, then the next set of commands (like Get Directions to) stays on 10-11 seconds.
What’s really cool about Glass is that during those listening windows you can say other things and it doesn’t “false fire” on them. I let my wife try out my Glass, and she said “You mean I just say OK Glass and then I can say any of these things like get directions to Chef Chu’s and…woah it works!” It ignores everything it’s not listening for and picks out the things it is listening for. The technology is known as “keyword spotting” for this reason.
To save power, Hallmark’s use of Sensory’s technology kicks into gear when the product is turned on. If it doesn’t hear one of the words it’s listening for spoken within a certain time frame, it will automatically power down, and stop “always listening” until its turned back on with a button press.
Sensory recently introduced a low power sound detection technology that further cuts power consumption by having the device “always listening” in a low power mode, where it doesn’t perform speech recognition. When it hears something it quickly powers up the recognizer for further analysis. This can cut the power consumption by “always listening” but not always recognizing, down to 1mA or so.
Can end users create their own “trigger phrases”? August 14th, 2013
The technology does exist for end users to create their own unique wake up words and/or speaker verification pass-phrases. If the phrase is known and prepared for in advance, we can typically achieve a higher accuracy. Some care needs to be put into training new or unexpected words to ensure the phrases have sufficient differentiated content that doesn’t frequently occur in real world conversations. Also, there needs to be excellent application design to ensure the templates recorded are of good quality. A bad training recording can really mess things up, and adaptive averaging approaches and good application designs can prevent this. We usually recommend training in quiet and using anywhere.
Let’s talk security August 12th, 2013
Here’s another question I hear: If the device is listening for a specific wake up phrase, how do I stop others from using it?
Some users and analysts have noted the amazing sensitivity of Glass. In my own experiments I’ve noticed that it’s even responsive to whispers in a quiet room or speakers from across the room, so it is possible that someone not wearing it can activate it in quiet conditions.
Speaker verification could be added to wake up words, without hurting the power consumption. The settings can be very light to reduce false firing and keep out some percentage of unintended users, or it can be tighter for more security. The “tighter” and higher security means the higher likelihood that the right user won’t always get in, that’s why we use a “light” setting so wrong users are USUALLY kept out and right users virtually always get in. The speaker verification requires training, but this could happen in an “adaptive” fashion with use, so that the training is invisible to the user. The longer the training word or phrase the better the accuracy!