Improving Signal to Noise Ratio   July 14th, 2010

Dealing with a poor signal to noise ratio is one of the toughest issues in automating speech recognition. At Sensory, we develop lots of techniques so our customers’ products can sit at one end of a noisy room and still recognize a speaker at the other end of the room. Our technologists typically don’t like to implement active noise cancellation techniques because of the belief that active noise cancellation’s signal processing will extract useful information from the speech data. Nevertheless we have a whole host of other techniques to make performance in noise work really well.

In Bluetooth® headsets we use a dual mic beamforming technology, and we’ve found that this approach improves our ability to recognize by about 7 or 8 dB. In the Bluetooth® space there are lots of noise cancellation providers, and there are many well proven techniques for removing noise.

What I’ve been wondering for the last few months are why those vuvuzelas are so dang loud during the World Cup broadcasts. Seems like a relatively easy task to just filter them out, or have the broadcasters microphones be in a silent booth.

I guess I’m not the only one that wondered about this: If you Google Vuvuzela, “filter” is one of the most common words following it, and clicking on it showed over 1.3 million listings from hackers guides to products for sale.

Todd
sensoryblog@sensoryinc.com

Yeah everyone’s writing about the new Google phone. I’ve heard various reports about it being underwhelming, and in-need of the marketing hype that Apple is so good at. Everybody loves to compare the iPhone with the Nexus One and talk about screen size, weight, camera capabilities, software, etc.

Here’s my 2 cents on speech recognition and Bluetooth for these devices:

Apple’s initial iPhone release had speech recognition–phobia, with no factory options for implementing voice recognition commands. It was such a shocking omission that many of the mainstream reviewers even pointed it out. In various industry conversations I heard “Steve doesn’t like speech recognition”. As a result, 50 speech recognition applications quickly appeared in the Apps store, and by necessity Apple soon implemented Voice Control for music and voice dialing. I assume Apple implemented Nuance technology and most likely in a local version that runs on the iPhone.

What Google’s done with the Nexus is WAY different. They are embracing speech recognition from the start, and not just implementing “me too” features. Google is pushing the boundaries by including speech recognition for dictation (text messaging, email, social networking, etc.) and mapping/GPS type functions. I remember the original Android announcements mentioned that Nuance was their speech partner, but it seems like all the big guys like to start with Nuance then switch away. My guess is that the Nexus One uses homegrown (Mike Cohen and Co.) speech recognition, and since it is server based, it should adapt and improve and just get better with the data they are collecting.  I give Kudo’s to Google for this!

On the Bluetooth side of things, we were shocked and hurt that we couldn’t use our BlueGenie Voice Interface Bluetooth headsets to easily call up recognizers on the iphone for name dialing. Although Bluetooth makes a clear protocol for this, it wasn’t implemented on the initial iPhone. New iPhone versions do support this, but Apple never clearly thought through the importance of a cohesive user interface and functionality with Bluetooth connected to its phones, especially when speech recognition is involved.

If Google is smart, they won’t only introduce a Nexus One phone, but they’ll come out with a really cool Nexus One headset that TAKES ADVANTAGE of all the great speech recognition software on the handset, with one seamless voice user interface! The Nexus One has been blasted as nothing really new, but this type of integration with a hands-free headset or car kit could make it TOTALLY REVOLUTIONARY.

Hey Google – make a BLUEGENIE VOICE INTERFACE HEADSET!

Todd
sensoryblog@sensoryinc.com

We have had a lot of requests over the years for products that are always on and listening for a key “trigger” word. The challenge of this approach is making a “trigger” that doesn’t accidentally trigger when it is not spoken, but also doesn’t accidentally NOT trigger when it IS spoken. The trade-off between these two types of errors is not so simple, since improving one usually makes the other worse, and background noise, especially talking, typically makes voice interfaces perform poorly. And this doesn’t even take into account the constant energy drain from devices that are always on and listening.

Nevertheless, we have gotten the same question over and over. “What’s the point of having speech recognition if I need to press a button to activate it?”

Some of our earliest customers, like VOS Systems, used a hands-free trigger to control a light switch. This was a particularly useful application, because it could be plugged into a wall without battery drain.

The “Phrase Spotting” technology has advanced over the years, and recently we introduced a new spin on it that we call “Truly Hands-Free” for Bluetooth carkits. This technology is being extremely well received, and we are consistently hearing high praise about performance in noise. It really hits the RIGHT combination of minimizing false accepts AND false rejects, all with minimal power drain considering it is always listening for a trigger word.

Now we’re starting to apply this technology to some new and interesting applications:

  1. Answer/Ignore for Bluetooth headsets and car kits. One of the most desired features of Sensory’s BlueGenie Voice Interface is that it allows answering a phone without having to touch it, for example in a Bluetooth headset or hands-free car kit. The challenge has been getting this to work well in the presence of really loud ring tones and background noises like a car radio or wind noise. The solution…we’ve implemented a Phrase Spotting version of Answer/Ignore that is completely robust to noise and ALWAYS does the right thing.
  2. Interactive Books. Imagine a book that offers an interactive experience with parents and children while they are reading at night. For example, I say “Jack and Jill went up a Hill” and Jack grunts and says “This is hard work!”, and then I say “to fetch a pail of water”, and I hear a water pouring sound, etc. Pretty fun! In the past this would have been difficult because the talking would have messed up the recognition, but the Phrase Spotting can be embedded even in the middle of a sentence!
  3. Remote-less Home Controls. If you are my age, you might remember the days of having to walk up to a TV set and manually crank the channel and volume knobs. That’s unheard of today, and nobody would ever buy a TV like that…but we do buy thermostats, microwaves, clocks, fans, heaters, lights, radios, and virtually everything else around the house that requires a manual interface. Why not use voice triggers? Sensory is currently working with many different consumer electronics manufacturers to implement this revolutionary recognition technology into a new generation of voice controlled devices.

Lot’s of exciting stuff in development here!! Next time, maybe I’ll write about our voice morphing TTS!

Todd
sensoryblog@sensoryinc.com

Jawbone Makers Dream   May 1st, 2009

I saw an interesting article in Wired this week.

It was an interview with Hosain Rahman and Alexander Asseily, the founders of Aliph who make the Jawbone headset. Aliph has done a nice job in the design and marketing of their Jawbone, and have made it one of the most widely recognized and best selling Bluetooth® headsets here in the US. Aliph has just announced the Jawbone Prime which offers a variety of new skin colors with amusing names plus a host of “improvements” that some might call “bug fixes” over the Jawbone II.

I’ve got to hand it to Hosain and Alexander for their well articulated strategy in Wired…it’s a vision that I hold as well. Here’s a few quotes from the article, with my comments:

“Aliph hopes to take Jawbone out of the “yet another Bluetooth headset” category and transform it into a device that could become an “audio gateway” for the consumer. Think news, weather, music or even language learning modules combined with a headset in a way that would bring term ‘wearable computing’ to life.”

This sounds like Sensory’s plans for the BlueGenie Voice Interface. In fact we’ve already implemented these plans with companies like Google and Microsoft to allow a speech user interface to access and dial any business in the US, get stock quotes or stock market updates, get weather information, driving directions, and much more. We expect to add customized entertainment, voice to text messaging, social networking with voice input and more! The BlueGenie Voice Interface enables a whole lot of more features without adding buttons!

“We are looking at wearable computing, which we see an opportunity for us to use the audio medium extensively,” says Asseily. Aliph is currently technologies such as speech recognition as a way to bring more functionality to its headset. The company could take a leaf out of Apple’s playbook there. Apple launched its latest iPod shuffle with speech recognition that tells users what song it is playing, the artist and the names of the playlists. Asseily and Rahman won’t reveal when Aliph will release a device with a comparable speech recognition feature but say they are big believers in the technology.

Oops - The editor got that one wrong. Apple uses text-to-speech, not speech recognition for the new Shuffle, but the technology is actually embedded in iTunes on the computer, not in the Shuffle. As a funny side note, it appears Apple used a very mediocre sounding TTS voice for their Windows PC version of iTunes and a good sounding voice for their Mac version. What’s stored on the Shuffle is an audio file moved over from the computer. Use a Mac and you get a good voice!

My hats off to Aliph for understanding the value of speech recognition in a Bluetooth headset for a “future” version of the Jawbone (it’s not in their Prime)…Of course Sensory has the BlueGenie Voice Interface running on CSR Bluetooth chips with speech recognition, compressed speech, text-to-speech, voice morphing and MORE.  This allows headsets to be easier to use, safer while driving, and have lots more features!

BlueAnt’s new Q1 headset will use the BlueGenie Voice Interface and should hit stores this month!!!!

Todd
sensoryblog@sensoryinc.com