Yeah everyone’s writing about the new Google phone. I’ve heard various reports about it being underwhelming, and in-need of the marketing hype that Apple is so good at. Everybody loves to compare the iPhone with the Nexus One and talk about screen size, weight, camera capabilities, software, etc.

Here’s my 2 cents on speech recognition and Bluetooth for these devices:

Apple’s initial iPhone release had speech recognition–phobia, with no factory options for implementing voice recognition commands. It was such a shocking omission that many of the mainstream reviewers even pointed it out. In various industry conversations I heard “Steve doesn’t like speech recognition”. As a result, 50 speech recognition applications quickly appeared in the Apps store, and by necessity Apple soon implemented Voice Control for music and voice dialing. I assume Apple implemented Nuance technology and most likely in a local version that runs on the iPhone.

What Google’s done with the Nexus is WAY different. They are embracing speech recognition from the start, and not just implementing “me too” features. Google is pushing the boundaries by including speech recognition for dictation (text messaging, email, social networking, etc.) and mapping/GPS type functions. I remember the original Android announcements mentioned that Nuance was their speech partner, but it seems like all the big guys like to start with Nuance then switch away. My guess is that the Nexus One uses homegrown (Mike Cohen and Co.) speech recognition, and since it is server based, it should adapt and improve and just get better with the data they are collecting.  I give Kudo’s to Google for this!

On the Bluetooth side of things, we were shocked and hurt that we couldn’t use our BlueGenie Voice Interface Bluetooth headsets to easily call up recognizers on the iphone for name dialing. Although Bluetooth makes a clear protocol for this, it wasn’t implemented on the initial iPhone. New iPhone versions do support this, but Apple never clearly thought through the importance of a cohesive user interface and functionality with Bluetooth connected to its phones, especially when speech recognition is involved.

If Google is smart, they won’t only introduce a Nexus One phone, but they’ll come out with a really cool Nexus One headset that TAKES ADVANTAGE of all the great speech recognition software on the handset, with one seamless voice user interface! The Nexus One has been blasted as nothing really new, but this type of integration with a hands-free headset or car kit could make it TOTALLY REVOLUTIONARY.

Hey Google – make a BLUEGENIE VOICE INTERFACE HEADSET!

Todd
sensoryblog@sensoryinc.com

My last blog was about TTS. When things aren’t pronounced right from a TTS engine, the linguists can go in and add “exceptions” so the standard pronunciation rules don’t have to apply in specific cases.

Of course the easy way to get TTS to pronounce things right it to spell everything phonetically correct, then no exceptions or special rules are ever necessary.

I was watching a documentary last night on Led Zeppelin. I got a kick out of one of the early friends of the band saying they spelled “Lead” like “Led” so that Americans would pronounce it right. The ironic thing about this intentional misspelling is that it led to other bands paying homage by misspelling in Led Zep’s footsteps. Def Leppard didn’t need to change their spelling to get their name pronounced right. I suppose Motley Crue, with all their umlaut’s actually mispronounce their misspelled name. Even some of heavy metal’s big name singers (Axl Rose??) might have had more “normal” names if this strange tradition never started with Led Zeppelin…

…and on a wildly different note, my heavy metal video documentary recommendation goes not to Zep but to “Anvil, The story of Anvil”. Another great music related video for you to consider: “Les Triplettes de Belleville”…and if you want a music related video that’s closer to home then see the documentary “Standing in the Shadows of Motown – The Funk Brothers”.

Todd
sensoryblog@sensoryinc.com

The new Android OS doesn’t have this problem! I read about one of these devices with TTS (Text-To-Speech) built in and voice commands too, so of course I had to try one out. I put it into TTS mode where it speaks everything, hit the recognition button and it prompted “SPEAK NOW.” I said something like “Starbucks in Sunnyvale, California”…and guess what it recognized??? “SPEAK NOW.” I guess the recognizer started listening too early and heard the TTS itself saying “SPEAK NOW.”

Listening at the right time is always a challenge for speech recognizers, but in Speech Recognition 101, programmers learn to make the recognizer listen AFTER the prompt is spoken. In Speech Recognition 201, students are taught to trim the silence after the end of the speech prompt, otherwise those that studied Speech Reco 101 will have it listening for a recognition word too late (because there’s usually a silent tail on the prompt that users don’t hear, so they speak too early if it’s not trimmed). Therefore, the first few hundred milliseconds of the user’s speech will be clipped off.

That same TTS in the Android was a Verizon product. Guess how it pronounces Verizon? Well, not the way I’ve ever heard it pronounced. TTS isn’t easy, but this should be an easy fix. Someone at Google or Verizon will figure it out soon, and Nuance will probably get a call.

I heard a great NPR report the other day about the Amazon Kindle. The product is being boycotted by groups as diverse as Syracuse University, the National Federation for the Blind, and the Burton Blatt Institute for Disability Studies. The complaint is that the while the Kindle offers Text-To-Speech as an option, it only reads from the books, and does not provide a friendly user interface for the visually impaired. In fact, one spokesperson said that the Text-To-Speech function is just about impossible for a blind person to use. Basically, Amazon needed to offer a mode where the TTS reads any button that was pressed, which shouldn’t have added any real cost to the bottom line. Better yet, they could have added a little speech recognition so the buttons weren’t even necessary!

Todd
sensoryblog@sensoryinc.com

We have had a lot of requests over the years for products that are always on and listening for a key “trigger” word. The challenge of this approach is making a “trigger” that doesn’t accidentally trigger when it is not spoken, but also doesn’t accidentally NOT trigger when it IS spoken. The trade-off between these two types of errors is not so simple, since improving one usually makes the other worse, and background noise, especially talking, typically makes voice interfaces perform poorly. And this doesn’t even take into account the constant energy drain from devices that are always on and listening.

Nevertheless, we have gotten the same question over and over. “What’s the point of having speech recognition if I need to press a button to activate it?”

Some of our earliest customers, like VOS Systems, used a hands-free trigger to control a light switch. This was a particularly useful application, because it could be plugged into a wall without battery drain.

The “Phrase Spotting” technology has advanced over the years, and recently we introduced a new spin on it that we call “Truly Hands-Free” for Bluetooth carkits. This technology is being extremely well received, and we are consistently hearing high praise about performance in noise. It really hits the RIGHT combination of minimizing false accepts AND false rejects, all with minimal power drain considering it is always listening for a trigger word.

Now we’re starting to apply this technology to some new and interesting applications:

  1. Answer/Ignore for Bluetooth headsets and car kits. One of the most desired features of Sensory’s BlueGenie Voice Interface is that it allows answering a phone without having to touch it, for example in a Bluetooth headset or hands-free car kit. The challenge has been getting this to work well in the presence of really loud ring tones and background noises like a car radio or wind noise. The solution…we’ve implemented a Phrase Spotting version of Answer/Ignore that is completely robust to noise and ALWAYS does the right thing.
  2. Interactive Books. Imagine a book that offers an interactive experience with parents and children while they are reading at night. For example, I say “Jack and Jill went up a Hill” and Jack grunts and says “This is hard work!”, and then I say “to fetch a pail of water”, and I hear a water pouring sound, etc. Pretty fun! In the past this would have been difficult because the talking would have messed up the recognition, but the Phrase Spotting can be embedded even in the middle of a sentence!
  3. Remote-less Home Controls. If you are my age, you might remember the days of having to walk up to a TV set and manually crank the channel and volume knobs. That’s unheard of today, and nobody would ever buy a TV like that…but we do buy thermostats, microwaves, clocks, fans, heaters, lights, radios, and virtually everything else around the house that requires a manual interface. Why not use voice triggers? Sensory is currently working with many different consumer electronics manufacturers to implement this revolutionary recognition technology into a new generation of voice controlled devices.

Lot’s of exciting stuff in development here!! Next time, maybe I’ll write about our voice morphing TTS!

Todd
sensoryblog@sensoryinc.com

I stopped at Walgreens last week to get some new blades for my razor. Usually when I go in to buy new blades I end up just buying a new razor with blades, since it usually costs about the same. This time was different…I bought an electric razor instead.

It’s an Eltron brand electric razor… a cordless rechargeable razor, which actually holds a charge quite well. It includes a flip-out beard trimmer, and a separate nostril trimmer came with it too.  The price was $9.95. It was not on sale by Walgreens, although the standard Eltron packaging said “normally $49.95 now specially priced” (or something to that effect.)

I figured it was going to be junk, but it works just fine, and it even has some nice features like being wet/dry so it can be used in the shower. Now, I would guess that Walgreens likes to make around a 35% margin, which means they probably purchase them for $6 or $7 dollars. The manufacturer needs to markup cost of goods by at least 3x to make a profit, cover shipping, assembly, support and testing, so that means the actual cost must be no more than $2 (or if a distributor is involved it could be a lot less!)

How can Gillette, Norelco, Braun and others compete? They sell electric razors for $50-$150…are they really that much better? I guess the answer must be features and quality, but it wouldn’t surprise me if these companies weren’t hurting pretty badly from such low cost competition.

It’s not so hard for low cost manufacturing companies to copy features and then compete on price. It’s a lot harder to make the investment in R&D to develop differentiating features. I just saw some numbers from Gartner that shows Apple’s success with smartphones. Apple is king when it comes to creating high margin, high feature products with AWESOME user experiences. They are now #3 in the smartphone market with the fastest year over year growth BY FAR of any player.

Global SmartPhone Sales Q2 2009

Apple isn’t resting on its laurels. I’m sure they are determined to be #1, and at the rate they are growing it could happen within a few years. Why are they growing so quickly? They keep adding value to their products. For instance, Apple hasn’t been afraid to change the user interface on their consumer electronics. They were one of the first to embrace touch technologies and now they are embracing voice technologies. Their iPhones are not just phones, but media players, video cameras, navigation systems, and much more as well…and this will continue to grow. Apple will be responsible for taking the smart phone and turning it into a consumer appliance for every room and every purpose imaginable. There are already 85,000 apps in the iPhone store and it’s growing by thousands every month. I don’t think low cost competitors can steal away this business!

Watch out Eltron…when my iPhone has a built in electric razor, I’m throwing you out!

Todd
sensoryblog@sensoryinc.com

See Jane Drive   August 19th, 2009

Since Sensory has gotten very actively involved in providing speech recognition for Bluetooth® based products, I have been asking friends and family about their experiences with various “hands-free” wireless devices.

I recently had an interesting conversation that I’ll share. A woman I know (I’ll call her Jane) uses a Jabra SP-200 Bluetooth® car kit. She says she had tried a wireless headset, but found the car kit much more comfortable and convenient since she really only uses it while driving. Jane found the initial pairing process clumsy and uncomfortable, but after much reading and experimentation is now very happy with her Jabra car kit.

When I pressed Jane more about what she likes and doesn’t like here’s what I found:

Likes:

  1. Doesn’t have to wear it on her head
  2. Call quality is good
  3. Simple and easy to use

Doesn’t like:

  1. Every once in a while it makes a call accidentally
  2. There is no easy way to call people back when she gets disconnected
  3. Doesn’t always understand the different flashing lights

I found this particularly interesting, since on the one hand she said it was simple and easy to use, but also said the lights were confusing, there were control issues, and it was too difficult to easily call someone back.

Of course, if you know Sensory’s BlueGenie™ Car Kit product then you understand that ALL these issues are solved with a BlueGenie™ Voice Interface! (By the way, have you seen the BlueGenie™ car kit video on the Sensory website front page with my daughter Samantha? Smart kid.)

I decided to go a little more in-depth on the SP-200 and looked it up on the web. Interestingly, Jabra markets it as “hands-free” (of course it’s not), and calls it part of the EASY series (it could be a lot easier with BlueGenie™ …) Jabra must understand it’s not Truly Hands-Free, because in some places they call it “hands-free talking.”

Here’s what I learned from the manual:

  • It has 3 LED’s (Blue, Green, and Red) that each mean a different thing. Sometimes they are solid, sometimes they blink, and SOMETIMES THEY BLINK AT DIFFERENT SPEEDS. No wonder Jane found this confusing. Even the same color doing the same thing can mean a different thing in a different mode (e.g. solid blue can mean it’s on, or it can mean it paired successfully).
  • There’s a single big button to tap. This is part of what makes it EASY I guess. However, Jabra differentiates between a TAP and a PRESS. A tap is short and a press is long. And there can be DOUBLE TAPS, and PRESS AND HOLD, and the HOLD can be for 1 second or 5 seconds, etc. For example, you “tap” to answer a call, and you “press” to reject an incoming call, or you double press to redial. Maybe this has something to do with the “accidental” calls Jane mentioned??

I think you absolutely must read and memorize the manual to know how to use this product…and once you do know how to use it, you need to touch it, touch your handset and look at the car kit while driving. That’s not a Truly Hands-Free, Eyes-Free product.

On the other hand, BlueGenie™ car kits will hit the market in 2010, and they will change the world! People will understand what “Truly Hands-Free” really means!

Todd
sensoryblog@sensoryinc.com

The SCID’s are Coming!!!!   August 4th, 2009

No, we’re not under attack from missiles and I’m not referring to results of the current financial crises. I’m talking about Speech Controlled Internet Devices. These are home consumer electronic devices that use a VUI (voice user interface) for the user to interact with the product. The products themselves are able to access data and information from the internet, and they use a client/server speech recognition system to obtain a higher recognition accuracy than possible with a lone client or lone server approach.

So what is Sensory’s role in this? Well, we originated the terminology, and we’re evangelizing the concept in advance of the release of our new chip in September. The new chip is designed to act as the main controller for SCID’s, although Sensory is looking for other partners on the chip side (like Intel or Phillips) for higher end/higher cost SCID’s. By the way, we’re also looking for server-based speech recognition partners (like Microsoft, Google, Vlingo, Novauris, etc.), and even hardware partners like Cisco that know the Wi-Fi and consumer electronics space.

Some of the press and analysts out there are starting to think about the potential for SCID’s. Troy Wolverton (my favorite Mercury News columnist) had a bit of a changed heart after seeing some of my demo’s. Earlier I had contacted him because he thought speech recognition never worked, so I was quite happy that his column was titled “Speech Recognition Technology is Rapidly Improving.”

I’m not going to say a whole lot about SCID’s here because Dan Miller from Opus Research has already done an EXCELLENT job of writing up a summary of our conversation. Dan highlights the HUGE volume opportunity that SCID’s will enable over the coming few years.

A really interesting angle on the SCID’s is the Voice Search opportunity they enable. Most people think of Voice Search as something for telephone handsets (the quick idea of “voice search” is that a multi-billion dollar ad/transaction business will emerge for voice search just like it has for conventional Google-like search, so all the major search players - Microsoft, Google, Yahoo, etc - are interested). The thing is, there will be billions of consumer electronic products hooked up to home internet, potentially with VOIP connections, so handsets won’t be the only devices enabling search opportunities - SCIDs could become a MAJOR driver for search revenues. Michael over at the Kelsey Group keyed off of the interesting opportunities that SCID’s bring to Voice Search and blogged a bit about that.

About the technology - It’s worth noting two very special things within the SCID’s:

  1. Sensory’s new Truly Hands-Free phrase spotting allows SCIDs to be always on always listening, so your voice becomes your remote control for accessing internet data through your SCID - no need to walk up and press buttons.
  2. Sensory will do really simple and accurate speech recognition on the client that provides standalone value when not connected to the internet, but ALSO ASSISTS THE SERVER RECOGNIZER by feeding categorized data along with the query.
    For example, if “Local News” (or time, weather, etc.) is requested from a news-oriented SCID, the client Sensory recognizer can recognize that and stream a local news report, and if “Other News” is requested we can prompt “Please say the location where you would like news reports”. Then Sensory can send a very targeted query to a server based recognizer identifying the recording as a location where recent news is requested. This simplifies the server task, and improves the accuracy of the “say anything” approach to speech queries.

Todd
sensoryblog@sensoryinc.com

I normally don’t watch much television, but when I got home last night my family had gone out to dinner, so I grabbed a veggie burger and plopped down in front of the TV. I decided to watch Band of Brothers (I had read a positive review of it once.)

During a break I saw Apple’s Voice Control advertisement. It was pretty dull as far as commercials go, and it certainly wasn’t the first one to feature a voice recognition consumer product (more on that later…) However, there was something VERY SPECIAL about it. Apple was making a big promotion for speech recognition as a user interface in one of the most successful consumer products ever - the iPhone. Now Apple isn’t just any company - Apple is a company that ABSOLUTELY LEADS in revolutionizing the user experience. From the early days of the Mac/Lisa/II and all the technologies that they developed (or learned from Xerox Parc) for computers, Apple revolutionized the music player and smart phone markets using cap/resistive sensor-based touch screens and text-to-speech capabilities. And now with the the latest iPhone, Apple has finally legitimized the voice interface in a main stream consumer product. By the way, as mentioned in an earlier Blog, this is not Sensory’s recognition, but probably from Nuance, which has a good technology and excellent language coverage.

For so many years, I heard that “Steve doesn’t like speech recognition”, as Apple had very unrealistic demands for performance and accuracy–at least until now. It’s fantastic to see Apple not only implementing speech technologies ACROSS their product line, but also ADVERTISING it as a key feature.

So a trivia question: What was the first broadcast commercial for a speech recognition consumer product? I’m not exactly sure, but I think Tomy and Tiger may have used television ads for Sensory-based products in 1995. In 1996, Hasbro had a GREAT commercial for an educational robot named RADAR that talked to kids featuring Sensory’s first speech chip, the RSC-164. Around 1998, Uniden launched a WONDERFUL commercial during the NCAA Final Four basketball tournament. They hired Konishiki, one of the world’s largest and most famous Sumo Wrestlers at the time, to rip apart phone books with his bare hands, then order a pizza by speaking to his Uniden Voice Dial phone (powered by a Sensory RSC-264 chip!) That was definitely the BEST speech recognition commercial ever!

Todd
sensoryblog@sensoryinc.com

Apple… It’s about time!   June 9th, 2009

OK, I guess I have to blog about Apple’s new iPhone 3GS and the new Voice Control feature. Yeah it’s a big deal. My main comment…It’s about time!

A lot of people and reviewers complained that speech recognition was missing when the iPhone first shipped. I repeatedly heard through the grapevine that “Steve doesn’t like recognition”. Then miraculously, 20 or 30 different voice dialers and various other voice recognition applications appeared in the Apps store. I tried a few and they all worked pretty well. My favorite is NameDial by VoiceActivation (which uses Sensory technology of course!)

For a long time Nuance was rumored to be swinging some kind of deal with Apple, and I guess they did. I haven’t seen the Nuance name mentioned yet, but with 30 different languages supported, I’m very confident Nuance is there behind the scenes…probably not making much money, if I know Apple!

It’s definitely an embedded engine too. If you listen to the demo of the TTS (text-to-speech), you can hear it’s embedded (i.e. not as good as a server based TTS system would allow); it even sounds kind of like a Nuance voice.

So why did I say it’s a big deal??? Voice dialing is old hat, but doing music search is pretty novel. I’ve only known of a couple other MP3 player apps that use speech recognition embedded into the devices.

Today Sensory announced its Truly Hands-Free technology for trigger type phrase spotting. It allows a product to activated solely by voice, with no button pressing necessary. We developed it to go with our BlueGenie car kits so drivers wouldn’t need to be distracted, but maybe Apple wants to license it to run with their new Voice Control!

Hey Steve – Wanna go Hands-Free?

Todd
sensoryblog@sensoryinc.com

I just heard that Sensory has won an award for “North American Voice User Interface Technology Innovation of the Year”… Hey that sounds really good, and boy do we deserve it! The BlueGenie Voice Interface is the best reviewed and most well received speech recognition product I’ve ever heard of!

The award is a product of a certain research company’s very detailed analysis. They do this research to make money selling it, so if you buy this company’s research, you can find out that Sensory won, and why Sensory won. Selling research is not the only way this company makes money; they are willing to sell me the right to announce that Sensory won an award from them.

For only $20,000 I can get the Basic Award Package. This lets me mention the award on my website, in a press release, and I even get to use the market research firm’s logo! Also I get 5 invites to the awards banquet, and an award plaque!

For $40,000, I get my ego stroked with a “Movers and Shakers” interview, plus everything from the basic package, PLUS additional seats at the awards banquet. If I want to pay $60,000, I also get a letter from the Chairman of the research firm congratulating me. Oh….and I get more award plaques.

This reminds me of the letters I get welcoming me into exclusive “Who’s Important” catalogs where I can get my name listed by paying $$. I don’t pay for those listings, and I’m not going to pay much for an award, either (yeah, I’d pay a little, but not more than a thousand dollars!).

Sensory’s money is better spent putting it back into research and development so we can continue making award winning products!

Todd
sensoryblog@sensoryinc.com