I normally don’t watch much television, but when I got home last night my family had gone out to dinner, so I grabbed a veggie burger and plopped down in front of the TV. I decided to watch Band of Brothers (I had read a positive review of it once.)

During a break I saw Apple’s Voice Control advertisement. It was pretty dull as far as commercials go, and it certainly wasn’t the first one to feature a voice recognition consumer product (more on that later…) However, there was something VERY SPECIAL about it. Apple was making a big promotion for speech recognition as a user interface in one of the most successful consumer products ever - the iPhone. Now Apple isn’t just any company - Apple is a company that ABSOLUTELY LEADS in revolutionizing the user experience. From the early days of the Mac/Lisa/II and all the technologies that they developed (or learned from Xerox Parc) for computers, Apple revolutionized the music player and smart phone markets using cap/resistive sensor-based touch screens and text-to-speech capabilities. And now with the the latest iPhone, Apple has finally legitimized the voice interface in a main stream consumer product. By the way, as mentioned in an earlier Blog, this is not Sensory’s recognition, but probably from Nuance, which has a good technology and excellent language coverage.

For so many years, I heard that “Steve doesn’t like speech recognition”, as Apple had very unrealistic demands for performance and accuracy–at least until now. It’s fantastic to see Apple not only implementing speech technologies ACROSS their product line, but also ADVERTISING it as a key feature.

So a trivia question: What was the first broadcast commercial for a speech recognition consumer product? I’m not exactly sure, but I think Tomy and Tiger may have used television ads for Sensory-based products in 1995. In 1996, Hasbro had a GREAT commercial for an educational robot named RADAR that talked to kids featuring Sensory’s first speech chip, the RSC-164. Around 1998, Uniden launched a WONDERFUL commercial during the NCAA Final Four basketball tournament. They hired Konishiki, one of the world’s largest and most famous Sumo Wrestlers at the time, to rip apart phone books with his bare hands, then order a pizza by speaking to his Uniden Voice Dial phone (powered by a Sensory RSC-264 chip!) That was definitely the BEST speech recognition commercial ever!

Todd
sensoryblog@sensoryinc.com

Apple… It’s about time!   June 9th, 2009

OK, I guess I have to blog about Apple’s new iPhone 3GS and the new Voice Control feature. Yeah it’s a big deal. My main comment…It’s about time!

A lot of people and reviewers complained that speech recognition was missing when the iPhone first shipped. I repeatedly heard through the grapevine that “Steve doesn’t like recognition”. Then miraculously, 20 or 30 different voice dialers and various other voice recognition applications appeared in the Apps store. I tried a few and they all worked pretty well. My favorite is NameDial by VoiceActivation (which uses Sensory technology of course!)

For a long time Nuance was rumored to be swinging some kind of deal with Apple, and I guess they did. I haven’t seen the Nuance name mentioned yet, but with 30 different languages supported, I’m very confident Nuance is there behind the scenes…probably not making much money, if I know Apple!

It’s definitely an embedded engine too. If you listen to the demo of the TTS (text-to-speech), you can hear it’s embedded (i.e. not as good as a server based TTS system would allow); it even sounds kind of like a Nuance voice.

So why did I say it’s a big deal??? Voice dialing is old hat, but doing music search is pretty novel. I’ve only known of a couple other MP3 player apps that use speech recognition embedded into the devices.

Today Sensory announced its Truly Hands-Free technology for trigger type phrase spotting. It allows a product to activated solely by voice, with no button pressing necessary. We developed it to go with our BlueGenie car kits so drivers wouldn’t need to be distracted, but maybe Apple wants to license it to run with their new Voice Control!

Hey Steve – Wanna go Hands-Free?

Todd
sensoryblog@sensoryinc.com

I just heard that Sensory has won an award for “North American Voice User Interface Technology Innovation of the Year”… Hey that sounds really good, and boy do we deserve it! The BlueGenie Voice Interface is the best reviewed and most well received speech recognition product I’ve ever heard of!

The award is a product of a certain research company’s very detailed analysis. They do this research to make money selling it, so if you buy this company’s research, you can find out that Sensory won, and why Sensory won. Selling research is not the only way this company makes money; they are willing to sell me the right to announce that Sensory won an award from them.

For only $20,000 I can get the Basic Award Package. This lets me mention the award on my website, in a press release, and I even get to use the market research firm’s logo! Also I get 5 invites to the awards banquet, and an award plaque!

For $40,000, I get my ego stroked with a “Movers and Shakers” interview, plus everything from the basic package, PLUS additional seats at the awards banquet. If I want to pay $60,000, I also get a letter from the Chairman of the research firm congratulating me. Oh….and I get more award plaques.

This reminds me of the letters I get welcoming me into exclusive “Who’s Important” catalogs where I can get my name listed by paying $$. I don’t pay for those listings, and I’m not going to pay much for an award, either (yeah, I’d pay a little, but not more than a thousand dollars!).

Sensory’s money is better spent putting it back into research and development so we can continue making award winning products!

Todd
sensoryblog@sensoryinc.com

Jawbone Makers Dream   May 1st, 2009

I saw an interesting article in Wired this week.

It was an interview with Hosain Rahman and Alexander Asseily, the founders of Aliph who make the Jawbone headset. Aliph has done a nice job in the design and marketing of their Jawbone, and have made it one of the most widely recognized and best selling Bluetooth® headsets here in the US. Aliph has just announced the Jawbone Prime which offers a variety of new skin colors with amusing names plus a host of “improvements” that some might call “bug fixes” over the Jawbone II.

I’ve got to hand it to Hosain and Alexander for their well articulated strategy in Wired…it’s a vision that I hold as well. Here’s a few quotes from the article, with my comments:

“Aliph hopes to take Jawbone out of the “yet another Bluetooth headset” category and transform it into a device that could become an “audio gateway” for the consumer. Think news, weather, music or even language learning modules combined with a headset in a way that would bring term ‘wearable computing’ to life.”

This sounds like Sensory’s plans for the BlueGenie Voice Interface. In fact we’ve already implemented these plans with companies like Google and Microsoft to allow a speech user interface to access and dial any business in the US, get stock quotes or stock market updates, get weather information, driving directions, and much more. We expect to add customized entertainment, voice to text messaging, social networking with voice input and more! The BlueGenie Voice Interface enables a whole lot of more features without adding buttons!

“We are looking at wearable computing, which we see an opportunity for us to use the audio medium extensively,” says Asseily. Aliph is currently technologies such as speech recognition as a way to bring more functionality to its headset. The company could take a leaf out of Apple’s playbook there. Apple launched its latest iPod shuffle with speech recognition that tells users what song it is playing, the artist and the names of the playlists. Asseily and Rahman won’t reveal when Aliph will release a device with a comparable speech recognition feature but say they are big believers in the technology.

Oops - The editor got that one wrong. Apple uses text-to-speech, not speech recognition for the new Shuffle, but the technology is actually embedded in iTunes on the computer, not in the Shuffle. As a funny side note, it appears Apple used a very mediocre sounding TTS voice for their Windows PC version of iTunes and a good sounding voice for their Mac version. What’s stored on the Shuffle is an audio file moved over from the computer. Use a Mac and you get a good voice!

My hats off to Aliph for understanding the value of speech recognition in a Bluetooth headset for a “future” version of the Jawbone (it’s not in their Prime)…Of course Sensory has the BlueGenie Voice Interface running on CSR Bluetooth chips with speech recognition, compressed speech, text-to-speech, voice morphing and MORE.  This allows headsets to be easier to use, safer while driving, and have lots more features!

BlueAnt’s new Q1 headset will use the BlueGenie Voice Interface and should hit stores this month!!!!

Todd
sensoryblog@sensoryinc.com

If you’ve read through my blogs you know that I really like my Hello Moshi Clock, and my BlueAnt V1 headset. These aren’t the only speech products I like, and some of the others don’t even use Sensory’s technology. I’ll mention a few others of my favorites, in no particular order:

  1. Midomi by Melodis. I’ve been playing with this free download on my iPhone. It lets you sing, hum, type, speak, or even hold your phone up to any song and it identifies it. I’m really amazed how well it works when I sing…and my singing isn’t very on pitch! You get to hear samples of different songs that their recognizer calls up. They record some users (it’s a little weird when you hear someone without any musical backing).
  2. 1800Call411 by Microsoft. This is my favorite of the free information services now available! You can get weather reports, stock market updates, call business, get movie info or buy tickets, find cheap gas, and lots more. The cool thing is that ITS NOT MULTI-MODAL. Multimodal interfaces are great when you can sit staring and playing with your cellphone, but HORRIBLE when driving. Goog-411 is nice too, and possibly more accurate than Microsoft, but business listings alone does not suffice. Microsoft has the right idea…Google is being too conservative (are those crazy young guys getting gray hairs?).
  3. Vlingo. Another free download for my iPhone. This is the most accurate “say anything” interface I have ever seen. Unconstrained speech is a real bitch, and Vlingo has the first usable unconstrained interface I have ever tried. It usually works. Even when its not 100% accurate it does speed entry. I wish they had a version that was 100% voice interface. I like the mono-modal interfaces where I talk and it talks back, but these are definitely the toughest to conquer.
  4. Radar by FisherPrice. Maybe I’m being a little nostalgic. This was one of Sensory’s first products released back around 1997, and it may be the best thought out educational play experience of any product we’ve ever done. It was a little robot with a phone. You would talk to it through the phone and play educational games. I can still remember my then 3 year old son, mimicking Radar would say, “What did you say? I can’t hear you. Please talk louder.” It had a true Voice User Interface (that is, there wasn’t a bunch of button and knob backup choices).
  5. Lightswitches by Voice Signal and VOS. These companies used to duke it out over this market that never fully emerged. VOS used Sensory IC’s (actually, so did Voice Signal in their very first implementation). Later, Voice Signal had its own technology that worked surprisingly well. The challenge in these products was to create a hands free trigger command that didn’t false trigger but worked well in noise. They both kept improving but never quite got there for mainstream acceptance.

I could keep going, but I’ll diverge here with thoughts coming from the hands-free trigger lightswitches. These were a huge opportunity that never seemed to emerge. The products were amazingly handy, and worked surprisingly well, given the technical challenge (especially given the huge success of things like the Clapper, which turns into a strobe light when music is played with a strong snare beat). It was the occasional trigger rejections and false fires that kept these from becoming mainstream. Sensory has been working on this hands-free trigger challenge for about 12 years, and we have a new patent pending approach to dealing with it that will enable a whole new class of hands-free trigger devices ranging from lightswitches to clocks, to carkits, to ovens & microwaves, to picture frames,  and other internet connected devices. We should have some interesting announcements with this technology in the months ahead!

Todd
sensoryblog@sensoryinc.com

The Best Vision-Free Products at CES   January 14th, 2009

CES came and went in a very hectic and busy flash. It was a record CES for Sensory, with 21 different products using our chips and technologies being shown across the floors and in the backrooms of hotel suites.
Last year Sensory announced 2 new things for CES, and both have proliferated into lots of successful products:

  • BlueGenie Voice Interface for Bluetooth chips. At CES, at least 4 headsets and 3 carkits were shown using Sensory’s BlueGenie. BlueAnt showed their new Q1 headsets, which is really beautiful and seems to perform even better than the very well regarded V1 – the first headset to ship with the BlueGenie Voice Interface.
  • Timeset technology for clocks. Sensory had several timeset customers (clocks that you talk to to find time, temperature, set time, etc.) that received particular acclaim at the show. The Hello Moshi clock (one of my favorite Sensory products EVER) was labeled by Good Morning America as the Best Product of CES. The Jensen Clock was designated a Top 8 Products of CES by CNN.

The National Federation of the Blind in conjunction with legendary recording artist Stevie Wonder and Sendero gave out awards to the best “Vision-Free” products, and Moshi Clock, BlueAnt V1, and the Accenda IPOD Controller (all Sensory customers) won awards. Sensory was also given an award and was the only technology company to receive one.

Check out the photo of Sensory’s VP Sales, Jeff Rogers, hanging with Stevie and Mike May of Sendero:

Stevie Wonder and Mike May Present Award to Sensory

Stevie Wonder and Mike May Present Award to Sensory

Todd
sensoryblog@sensoryinc.com

Rep Fair and Moshi   November 20th, 2008

Last month Sensory hosted its annual Rep Fair. Once a year, we invite all of our sales and distribution reps from around the world to attend this conference to talk about our new products and technologies.

In past, we have had over two dozen attendees at these conferences, and have rented out large hotel meeting rooms for multiple days of activities. With the down economy, we scaled back a bit and hosted this year’s Rep Fair at a Sensory meeting room. Fifteen people representing eight countries attended, which is not bad considering the doom & gloom economy, but that’s not what I wanted to write about…

This year we demonstrated a few new products featuring Sensory technologies that are just now hitting the market. One of these is the Moshi IVR Alarm Clock by Moshi Lifestyle which uses our Natural Time Set technology. This clock allows a person to set the time and alarms by voice, using simple phrases like “8:45 AM”. This is a pretty tough grammar task for Sensory’s entry level chips which use only 2K RAM and less than 10 MIPS of processing, but the speech recognition is flawless and the product is excellent.

For fun we left the “voice trigger” mode on during the entire two days of nonstop meetings, presentations, and discussions. The voice trigger mode (or continuous listening), is one of the toughest tasks a speech recognizer can do, because the listening window is completely unconstrained, and everything spoken must be analyzed, with the right trigger phrase being responded to and the wrong words or phrases being rejected.

For this clock the trigger is “Hello Moshi.” I was expecting the clock to false trigger about once every hour, which would have been normal for an older technology release. However, our latest code is more finely tuned and calibrated for noisy environments, and after about four hours with no false triggers, I just had to check to see if it was even still on. So in the middle of a presentation on Sensory’s new VPC chip (not yet announced!), I said “Hello Moshi”…low and behold Moshi responded. It was on and listening the whole time, but never false triggered.

We went a full 2 days with no false triggers, and all the Sensory employees and old time reps were blown away!

Todd
sensoryblog@sensoryinc.com

Posted in ICs | No Comments »

It’s All About the Music!   October 1st, 2008

My first startup was a company called ESS Technologies, which originally stood for Electronics Speech Systems. We started out as a software speech synthesis developer, and found mild success making the early Commodore 64 and Apple II games speak. Our claim to fame was that every one of the 20 or so games that licensed our technology made it to the Billboard top 10 of software games.

ESS moved into speech chips, and sales grew dramatically when talking books started shipping. ESS’s sales really took off though, when music synthesis industry pioneer Roi Peers started running music on our chips This enabled ESS to release a single chip IC that essentially removed the need for Creative Sound Blaster boards on portable PC’s. It was ESS’s music chip sales that launched it as the most successful semiconductor IPO of 1995, and sales increased exponentially from tens of millions of dollars per year to hundreds of millions of dollars per year.

I love music. I used to work in the music industry, and I’ll play just about any instrument I can get my hands on. I used to hear the statistic that 1/10 people consider themselves a musician and 9/10 people want to be a musician. I don’t know if that’s true, but it’s reasonable. I just read in my Costco Connection magazine (must have reading while I wait for programs to load on my slow computer) a few interesting things about music game software:

- 2006 sales were $250 Million
- 2007 sales were $1.3 Billion
- 31% of PS2 dollars were spent on Guitar Hero when Guitar Hero was released
- Guitar Hero alone made over $820 million more than Mario and Halo combined!

Sensory hasn’t yet announced our next generation chip, but for those that read my blog, here’s a sneak preview…This chip is called the VPC and will be an AWESOME low cost music chip (while also offering many other hi-quality audio technologies including speech recognition and synthesis):

  • STEREO 16 bit DAC output - Sensory’s current generation IC’s are mono 12 bit devices, making them applicable only for low fidelity toys.
  • 32 voice MIDI synthesis - it can play midi files and access the large available content base
  • MP3 decoding - yes, it’s an MP3 player!
  • Mixer/Effects - Reverb, EQ, echo and other effects are included
  • Sampler - we have a 16 bit stereo ADC so the chip can also record sounds

So is this the next great music chip to take over the Pro-audio market? NO. It’s also not a high-end audiophile-quality MP3 playback device. But it is very low cost, and 99% of the population couldn’t hear the difference and would rather save the money!

Maybe it’s the chip for a new generation of low cost Guitar Hero like instruments that can be played and jammed on in a stand alone or “group” jam environment. It should start shipping in 2009…but no guarantees about any features, ship dates, or anything else…it hasn’t been announced yet.

Todd
sensoryblog@sensoryinc.com

Posted in ICs | No Comments »

I hope it’s not Gaudy…   September 24th, 2008

Google launched a new “audio search indexing experiment that allows users to find spoken words inside videos.” Google Audio Indexing (GAUDI) was developed by Google Labs (is that where Nuance founder and hot jazz guitarist Michael Cohen lives these days?).

Gaudi is a fun name. Anyone that’s been to Barcelona has seen the unique art nouveau style of design and architecture from the very famous Antoni Gaudi.

Ironically, Gaudi sounds kind of like Gaudy which Websters defines as “ostentatiously or tastelessly ornamented”. Now I’m a fan of Gaudi (the architect), but I could see how a critic might describe some of Gaudi’s works as “tastelessly ornamented”. They certainly can be ornamented and “tasteless” is just a matter of opinion. A quick Googling shows that gaudy has ancient latin roots and no relationship to Gaudi.

Anyways…Gaudi (the software) “transforms spoken words into text and then indexes that text using search technology–users searching for spoken words inside video clips will be able to jump to portions of a video where the searched words are spoken.” Pretty cool! Isn’t that what Paul Leggo was doing over at Virage 10 or 15 years ago?

I applaud Google for bringing this to the masses. The press release then gives its standard Google spokesperson quote “Google’s mission is to organize the world’s information and make it universally accessible and useful”. They left off the part about making money through ad content and hierarchy’s of information based on what advertisers will pay.

Hey, I love all this free access to content and information. I think its fine to have some advertising on the side bars, it’s certainly fair for companies to make money. I don’t mind transaction commissions, ads, etc. as long as I know when it’s happening. I wish there was a law though that forced disclosure anytime search results are ranked by commission or dollars paid.

Someone told me the other day that some mapping programs don’t necessarily take you through the fastest route, but instead bring you by billboards they want you to see. Could that be true? Very scary. Don’t be evil!

Todd
sensoryblog@sensoryinc.com

On Human Misrecognitions…   September 24th, 2008

My very first blog was called “Weapons for Christmas:…I had misunderstood my daughter when she said she wanted “Webkins for Christmas”. I’m always intrigued by errors in human speech recognition. I figure if we can’t do it right with all our sensory and extra sensory powers, then how in the world can a computer ever get it right? Or better yet, how can we apply the sensory tools in people to make our machines better.

One of Sensory’s Bluetooth engineers is a native Chinese speaker. Sometimes I have a difficult time understanding his accent, but he says that our BlueGenie Voice Interface on the headsets he works on always works for him. I wonder is that because Sensory’s technology is so good, or because he is well trained on how to talk by our technology. I suspect it’s a combination of both.

A couple of months ago I was in New York. I had a meeting in a building with a security gate entrance. When I signed in at the counter I was given a barcode pass. Upon exiting, I slid the pass in the security gate, but the gate didn’t open. I tried again and it still didn’t open. The security guard gave me a mean look and said something to me. He was a local guy with a New York accent. I had no idea what he said. I tried swiping my card again…gate still didn’t upon. Guard looked mad and grumbled the same thing again, sounded like “Japushida”. I had no idea what he meant, then he made a pushing motion with his hands…I wasn’t supposed to wait for it to open automatically, I was supposed to “just push it in” (I guess?). The body language clued me in!

I was on the phone yesterday and I heard the person on the other end tell me “My female is slowing down my system”…I quickly corrected that in mind to be “my email is slowing down my system, but the correction didn’t occur until I heard the word “system”…then the context made it all come together. I do remember a split second thinking “why is he talking about ‘his female’”…I didn’t know what he meant and it seemed so politically incorrect. Context certainly helps!

Todd
sensoryblog@sensoryinc.com