Haven’t blogged in a long time…I have plenty to say but have just been too busy. That’s good news. Sensory is signing up new deals at a very rapid rate, so 2011 should be an excellent year for us. I declare the economic recovery in full swing (although I do have some trepidation it could be short lived). Right now my biggest issue is chip SUPPLY! We’ve actually had some trouble getting enough chips (this is endemic to the entire chip market right now!). Luckily, our software business is exploding and a growing percentage of overall revenues is not dependent on buying silicon!

The cool thing is for the first time in Sensory’s 15 year history we are putting text-to-speech into products. We’ve done a handful of deals in just the last couple of months, and I expect that within 2 years we’ll have over 10 million TTS devices that will have hit the market (we’re at around 60 million speech recognition products right now).

I went to Voice Search last week. This is the show that Bill Meisel and AVIOS co-host every year. It’s my favorite speech industry show and pretty much the only one I attend. At the show I spoke on a consumer speech panel and demonstrated Sensory’s Truly Hands-Free Voice Trigger. Nobody thinks that wordspotting can be always on and always listening without false firing - and still catch the trigger word when it’s spoken. Sensory’s spotting technology WORKS! It’s my pet technology right now and I think it will change the world, by making speech recognition TRULY HANDSFREE (that was the title of my presentation)…anyways…I demoed it live. Nobody is supposed to do live speech recognition demos because they always fail (Microsoft has had the misfortune of proving that more than once!), so most people at the conferences show video clips. I know Sensory’s stuff works well, but I got a little nervous when I started talking and I could hear the echo of the microphone, and as I spoke I was hoping it wouldn’t false trigger and totally embarrass me. It didn’t false trigger…then it just had to recognize my trigger words. It got the first and the second one right. Then on the 3rd time the small device started sliding down the podium and the mic got covered up and for a brief moment my heart froze and I thought I was going to need to repeat my trigger word…then all of a sudden I felt my heart exploding as I waited microseconds for the response…then it spoke and it got it! No false fires and 3/3 triggers accurately recognized. Oh the trials and tribulations of a speech industry veteran! The technology is great and in a car it’s nearly flawless; it was this new acoustical environment that made me nervous. It came through!!!

So…Apple acquired SIRI, Inc., an iPhone developer that supplies a personal assistant application featuring speech recognition. Cool. That means Apple is in the game - the speech game, with apparently a slightly different twist than Microsoft or Google. All 3 companies are investing in speech recognition. But Apple is doing very light investing while Google and Microsoft are HEAVILY invested. Apple apparently isn’t using any of its home grown technologies as they keep licensing Nuance…and SIRI uses a Nuance engine as well. SIRI is a voice concierge type service that uses the Nuance recognizer then throws a layer of “meaning” interpretation or “intelligence” into the process. Anyways, I’m glad Apple is taking voice control seriously…they’re gonna have a tough time catching up with Google. My take is the Google stuff works best right now. I was playing with a Nexus One phone and the recognition on it is really amazing. BING is pretty good too and has wrapped better apps around their technology in BING411.

I remember a Keynote talk 15 years ago at a speech conference titled something like “the Ever-Imminent-Never-Arriving Speech Bonanza”…well it’s finally here, and I have to thank Google and Microsoft (and Vlingo too!) for clearly taking us over the hurdle and making speech recognition accessible and usable by the masses. Now it’s time for Apple to kick in and do its part…and now that HP has acquired Palm, it will need to get in the game too. I don’t even know if HP has a speech recognition team, but if they don’t they will soon. So will Cisco. So will all the major consumer electronics and automotive companies! Our time has come!!! Speech Recognition has arrived and is working for the masses! It will just get better!

Todd
sensoryblog@sensoryinc.com

The SCID’s are Coming!!!!   August 4th, 2009

No, we’re not under attack from missiles and I’m not referring to results of the current financial crises. I’m talking about Speech Controlled Internet Devices. These are home consumer electronic devices that use a VUI (voice user interface) for the user to interact with the product. The products themselves are able to access data and information from the internet, and they use a client/server speech recognition system to obtain a higher recognition accuracy than possible with a lone client or lone server approach.

So what is Sensory’s role in this? Well, we originated the terminology, and we’re evangelizing the concept in advance of the release of our new chip in September. The new chip is designed to act as the main controller for SCID’s, although Sensory is looking for other partners on the chip side (like Intel or Phillips) for higher end/higher cost SCID’s. By the way, we’re also looking for server-based speech recognition partners (like Microsoft, Google, Vlingo, Novauris, etc.), and even hardware partners like Cisco that know the Wi-Fi and consumer electronics space.

Some of the press and analysts out there are starting to think about the potential for SCID’s. Troy Wolverton (my favorite Mercury News columnist) had a bit of a changed heart after seeing some of my demo’s. Earlier I had contacted him because he thought speech recognition never worked, so I was quite happy that his column was titled “Speech Recognition Technology is Rapidly Improving.”

I’m not going to say a whole lot about SCID’s here because Dan Miller from Opus Research has already done an EXCELLENT job of writing up a summary of our conversation. Dan highlights the HUGE volume opportunity that SCID’s will enable over the coming few years.

A really interesting angle on the SCID’s is the Voice Search opportunity they enable. Most people think of Voice Search as something for telephone handsets (the quick idea of “voice search” is that a multi-billion dollar ad/transaction business will emerge for voice search just like it has for conventional Google-like search, so all the major search players - Microsoft, Google, Yahoo, etc - are interested). The thing is, there will be billions of consumer electronic products hooked up to home internet, potentially with VOIP connections, so handsets won’t be the only devices enabling search opportunities - SCIDs could become a MAJOR driver for search revenues. Michael over at the Kelsey Group keyed off of the interesting opportunities that SCID’s bring to Voice Search and blogged a bit about that.

About the technology - It’s worth noting two very special things within the SCID’s:

  1. Sensory’s new Truly Hands-Free phrase spotting allows SCIDs to be always on always listening, so your voice becomes your remote control for accessing internet data through your SCID - no need to walk up and press buttons.
  2. Sensory will do really simple and accurate speech recognition on the client that provides standalone value when not connected to the internet, but ALSO ASSISTS THE SERVER RECOGNIZER by feeding categorized data along with the query.
    For example, if “Local News” (or time, weather, etc.) is requested from a news-oriented SCID, the client Sensory recognizer can recognize that and stream a local news report, and if “Other News” is requested we can prompt “Please say the location where you would like news reports”. Then Sensory can send a very targeted query to a server based recognizer identifying the recording as a location where recent news is requested. This simplifies the server task, and improves the accuracy of the “say anything” approach to speech queries.

Todd
sensoryblog@sensoryinc.com

I normally don’t watch much television, but when I got home last night my family had gone out to dinner, so I grabbed a veggie burger and plopped down in front of the TV. I decided to watch Band of Brothers (I had read a positive review of it once.)

During a break I saw Apple’s Voice Control advertisement. It was pretty dull as far as commercials go, and it certainly wasn’t the first one to feature a voice recognition consumer product (more on that later…) However, there was something VERY SPECIAL about it. Apple was making a big promotion for speech recognition as a user interface in one of the most successful consumer products ever - the iPhone. Now Apple isn’t just any company - Apple is a company that ABSOLUTELY LEADS in revolutionizing the user experience. From the early days of the Mac/Lisa/II and all the technologies that they developed (or learned from Xerox Parc) for computers, Apple revolutionized the music player and smart phone markets using cap/resistive sensor-based touch screens and text-to-speech capabilities. And now with the the latest iPhone, Apple has finally legitimized the voice interface in a main stream consumer product. By the way, as mentioned in an earlier Blog, this is not Sensory’s recognition, but probably from Nuance, which has a good technology and excellent language coverage.

For so many years, I heard that “Steve doesn’t like speech recognition”, as Apple had very unrealistic demands for performance and accuracy–at least until now. It’s fantastic to see Apple not only implementing speech technologies ACROSS their product line, but also ADVERTISING it as a key feature.

So a trivia question: What was the first broadcast commercial for a speech recognition consumer product? I’m not exactly sure, but I think Tomy and Tiger may have used television ads for Sensory-based products in 1995. In 1996, Hasbro had a GREAT commercial for an educational robot named RADAR that talked to kids featuring Sensory’s first speech chip, the RSC-164. Around 1998, Uniden launched a WONDERFUL commercial during the NCAA Final Four basketball tournament. They hired Konishiki, one of the world’s largest and most famous Sumo Wrestlers at the time, to rip apart phone books with his bare hands, then order a pizza by speaking to his Uniden Voice Dial phone (powered by a Sensory RSC-264 chip!) That was definitely the BEST speech recognition commercial ever!

Todd
sensoryblog@sensoryinc.com

Rep Fair and Moshi   November 20th, 2008

Last month Sensory hosted its annual Rep Fair. Once a year, we invite all of our sales and distribution reps from around the world to attend this conference to talk about our new products and technologies.

In past, we have had over two dozen attendees at these conferences, and have rented out large hotel meeting rooms for multiple days of activities. With the down economy, we scaled back a bit and hosted this year’s Rep Fair at a Sensory meeting room. Fifteen people representing eight countries attended, which is not bad considering the doom & gloom economy, but that’s not what I wanted to write about…

This year we demonstrated a few new products featuring Sensory technologies that are just now hitting the market. One of these is the Moshi IVR Alarm Clock by Moshi Lifestyle which uses our Natural Time Set technology. This clock allows a person to set the time and alarms by voice, using simple phrases like “8:45 AM”. This is a pretty tough grammar task for Sensory’s entry level chips which use only 2K RAM and less than 10 MIPS of processing, but the speech recognition is flawless and the product is excellent.

For fun we left the “voice trigger” mode on during the entire two days of nonstop meetings, presentations, and discussions. The voice trigger mode (or continuous listening), is one of the toughest tasks a speech recognizer can do, because the listening window is completely unconstrained, and everything spoken must be analyzed, with the right trigger phrase being responded to and the wrong words or phrases being rejected.

For this clock the trigger is “Hello Moshi.” I was expecting the clock to false trigger about once every hour, which would have been normal for an older technology release. However, our latest code is more finely tuned and calibrated for noisy environments, and after about four hours with no false triggers, I just had to check to see if it was even still on. So in the middle of a presentation on Sensory’s new VPC chip (not yet announced!), I said “Hello Moshi”…low and behold Moshi responded. It was on and listening the whole time, but never false triggered.

We went a full 2 days with no false triggers, and all the Sensory employees and old time reps were blown away!

Todd
sensoryblog@sensoryinc.com

Posted in ICs | No Comments »

It’s All About the Music!   October 1st, 2008

My first startup was a company called ESS Technologies, which originally stood for Electronics Speech Systems. We started out as a software speech synthesis developer, and found mild success making the early Commodore 64 and Apple II games speak. Our claim to fame was that every one of the 20 or so games that licensed our technology made it to the Billboard top 10 of software games.

ESS moved into speech chips, and sales grew dramatically when talking books started shipping. ESS’s sales really took off though, when music synthesis industry pioneer Roi Peers started running music on our chips This enabled ESS to release a single chip IC that essentially removed the need for Creative Sound Blaster boards on portable PC’s. It was ESS’s music chip sales that launched it as the most successful semiconductor IPO of 1995, and sales increased exponentially from tens of millions of dollars per year to hundreds of millions of dollars per year.

I love music. I used to work in the music industry, and I’ll play just about any instrument I can get my hands on. I used to hear the statistic that 1/10 people consider themselves a musician and 9/10 people want to be a musician. I don’t know if that’s true, but it’s reasonable. I just read in my Costco Connection magazine (must have reading while I wait for programs to load on my slow computer) a few interesting things about music game software:

- 2006 sales were $250 Million
- 2007 sales were $1.3 Billion
- 31% of PS2 dollars were spent on Guitar Hero when Guitar Hero was released
- Guitar Hero alone made over $820 million more than Mario and Halo combined!

Sensory hasn’t yet announced our next generation chip, but for those that read my blog, here’s a sneak preview…This chip is called the VPC and will be an AWESOME low cost music chip (while also offering many other hi-quality audio technologies including speech recognition and synthesis):

  • STEREO 16 bit DAC output - Sensory’s current generation IC’s are mono 12 bit devices, making them applicable only for low fidelity toys.
  • 32 voice MIDI synthesis - it can play midi files and access the large available content base
  • MP3 decoding - yes, it’s an MP3 player!
  • Mixer/Effects - Reverb, EQ, echo and other effects are included
  • Sampler - we have a 16 bit stereo ADC so the chip can also record sounds

So is this the next great music chip to take over the Pro-audio market? NO. It’s also not a high-end audiophile-quality MP3 playback device. But it is very low cost, and 99% of the population couldn’t hear the difference and would rather save the money!

Maybe it’s the chip for a new generation of low cost Guitar Hero like instruments that can be played and jammed on in a stand alone or “group” jam environment. It should start shipping in 2009…but no guarantees about any features, ship dates, or anything else…it hasn’t been announced yet.

Todd
sensoryblog@sensoryinc.com

Posted in ICs | No Comments »

Power Outages   January 7th, 2008

2008 is here, and in the Silicon Valley it comes with a series of powerful storms, winds up to 60 miles per hour and rain, rain, rain. Of course, what this means is power outages are upon us; a short one and the house will probably stay cold enough to not worry about the food going bad. We’ll build fires, light candles, load the flash lights with batteries, and when the power comes on, spend way too much time resetting our clocks.

Yeah, that’s my pet peeve. No one ever created a standard way to reset the time on clocks, so it always takes a bit of systematic experimentation to figure out exactly how to reset clocks and appliances like VCR’s.

But wait-Have you seen Sensory’s new time-set technology? This inconvenience could be a thing of the past if the clock uses a Sensory chip. Check out the YouTube video.

This is what customers have been asking us for years and years and the accuracy was never quite there, but we kept working on it. I’m happy to say, we’re there! Sensory now has a chip that sells in volumes for under $2 that can be integrated into clocks and uses voice recognition to set the alarm time with natural phrases like “Five thirty-five AM”. Recognizing digits in a natural context is one of the Holy Grails in speech recognition, and I’m proud to say ours works very accurately. Of course, shutting off alarms by voice commands or creating hands-free requests like “What time is it?” can be done as well.

I hope to see low-cost clocks for under $30 hit the market by the end of the year that incorporate Sensory’s chips featuring this awesome new technology. It’s REALLY COOL, and I’m REALLY EXCITED about it!

Todd
sensoryblog@sensoryinc.com

Posted in ICs | No Comments »

Weapons for Christmas   August 29th, 2007

My seven year old daughter Sydney recently asked, “Daddy, when is Christmas?” I asked “Why?” She said (or more accurately, I heard) “Because I know what I want. I want weapons.”

I was a bit taken aback. But after some clarification I realized that I hadn’t heard her properly (more on her intent if you read on.) It’s always interesting to me when people hear things wrong. Humans have so many great clues about intent and context, yet we still occasionally get the wrong message. The best speech recognition systems actually try to take contextual probabilities into account. Dictation systems don’t just perform speech recognition, but get into “meaning” recognition as well.

I remember one system I read about from Bell Labs that included a camera to help improve accuracy by watching the speaker’s mouth and performing lip-reading. Humans utilize this approach too; I used to find it mildly amusing (back before I had my eyes lasered) to realize that when I took my contacts out I couldn’t always understand what people were saying. Too many years of playing in loud rock bands has damaged my hearing and I have learned to compensate by watching lips while I listen. The makers of the Jawbone Bluetooth headset have employed an interesting approach to noise reduction by “listening” in on the jawbone movements to help isolate the persons speaking from the background noises.

Okay, so what does my daughter want for Christmas? Webkinz, not weapons. Webkinz are the latest Virtual Pet toy craze. Virtual Pets have been around for a long time, but really exploded with Bandai’s 1997 hit Tamagotchi, which sold something like 40-50 million units. Tiger’s 1998 phenomenal hit Furby (which used a Sensory/TI SC chip in its original introduction and a Sensory RSC chip in it’s 2005 re-introduction) was a big enough sensation that Hasbro bought the company for over $300 million dollars.

The original Tamagotchi was a simple little virtual pet contained in a watch-like device with a small display. A few buttons enabled feeding, sleeping and other activities like exercise. The first Furby added mechanics to the mix by making a virtual creature that could move around and speak “Furbish”, while products like Sony’s Aibo and Furby II added more complex mechanics along with speech recognition. Webkins use the Internet to take one step forward in technology. Users can log onto their accounts and do various things to and with their pets, but the “pets” themselves are really a step backwards in simplicity. No mechanics, no speech recognizers, not much really but a ball of plush!

Nevertheless, the idea of products that interact with the Internet is big today and it will just get bigger. Even my four year old son goes onto the Internet to play games. Kids are growing up with big monitors, big memories, and powerful processors, and toy companies can make their products more powerful by taking advantage of this. I think more and more toy products will have online personas and the ability to download new gameplays, voices and recognition sets in the near future. In fact, watch out for a new chip from Sensory in 2008 that includes a USB port to make this kind of communication really easy. This is not a new idea for Sensory; some of our early patents made claims for this kind of stuff, and it’s really fun and exciting to see it all coming to life!

Todd
sensoryblog@sensoryinc.com

Posted in ICs | No Comments »