I hope it’s not Gaudy… September 24th, 2008
Google launched a new “audio search indexing experiment that allows users to find spoken words inside videos.” Google Audio Indexing (GAUDI) was developed by Google Labs (is that where Nuance founder and hot jazz guitarist Michael Cohen lives these days?).
Gaudi is a fun name. Anyone that’s been to Barcelona has seen the unique art nouveau style of design and architecture from the very famous Antoni Gaudi.
Ironically, Gaudi sounds kind of like Gaudy which Websters defines as “ostentatiously or tastelessly ornamented”. Now I’m a fan of Gaudi (the architect), but I could see how a critic might describe some of Gaudi’s works as “tastelessly ornamented”. They certainly can be ornamented and “tasteless” is just a matter of opinion. A quick Googling shows that gaudy has ancient latin roots and no relationship to Gaudi.
Anyways…Gaudi (the software) “transforms spoken words into text and then indexes that text using search technology–users searching for spoken words inside video clips will be able to jump to portions of a video where the searched words are spoken.” Pretty cool! Isn’t that what Paul Leggo was doing over at Virage 10 or 15 years ago?
I applaud Google for bringing this to the masses. The press release then gives its standard Google spokesperson quote “Google’s mission is to organize the world’s information and make it universally accessible and useful”. They left off the part about making money through ad content and hierarchy’s of information based on what advertisers will pay.
Hey, I love all this free access to content and information. I think its fine to have some advertising on the side bars, it’s certainly fair for companies to make money. I don’t mind transaction commissions, ads, etc. as long as I know when it’s happening. I wish there was a law though that forced disclosure anytime search results are ranked by commission or dollars paid.
Someone told me the other day that some mapping programs don’t necessarily take you through the fastest route, but instead bring you by billboards they want you to see. Could that be true? Very scary. Don’t be evil!
On Human Misrecognitions… September 24th, 2008
My very first blog was called “Weapons for Christmas:…I had misunderstood my daughter when she said she wanted “Webkins for Christmas”. I’m always intrigued by errors in human speech recognition. I figure if we can’t do it right with all our sensory and extra sensory powers, then how in the world can a computer ever get it right? Or better yet, how can we apply the sensory tools in people to make our machines better.
One of Sensory’s Bluetooth engineers is a native Chinese speaker. Sometimes I have a difficult time understanding his accent, but he says that our BlueGenie Voice Interface on the headsets he works on always works for him. I wonder is that because Sensory’s technology is so good, or because he is well trained on how to talk by our technology. I suspect it’s a combination of both.
A couple of months ago I was in New York. I had a meeting in a building with a security gate entrance. When I signed in at the counter I was given a barcode pass. Upon exiting, I slid the pass in the security gate, but the gate didn’t open. I tried again and it still didn’t open. The security guard gave me a mean look and said something to me. He was a local guy with a New York accent. I had no idea what he said. I tried swiping my card again…gate still didn’t upon. Guard looked mad and grumbled the same thing again, sounded like “Japushida”. I had no idea what he meant, then he made a pushing motion with his hands…I wasn’t supposed to wait for it to open automatically, I was supposed to “just push it in” (I guess?). The body language clued me in!
I was on the phone yesterday and I heard the person on the other end tell me “My female is slowing down my system”…I quickly corrected that in mind to be “my email is slowing down my system, but the correction didn’t occur until I heard the word “system”…then the context made it all come together. I do remember a split second thinking “why is he talking about ‘his female’”…I didn’t know what he meant and it seemed so politically incorrect. Context certainly helps!
Voice User Interfaces Everywhere! September 17th, 2008
I was talking with an industry analyst today. He had gotten the BlueAnt V1 Bluetooth headset with Sensory’s BlueGenie technology, and he was very pleasantly surprised by how it was both EASIER to use yet MORE FEATURE RICH all at the same time (OK, I’ll include my favorite reviewer’s quotes below…and by the way, it’s also SAFER!).
Let me sidetrack a bit, though, before talking about the industry analyst call. The BlueAnt V1 is really a great product, and a true innovation for the speech industry. First of all it WORKS. Not only does it really work, but it’s also the smallest speech I/O system to ever ship…and it’s the first “complex” consumer product with a true voice user interface. By “complex” I mean voice is used for more than simple on/off kinds of functions (like a voice lamp). All the other voice based consumer products that have hit the market use speech as a feature. These are products like toys, cellphones, and remote controls that are designed to be held, looked at, and touched. For example, a cellphone is a multi-modal product…it has a keyboard and a display. It’s designed to be used while looking at it. A headset is totally different. It’s designed for use WITHOUT looking at it and basically without even touching it! A voice user interface is the perfect solution for Bluetooth headsets, and the BlueGenie interface is really bringing Sensory a lot of recognition (bad pun intended!).
Anyways, the analyst said “I now understand how your BlueGenie Voice Interface makes products easier to use. I don’t understand why touch technology is getting so popular instead of Voice User Interfaces”. Well he hit the nail on the head. It’s very clear that one day voice user interfaces will be everywhere, and will overtake and combine with touch for improved interfaces on products. Voice is easier and more natural and even offers the opportunity for more features without complexity. The BlueAnt V1 doesn’t even need a manual because it’s all contained within the headset!
So why aren’t voice user interfaces everywhere today??? Because speech technologies still need to improve. What Sensory has found though is that for constrained task environments like a Bluetooth headset or repetitive but complex tasks like setting time or adjusting controls on a microwave oven, a voice user interface can very much be the magic solution of today!
Don’t believe me? Go buy a BlueAnt V1 and experience for yourself the magic of a voice user interface - The BlueGenie Voice Interface. (If you have noticed that I really like that BlueAnt product, then you are absolutely right…It’s the best and most important product Sensory has made in its 15 year history!).
Nuance Done Acquiring? September 8th, 2008
I just read an interesting analysis by Ketul Kirtikumar on seekingalpha.com.
Ketul claims, “The acquisition machine which fueled growth at Nuance might be slowing down due to the high debt that Nuance Communications (NUAN) has accumulated in the last two years.” He states their organic growth has been slowing while execs and insiders have been unloading shares.
I have mixed feelings about Nuance ceasing its aggressive acquisition strategy. Nuance’s acquisitions have created a wonderful consolidation in the embedded space, virtually removing all of Sensory’s competitive threats and allowing Sensory to be the only remaining major player with any substantive size in the embedded speech market. A few years back it was ART, Voice Signal and Sensory. Now ART and Voice Signal have been merged/acquired into Nuance. Hey, I like that!
Nuance has a habit of suing companies before acquiring them. This is the reason I’d be glad if they stopped acquiring. Patent infringement lawsuits are such nasty things. Sensory has had to build an arsenal of patents primarily as a defensive measure (even though Nuance is our friend and customer). Suing a company to acquire them seems kind of like spitting on girls to try and get a date.
The latest lawsuits I read about Nuance were with Zi Corp and with Vlingo. Zi is an intelligent text company that competes with Nuance’s Tegic (another acquisition). Zi and Tegic already battled it out years ago on patents and after a long bitter feud they had it all settled, guess not. Vlingo uses IBM’s speech technology, and it appears the lawsuit could be Nuance’s awkward way of courting, or possibly just revenge because a Nuance former CTO left to start Vlingo. Who knows??? I just like battles in the marketplace a lot more than in the courtroom.
So, the really interesting thing about Ketul’s article was the revenue numbers he showed for Nuance’s embedded handset business. It was close to $200M for 2008. Huh??? My wildest guesses from a few years back would have been Tegic was doing $40M, Voice Signal $20M, ART and other Nuance stuff might have totaled another $20M. So how did putting it together grow it from $80M a few years ago to just under $200M in 2008? I don’t think there’s been that much growth in the embedded market. Per unit royalty rates have probably dropped with adoption rates. $193M is like Nuance getting .15 or .20 on every handset sold everywhere in the world. I don’t think so. Let’s see, 30 or 40 cents on half the handsets sold? Nope.
Interestingly Ketul says “Nuance’s embedded solutions are used for voice command in embedded devices and are clearly market leaders in the segment. However, voice command embedded solutions haven’t moved beyond the visionary phase of the technology adoption cycle and show no signs of crossing the chasm at its current rate of usage.” He shows a graph of technology adoption lifecycle that implies Nuance has about a 10% penetration (sounds OK to me for speech, but maybe a little low for adaptive text). So if 2008 has a 1.2B unit market for headsets, and Nuance has penetrated 120M units (10%), then that would imply they are making roughly $1.60/unit. I don’t know any handset guys that would pay even close to that for intelligent text and voice dialing. Go figure!!!
So, something is strange in analyst land, but I hope Ketel is right that the lawsuit to acquisition spree is coming to an end!