Friday, February 12, 2010

Effectiveness of Automatic Speech Recognition Software (ASR)

MOST of us talk to our computers, if only to curse them when a glitch destroys hours of work. Sadly the computer doesn't usually listen, but new kinds of software are being developed that make conversing with a computer rather more productive.

The longest established of these is automatic speech recognition (ASR), the technology that converts the spoken word to text. More recently it has been joined by subtler techniques that go beyond what you say, and analyse how you say it. Between them they could help us communicate more effectively in situations where face-to-face conversation is not possible.

ASR has come a long way since 1964, when visitors to the World's Fair in New York were wowed by a device called the IBM Shoebox, which performed simple arithmetic calculations in response to voice commands. Yet people's perceptions of the usefulness of ASR have, if anything, diminished.

"State-of-the-art ASR has an error rate of 30 to 35 per cent," says Simon Tucker at the University of Sheffield, UK, "and that's just very annoying." Its shortcomings are highlighted by the plethora of web pages poking fun at some of the mistakes made by Google Voice, which turns voicemail messages into text.

What's more, even when ASR gets it right the results can be unsatisfactory, as simply transcribing what someone says often makes for awkward reading. People's speech can be peppered with repetition, or sentences that just tail off.

"Even if you had perfect transcription of the words, it's often the case that you still couldn't tell what was going on," says Alex Pentland, who directs the Human Dynamics Lab at the Massachusetts Institute of Technology. "People's language use is very indirect and idiomatic," he points out.

Despite these limitations, ASR has its uses, says Tucker. With colleagues at Sheffield and Steve Whittaker at IBM Research in Almaden, California, he has developed a system called Catchup, designed to summarise in almost real time what has been said at a business meeting so the latecomers can... well, catch up with what they missed. Catchup is able to identify the important words and phrases in an ASR transcript and edit out the unimportant ones.

It does so by using the frequency with which a word appears as an indicator of its importance, having first ruled out a "stop list" of very common words. It leaves the text surrounding the important words in place to put them in context, and removes the rest.

A key feature of Catchup is that it then presents the result in audio form, so the latecomer hears a spoken summary rather than having to plough through a transcript. "It provides a much better user experience," says Tucker.

Read the full article here ....

No comments:

Post a Comment

Post a Comment