NEC develops comprehensive speech recognition technology

October 06, 2011

By SHU NOMURA / Staff Writer

NEC Corp. has developed a new technology to enhance the recognition rates of speech recognition software that converts voice commands into text. The company hopes its applications will be used primarily in schedule management, reception work at hospitals, and for similar tasks that require the input of large quantities of information.

The technology will also be marketed for users of personal computers. NEC plans to commercialize the new technology by the end of next year.

Speech recognition software links vocal input to a dictionary, where a large number of words have been registered, and converts words that match the data into text.

The new technology first classifies each word as a "proper noun," "time," "place," etc., by referring to the context and to the adjacent words. It then picks up an optimal dictionary, such as a "dictionary focused on proper nouns" or a "dictionary focused on time" from among a set of available choices, and compares the voice against it.

For example, when a user says, "I want to watch a TV program tomorrow night where AKB48 appears," the software estimates that "tomorrow" is a word indicating time and that "AKB48" is a proper noun, and converts each word using a dictionary focused on the respective field. While the words will be accumulated as text, NEC also envisages further applications, in which the text is automatically fed into a search engine on the Internet and used to call up websites offering TV program information and other related content.

Existing technologies convert voice to text by comparing entire sentences against a single dictionary with a large number of word entries. That method involved a misrecognition rate of about 15 percent, but this will be reduced to about 7 percent under the new system, company officials said. NEC has filed patent applications for the new technology, which it calls the first of its kind.

Speech recognition technologies are becoming more commonly used. For example, Google Inc.'s Voice Search feature for smartphones displays a list of appropriate bars and restaurants if you say, "Drinking places that feature Spanish dishes."

By SHU NOMURA / Staff Writer
  • 1
submit to reddit
The Asahi Shimbun

The Asahi Shimbun

Toggle
  • The Asahi Shimbun