Culture | October 26th, 2016
By Chuck Solly
rcsolly@gmail.com
Speech recognition software has made great strides in the last five years. You may have seen your doctor speaking into his digital voice recorder after he has finished examining you. My doctor uses a human transcriber, that is, there is someone somewhere typing his electronic notes into a computer. My hospital, however, uses a machine transcriber and speech recognition to enter the text automatically into the computer.
So which system is faster? More to the point, which system is more accurate? If you are using Windows 7 or 10 you can try transcribing yourself. There is a speech recognition program built into the operating system. Try this out for your class notes or email to friends. If you do try the Windows system, remember to speak slowly and clearly and be prepared to make a lot of corrections to the text until you get the hang of it.
The terms "speech recognition" and "voice recognition" are sometimes used interchangeably. However, the two terms mean different things. Speech recognition is used to identify words in spoken language. Voice recognition is a biometric technology used to identify a particular individual's voice. In my Windows 7 computer, you will be asked to read several paragraphs to train the software to respond to your particular voice. This is voice recognition.
Microsoft announced that its speech recognition technology has achieved a Word Error Rate (WER) of only 5.9%, which the company said was similar to what human transcribers are able to achieve. Yea, maybe, but it isn't all that easy. The deletion rate is significantly smaller for the Microsoft system compared to humans; for substitution, the situation reverses. "Substitution" in this case refers to words being replaced with other words when the recording is being transcribed. "Deletion" refers to words being added wrongfully, and then deleted.
Even if the number of word errors that machines make are on par with humans, machines can still make significantly different ones. Therefore, sentences transcribed by a machine could be much more confusing to humans than they would be if other humans transcribed them, even if the error rate is the same.
Microsoft’s paper also noted that the ASR (Automatic Speech Recognition) system confused “backchannel” or non-words such as “uh-huh,” which is an acknowledgement to what the other speaker is saying, with hesitations such as “uh,” which is a pause before continuing to speak. Humans don’t make these mistakes because they know intuitively what these spoken words represent.
Machine learning-based speech recognition may not yet be quite as good as humans in the real world, but just the fact that word error rates are now similar means that speech recognition software is getting close to achieving true human parity, or even surpassing humans in speech recognition.
The Microsoft services that use and take advantage of speech recognition, such as Cortana, will hopefully be easier to use and less frustrating in the future. Google’s recently announced near-human level accuracy in machine translation, and synthetic speech generation that sounds almost as good as humans, and better-than-human image recognition, all show that maybe we won’t have to type faster.
I still like to type and I still feel better about expressing myself at the keyboard. How fast can you type?
November 23rd 2024
October 17th 2024
October 16th 2024
October 10th 2024
September 19th 2024
By Josette Ciceronunapologeticallyanxiousme@gmail.com What does it mean to truly live in a community —or should I say, among community? It’s a question I have been wrestling with since I moved to Fargo-Moorhead in February 2022.…