How Artificial Intelligence Is Changing Voice Transcription

How Artificial Intelligence Is Changing Voice Transcription


Transcription, an inseparable part of the language industry, requires years of experience to perfect. It takes a lot of practice to adequately capture the subtleties of different accents and other peculiar characteristics people might have when speaking. Not surprisingly, artificial intelligence is currently making a very clear impact on transcription services. Let’s take a look at what part technology plays in this area of language services.

Transcription, just like translation, has a very long history. Also just like translators, transcribers get tired and can make mistakes, which is a very human thing to do.

learning literature

Transcription, just like translation, requires a specific skillset.

In order to ease some of the difficulties that transcription entails, it is becoming more and more common to use some forms of artificial intelligence that can assist in the process.

Microsoft makes a statement

A couple of years ago, Microsoft published results of its study where they compared the error rate in transcription carried out by professional human transcribers and an AI system created by Microsoft Research. Although the exact details of the collaboration are not publicly known, Microsoft hired a third-party service to transcribe a piece of audio.

Professional condenser studio microphone over the abstract photo blurred of Hand adjusting audio mixer, Musical instrument Concept

The fact that there can be multiple speakers and the audio might not be perfectly clear adds complexity to the task of transcribing, also for machines.

The process that the third-party followed was to have one professionally trained person type up the audio and a second person listen to the audio again and correct any errors that the first person might have made in their transcript.

This two-person process resulted in an error rate of 5.9% and 11.5%. Microsoft’s AI system, trained on 2,000 hours of human speech, scored marginally better – 5.9% and 11.3%.

These results pushed Microsoft to make a bold statement – they publicly announced that its speech transcription AI performs better than professional human transcribers.

Although this is way too general a statement to make – the results from Microsoft’s study can only imply that their AI system is better ONLY in certain conditions – since then, Microsoft has rolled out the technology and integrated it into Cortana and Xbox consoles.

Synergy rather than rivalry

But the point is not really about proving who can do better – humans or AI. On one hand, human transcribers are much more apt at correctly interpreting an utterance as a sign of hesitation or affirmation. Machines, on the other hand, struggle with comprehending emotions behind words and sounds that are not words but are made when the speaker is thinking, for example.

AI faces further problems when audio is of poor quality, there are several people talking at the same time or if the dialogue is of very intricate nature.

However, the strength of the AI technology is that it doesn’t get tired as a human does; and with tiredness come mistakes. Even if the accuracy rate of an AI system supporting the transcription process is not very close to the accuracy rate of a human, the machine can still take away some of the hard work from the transcriber, who can then focus on editing the occasional mistakes and perfecting the overall accuracy of the final transcript.

Catchy idea

Microsoft is not the only tech giant who has invested in exploring AI in the context of voice transcription. Amazon Transcribe, available through Amazon Web Services as an API, works in a number of use cases. For instance, it can transcribe customer service phone calls as well as generate subtitles on audio and video content.

The service supports all common audio formats, including WAV and MP3. It also adds a timestamp for every word, making it much easier to locate the audio in the original source by searching for text and verify the accuracy of the transcription.

What makes it even more desirable is that the solution is also able to recognise multiple speakers – something that AI has traditionally struggled with.

Another example of how AI helps with voice-to-text is Otter, a free to use app that uses similar algorithms to Siri and Alexa. Not only does the app have an intuitive interface but it also learns your voice’s nuances over time, which, in turn, makes the transcription outcome gradually improve.

Hand of woman using smartphone on wooden table,Space for text or design.

Mobile phone applications make voice transcription more easily accessible for everyday users.

The app has its downsides too as it tends to introduce spelling mistakes in the transcript but thanks to its user-friendliness it is ideal for individual, non-commercial use.

As these technologies improve, we will find them become more and more commonplace. This could be good news for professional transcribers as in the near future they might be able to considerably improve their daily throughput without having to work any harder.

Related posts

Subscribe to our newsletter