06 Aug 2014

Grammatical Habits of Non-native Speakers Offer Linguistic Clues

Machine translation tools are extremely useful in the world of business. They helps professionals from different countries communicate with each other, allowing them to discuss terms and thrash out deals relatively quickly.

The rising popularity of machine translation, particularly among companies – both large and small – that use it to gain access to emerging foreign markets, means more and more is being done to improve it.

Companies are forking out large amounts of money to help plug the holes and bridge the gaps that exist in today’s machine translation technology.

Take technology giant Google, for example, which has invested millions of dollars, perhaps even billions, over the last few years in a bid to perfect its machine translation service – Google Translate.

But despite this global effort and investment to improve machine translation, the key to its future could lie in the grammatical habits of non-native speakers writing essays or various other texts in English.

By looking at the written mistakes of people who do not speak English as their first language, we get a glimpse into the relationship between different languages. This may ultimately help with the development of machine translation into a much more reliable tool.

Machine translation in a nutshell

Machine translation is automated translation. In other words, computer software is used to translate text from one language to another.

There are two types: Rule-Based Machine Translation and Statistical Machine Translation.

The former relies on countless built-in linguistic rules and millions of bilingual dictionaries for each language pair, while the latter uses statistical translation models with parameters that stem from the monolingual and bilingual analysis.

Both ultimately do the same thing, yet there are notable differences. Here we list the pros and cons of each.

Rule-Based Machine Translation

Pros

  • Consistent and predictable quality
  • Good out-of-domain translation quality
  • Knows grammatical rules
  • High performance and robustness
  • Consistency between version

Cons

  • Lack of fluency
  • Hard to handle exceptions to rules
  • High development and customization costs

Statistical Machine Translation

Pros

  • Good fluency
  • Good for catching exceptions to rules
  • Rapid and cost-effective development

Cons

  • Unpredictable translation quality
  • Poor out-of-domain translation quality
  • Doesn’t know grammar
  • High computer and disk space requirements
  • Inconsistency between versions

Given the pros and cons of both, it could be argued that a third, more balanced approach is needed when it comes to machine translation – one that helps users achieve higher quality translations but isn’t too expensive to develop and maintain.

Do essays hold the answer?

Computer scientists at the Massachusetts Institute of Technology (MIT) in the United States recently discovered that grammatical habits in written English reveal linguistic features of the languages of non-native speakers.

These linguistic features could be extremely valuable in the field of electronic translation, perhaps plugging the holes and bridging the gaps where technology companies have tried and failed in the past.

MIT built a system that combed through more than 1,000 English-language essays written by native speakers of 14 different languages, analysing the parts of speech of the words in every sentence of every essay and the relationships between them.

All of the nine languages that are in the Indo-European family were found to be clearly distinct from the five that aren’t, while the Romance languages and the Slavic languages were more similar to each other than they were to the other Indo-European languages.

The findings could be used to predict typological features of a language for which there is no linguistic knowledge.

This includes things like the typical order of subject, object and verb, how negations are formed, plus whether nouns take articles, as well as many other syntactic patterns that linguists use to characterise languages.

Lost in translation

Machine translation has improved dramatically in recent years, but it’s still far from the finished article.

The meaning of a text can sometimes get lost when using Rule-Based Machine Translation and Statistical Machine Translation.

Word-sense disambiguation, for instance, is a problem. This occurs when a word has more than one meaning, as is non-standard speech, or casual speech, which can lead to inaccurate translation and embarrassing errors. Named entries – including people, organizations, companies and places – can also prove extremely difficult.

These are just some of the reasons why electronic translation cannot yet take the place of professional translators, who provide the human touch that is needed to produce accurate translations.

Machine translation is undoubtedly making big strides in its efforts to catch up with translators when it comes to written text, Google Translate is testament to this. Yet it remains considerably off the pace in terms of spoken language.

Spoken language is too quick and fragmented for machine translation. False-starts and unintentional errors make the process particularly hard, while tone of voice, cultural references, idiom and humour add to the challenges.

So, translators can rest assured in the fact that machine translation is still a long way from being good enough for brands to trust with their marketing material.



 
 

Sign up to our newsletter

Get our blog articles straight to your inbox.