Machine Translation and Security

Machine Translation and Security

Public machine translation platforms such as Google Translate and Microsoft Bing have been in extensive use for many years. They cover a wide range of language combinations, come with an intuitive user interface and are almost free of charge. At first glance, this offering seems almost too good to be true. However in reality, the quality of translations produced by these systems is only one of the concerns. Using them also has some security implications that cannot be ignored.

The “free” aspect of the public machine translation platforms may seem very alluring at face value. Millions of users all over the world use them, accepting lower quality of translations and occasionally having some fun laughing at the results, when they are ridiculously wrong. Many of them do not realize that in fact using those solutions is not free at all.

What “free” really means

In reality, users pay a high price for their machine translation by unwittingly sharing their personal data with third parties, which can then use it to their own means.

There is no way for Internet users to know who has access to the information they have given away by uploading their files to a free machine translation platform, what exactly happens to it, where it is stored, and how it is used by the parties that have gained access to it.

In the era where technology progresses at a very fast pace, it is all too easy to harvest not only the text uploaded but also the IP address and location of the user as well as the timestamp, which makes it straightforward to link that data back to a specific company.

Security risks

The above might not be so frightening if you want to machine translate a post you found on Facebook about to your favorite pop star or a piece of news related to current affairs you found on a news website. That content is already publicly available and worst case scenario the data from your search could be used to track back which type of news you are interested in.

In the corporate environment however, a lot of the content that employees handle daily – contracts, HR information, financial reports, purchase records, etc. – is confidential. Releasing that content into public machine translation systems, may cause serious data privacy breaches as the data uploaded becomes public and searchable by other Internet users.

A hand stopping domino blocks from falling

This risk is particularly acute for legal firms, which are known to work under top security measures.

Legal firms collaborate with their clients and suppliers under full confidentiality and any breach of data privacy could not only cost them hefty sums of money in fines but also their reputation.

Yet, due to the lack of awareness employees of those businesses, being under pressure to get documents translated in a very short time frame, often resort to using the likes of Google Translate, thus putting sensitive information at risk.

The solution

The way to avoid data privacy risks is not to simply stop using machine translation. With so much content needing translation in the modern world, it seems unavoidable that machine translation will remain part of the picture. Particularly, because being unable to readily obtain instant translations can affect making timely business decisions and often results in missed opportunities.

The answer instead is to invest in secure, customized machine translation systems. Those can be deployed on each company’s own servers, can be encrypted if necessary and are reachable via a secure data connection, which is not something that public MT systems can guarantee.

They stop any data from leaving the securing environment and therefore prevent it from being harvested and used by any third parties.

There are a number of providers offering this service and although their solution entails installation, setup and usage costs, in the long run it brings a return on investment in terms of security, safeguarding reputation, and also quality of the translations.

Custom machine translation systems can be tailored to the required field, which can be anything ranging from technical manuals to celebrity gossip, style, tone of voice, terminology and even output length. They are known to return significantly better translation results than “free” generic MT solutions that are not adapted to any specific subject domain or style.

Although it is not realistic to expect that all businesses in the world will be open to deploying private, customized MT systems on their servers, promoting awareness of the risks stemming from using public MT solutions is the first step into the right direction.

Written by Kasia Kosmaczewska
Kasia Kosmaczewska
Kasia Kosmaczewska is Machine Translation Programme Manager at TranslateMedia. She has extensive linguistic experience and a keen interest in machine learning. She spends her free time reading about socio-politics, practicing pilates and travelling.

Related posts