Wikipedia has recently integrated Google Translate into its content translation tool to provide a more streamlined way to localise its articles. This change has been long requested by the Wikipedia community and has expanded the number of supported languages. However, with machine translation’s rising popularity, it has also become apparent that it can carry signs of gender bias.
In recent years, we have seen a keen interest across the globe in exposing and actively fighting gender bias in all shapes and forms.
As developments in artificial intelligence and machine translation progress, there are more and more related ethical questions that arise. It turns out that MT is also not free from gender bias.
What we see is that machine translation systems, with Google Translate as a prime example, will translate such English words as “strong” or “doctor” into other languages into a masculine form, while words such as “beautiful” or “nurse” will be translated into a feminine form.
English is a much more gender-neutral language than Spanish, for example. That is why the bias demonstrates itself in translations from English into less gender-neutral languages like Spanish rather than the other way round.
The reason behind this phenomenon is very simple. Machine translation engines simply “believe” what they are told – they were trained on millions of bilingual sentences which featured gender bias in the first place.
As a result, the machine translation that is produced simply reflects the pre-existing bias that is already there in the data that they were trained on.
We live in a biased world, so the training data will also reflect that.
Neural networks’ strength lies in being able to automatically uncover and learn linguistic patterns and associations. This has unfortunately turned out to be the system’s weakness at the same time as it reproduces the biased patterns it’s been taught.
To tackle this issue, Google Translate has recently introduced a double translation as opposed to showing just one translation option. It now provides both feminine and masculine forms whenever possible to reduce the impact of the bias programmed into its neural MT models.
Google is not the only institution that is actively trying to fight the gender bias in machine translation. Researchers from MIT and the Qatar Computing Research Institute have created an application that allows users to trace and visualise which parts of the neural machine translation network, also called “neurons”, are responsible for creating specific parts of the translation. What is more, it allows users to see what importance is assigned to those individual neurons.
With that knowledge, it is possible to trace back which neurons in the network are responsible for gender. Once those are identified, there might be a way to manipulate them in a way to force the system to translate the words more in the opposite gender and therefore promote more gender equality.
Some research suggests that de-biassing machine translation models does not have any impact on their performance but it still not entirely clear though whether reducing the gender bias will not cause any detriment to the quality of the translation.
It is important to realise that biases might demonstrate themselves on many levels, not only on the basis of gender. It can be anything ranging from age, race to minority groups. That makes it even harder to find potential solutions to de-bias machine translation systems.
While it is important to reduce and, wherever possible, eliminate the gender bias, it is clear that in the context of machine translation there are no straightforward answers to this issue. Each language expresses gender differently, in less and more obvious ways. A one-size-fits-all solution has certainly not yet emerged.