In the mid-nineties, it was estimated that English made up perhaps 80% of all the content available on the internet. Other languages are now much better represented online but it’s thought that English still constitutes around a third of all web content.
Chinese languages, Arabic, Spanish and Portuguese are well-represented online these days. But not all world languages have a significant presence on the internet, and if you don’t have command of one of the dominant web languages it can really restrict what content is available to you online.
In many parts of the world, if you perform an English language search for a local business such as a hairdresser or restaurant you’ll find many more results compared to a search in a minority language such as Polish or Lahnda.
If you speak Arabic or Estonian, you’ll find you have access to fewer Wikipedia articles compared to a Finnish or Swedish speaker. The scope and depth of content in minority languages just can’t compare to the dominant language content online. As the Guardian newspaper recently put it, “the internet is only as big as your language.”
Not only does the language with which you access the internet affect how much you see, it also affects what you see online. Wikipedia is a good yardstick for internet activity because it’s a resource of over 40 million articles, in 288 separate languages, crowdsourced by internet users.
Different language groups tend to produce more content around certain topics, often based on particular cultural interests, obsessions and hobbyhorses. There’s a huge amount of content about pop culture in Japan, but far fewer pages on this topic in French Wikipedia and the same topic page can say very different things in different languages.
In some ways that makes perfect sense. It would be odd and unnecessary for French Wikipedia to cover minutiae of Japanese pop culture that French users don’t search for. But 74% of concepts are only covered by one language on the website.
If you’re interested in any of these concepts, then you need to speak the relevant language. A lot of information is only available in a small number of languages, and search engines tend to only show us the available content in our own language.
The internet may be a key information resource for humanity, but we’re not great at sharing the information we gather with different language groups.
Inequality of Information
One troubling aspect of the information inequality online is that information and its growth can be very one-sided. Take descriptions of particular regions of the world, for example. Wikipedia articles describing the different geographies tend to be focused in a small number of languages.
That’s particularly true of the world’s entire southern hemisphere, which is described online almost exclusively by dominant languages. For example, European languages account for most of the world’s content about many African countries.
RELATED: Challenging Classism in Languages
In fact, it’s remarkable how much of the way the world is described online shadows former colonial dominance patterns. As researchers at the Oxford Internet Institute put it, “Rich countries largely get to define themselves and poor countries largely get defined by others.” This pattern is troubling and resolutely unfair.
The Language of Search
Search is the starting point of most web journeys, and the language we use for our web searches seems to be particularly important to what content we can access. Google, the world’s most popular search engine, is only available 130 of the world’s 6000 or so languages. It recognizes only one African language, yet 30 European ones. So one of the internet’s most popular tools isn’t even available to you unless you speak a majority language.
One piece of research in the West Bank found that searches conducted in English delivered 4 or 5 times more results compared to an equivalent search term in Arabic, despite the fact this is a key local language.
It’s not really fair to blame the search engine for this: Google just sets the rules and content creators provide the content.
The paucity of Wikipedia pages in Arabic has been attributed to wider illiteracy and lack of internet access in Arabic-speaking regions of the world compared to other language groups. The content needs to be created by the language speakers in order for it to be indexed.
Questions of language dominance on the internet aren’t just important because they dictate how many restaurants you can find in local search; it’s also a question about language survival and the cultural autonomy of minority language groups.
As we rely more and more on the internet as a key source of information and key route of access to vital services, the importance of speaking a web language increases. This validates the languages used on the internet and diminishes the power of languages that aren’t well-represented online.
In 2003 linguist David Crystal argued that the use of English around the world had reached a critical mass, and said he believed that it would be a world language in perpetuity. Not everyone agrees with that perspective, but it’s likely that English and a small number of ‘power’ languages such as Mandarin may continue to dominate in the near future.