For more than a decade, computer scientists and linguists have attempted to get computers to learn human languages by programming semantics using software. Semantics is the study of meaning, i.e. the relation between words and phrases.
Scientists used to hard-code human logic or decipher dictionaries to teach computers languages, but Erk tried a different approach: she feeds computers a broad body of texts which act as a reflection of human knowledge and create a map of relationships by using the implicit connections between the words.
This technique requires a great deal of words and texts in order to create a model which can correctly recreate the intuitive capability of distinguishing word meanings.
Erk said that “the lower end for this kind of research is a text collection of 100 million words.”
Initially, she conducted her research on desktop computers but later began using parallel computing systems. With the help of Hadoop-optimised subsystems, the researchers could expand the scope of their testing.
Hadoop is a software library framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.
Erk explains that humans either think of words being far apart from each other (for instance criminal charges and battery charge) or close together (for example criminal charges and accusations). Hence, people visualise different meanings of words as points in space.
The meaning of a word in a specific contextual relationship is a point in this space. There is no need for humans to mention how many meanings a word has, but instead they choose a word which is close to the usage in another sentence, but far away from the third use, Erk says.
As mentioned in the previous article Will the Semantic Web mean better Machine Translation?, computers have so far only recognised how you say things (also called syntax), but not the meanings of the words (semantics). When we use a search engine, thousands of documents are scanned which contain the words or phrases that you look for, but this word-matching process is a rather low level search.
If computers were able to understand the actual meanings of the words we use, they would not only be more beneficial for search engines, but also for machine translations. Computers working with the Semantic Web, a web that relates how things are connected to each other, would be able to detect the context from the metadata inside the webpage and then apply the correct machine translations engine, based on the markup.
We can only imagine the countless possibilities that would arise from this innovation. How could the semantic understanding of languages in computers enrich your life? Let us know!