08 Mar 2013

Will the Semantic Web mean better Machine Translation?

“The Web of today is about documents, the Semantic Web is about things and how these things are related to each other”

But what exactly does this mean?

So far, computers only recognise how you say things (i.e. the syntax) but not the meaning of the words (i.e. the semantics). When you use a search engine, thousands of documents are scanned which contain the words or phrases that you look for; this word-matching process is a rather low level search. If you were searching the Semantic Web, however, you could find related items and their relationship to each other.

For instance, instead of searching about the weather this weekend (rainy in London…surprise…), you could search for ‘How does the weather affect birth rates?’. And this concept doesn’t only work for the relationship between the weather and birth rates, it can be applied to anything you can think of: people, places, music, films, events, organisations, and more. Anything in the world.

That means that including the semantic context into webpages could help search engines to become more accurate, because they won’t just depend on keywords in webpages, but also on their semantic meaning.

But could the Semantic Web be beneficial for Machine Translation quality?

Machine Translation, like Google Translate, is more accurate when trained in a specific field. If you train an engine in legal or financial information alone, in contrast to generalised Machine Translation engines, the quality of output is supposed to be better.

Computers working with the Semantic Web could detect the context from the metadata inside the webpage. They could then apply the right machine translation engine, based on the markup. So, with lots of pre-prepared Machine Translation engines stored in the cloud, lots of web-pages with semantic markup, and some clever routing, could each sentence on this page could be translated according to it’s semantic markup and context, creating better translation quality?

Sounds like science fiction, but study into this area has already started. In 2012 two German academics, Harriehausen-Muehlbauer and Heuss, produced a research paper called Semantic Web-based Machine Translation, saying that especially ‘with translations, it is often crucial to understand the source text correctly, as otherwise ambiguities may result in incomprehensible target language translations’.

The idea of the Semantic Web was first introduced by Tim Berners Lee, inventor of the World Wide Web, HTTP and HTML. Several years ago, a starting point was initiated by the big search engines such as Google, Yahoo! and Bing. The website www.schema.org consists of lists of items with codes that enable computers to understand the meaning of the context. A recent Econsultancy article claims that Google’s Knowledge Graph is the first step towards a truly Semantic Web.

If you’re just getting interested in the Semantic Web, you might find this video informative:



Sign up to our newsletter

Get our blog articles straight to your inbox.