By Philip Bethge
Och has now perfected this statistical process for Google. During his doctoral work Och, who is from northern Bavaria, specialized in language recognition. Then he went to the University of Southern California. The Pentagon soon began to show an interest in his work. After 9/11, the US intelligence services wanted to be able to monitor Arab newspapers, chat rooms and websites more closely.
But in 2004, Google convinced the language tamer to come to Mountain View, where Och could have the Internet giant's massive computing power at his disposal. Och isn't willing to mention any numbers. However, the Google databases contain billions of entries for many language pairs. Important resources for the word archive include, for example, the Bible, which has been translated into many languages, United Nations transcripts and European Union documents, which are available in 23 languages.
Such "parallel texts" are something of a Rosetta stone of the digital age. The ancient prototype bears the same inscription in Greek, Demotic and hieroglyphs.
And now Och's software is doing exactly the same thing. One of the strengths of the system is that one and the same source code works for all languages. The only catch is that enough translated text has to exist.
A letter-cruncher as a universal interpreter? Many linguists say that such statistical tricks are rubbish. "Statistical translation will quickly reach its limits," says linguist Martin Kay of Stanford University. "The approach ignores the complex structure of language." For example, the technology fails when it comes to the positioning of the main verb and auxiliary verb commonly used in German. According to Kay, it also has trouble distinguishing between subject and object.
"For really good results we have to look somewhat deeper in the language," says Hassan Sawaf, chief developer with the US software maker Apptek, which uses a hybrid approach. In addition to statistical algorithms, Sawaf also applies classic rules of grammar. "This makes the system so much better and considerably improves sentence structure and clarity."
Comprehending Chinese
Sawaf is also critical of the fact that Och's system only works online. "Anyone who works offline can forget about Google Translate." German computer scientist Alex Waibel is also skeptical. "Imagine you're in a foreign country and you want to converse with a salesperson. First you have to find a network, and on top of that, you'll also be paying high roaming fees. It isn't practical."
The fact that Google Translate only works on the Internet is one of its greatest weaknesses. Nevertheless, the California-based company remains undeterred. Its scientists are already developing a special version of the program with integrated voice recognition for Google's Android mobile phone operating system. The ability to have text on photos translated in no time is also just around the corner. It would enable someone traveling in China, for example, to take a picture of a sign written in Chinese characters, and promptly learn that he is on his way to Beijing.
Another moneymaker for the Internet giant seems to be in the works. But Och demurs. Like many Google employees, he prefers to see himself as part of a campaign for freedom and equality on the Internet. "Someone who doesn't speak English can only use a fraction of the Internet," he says. His goal, he claims, is to make the richness of the Internet available to everyone.
There is at least one indication of the programmer's noble intentions. Och and his team have developed a special program that allows interpreters to feed translations into the system on their own, including the translations of extremely exotic idioms in the Bantu language Xhosa, the language spoken by members of the Ainu ethnic group in Japan and the Inuit language, Inuktitut. The software developers hope that the program will give a voice to languages that are in danger of being forgotten. Te Taka Keegan, a computer engineer at the University of Waikato in New Zealand, has already tested the program with the language spoken by the Maori people. Keegan recently spent six months at Google to figure out whether the digital language miracle from Mountain View could protect the idiomatic expressions of New Zealand's indigenous people from extinction. His experiences have been consistently positive.
"The quantity and quality of Maori translations is growing constantly with the help of this tool," Keegan reports. According to Keegan, a digital archive is being developed that will give the language a significant boost.
"The digital world is our children's future," says Keegan. "The language will only survive if we manage to make Maori part of this world."
Translated from the German by Christopher Sultan
Post to other social networks:
Stay informed with our free news services:
| All news from SPIEGEL International | Twitter | RSS |
| All news from Business section | RSS |
© SPIEGEL ONLINE 2010
All Rights Reserved
Reproduction only allowed with the permission of SPIEGELnet GmbH