Modern machine translation is not immune from ridicule. Last month on The Jimmy Fallon Show, singer Miley Cyrus covered popular songs that had been translated from English into Croatian and back with Google Translate.
Ed Sheeran’s Shape of You put through the engine resulted in choice lines like: “But my organ drops right out, I like that cadaver” (originally: “Although my heart is falling too, I’m in love with your body”).
Song lyrics though – often nonsensical and lacking context to begin with – are an especially tricky test for machine translation, which has seen some major improvements in recent years.
The Latest neural machine translation techniques have resulted in accuracy scores close to those achieved by human translators. Google, a leader in the field, says it has halved the number of errors its translation engines make in the last year alone.
But even if machine translation can surpass humans in quality, there is still some way to go before the systems can produce perfect output.
To reach that goal machine translators will need more than an extensive knowledge of language, says Google’s director of research Peter Norvig, they’ll also “need to understand the world”.
Machine translation has been widely available since 1997 with AltaVista’s Babelfish (named after the creature in Douglas Adams’ Hitchhiker’s Guide to the Galaxy, which users place in their ear to “instantly understand anything said to you in any form of language”).
Major providers like Babelfish, as well as Google Translate and Microsoft Translation, for many years used a method called Statistical Machine Translation (SMT). In Google’s case a form called Phrase-Based Machine Translation.
The technique works by digesting a huge index of content that has already been translated by humans. The machine uses statistical analysis to discover patterns and with this 'learn' a language.
As its name suggests, Phrase-Based Machine Translation works with blocks of word sequences although ones far short of whole sentences.
In more recent years, machine translation engines have begun used artificial neural networks, giving rise to Neural Machine Translation (NMT) which instead considers the entire input sentence as a unit for translation.
“Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference,” wrote Google’s Yonghui et al in their 2016 research paper. “These issues have hindered NMT's use in practical deployments and services, where both accuracy and speed are essential.”
Sign up for Computerworld eNewsletters.