The role of AI in breaking language barriers
The role of AI in providing access to information: Collectively, the human race speaks more than 7k languages. Of those, north of 4k have a writing system and the rest have only ever been spoken (known as low-resource languages). But even languages that have been codified in text don’t always lend themselves well to automatic translation — in fact, there are just over 100 languages that automatic translation engines (like Google Translate) are able to work with. That leaves a massive gap in potential communication across languages, which the US’ intelligence research arm IARPA is looking to bridge by funding various research teams to develop a system that can find, translate and summarize information from any low-resource language, according to the BBC.
What is a low-resource language? Common languages like English, Spanish, French and German are translated in abundance by multilingual institutions like the European Parliament, which in the last 10 years produced 1.37 bn words in 23 languages — much of which was published online, making it easily accessible to AI-powered translation engines. The algorithms that power translation systems learn from these massive human-translated data sets. The limitations arise when you want to translate languages that may be widely spoken but are not as widely published with high quality translations.
So, how does the new model work? It uses neural network technology that mimics aspects of human thought, and allows AI models to understand the meaning of words and sentences instead of just memorizing them. The concept seems simple enough, but the challenge is reducing how much data the network needs to be able to yield the desired results.
Machines use much more data to learn languages than humans do: “Whenever you study a language, you would never see the amount of data today's machine translation systems use for learning English-to-French translation,” says MIT Researcher Regina Barzilay. “You see a tiny fraction, which enables you to generalise and to understand French. So in the same way, you want to look at the next generation of machine-translation systems that can do a great job even without having this kind of data-hungry behaviour.”
The neural networks can be pre-trained to understand general features and structures of sentences, which allows researchers to harvest monolingual data on low-resource languages through the internet. Once pre-trained on many languages, the neural models can learn to translate between individual languages using very little bilingual training material.
FURTHER READING- Have you ever seen robot poetry? Read Neukom Institute for Computational Sciences director Dan Rockmore’s fascinating “What happens when machines learn to write poetry,” in the New Yorker.