Back to the complete issue
Monday, 10 May 2021

The role of AI in breaking language barriers

The role of AI in providing access to information: Collectively, the human race speaks more than 7k languages. Of those, north of 4k have a writing system and the rest have only ever been spoken (known as low-resource languages). But even languages that have been codified in text don’t always lend themselves well to automatic translation — in fact, there are just over 100 languages that automatic translation engines (like Google Translate) are able to work with. That leaves a massive gap in potential communication across languages, which the US’ intelligence research arm IARPA is looking to bridge by funding various research teams to develop a system that can find, translate and summarize information from any low-resource language, according to the BBC.

What is a low-resource language? Common languages like English, Spanish, French and German are translated in abundance by multilingual institutions like the European Parliament, which in the last 10 years produced 1.37 bn words in 23 languages — much of which was published online, making it easily accessible to AI-powered translation engines. The algorithms that power translation systems learn from these massive human-translated data sets. The limitations arise when you want to translate languages that may be widely spoken but are not as widely published with high quality translations.

So, how does the new model work? It uses neural network technology that mimics aspects of human thought, and allows AI models to understand the meaning of words and sentences instead of just memorizing them. The concept seems simple enough, but the challenge is reducing how much data the network needs to be able to yield the desired results.

Machines use much more data to learn languages than humans do: “Whenever you study a language, you would never see the amount of data today's machine translation systems use for learning English-to-French translation,” says MIT Researcher Regina Barzilay. “You see a tiny fraction, which enables you to generalise and to understand French. So in the same way, you want to look at the next generation of machine-translation systems that can do a great job even without having this kind of data-hungry behaviour.”

The neural networks can be pre-trained to understand general features and structures of sentences, which allows researchers to harvest monolingual data on low-resource languages through the internet. Once pre-trained on many languages, the neural models can learn to translate between individual languages using very little bilingual training material.

FURTHER READING- Have you ever seen robot poetry? Read Neukom Institute for Computational Sciences director Dan Rockmore’s fascinating “What happens when machines learn to write poetry,” in the New Yorker.

Enterprise is a daily publication of Enterprise Ventures LLC, an Egyptian limited liability company (commercial register 83594), and a subsidiary of Inktank Communications. Summaries are intended for guidance only and are provided on an as-is basis; kindly refer to the source article in its original language prior to undertaking any action. Neither Enterprise Ventures nor its staff assume any responsibility or liability for the accuracy of the information contained in this publication, whether in the form of summaries or analysis. © 2022 Enterprise Ventures LLC.

Enterprise is available without charge thanks to the generous support of HSBC Egypt (tax ID: 204-901-715), the leading corporate and retail lender in Egypt; EFG Hermes (tax ID: 200-178-385), the leading financial services corporation in frontier emerging markets; SODIC (tax ID: 212-168-002), a leading Egyptian real estate developer; SomaBay (tax ID: 204-903-300), our Red Sea holiday partner; Infinity (tax ID: 474-939-359), the ultimate way to power cities, industries, and homes directly from nature right here in Egypt; CIRA (tax ID: 200-069-608), the leading providers of K-12 and higher level education in Egypt; Orascom Construction (tax ID: 229-988-806), the leading construction and engineering company building infrastructure in Egypt and abroad; Moharram & Partners (tax ID: 616-112-459), the leading public policy and government affairs partner; Palm Hills Developments (tax ID: 432-737-014), a leading developer of commercial and residential properties; Mashreq (tax ID: 204-898-862), the MENA region’s leading homegrown personal and digital bank; Industrial Development Group (IDG) (tax ID:266-965-253), the leading builder of industrial parks in Egypt; Hassan Allam Properties (tax ID:  553-096-567), one of Egypt’s most prominent and leading builders; and Saleh, Barsoum & Abdel Aziz (tax ID: 220-002-827), the leading audit, tax and accounting firm in Egypt.