Data science comes to Egypt
Studying data science at university is finally a thing in Egypt: Big Data, as our friend Dame Nemat “Minouche” Shafik, director of the London School of Economics told us, is “changing everything, from services, to marketing, to medicine.” With more data on every facet of our lives available than ever before, the chance to mine that data for useful information has never been greater — or more relevant. Enter Dr Ali Hadi (bio), chair of the Department of Mathematics and Actuarial Science at the American University in Cairo (AUC) and the founder of Egypt’s first undergraduate major in data science, launching this fall. Edited excerpts from our conversation with Hadi about the program and understand the importance of data science in Egypt:
The program was launched this year because data scientists are in great demand and in scarce supply. This is partly because there are no programs currently in Egypt to produce data scientists. Even abroad, there are very few data science programs and the majority of them are graduate programs. This is where our program has added value: It’s one of very few four-year undergraduate degrees in data science. The program will be small — 15 students in the first year and 25 after that — because we want to test the market and make sure our graduates can find jobs. We also need to fill some faculty vacancies. And we keep class sizes small to ensure quality of education. I established the actuarial science program in 2004 and I know all the people who went through it by name. I expect there will be high demand, but if we see otherwise, we’ll adjust. We’ll plan for the worst and hope for the best.
In a nutshell, data science is the science of generalizable knowledge extraction from data for the purpose of informed decision-making. The availability of data by itself is not useful, you need to know how to extract knowledge from it. Insurance is a good example. You have a ton of insurance claims that could be fraud or medical malpractice. But this is raw data, and you can’t extract knowledge from it to make informative decisions. A data scientist is the link between the data and the decision makers.
Because data science requires domain knowledge, the data scientist usually works with an expert in a particular field. If we’re talking about medicine, they’ll work with doctors, if we’re talking about a social problem, they’ll work with sociologists. The domain expert will take the results provided by the data scientist and make decisions from there. Sometimes the decision-maker is an actuary, who makes decisions based on actuarial models whose parameters were drawn up from the data by the data scientist.
The role of the data scientist is increasingly important because we have more data available than ever before. 20 or 25 years ago, the number of observations in an experiment was 500 or 1000. But now it’s mns of observations — an amount of information that goes beyond classical statistics. A scatterplot with a mn points takes time to generate and it appears as a blot of ink. There’s no pattern or structure. And this is the challenge: To extract knowledge from that black graph at the speed with which we need to make decisions.
Companies see shortage in quantitative reasoning: Part of the demand is that Egypt lacks in quantitative literacy — the ability to draw correct conclusion — and lots of Egyptian companies are hungry for this knowledge, especially banks, IT companies, and even some government agencies.
Despite the demand, you can count the sectors in Egypt using data science on both hands. The Health Ministry is using data science to determine health insurance policies, etc. But beyond that, even if industry wanted data scientists, they won’t find one. Financial analysts or risk managers are doing the work but they’re not properly trained in data science. The difficulty is that data science is both interdisciplinary and multidisciplinary, drawing from mathematics, probability, statistics, and computer science.
All industries and segments would have their own unique way of applying data science. Let’s take insurance companies. The data scientist works with an insurance expert and an actuarial scientist to make sure the insurance company has enough reserves for unexpected claims, to monitor fraudulent policies, and to price the policy correctly.
As for the banking sector, if a bank wants to market a new credit card, the data scientist will segment the market and the industry expert will decide to mail to X not Y. This will save money and increase the response rate.
In government, the Finance Ministry uses data scientists to estimate taxes, which is important because the record here in Egypt for determining the right amount of tax is not great. Another example is estimating the value of the EGP relative to foreign currencies for the budget, which is partly in EGP and partly in foreign currency.
If I could revolutionize an industry with data science, it would be insurance because insurance companies are able to use very little of the huge amount of data available to them. They have tons and tons of data but few people know it exists and fewer people can use it. They have the data but they don’t have the expertise to extract knowledge from it.
Data science has been well established as a field for some time but Egypt has been late to the game. Egypt is late to a lot of things, but better late than never. Someone has to take the initiative. Two years ago, I saw the need for data scientists so I decided to establish this program. I did the same for the actuarial science program back in 2000. I hope it will thrive like the actuarial science program, which is one of the top programs at AUC now in terms of quality of students.
In Egypt, there are also difficulties relating to access and quality of data. Branches of the government consider this data to be highly sensitive, or different ministries might have different numbers. In the private sector, companies treat data as their secret wealth and keep a lid on it. You have to be working inside the company to get access or sign a nondisclosure agreement. These challenges aren’t specific to data science but to decision makers in general. You could be a minister who wants data from another ministry and you’ll have difficulty obtaining it.
There are ethical implications to the job: The same data can produce different results, which can lead to very different decisions. This is especially important when the decision-maker is not numerically literate and can’t interrogate the data scientist’s results. Most of the time, different results happen because the data scientist’s methods are grounded in different assumptions. An informed data analyst will understand the assumptions behind the method and make sure first to validate the assumptions before going ahead with the method let alone interpreting results. Some people do use data to try and mislead others, so we teach students how to look out for these biases.
Is Egypt at a stage where it needs something like the EU general data protection regulations? These kinds of things do not usually hurt unless they go overboard. For example, if investors are deciding whether or not to invest in Egypt but the data they need to make that decision is unavailable, this could be a hindrance. On the other hand, if these same investors come and invest, you have to protect their privacy. Legislation is like any tool, you can use it and abuse it, over do it or under do it.