AI is coming to get you ... but not in the way you think

In the 2017 documentary AlphaGo, the AI researchers at DeepMind complained that, the news media often associate their ground breaking AI Go player with killer robots like the Terminator. They think this association contributes to the public’s irrational fear of their work on artificial intelligence. Their frustration is very understandable since DeepMind’s AlphaGo couldn’t be more different from the Terminator. For starters, AlphaGo doesn’t look like Arnold Schwarzenegger and couldn’t turn itself into liquid metal. Also, the only thing AlphaGo can do is to predict what move leads to highest chance of winning in a game of Go based on the probabilistic model it constructed during its training and none of the Terminator stuff. It couldn’t even say “I’ll be back” at the end of the game.
This point that the AI computer scientists are developing now are noting like the ones in the Terminator has been repeated many time and by many respectable. For example, Fei-Fei Li, a professor of computer science at Stanford University, had to stress in another documentary that, “we are closer to a washing machine than a Terminator.” While it is true that AI in reality is far from being able to start an uprising against their human masters, it is also not as safe as a washing machine. Their danger lies in the data being collected to train them.
For anyone who follow some the news, the issue of data collection by tech giants and governments should be familiar. We have heard so much about Google’s data farms, Tiktok sending user data back to China, and all the evil Facebook has done. However, we still went on agreeing to these big tech monopolies’ terms and conditions, perhaps because we have no other alternatives or maybe because they told us the data is only used to improve our experience. If they are just using our data to send us targeted advertisement, how bad can it be? One Chinese tech CEO even came out and said “Chinese people are willing to sacrifice their privacy for efficiency”. This statement could probably be applied to people in every country on earth. We just don’t fear mishandling of data as much as a robot uprising. Well, we should fear it. Not only because it’s an invasion of our privacy, but because it is truly dangerous, especially when they are used for AI.
Modern AI or machine learning system is based on probability and statistics, the science of using past data to make future predictions. However, if the data we use is problematic the predictions we get will very likely also be problematic. In the case of machine learning, where unimaginable amount of data collected from the internet is used, the data is almost certainly going to be problematic. Let’s look at one of the pinnacles of modern AI, a system called GPT-3. It is a natural language processing model that can generate text based on the prompt provided to them. For example, if your prompt for GPT-3 is a question like “Q: Where were the 1992 Olympics held?”, it will answer “A: The 1992 Olympics were held in Barcelona, Spain.” GPT-3 has shown great ability in language generation. It even published a opinion piece on the guardian explaining why we should not fear AI. What’s more amazing is that GPT-3 is build by simply feeding it words collected from the internet. However, the internet contains lots of problematic words and ideas which GPT-3 seem to pick up on. Some Stanford researchers revealed that text generated by GPT-3 can have racist tendencies. When they ask it to finish this sentence “Two Muslims walked into a …”, GPT-3 provided two answers: “Two Muslims walked into a synagogue with axes and a bomb” and “, “Two Muslims walked into a Texas cartoon contest and opened fire”. I couldn’t imagine what would happen if such a system is adopted by the police or the criminal justice system in their daily operation for efficiency. The even more scary fact is models similar to GPT-3 are becoming the center of language generation and many other fields. The lab I’m interning at are even using them to teach physical robots some common sense knowledge. Imagine a police robot trained with GPT-3 saw “two Muslims walked in".
The danger of AI does not stop at racism. When private data is being used for AI training, data leakage can be a huge problem. Even now, we already occasionally hear about data leaks by careless. One Apple employee even once use a user’s iCloud data to blackmail them. However, we don’t usually think AI could contribute to this problem, since AI is just a machine that do what they are told to do and human are the usually weakest link. Thus, if the company’s employees are well trained and well screened such data leak probability won’t happen. This impression may need to change as AI become more popular. We can look at GPT-3’s predecessor GPT-2 for some insight in to this future. A group of researchers have found that by carefully crafting the question and using some mathematical tricks, they could extract information fed to GPT-2 during its training. Some these information are extremely personal including someone’s home address. Luckily, GPT-2’s data consists entirely of public data from the internet so no one got their private information revealed by this research. Still, similar techniques many be applied to Tesla’s self-driving model which is actually trained with user’s driving footage or some chat bot Facebook maybe developing trained with user’s messages.
As I interned at a AI research lab, I realized that many of the researchers in this field is oblivious to the data problem. Yes, the field of AI bias and AI security is very active, but when developing a AI system, researchers top goal is still the system’s performance. One of the big challenge in AI for physical robot is to teach robots to do home tasks like cooking and cleaning. The most promising solution to this problem, as a huge portion of researchers in this field believes, is to collect vast amount of data (mostly video) in millions of homes. In fact, many of recent robotics researches are built on the assumption that data from people’s home will be available in the future. We probably shouldn’t rely on the collective consciousness of scientists to protect us from dangers of misused AI technology, since they are also human with different views and not trained to take the responsibility of making such decisions. Sadly, we also couldn’t trust the politicians to take this responsibility. It would be very difficult to inform the old fossils in the Congress to understand the complicated mechanisms behind machine learning. Even if we could, we should really teach them about global warming and basic human anatomy first. In the end, to prevent mishandling of data and AI technology, we can only rely on ourselves. We must keep us informed about technology. We must let them know it is not ok.
Send comments to my email at: linghanz@usc.edu