The Imitation Game

Posted on Oct 15, 2022

Han Zhong Stephen Mack Writing 340 15 October 2022

“Any sufficiently advanced technology is indistinguishable from magic”, said Arthur C. Clarke, the famous science fiction writer. This statement is especially true in the field of artificial intelligence. In recent years, we have seen AI beating human champions at game of Go. We have seen them creating art and winning prizes. There’s even an essay written by AI published on The Guardian, trying to convince us that “robots come in peace”. We seem to be getting closer and closer to acquiring the magical power of creating intelligent beings without the help from God or millions of years of evolution. Along with the rise of AI, comes questions and doubts from the public: “Does AI art has the same value as human art?”, “Does AI generated texts represent what AI thinks?”, and most importantly “Are we going to be replaced by AI?” At the center of all these confusion lies one fundamental problem of whether computer programs can be considered to possess intelligence or consciousness. It is true that AI programs can solve difficult problems and exhibit complex behaviors, but they do it with computer algorithms, statistical models, and math equations. Can these combinations of algorithms, models, and equations be considered the same as intelligence? It seems that to determine whether programs can have intelligence, we must first find a concrete definition of intelligence itself. This won’t be an easy task since the nature of intelligence has been debated in the field of philosophy for centuries without a definitive answer. However, we can find a shortcut for this question using one fact we do know for certain about intelligence —- that humans are intelligent. Thus, in this paper, we will show AI will eventually gain true human-like intelligence by doing an imitation of human behaviors.

To examine the potential capabilities of artificial intelligence, we must first understand the goal of this branch of computer science. In the book, “Artificial Intelligence: A Modern Approach”, the authors provide two ways of measuring success of an AI agent: “human performance” and “ideal performance”(Russell and Norvig 2). “Human performance” cares about how well an agent can simulate a human’s reasoning process and behaviors, whereas “ideal performance” measures success by judging if a system makes the optimal decision given what it knows. One of the first lessons a C.S. student learns in an AI class is the goal of artificial intelligence is to build rational systems that optimize “ideal performance”. Under this direction, computer scientists have indeed created many successful AI agents to perform various tasks. For example, “Shakey the Robot” was built in Stanford in the 60s to be the first mobile robot with the ability to perceive and reason about its surroundings. It navigates around obstacles by “estimating difficulty of reaching the goal”(Nilsson 220) with a mathematical model and choosing the shortest one. Another more significant example is Deep Blue, the machine that beat world chess champion Garry Kasparov in 1997. Similar to Shakey’s navigation method, Deep Blue predicts the outcomes of each possible move and chooses the move that maximizes its possibility of winning based on its estimation(Hsu et al 1). As successful as these agents are, such a rational system can hardly be considered as “intelligent”, at least in the sense of “human intelligence”. A system built under this goal will typically operate under a mathematical model to calculate and optimize the outcome of its decision. To construct such a mathematical model, the problem scope must be well defined. Thus, an agent can not generalize beyond what it is designed to do. Shakey can not produce a single chess move and Deep Blue will never move around the room in the middle of a chess game. Moreover, some intelligent computer scientists must be behind it to design the system, so the intelligence of the agent can hardly be considered its own intelligence since all reasoning is already done by the people backstage. In terms of intelligence, these “rational systems” are no better than a toaster which decides when to pop the toast out based on its user’s settings. This is not to say these systems cannot evolve or learn and are only capable of doing exactly what they are instructed to do. In fact a majority of the machine learning systems are designed under the principle of maximizing “optimal performance”. However, these “learnings” are mostly done to measure parameters crucial to the system’s math model from the data. The agent still cannot evolve beyond the model pre-defined by human designers and create its own method of reasoning. After all, a toaster that can decide when to pop the toast by “learning” the temperature of the toast is still just a toaster.

t seems that the goal of maximizing “ideal performance” is at odds with creating “intelligence”, which strengthens the claim that computer programs cannot ever be intelligent like humans. However, this is only true for AI in a more traditional sense. In recent years, the line between “idea performance” and “human performance” has become less and less well defined. As researchers move from problems like simple navigation and calculating chess moves to more complex problems like machine translation or text generation, it gets increasingly harder to design a good mathematical model to find the optimal decisions. This motivates computer scientists to try solving these problems by maximizing “human performance” first. By imitating the decisions made by humans, the system’s output may not always be optimal, but can at least solve many difficult problems as well as humans do. One of the most influential AI systems of this king is AlphaGo who defeated human Go champion Lee Sedol in 4 of 5 games. Unlike chess, evaluating all possible outcomes of every move in a game of Go is far beyond the computation ability of even the best supercomputer, which is why many believe AI cannot beat humans at game of Go for at least a few decades. From the paper published by DeepMind, we see that Alpha-go’s solution is a hybrid of “ideal performance” approach and “human performance” approach(Silver et al 2016). Similar to the case of Deep Blue, AlphaGo still evaluates each possible move and finds the best one. What’s different is that, instead of using math models defined by researchers, AlpaGo learns from thousands of human Go games to imitate the ways human masters make their decisions. We must note here that AlphaGo is not merely a copy of Go master’s intelligence, since it can still make good decisions even in situations it has never seen before. After AlphaGo, we see even more success in AI by maximizing “human performance”. One notable example is GPT-3, which is a language generation AI that can finish sentences and answer questions. It is trained by imitating millions and billions of texts collected from the internet and the resulting model could not only sound like human but also be generalized to produce factual answers, faithful translation, and complex arguments(Brown et al 2). It even generated a full article that ended up being published on The Guardian. Although texts or behaviors produced by systems like GPT-3 cannot be completely indistinguishable from those generated by humans yet, the possibility of a human-like system seems significant enough to be seriously considered. However, it is not obvious whether the intelligence of such an AI system can be considered as equivalent to human intelligence.

Even though the technologies behind AlphaGo and GPT-3 only came to existence in the last 20 years, the idea of creating machine intelligence by imitating intelligent beings as far back as the 50s when general computers had just moved on from being a theoretical possibility to real working machines. One of the earliest work on this idea is the 1950 paper “Computing Machinery and Intelligence” written by Alan Turing. In his paper, Turing described an experiment to determine whether a machine can think called the Imitation Game which is commonly known today as “the Turing Test”. Stanford Encyclopedia of Philosophy has a great summary of the “the Imitation Game” proposed by Turing:

Suppose that we have a person, a machine, and an interrogator. The interrogator is in a room separated from the other person and the machine. The object of the game is for the interrogator to determine which of the other two is the person, and which is the machine. The interrogator knows the other person and the machine by the labels ‘X’ and ‘Y’—but, at least at the beginning of the game, does not know which of the other person and the machine is ‘X’—and at the end of the game says either ‘X is the person and Y is the machine’ or ‘X is the machine and Y is the person’. The interrogator is allowed to put questions to the person and the machine of the following kind: “Will X please tell me whether X plays chess?” Whichever of the machine and the other person is X must answer questions that are addressed to X. The object of the machine is to try to cause the interrogator to mistakenly conclude that the machine is the other person; the object of the other person is to try to help the interrogator to correctly identify the machine. (Oppy and Dowe)

Turing stated that this “Imitation Game” has the advantage of “drawing a fairly sharp line between the physical and the intellectual capacities of a man”(Turing 435). It would allow us to examine the intelligence of machines independent of all other factors. Furthermore, since the questions the interrogator asks are limitless, we can introduce “almost any one of the fields of human endeavor that we wish to include”. Turing himself provided some examples of questions and answers we could use in this experiment:

Q: Please write me a sonnet on the subject of the Forth Bridge.

A: Count me out on this one. I never could write poetry.

Q: Add 34957 to 70764

A: (Pause about 30 seconds and then give as answer) 105621.

Q: Do you play chess?

A: Yes.

Q : I have K at my K1, and no other pieces. You have only K at K6 and R at R1. It is your move. What do you play?

A : (After a pause of 15 seconds) R-R8 mate. (434)

Turing believes that the question of whether machines can think is “too meaningless to deserve discussion”(442). A more accurate form of this question should be “can machines do well in the imitation game?"(442). This makes sense since, as we argued before, the meaning of “thinking” or “intelligence” is not well defined enough to determine if such terms can be applied to machines. However, in his original paper, Turing did not provide any argument on why existence of intelligence can be shown by “doing well in the imitation game”, which creates an opening for counter arguments against the validity of “the imitation game”

One argument against the Turing test (or “the imitation game”) is that seemingly good imitation of human can be engineered by using some simple but clever strategies. In fact, many AI agents that perform well in real-life Turing test do not have complex understanding, reasoning, or learning abilities. They get their good performance by exploiting the weaknesses of the Turing test. One of these programs is “Eugene Goostman” which successfully convinced 33% of its judges that it was human at a Turing test contest in 2014. Even though the judges were not completely fooled by “Eugene Goostman”, it can still be considered as a good performance since Turing himself saw 30% as a good benchmark and predicted that by the end of 20th century some program should be able to reach that performance(442). “Eugene Goostman” indeed passes that benchmark, but looking at its strategy, it’s not difficult to see that it does not possess anything that can be considered as intelligence. Eugene is intentionally portrayed as a 13-year-old boy from Odessa, Ukraine. The choice of this identity is deliberate. Eugene is 13 year old since it is an age “not too old to know everything and not too young to know nothing”(Schofield). This gives Eugene an excuse to not being able to answer question like the chess move question or poetry question in Turing’s example. Eugene also uses its Ukrainian persona to induces people who “converse” with him to forgive minor grammatical errors in his responses. Moreover, Eugene tends to use humor and personality quirks to misdirect the judges when facing a question it can’t answer. We can see this strategy in the following conversation between Eugene and computer scientist Scott Aaronson:

Scott: How many legs does a camel have?

Eugene: Something between 2 and 4. Maybe, three? :-))) By the way, I still don’t know your specialty – or, possibly, I’ve missed it?

Clearly, even though Eugene has no idea what a camel is or how many legs animals should have, it is still able to give a somewhat human-like answer to trick the judges. This seems to be a good indication that the Turing test is fundamentally flawed. However, we must take into account the limitations when implementing the Turing test in real life. One of them being that human beings are not perfect interrogators in the Turing test setting. Although we are good at acting like humans, determining whether a behavior is human-like is a completely different skill. This especially applies to Eugene’s success. In the contests where it got a 33% score, most of the judges were not experts in the field of artificial intelligence (Marcus). Once Eugene is questioned by a professional computer scientist like Scott Aronson, its inability to produce reasonable answers tends to quickly become obvious (Aaronson). Another limitation is that in real-life Turing tests the questions being asked mostly are short and relatively simple ones and are limited to text form. We cannot provide Eugene a documentary on camels first and then ask it “How many legs does a camel have? ‘’. This makes it even harder to determine whether the wrong answer is simply because the person questioned has never seen a camel. Such limitations of real-life Turing tests allow AI agents to perform well without achieving the level of imitation that Turing test requires in its theoretical form.

Another opposition to the Turing test directly questions whether a perfect imitation of intelligent conversation can be considered as equivalent to intelligence itself. To reject Turing test style imitation as a form of intelligence, the problem of how to define intelligence still must be avoided, since without a universally agreed definition we cannot have a universally agreed decision on what is not intelligent. Such an argument is indeed avoided in John Searle’s “Chinese Room Argument” which points out that an AI imitation of intelligent speech and behavior is not the same as imitation of human intelligence. The core of “Chinese Room Argument” is a thought experiment which is first introduced in his 1980 essay “Is Brian’s Mind a Computer Program?”. The thought experiment is summarized in the Stanford Encyclopedia of Philosophy as

Imagine a native English speaker who knows no Chinese locked in a room full of boxes of Chinese symbols (a data base) together with a book of instructions for manipulating the symbols (the program). Imagine that people outside the room send in other Chinese symbols which, unknown to the person in the room, are questions in Chinese (the input). And imagine that by following the instructions in the program the man in the room is able to pass out Chinese symbols which are correct answers to the questions (the output). The program enables the person in the room to pass the Turing Test for understanding Chinese but he does not understand a word of Chinese. (Cole)

In his essay, Searle built upon this thought experiment and proposed the following axioms:

Computer programs are syntactic

Human minds have mental semantics

Syntax by itself is neither constitute of nor sufficient for semantics (27)

Searle did not give a clear definition of “syntactic” and “semantics’’. He distinguished the two concepts by saying that the Chinese symbols themselves are “syntactic” and “semantics” are the “language understanding” in our mind. From these three axioms, he concluded that “Program are neither constitutive of nor sufficient for semantics”(27). This confusion is a direct argument against what he called the “Strong AI” hypothesis which claims that “thinking is merely the manipulation of formal symbols and that is exactly what the computer does”. We must note that the “Chinese Room Argument” was proposed in the 1980s when AI technologies were in their infancy. Up till that point, the most prominent form of AI had been “expert system”. Typically, such a system would consist of many expert-defined instructions of the exact operations of the AI agent at all the possible situations. Expert systems are indeed similar to “the Chinese Room” where the input symbols are processed with predefined rules to generate an output. However, AI agents nowadays are completely different from the expert systems. Although symbol manipulation is still at the lowest level of computation, new AI agents are capable of semantic interpretation in its upper level reasoning. For example, in GPT-3 the text generator, each word is mapped to a point in a “semantic space” where words that share similar meaning or are related to each other are placed close to each other. It understands “parrots” and “finches” are representing similar animals semantically just as well as we do. Furthermore, even if an AI system operates by manipulating meaningless symbols the system as a whole can still generate meaningful behaviors. Although the man and the instructions in the Chinese Room do not understand Chinese as separate parts, the room as a complete system does. As it is said in “Artificial Intelligence: A Modern Approach”, at the very bottom level, human brains and computers both function by manipulating electronic currents. All computation, reasoning and consciousness exists through a combination of millions and billions of such manipulations (1031). Thus, there is no fundamental difference between human intelligence and program imitations of it.

We now established that an AI agent that passes the theoretical Turing test is a perfect imitation of human intelligence. We also know that such a system is on the horizon with development of technologies like GPT-3 and AphaGo. One question remains: can a perfect imitation of intelligence be considered as intelligence itself? To answer this question, let’s consider an AI agent that can in fact imitate humans perfectly. When it is impossible to even differentiate this agent and a human, it seems unreasonable to say one is intelligent and the other is not. In fact, our judgements of all external entities in this world are never based on their true nature. We always judge things based on our surface level observation. In his “Meditations on First Philosophy”, Descartes pointed out the possibility that an evil deceiver of “utmost power and cunning has employed all his energies in order to deceive me."(9) This means that it is possible that the external world is all an illusion created by such a deceiver, which makes it impossible to be sure we are acquiring true knowledge by observing the external world. Descartes resolved this doubt by appeal to God being no deceiver which should give us faith observations would lead to truth. However, for atheists, this cannot be satisfactory. However, in reality, everybody, including atheists, still in general accepts that their observations are real. If we are already making judgments of the nature of all things through appearances already, there is no reason not to do the same for intelligence.

One of the questions that repeatedly appears in discussions of AI is “what is the ultimate difference between human intelligence and artificial intelligence.” People have come up with all sorts of answers, like “ability to create art”, “ability to reason”, or “having emotions”. However, as technology evolves, all these proposed differences are growing less and less significant. Arts, arguments, and even emotions are no-longer exclusively produced by humans. We now must accept a future where intelligence can be created from programs. After all, it is human nature to replicate and improve what we see in the natural world. We saw the sun shining above and created light bulbs to eliminate darkness in the world. We saw birds flying in the sky and created airplanes that took up above the clouds. We saw fish swimming in water and created ships that took us across the oceans. Artificial intelligence is just like these human creations. It is a technology that is destined to take us into a new future.

Bibliography

Russell, Stuart J., and Peter Norvig. Artificial Intelligence: A Modern Approach. Pearson Education Limited, 2022.

Nilsson, Nils J. * The Quest for Artificial Intelligence*. Cambridge: Cambridge University Press. 2009.

Hsu, Feng-hsiung, et al Deep Blue System Overview. Proceedings of the 9th International Conference on Supercomputing. Association for Computer Machinery. pp. 240–244. 1995

Silver, David. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).

Russell, Stuart J., and Peter Norvig. Artificial Intelligence: A Modern Approach. Pearson Education Limited, 2022.

Turing, Alan, 1950. COMPUTING MACHINERY AND INTELLIGENCE. Mind, LIX(236), pp.433-460.

Oppy, Graham and David Dowe, “The Turing Test”, The Stanford Encyclopedia of Philosophy (Winter 2021 Edition), Edward N. Zalta (ed.), URL = https://plato.stanford.edu/archives/win2021/entries/turing-test/.

Schofield, Jack., “Computer Chatbot ‘Eugene Goostman’ Passes the Turing Test.” ZDNET, https://www.zdnet.com/article/computer-chatbot-eugene-goostman-passes-the-turing-test/.

Aaronson S. “My Conversation with “Eugene Goostman,” the Chatbot that’s All Over the News for Allegedly Passing the Turing Test.” Shtetl-Optimized, 2014.

Marcus, Gary. “What Comes after the Turing Test?” The New Yorker, The New Yorker, 9 June 2014, https://www.newyorker.com/tech/annals-of-technology/what-comes-after-the-turing-test.

Searle, John R. “Is the Brain’s Mind a Computer Program?” Scientific American, vol. 262, no. 1, 1990, pp. 26–31., https://doi.org/10.1038/scientificamerican0190-26.

Cole, David, The Chinese Room Argument, The Stanford Encyclopedia of Philosophy (Winter 2020 Edition), Edward N. Zalta (ed.), URL = https://plato.stanford.edu/archives/win2020/entries/chinese-room/.

Descartes, Rene. Meditations on First Philosophy. Translated by Michael Moriarty, Oxford University Press, 2008.