If machines could Actually to think, Turing believed, was a matter “too meaningless to merit discussion”. Nevertheless, the “Turing test” has become a benchmark in artificial intelligence. Over the decades, various computer programs have competed to pass it using cheap conversational tricks, with some success.
In recent years, wealthy tech companies such as Google, Facebook, and OpenAI have developed a new class of computer programs called “big language models,” with conversational capabilities far beyond the rudimentary chatbots of yore. One such model, Google’s LaMDA, convinced Google engineer Blake Lemoine that it’s not just smart, but aware and sensitive.
If Lemoine was won over by LaMDA’s realistic answers, it seems plausible that many others with far less understanding of artificial intelligence, AI, could be too – a testament to its potential as a tool of deception and manipulation, in the wrong hands.
For many in the field, then, LaMDA’s remarkable aptitude for Turing imitation play is nothing to celebrate. On the contrary, it shows that the venerable test has survived its use as a benchmark for artificial intelligence.
The Google engineer who thinks the company’s AI has come to life
“These tests don’t really hit intelligence,” said Gary Marcus, cognitive scientist and co-author of the book “Rebooting AI.” It is the capacity of a given software to pass for human, at least under certain conditions. Which, come to think of it, might not be such a good thing for society.
“I don’t think that’s a step up in intelligence,” Marcus said of programs like LaMDA generating prose or human conversation. “It’s a breakthrough to deceive people that you have intelligence.”
Lemoine may be an outlier among his peers in the industry. Google and outside AI experts say the program doesn’t and couldn’t have anything like the inner life it imagines. We don’t have to worry about LaMDA’s upcoming transformation into Skynet, the malevolent spirit of the Terminator movies.
But now that we live in the world predicted by Turing, there is another set to worry about: a world in which computer programs are advanced enough to make people feel like they own their own agency. , even if they don’t in fact.
State-of-the-art artificial intelligence programs, such as OpenAI’s GPT-3 text generator and DALL-E 2 image generator, focus on generating eerily human-like creations based on huge sets of data and vast computing power. They represent a far more powerful and sophisticated approach to software development than was possible when programmers in the 1960s gave a chatbot called ELIZA canned responses to various verbal cues in an attempt to trick human interlocutors. And they may have commercial applications in everyday tools, such as search engines, autocomplete suggestions, and voice assistants like Apple’s Siri and Amazon’s Alexa.
We asked a computer program to imitate Gay Talese’s handwriting. Next, we asked Talese what he thought about it.
It should also be noted that the AI industry has largely abandoned the use of the Turing test as an explicit benchmark. Designers of large language models are now aiming for high scores on tests such as the General Language Understanding Assessment, or GLUE, and the Stanford Question Answer Dataset, or SQuAD. And unlike ELIZA, LaMDA was not built with the specific intention of passing as human; it’s just very good at putting together and spitting out plausible answers to all sorts of questions.
Yet beneath this sophistication, today’s models and tests share with the Turing test the underlying goal of producing results that are as humane as possible. This “arms race,” as AI ethicist Margaret Mitchell called it in a Chat Spaces Twitter with Washington Post reporters on Wednesday, came at the expense of all sorts of other possible goals for language models. This includes ensuring that their operation is understandable and that they do not mislead people or inadvertently reinforce harmful biases. Mitchell and his former colleague Timnit Gebru were fired by Google in 2021 and 2020, respectively, after they co-authored an article highlighting these and other risks from major language models.
Google fired its star artificial intelligence researcher a year ago. Now she is launching her own institute.
While Google has distanced itself from Lemoine’s claims, he and other industry leaders have at other times celebrated their systems’ ability to fool people, as Jeremy Kahn pointed out this week in its Fortune newsletter, “Eye on AI” at a public event in 2018, for example, the company proudly released recordings of a voice assistant called Duplex, with verbal tics like “umm” and “mm-hm” , who tricked receptionists into thinking it was a human when calling to make appointments. (After a backlash, Google promised the system would identify itself as automated.)
“The most troubling legacy of the Turing test is ethical: the test is fundamentally about deception,” Kahn wrote. “And here the impact of the field test has been very real and disturbing.”
Kahn reiterated a call, often voiced by critics and AI commentators, to withdraw the Turing test and move on. Of course, the industry has already done this, in the sense that it has replaced the Imitation Game with more scientific benchmarks.
But Lemoine’s story suggests that the Turing test might serve a different purpose in an age when machines are increasingly adept at sounding like humans. Rather than being an ambitious standard, the Turing test should serve as an ethical red flag: any system capable of passing it carries the danger of misleading people.