AI21 Labs conducted a social experiment this spring where more than 2 million participants engaged in more than 15 million conversations through its website. At the end of each chat, a participant had to guess whether their conversation partner was a human or an AI bot. Nearly one-third guessed wrong.
As ChatGPT and other AI chatbots become more popular, so have the questions about whether such AI tools can be as intelligent as humans, whether the content these tools generate can pass for human creations, and whether AI threatens people’s jobs.
Also: 4 things Claude AI can do that ChatGPT can’t
AI21 Labs found inspiration for the “Human or Not?” experiment from Alan Turing’s evaluation of a machine’s ability to exhibit a level of intelligence indistinguishable from that of a human.
This type of experiment would come to be known as a Turing Test based on the 1950 observation by the mathematician, “I believe that in 50 years’ time, it will be possible to make computers play the imitation game so well that an average interrogator will have no more than 70% chance of making the right identification after 5 minutes of questioning.”
Results of the Human or Not experiment support Turing’s prediction: Overall, the experiment’s participants guessed correctly 68% of the time. When paired with an AI chatbot, participants guessed correctly only about 60% of the time. When the conversation partner was another human, they guessed correctly 73% of the time.
Though this wasn’t a perfect Turing Test, AI21 Labs’ Human or Not experiment showed how AI models can mimic human conversation convincingly enough to deceive people. This challenges the assumptions we have about AI limitations and could have implications for AI ethics.
Also: 40% of workers will have to reskill in the next three years due to AI, says IBM study
The experiment found that human participants used different strategies to try to spot the AI bots, like asking personal questions, inquiring about current events, and assessing the level of politeness in the responses.
On the other hand, the authors found that bots confused players with human-like behaviors, like using slang, making typos, being rude in their responses, and showing awareness of the context of the game.
“We created ‘Human or Not’ with the goal of enabling the general public, researchers, and policymakers to further understand the state of AI in early 2023,” according to Amos Meron, creative product lead at AI21 Labs at the time of the experiment. One objective, he added, was “not looking at AI just as a productivity tool, but as future members of our online world, in a time when people are questioning how AI should be implemented in our futures.”
Also: The new Turing test: Are you human?
Having used it myself while it was available, I was paired with humans each time and guessed correctly each time. The answer seemed clear to me because my conversation partners would use internet slang (idk, for example), refused to answer questions, or didn’t know the answers.
Players tried to confuse other players by imitating AI chatbots, using terms like “as an AI language model,” but this was often done imperfectly, and human participants on the other end were able to see through the attempts.
Artificial Intelligence