What the New GPT-4 AI Can Do
Tech research company OpenAI has just released an updated version of its text-generating artificial intelligence program, called GPT-4, and demonstrated some of the language model’s new abilities. Not only can GPT-4 produce more natural-sounding text and solve problems more accurately than its predecessor. It can also process images in addition to text. But the AI is still vulnerable to some of the same problems that plagued earlier GPT models: displaying bias, overstepping the guardrails intended to prevent it from saying offensive or dangerous things and “hallucinating,” or confidently making up falsehoods not found in its training data.
On Twitter, OpenAI CEO Sam Altman described the model as the company’s “most capable and aligned” to date. (“Aligned” means it is designed to follow human ethics.) But “it is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it,” he wrote in the tweet.
Perhaps the most significant change is that GPT-4 is “multimodal,” meaning it works with both text and images. Although it cannot output pictures (as do generative AI models such as DALL-E and Stable Diffusion), it can process and respond to the visual inputs it receives. Annette Vee, an associate professor of English at the University of Pittsburgh who studies the intersection of computation and writing, watched a demonstration in which the new model was told to identify what was funny about a humorous image. Being able to do so means “understanding context in the image. It’s understanding how an image is composed and why and connecting it to social understandings of language,” she says. “ChatGPT wasn’t able to do that.”
A device with the ability to analyze and then describe images could be enormously valuable for people who are visually impaired or blind. For instance, a mobile app called Be My Eyes can describe the objects around a user, helping those with low or no vision interpret their surroundings. The app recently incorporated GPT-4 into a “virtual volunteer” that, according to a statement on OpenAI’s website, “can generate the same level of context and understanding as a human volunteer.”
But GPT-4’s image analysis goes beyond describing the picture. In the same demonstration Vee watched, an OpenAI representative sketched an image of a simple website and fed the drawing to GPT-4. Next the model was asked to write the code required to produce such a website—and it did. “It looked basically like what the image is. It was very, very simple, but it worked pretty well,” says Jonathan May, a research associate professor at the University of Southern California. “So that was cool.”
Even without its multimodal capability, the new program outperforms its predecessors at tasks that require reasoning and problem-solving. OpenAI says it has run both GPT-3.5 and GPT-4 through a variety of tests designed for humans, including a simulation of a lawyer’s bar exam, the SAT and Advanced Placement tests for high schoolers, the GRE for college graduates and even a couple of sommelier exams. GPT-4 achieved human-level scores on many of these benchmarks and consistently outperformed its predecessor, although it did not ace everything: it performed poorly on English language and literature exams, for example. Still, its extensive problem-solving ability could be applied to any number of real-world applications—such as managing a complex schedule, finding errors in a block of code, explaining grammatical nuances to foreign-language learners or identifying security vulnerabilities.
Additionally, OpenAI claims the new model can interpret and output longer blocks of text: more than 25,000 words at once. Although previous models were also used for long-form applications, they often lost track of what they were talking about. And the company touts the new model’s “creativity,” described as its ability to produce different kinds of artistic content in specific styles. In a demonstration comparing how GPT-3.5 and GPT-4 imitated the style of Argentine author Jorge Luis Borges in English translation, Vee noted that the more recent model produced a more accurate attempt. “You have to know enough about the context in order to judge it,” she says. “An undergraduate may not understand why it’s better, but I’m an English professor.... If you understand it from your own knowledge domain, and it’s impressive in your own knowledge domain, then that’s impressive.”
May has also tested the model’s creativity himself. He tried the playful task of ordering it to create a “backronym” (an acronym reached by starting with the abbreviated version and working backward). In this case, May asked for a cute name for his lab that would spell out “CUTE LAB NAME” and that would also accurately describe his field of research. GPT-3.5 failed to generate a relevant label, but GPT-4 succeeded. “It came up with ‘Computational Understanding and Transformation of Expressive Language Analysis, Bridging NLP, Artificial intelligence And Machine Education,’” he says. “‘Machine Education’ is not great; the ‘intelligence’ part means there’s an extra letter in there. But honestly, I’ve seen way worse.” (For context, his lab’s actual name is CUTE LAB NAME, or the Center for Useful Techniques Enhancing Language Applications Based on Natural And Meaningful Evidence). In another test, the model showed the limits of its creativity. When May asked it to write a specific kind of sonnet—he requested a form used by Italian poet Petrarch—the model, unfamiliar with that poetic setup, defaulted to the sonnet form preferred by Shakespeare.
Of course, fixing this particular issue would be relatively simple. GPT-4 merely needs to learn an additional poetic form. In fact, when humans goad the model into failing in this way, this helps the program develop: it can learn from everything that unofficial testers enter into the system. Like its less fluent predecessors, GPT-4 was originally trained on large swaths of data, and this training was then refined by human testers. (GPT stands for generative pretrained transformer.) But OpenAI has been secretive about just how it made GPT-4 better than GPT-3.5, the model that powers the company’s popular ChatGPT chatbot. According to the paper published alongside the release of the new model, “Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.” OpenAI’s lack of transparency reflects this newly competitive generative AI environment, where GPT-4 must vie with programs such as Google’s Bard and Meta’s LLaMA. The paper does go on to suggest, however, that the company plans to eventually share such details with third parties “who can advise us on how to weigh the competitive and safety considerations ... against the scientific value of further transparency.”
Those safety considerations are important because smarter chatbots have the ability to cause harm: without guardrails, they might provide a terrorist with instructions on how to build a bomb, churn out threatening messages for a harassment campaign or supply misinformation to a foreign agent attempting to sway an election. Although OpenAI has placed limits on what its GPT models are allowed to say in order to avoid such scenarios, determined testers have found ways around them. “These things are like bulls in a china shop—they’re powerful, but they’re reckless,” scientist and author Gary Marcus told Scientific American shortly before GPT-4’s release. “I don’t think [version] four is going to change that.”
And the more humanlike these bots become, the better they are at fooling people into thinking there is a sentient agent behind the computer screen. “Because it mimics [human reasoning] so well, through language, we believe that—but underneath the hood, it’s not reasoning in any way similar to the way that humans do,” Vee cautions. If this illusion fools people into believing an AI agent is performing humanlike reasoning, they may trust its answers more readily. This is a significant problem because there is still no guarantee that those responses are accurate. “Just because these models say anything, that doesn’t mean that what they’re saying is [true],” May says. “There isn’t a database of answers that these models are pulling from.” Instead, systems like GPT-4 generate an answer one word at a time, with the most plausible next word informed by their training data—and that training data can become outdated. “I believe GPT-4 doesn’t even know that it’s GPT-4,” he says. “I asked it, and it said, ‘No, no, there’s no such thing as GPT-4. I’m GPT-3.’”
What the New GPT-4 AI Can Do
OpenAI just released an updated version of its text-generating artificial intelligence program. Here’s how GPT-4 improves on its predecessor
www.scientificamerican.com
GPT-4
openai.com
AI makes plagiarism harder to detect, argue academics – in paper written by chatbot
An academic paper entitled Chatting and Cheating: Ensuring Academic Integrity in the Era of ChatGPT was published this month in an education journal, describing how artificial intelligence (AI) tools “raise a number of challenges and concerns, particularly in relation to academic honesty and plagiarism”.
What readers – and indeed the peer reviewers who cleared it for publication – did not know was that the paper itself had been written by the controversial AI chatbot ChatGPT.
“We wanted to show that ChatGPT is writing at a very high level,” said Prof Debby Cotton, director of academic practice at Plymouth Marjon University, who pretended to be the paper’s lead author. “This is an arms race,” she said. “The technology is improving very fast and it’s going to be difficult for universities to outrun it.”
Cotton, along with two colleagues from Plymouth University who also claimed to be co-authors, tipped off editors of the journal Innovations in Education and Teaching International. But the four academics who peer-reviewed it assumed it was written by these three scholars.
For years, universities have been trying to banish the plague of essay mills selling pre-written essays and other academic work to any students trying to cheat the system. But now academics suspect even the essay mills are using ChatGPT, and institutions admit they are racing to catch up with – and catch out – anyone passing off the popular chatbot’s work as their own.
The Observer has spoken to a number of universities that say they are planning to expel students who are caught using the software.
Thomas Lancaster, a computer scientist and expert on contract cheating at Imperial College London, said many universities were “panicking”.
“If all we have in front of us is a written document, it is incredibly tough to prove it has been written by a machine, because the standard of writing is often good,” he said. “The use of English and quality of grammar is often better than from a student.”
Lancaster warned that the latest version of the AI model, ChatGPT-4, which was released last week, was meant to be much better and capable of writing in a way that felt “more human”.
Nonetheless, he said academics could still look for clues that a student had used ChatGPT. Perhaps the biggest of these is that it does not properly understand academic referencing – a vital part of written university work – and often uses “suspect” references, or makes them up completely.
Cotton said that in order to ensure their academic paper hoodwinked the reviewers, references had to be changed and added to.
Lancaster thought that ChatGPT, which was created by the San Francisco-based tech company OpenAI, would “probably do a good job with earlier assignments” on a degree course, but warned it would let them down in the end. “As your course becomes more specialised, it will become much harder to outsource work to a machine,” he said. “I don’t think it could write your whole dissertation.”
Bristol University is one of a number of academic institutions to have issued new guidance for staff on how to detect that a student has used ChatGPT to cheat. This could lead to expulsion for repeat offenders.
Prof Kate Whittington, associate pro vice-chancellor at the university, said: “It’s not a case of one offence and you’re out. But we are very clear that we won’t accept cheating because we need to maintain standards.”
She added: “If you cheat your way to a degree, you might get an initial job, but you won’t do well and your career won’t progress the way you want it to.”
Irene Glendinning, head of academic integrity at Coventry University, said: “We are redoubling our efforts to get the message out to students that if they use these tools to cheat, they can be withdrawn.”
Anyone caught would have to do training on appropriate use of AI. If they continued to cheat, the university would expel them. “My colleagues are already finding cases and dealing with them. We don’t know how many we are missing but we are picking up cases,” she said.
Glendinning urged academics to be alert to language that a student would not normally use. “If you can’t hear your student’s voice, that is a warning,” she said. Another is content with “lots of facts and little critique”.
She said that students who can’t spot the weaknesses in what the bot is producing may slip up. “In my subject of computer science, AI tools can generate code but it will often contain bugs,” she explained. “You can’t debug a computer program unless you understand the basics of programming.”
With fees at £9,250 a year, students were only cheating themselves, said Glendinning. “They’re wasting their money and their time if they aren’t using university to learn.”
AI makes plagiarism harder to detect, argue academics – in paper written by chatbot
Lecturers say programs capable of writing competent student coursework threaten academic integrity
www.theguardian.com
The period of receiving investment advice from the artificial intelligence assistant has also begun.
GCHQ warns that ChatGPT and rival chatbots are a security threat
The spy agency GCHQ has warned that ChatGPT and other AI-powered chatbots are an emerging security threat.
In an advisory note published on Tuesday the National Cyber Security Centre warns that companies such as ChatGPT maker OpenAI and its investor Microsoft “are able to read queries” typed into AI-powered chatbots.
GCHQ’s cyber security arm said: “The query will be visible to the organisation providing the [chatbot] , so in the case of ChatGPT, to OpenAI.”
Microsoft’s February launch of a chatbot service, Bing Chat, took the world by storm thanks to the software’s ability to hold a human-like conversation with its users.
The NCSC’s warning on Tuesday cautions that curious office workers experimenting with chatbot technology could reveal sensitive information through their search queries.
Cyber security experts from the GCHQ agency said, referring to large language model [LLM] tech that powers AI chatbots: “Those queries are stored and will almost certainly be used for developing the LLM service or model at some point.
“This could mean that the LLM provider (or its partners/contractors) are able to read queries, and may incorporate them in some way into future versions.
“As such, the terms of use and privacy policy need to be robustly understood before asking sensitive questions.”
Microsoft disclosed in February that its staff are reading its users’ conversations with Bing Chat, monitoring conversations to detect “inappropriate behaviour”.
Immanuel Chavoya, a senior security manager at cyber security company Sonicwall, said: “While LLM operators should have measures in place to secure data, the possibility of unauthorized access cannot be entirely ruled out.
“As a result, businesses need to ensure they have strict policies in place backed by technology to control and monitor the use of LLMs to minimize the risk of data exposure.”
The NCSC also warned that AI-powered chatbots can “contain some serious flaws”, as both Microsoft and its arch-rival Google have learnt.
An error generated by Google’s Bard AI chatbot wiped $120bn (£98.4bn) from its market valuation after the software gave a wrong answer about scientific discoveries made with the James Webb Space Telescope.
The error was prominently featured in Google promotional material used to launch the Bard service.
City firm Mishcon de Reya has banned its lawyers from typing client data into ChatGPT for fear that legally privileged material might leak or be compromised.
Accenture has also warned its 700,000 staff worldwide against using ChatGPT for similar reasons as nervous bosses fear customers’ confidential data will end up in the wrong hands.
Other companies around the world have become increasingly wary of chatbot technology.
Softbank, the Japanese tech conglomerate that owns computer chip company Arm, has warned its staff not to enter “company identifiable information or confidential data” into AI chatbots.
Other business have been quick to embrace AI chatbot technology.
City law firm Allen & Overy has deployed a chatbot tool called Harvey, built in partnership with ChatGPT makers OpenAI.
Harvey is designed to automate some legal drafting work, although the firm says humans will continue to review its output before using it for real.
Microsoft is reportedly working on a new release of ChatGPT capable of turning text queries into videos, similar to OpenAI’s DALL-E image generation technology which uses similar software to ChatGPT.
Meanwhile the government is also concerned that Britain may be falling behind in the global AI race and is launching a new task force to encourage AI chatbot technology development in the UK.
Technology Secretary Michelle Donelan said on Monday: “Establishing a task force that brings together the very best in the sector will allow us to create a gold-standard global framework for the use of AI and drive the adoption of foundation models in a way that benefits our society and economy.”
Matt Clifford, chief executive of the government’s Advanced Research and Invention Agency, has been appointed to lead the task force.
GCHQ warns that ChatGPT and rival chatbots are a security threat
Concerns centre on the possibility that sensitive search queries could be revealed
www.telegraph.co.uk