ChatGPT may be as good as or better than students at assessments in around a quarter of university courses. However, this generally only applies to questions with a clear answer that require memory recall, rather than critical analysis.
Yasir Zaki and his team at New York University Abu Dhabi in the United Arab Emirates contacted colleagues in other departments asking them to provide assessment questions from courses taught at the university, including computer science, psychology, political science and business.
These colleagues also provided real student answers to the questions. The questions were then run through the artificial intelligence chatbot ChatGPT, which supplied its own responses.
Next, both sets of responses were sent to a team of graders. “These graders were not made aware of the sources of these answers, nor were they aware of the purpose of the grading,” says Zaki.
In nine out of the 32 courses surveyed, ChatGPT’s answers were rated as good as or better than those of students. At times, its answers were substantially better. For example, it achieved almost double the average score of students when answering questions from a course called Introduction to Public Policy.
“ChatGPT performed much better on questions that required information recall, but performed poorly on questions which required critical analysis,” says Zaki.
Sign up to our The Daily newsletter
The latest science news delivered to your inbox, every day.
Sign up to newsletter
The results highlight an issue with the way university assessments are set, says Thomas Lancaster at Imperial College London. They should probe students’ critical thinking, which may not be achieved by ChatGPT. “If [better answers are] possible [with ChatGPT], it suggests that there are flaws in the assessment design.”
Read more:
Tricks for making AI chatbots break rules are freely available online
Lancaster also says that many of the assessments that are susceptible to being cheated on via ChatGPT could have been vulnerable to existing contract cheating services. This is where students pay professional essay writers to do their work, but they may similarly not perform critical analysis.
Separately, Zaki and his team surveyed academics and students in the UK, US, India, Japan and Brazil about their attitudes towards ChatGPT. Across all of the countries, the students were more likely to say that they would use the chatbot than the academics thought they would.
Journal reference:
Scientific Reports DOI: 10.1038/s41598-023-38964-3