Political Science Exams in the Age of AI

June 7, 2024

Political Science Educator: volume 28, issue 1

Reflections

When ChatGPT-3.5 was released on the world in the fall of 2022, there was a collective sense of amazement. This was not a ghost in a machine, but instead something suspiciously close to a human being. The chatbot engaged in conversations on any topic, wrote iambic verse in the voice of a pirate, and even declared its love for its users. It also did really well on exams. ChatGPT-4, released in March 2023, scored 90% on the bar exam (OpenAI et al. 2024). When I asked it to take the political science exams I gave to my undergraduate students, it provided answers in the B+ to A- range. It was obvious that teaching, or at least examining students, would never be the same again.

The first, and very understandable, reaction from some teachers was to ban the damn thing. Relying on AI to write a take-home exam is obviously very similar to asking someone else to write it for you. And that is cheating! The question was only what to do about it. Although the regular AI voice is quite distinct in some ways, it is at the same time easy to disguise. You can even ask AI to disguise itself, making it virtually undetectable. As a consequence, many teachers concluded, we have to go back to the next-to-abandoned format of hand-written, in-class, exams—and leave all electronic devices at the door as you enter the classroom!

The alternative is to find a way to cohabitate with the beast. Or better yet: let’s think positive! AI is obviously a fantastic resource that has the potential to improve every aspect of university education. However, for this to happen, we must be prepared to think creatively and experiment a bit. There are a number of promising suggestions. AI can offer personalized support based on the students’ engagement and performance, identify failing students, and serve as a virtual TA by helping answer routine queries. If we “flip the classroom,” AI can provide interactive assignments for students to complete at home, while they can do their homework in class. In this way, it is easier for teachers to make sure that students are not simply copying and pasting.

During the past three semesters, I have experimented with a new form of exams. I ask the students to write “stories” about what I call “items.” An item can be anything – a text we read in class, a YouTube clip, a picture, a poem—and the story is what connects the items to each other. The task, in other words, is to identify and describe the respective items, but above all, to join them together by means of an overarching plot. The items are sort of speaking to each other, and the job of the students is to tell me what they are saying. I have a certain plot in mind, but there is no right answer as such. Students can connect the items in ways I hadn’t expected, and those answers can be equally good. When I grade, they are then rewarded on their knowledge of the material we have covered in class and on their creativity.

So how does AI do on these exams? Not very well. ChatGPT-4 is perfectly capable of identifying the individual items, and it can describe them in great detail. The connections it draws between them, though, are always superficial and often nonsensical. And this is not surprising. The connections are generally topics we discussed in class, and AI wasn’t there to listen and take notes. It could also be that this exercise is beyond what AI is capable of (for now). You have to see the connection—it is a sort of aha! moment—and AI doesn’t seem to be having those. As a result, ChatGPT and its colleagues are not a threat to this exam format. Indeed, I can safely allow students to use AI, even encourage them to use it!

Let me give you an example. This is an exam question from my Introduction to International Relations course. The three items are 1) an excerpt from Chapter XIII of Thomas Hobbes’ Leviathan; 2) an idyllic, AI-generated picture of life in a stateless society; and 3) a link to a newspaper article in which Donald Tusk, the Prime Minister of Poland, warns about the possibility of a coming European war. By means of artificial intelligence, it is easy to identify these items. But what is the connection between them? The idyllic depiction of life in a stateless society is clearly a riposte to Hobbes. But where does that then leave the Polish Prime Minister? A student who has taken my course, and paid proper attention, will remember that Hobbes’ thirteenth chapter has been used by IR theorists to describe the concept of “anarchy,” and that concept clearly speaks to Tusk’s concerns. But that story cannot easily include the picture of the stateless society. A better student will remember that Hedley Bull (on the reading list) compared the international system to a stateless society, and the very best student will connect that description first with Hobbes and then with the prospect of another war in Europe. In this way, this becomes a story about the nature of the international system, the role of social norms in regulating human conduct, and so on. Or this, at least, was my idea.

Another example comes from my course on diplomatic history. Here item 1) is an AI-generated picture of a Hindu avatar; item 2) an extract from Samuel Pepys’ diary of September 30, 1661, which describes a violent clash at the Tower of London between the diplomatic delegations of France and Spain; item 3) is a chapter from John Bassett Moore’s The Principles of American Diplomacy, 1918. The connection I had in mind here concerns the way ambassadors in early modern Europe were treated as avatars of the states they represented, and how this constantly led to diplomatic “incidents” and quarrels about protocol. Moore’s chapter explains how post-independence Americans refused to interact on these terms and favored a more democratic and businesslike form of diplomacy. Here, for my students, the avatar was clearly the biggest stumbling block. Many didn’t remember that we had discussed the role of early modern ambassadors in these terms, and many didn’t realize that “avatar” is a Hindu concept. The best students understood this connection and went on to discuss how European diplomacy has its origin in an aristocratic court culture and how 19th century Americans sought to replace this culture with a “republican” form of diplomacy.

In this way, students come to place themselves quite neatly along an A to F spectrum. The best students do well on these exams, and the not-so-good do not-so-well. The worst answers are those that never go beyond the information that AI can provide. The best answers are those that find the connections, and go on exploring, analyzing, and critiquing them. The very best students come up with stories that surprise me. And many of the answers are a joy to read – some read like film scripts, fairy tales, or political pamphlets. The format is also popular with students. According to my informal survey, 35% of 40 respondents said they “loved it,” while 57% called it “an interesting experiment” which they were prepared to try again. Only 4% of students wanted to go back to more traditional forms of exams.

This format is obviously not a panacea. There must be many political science courses that cannot be examined this way. The format may also be biased, or unreliable, in ways I haven’t considered. For example, I have yet to find out whether my colleagues would grade the answers the same way I do. These exams obviously reward students who are good writers, but those who are not can legitimately rely on AI for help. The same is true for students whose first language is not English. There is no doubt that the format can be expanded and improved in a number of ways. But one thing is certain: The advent of artificial intelligence means that we have no choice but to try new things. And let’s be honest, university teaching was well overdue for a bit of a shakeup anyway.

References

OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, et al. 2024. “GPT-4 Technical Report.” arXiv. https://doi.org/10.48550/arXiv.2303.08774.

The Chronicle of Higher Education. 2023. “Big Bot on Campus: The Perils and Potential of ChatGPT and Other AI.” Washington D.C. https://store.chronicle.com/products/big-bot-on-campus.

———. 2023. “Perspectives on Generative AI: College Leaders Assess the Promise and the Threat of a Game-Changing Tool.” Washington D.C. https://connect.chronicle.com/CHE-CI-WC-2023-09-25-C-AI-CHE_LP.html.

—

Erik Ringmar is a fellow at the Research Center in Political Science (CICP) at the University of Minho, Braga, Portugal.

Published since 2005, The Political Science Educator is the newsletter of the Political Science Education Section of the American Political Science Association. All issues of The Political Science Educator can be viewed here.

Editors: Colin Brown (Northeastern University), Matt Evans (Northwest Arkansas Community College)

Submissions: editor.PSE.newsletter@gmail.com

As part of APSA’s mission to support political science education across the discipline, APSA Educate has republished The Political Science Educator since 2021. Please visit APSA Educate’s Political Science Educator digital collection here.

Posted in Political Science Educator

Educate

Political Science Today

Political Science Educator: volume 28, issue 1

Educate

Political Science Today

Follow Us