Can Chat GPT and Crowdsourced Forecasting Help Students Think About International Relations? A New Class Assignment

June 7, 2024

Political Science Educator: volume 28, issue 1

Reflections

By Justin Robertson (justin.robertson@cityu.edu.hk)

The arrival of ChatGPT has sparked existential questions about the future of the humanities and social sciences and has been accompanied in its wake by a hardline response: some instructors treat any use of ChatGPT as a punishable form of plagiarism. The following analysis is an attempt to determine if one can engage constructively with ChatGPT in a student assignment. It does so in the form of a new exercise where students are asked to interact with two significant developments in the field of information analysis simultaneously: ChatGPT, as an example of artificial intelligence, and collective intelligence, in the form of crowdsourced forecasting. Crowdsourced forecasting is founded on the belief that aggregating a large number of insights will yield predictions that are equally or more accurate than expert opinion. Major news outlets have run high-profile stories on crowdsourced forecasting in recent years (New York Times 2023; The Guardian 2022). When asked to assess their experience integrating information from these two sources, the students reported only lukewarm satisfaction with the performance of ChatGPT, while offering somewhat higher reviews for the role of crowdsourced forecasting. Among the most significant outcomes, the students thought more deeply about bias found in new information processes such as ChatGPT and crowdsourced forecasting and learned how to formulate better prompts.

As instructors, we typically design assignments that confront historical and emerging patterns. However, there is scope for a limited use of future-oriented thinking in which students closely analyze recent trends and lay out future scenarios. It forces students to think through their key theoretical assumptions of structures, processes, and actors, and above all the potential for social change. Lastly, making predictions is increasingly seen to stimulate learning. As Lang (2016: 61) writes, “Predictions make us curious—I wonder whether I will be right?—and curiosity is an emotion that has been recently demonstrated to boost memory when it is heightened prior to exposure to new material.” I am not the first to design an assignment around forecasting, but these relatively small number of articles have nearly all been concentrated on forecasting elections (Berg and Chambers 2019; Leiter 2023). International relations presents a broader and livelier canvas on which to forecast; for example, during the assignment period, students were confronted with crises in Gaza and the Ukraine.

A long-running major experiment recently found that social science experts performed no better than members of the general public at forecasting social phenomena (The Forecasting Collaborative 2023). The research team concluded with a silver lining, “The good news is that forecasting skills can be improved.” (The Forecasting Collaborative 2023: 493) One of the key mechanisms singled out is that forecasters who paid particularly close attention to prior data are more likely to be accurate (The Forecasting Collaborative 2023: 484). This finding raises an important pedagogical question that is particularly relevant today: can ChatGPT play a supportive role in the learning process, in this case by more easily and effectively compiling prior data for students who are preparing forecasts? Students were not asked to directly test ChatGPT’s ability to forecast. One recent study found ChatGPT fared significantly worse than human participants at forecasting (Schoenegger and Park 2023). Instead, the goal is to bring ChatGPT alongside the student as a research assistant, finetuning the ability to harness information and determine whether such information can help form a position on a forecast.

There are now many examples of how university instructors are designing creative assignments that feature an AI component (Chronicle of Higher Education 2024). Ethan Mollick of the University of Pennsylvania is a leader here, with his One Useful Thing newsletter on Substack and his pedagogy effectively summarized in a recent interview (The Ezra Klein Show 2024). Few of these first-generation experiments in teaching with AI turn the attention back on AI itself and ask students, first and foremost, to write critically and reflectively about the learning experience. Most assignments are output-oriented with the assistance of AI. The contribution of my course exercise is to ensure that students are reflecting on how AI is impacting knowledge processes at this early stage in the development of AI. Students were told that this assignment was not a conventional research assignment in which they would formulate a research question, collect evidence, and develop an argument. They would be taking part in a new experience of forecasting assisted by AI research and they would provide analysis, observations and reflection based on this experience.

The Mechanics of the Exercise

Students in a second-year international relations theory course at a university in Hong Kong critically compared and contrasted the information they collected on issues in world politics using both technologies. I partnered with Good Judgement Open (GJO), one of the three leading crowdsourced forecasting platforms, which enabled my students to be active members of their forecasting community. This entailed answering a large number of forecasts as well as putting forward their own forecasting questions. GJO is a credible organization with a rising user base and its forecasts are featured regularly in The Economist. Most recently, its forecasts were profiled as a key barometer of the world in 2024 (The Economist 2023).

The GJO partnership was powerful because the platform presents comparative data to allow students to set their own views in context to others. When a student is forecasting on a given question, they are unware of the consensus forecast. In real time, students are able to see how their forecasts compare to their peer group (the classmates in the course), to the full GJO sample, and for a limited number of questions, to expert forecasters. Students can also adjust their forecast over time and observe how the consensus forecast moves up and down over a longer period.

Students researched and highlighted factors drawn from ChatGPT searches that bear on the forecast. I explained to students that there would be substantial data generated as a result of the interaction between generative AI and crowdsourced forecasting, data that they would analyze and report on in written assignments.

Students presented their findings in two written pieces. The first assignment involved answering two questions in up to 900 words:

What observations can you offer concerning the types of forecasting questions posed, the forecasts you made, and how your forecasts were similar or different to the crowd?
Did the forecasting experience shift some of your views on international issues, generally and/or on any specific issues? Why or why not?

The second and longer assignment of up to 2,000 words required students to answer four questions:

Show how you used ChatGPT to prepare for your forecasts. What steps did you take with ChatGPT? Critically evaluate the quality of information that ChatGPT generated. Note any potential limitations or biases in the information provided by ChatGPT.
Talk about one forecast, including how you reached your forecast position, how credible you think the crowd forecast was, and what this forecast tells us about international relations. Attach a relevant example of text that ChatGPT generated for you with this forecast and tell me why it was helpful or unhelpful (the ChatGPT text should be excluded from your word count).
Do ChatGPT and forecasting complement each other, or not, in trying to think about international issues?
Based on this experience, do you believe that forecasting should be a tool more widely used in society? Why or why not?

The two assignments challenged students to engage with different types of analysis. The first assignment is centered on larger data assessment and determining patterns among the forecasting data generated during this period. Some students skillfully produced tables comparing the forecasts made by the crowd, their peers and themselves. The data analysis is combined with personal reflection in the form of reporting any shifts in perspectives on international relations. The case study method sits at the center of the second assignment and the objectives are a) to draw out ChatGPT’s relative merits and weaknesses and b) to answer whether ChatGPT and forecasting work well together and whether forecasting as a methodology should be elevated in society.

Outcomes and Findings

The exercise empowered student voices with the prospect that their forecasting questions might reach a wide audience. Naturally, many students were initially unable to form sufficiently strong questions to be posted by GJO, but, at a minimum, the brainstorming process brought them closer to the ideals of forecasting. Students reported that proposing questions about the future was both challenging and rewarding in identifying key debates about trends in international relations and critical case studies. Importantly, some students furnished informative questions that were published by GJO and forecast upon by the general public, including for example:

Will COP28 result in an agreement that includes explicit language committing to phase out all fossil fuels for energy production by a specific date?
Will the “natural growth rate” of the population of China in 2023 be negative?

Some of the most important identifiable trends among student papers can be discussed. About half of the students articulated significant disappointment with ChatGPT and the other half found value in its research support. Students adopted a more positive tone towards crowdsourced forecasting than ChatGPT with only one-third of the class presenting a critical interpretation of forecasting.

The most common ChatGPT strengths cited by students included: its capabilities to support background research, especially providing historical context; its power to generate ideas and hypotheses; and, its ability to summarize arguments for and against a given position. On the other hand, students complained that ChatGPT is prone to generic responses that are often too brief and are particularly mediocre in terms of current affairs. One student spoke of the limited role in which ChatGPT can be deployed: “ChatGPT helps to provide skeleton outlines but separate detailed research is always needed to actually write papers.”

The most pronounced criticism of crowdsourced forecasting is that it is unrepresentative of global issues. Students noted that forecasts are unduly geared to a Western, and particularly American, perspective. One student wrote: “This bias reflects the dominance of the US and other Western powers in shaping international affairs with a tendency to prioritize Western political discourse and issues. This leads to a relative lack of emphasis on Asian-focused questions.” Two examples illustrate how students drew on empirical research to make this claim. First, a student reported that on one sample day during the exercise 16 out of the 20 questions listed under the category “In the News 2024” related to political and economic developments in the US and Europe. Second, another student asked ChatGPT to suggest ten forecasting questions and demonstrated that ChatGPT’s roster of forecasting questions was more globally representative than the US-based crowdsourced platforms. There is space, students felt, for a more global outlook in crowdsourced forecasting and one student recommended that a more global audience could be attracted should prediction platforms “strive to incorporate multiple languages and employ automatic translation for each entry.”

Forecasting further introduced them to the study of bias, with several students noticing that ChatGPT was more reticient to address divisive topics in Western countries in detail than similar questions elsewhere in the world. The exercise shifted few student views but it forced them to research unfamiliar topics—quite a few mentioned the global price of oil—and in the process exposed them to new information and new research tools. The exercise taught students how to frame better prompts with generative AI, and this represents a useful problem-solving skill. As one student wrote, “If the question is not specific, the AI tends to provide a broad response, whereas a well-structured question yields crucial information relevant to the forecasting task.”

Possible Modifications for Future Exercises

The three major crowdsourced forecasting platforms, GJO, INFER Public, and Metaculus, were all willing to support my assignment. If you are interested in incorporating a crowdsourced forecasting experiment into one of your courses, you are likely to find willing organizational partner.

There is significant room to craft your own forecasting exercise and the model in this article serves as one illustrative example, with a focus on reflection. Students could also integrate crowdsourced forecasting and generative AI into a paper by tackling one issue/theme in a research-based analysis. Such a paper would involve the following steps:

Choose a major issue in international relations
Conduct ChatGPT research
Search for predictions, data points, and discussions related to the issue on forecasting platforms
Compare and contrast information from these two tools
Incorporate material from both ChatGPT and crowdsourced forecasting—and document the process—in the formation and defence of your argument

A number of smaller modifications deserve consideration. First, and building on the finding that forecasting improves when forecasters collaborate in teams (Friedman et al. 2018), you could ask your class to negotiate as a whole and carry out a select number of forecasts with one class position. It would be exciting to discuss an issue, take a vote, and then for the students to follow the progression of the class forecast.

Second, there would be value in asking students to consult more than one forecasting platform. This would increase the comparative data from which they could draw in developing arguments. Some platforms like GJO restrict forecasts to issues that must have a known outcome within one year, which is understandable but the advantage of sites like Metaculus is that they allow forecasting on longer-term trends. When proposing questions, my students framed a number of interesting questions for which the resolution would only be clear over a longer time span. For example, only Metaculus would have published these questions submitted by students:

Will Africa have a company in the Fortune 500 by 2028?
Will peak oil demand be reached in 2028 and subsequently fall in every year after?
If Trump wins the US Presidency, will the US leave NATO?

Finally, instructors could stipulate that students must revisit concluded forecasts during the course and evaluate the factors that explain why their forecast and the crowdsourced forecast were either accurate or inaccurate.

Conclusion

ChatGPT, as a new technology, is ripe for close analysis and reflection by students just as much as by instructors. This article has set out one example of how ChatGPT can be integrated into a course assignment. Students weighed the value of AI research, identified patterns in crowdsourced forecasting outcomes, and reflected on the significance of these results after experiencing both processes first-hand. Students found crowdsourced forecasting to be a stronger learning tool than ChatGPT. The exercise impacted students by challenging them to research less-known topics with a new research tool and a future-oriented perspective. Crowdsourced platforms, while small, are growing, well-organized and able to generate substantial comparative data for students to evaluate. Are they able to foster insights on the direction of international relations? There is significant potential for political science instructors to pair ChatGPT and their own crowdsourced forecasting assignments to further shed light on this fundamental question.

References

Berg, Lukas and John Chambers. 2019. “Bet Out the Vote: Prediction Markets as a Tool to Promote Undergraduate Political Engagment” Journal of Political Science Education, 15(1): 2-16. https://doi.org/10.1080/15512169.2018.1446342

The Economist, November 13, 2023. “What the ‘Superforecasters’ Predict for Major Events in 2024”. https://www.economist.com/the-world-ahead/2023/11/13/what-the-superforecasters-predict-for-major-events-in-2024

The Ezra Klein Show. 2024. “How Should I Be Using AI Right Now?”. New York Times, April 2. https://www.nytimes.com/2024/04/02/opinion/ezra-klein-podcast-ethan-mollick.html

The Forecasting Collaborative. 2023. “Insights Into the Accuracy of Social Scientists’ Forecasts of Societal Change” Nature Human Behaviour 7: 484–501. https://doi.org/10.1038/s41562-022-01517-1

Friedman, Jeffrey A., Baker, Joshua D. and Mellers, Barbara A, Tetlock, Philip E. and Zeckhauser, Richard. 2018. “The Value of Precision in Probability Assessment: Evidence from a Large-Scale Geopolitical Forecasting Tournament”. International Studies Quarterly, 62(2): 410–422. https://doi.org/10.1093/isq/sqx078

Glover, Sam. 2022. “The Big Idea: Can You Learn to Predict the Future?” The Guardian, September 26. https://www.theguardian.com/books/2022/sep/26/the-big-idea-can-you-learn-to-predict-the-future

Lang, James M. 2016. Small Teaching: Everyday Lessons from the Science of Learning, San Francisco: Jossey-Bass.

Leiter, Debra. 2023. “Teaching Forecasting Without Teaching Methods” Journal of Political Science Education, 19(2): 185-194. https://doi.org/10.1080/15512169.2022.2116333

Roose, Kevin. 2023. 2023. “The Wager That Betting Can Change the World” New York Times, October 8. https://www.nytimes.com/2023/10/08/technology/prediction-markets-manifold-manifest.html

Schoenegger, Philipp and Park, Peter S. 2023. “Who is the Better Forecaster: Humans or Generative AI?”, LSE Blogs, November 9. https://blogs.lse.ac.uk/impactofsocialsciences/2023/11/09/who-is-the-better-forecaster-humans-or-generative-ai/

Torres, J.T. and Adam Nemeroff. April 15, 2024. “Are We Asking the Wrong Questions About ChatGPT?”, Chronicle of Higher Education. https://www.chronicle.com/article/are-we-asking-the-wrong-questions-about-chatgpt

—

Justin Robertson is an Associate Professor in the Department of Public and International Affairs at the City University of Hong Kong. His scholarship on teaching and learning has been published in European Political Science, Interactive Learning Environments, Journal of Political Science Education, and International Studies Perspectives.

Published since 2005, The Political Science Educator is the newsletter of the Political Science Education Section of the American Political Science Association. All issues of The Political Science Educator can be viewed here.

Editors: Colin Brown (Northeastern University), Matt Evans (Northwest Arkansas Community College)

Submissions: editor.PSE.newsletter@gmail.com

As part of APSA’s mission to support political science education across the discipline, APSA Educate has republished The Political Science Educator since 2021. Please visit APSA Educate’s Political Science Educator digital collection here.

Posted in Political Science Educator

Educate

Political Science Today

Political Science Educator: volume 28, issue 1

Educate

Political Science Today

Follow Us