Blog of the International Journal of Constitutional Law

Editorial: ChatGPT and Law Exams

J.H.H. Weiler, NYU School of Law

To suggest that AI is upending our world in a myriad of ways is by now a banality. To suggest that it poses a challenge to the very human condition, perhaps more so than previous technological revolutions, is, if not a banality at least a matter of extensive public discussion and debate. That there are no easy, consensus solutions has become self-evident. Here is but one much discussed example. Set aside the issue of deception. Imagine an AI that produces, say, a piece of music, or a painting, or poem. Openly and transparently. Acknowledging that it is in the “school of” Mozart, or Titian or Szymborska, respectively. Imagine further that the “machine” did a really good job. Would you, should you, enjoy it differently than you would if it were produced by a human, flesh and blood? The literature is conflicted, and I cannot even disentangle my own reactions to this. This, as mentioned, is but one tiny example. It will not be a short, or easy, process of civilizational and, perhaps, regulatory adjustment.

Be all that as it may, the advent of AI in general and one of its most accessible tools, ChatGPT, in particular, poses some daunting challenges in the world of higher education, not least when it comes to setting exams. Here we cannot set aside the issue of deception. In replying to an exam or writing a term paper, or Master’s thesis etc., a student using ChatGPT without acknowledging such is simply cheating—their teachers and institution, their fellow students and, in a deeper sense, themselves.

The alarm is spreading: What are we to do when we set written exams, especially when these take the form of “open book,” and even more so when they are “take home” (a practice very common in the US, less so in other jurisdictions)?

This is no longer a case of a hypothetical future. Colleagues from around the world are reporting increasing use, or suspicion of use, of ChatGPT by students. And since oftentimes the “young” are ahead of the curve when it comes to the use of technology, the problem is likely to grow exponentially. (In an aside, I should mention that in our Journals we have already received articles about which we are sure they were written by GPT, and who knows if there are some we did not spot. More about this particular issue in a future Editorial).

In a recent faculty seminar at New York University Law School, we were shown examples of the potential of the current version of ChatGPT (3.5) to tackle even sophisticated exam questions. The results in some cases were impressive. Even when the AI reply was not perfect, deserving say an A, in several cases it would certainly yield a passing and even above average B mark. ChatGPT 4.0 is already available and is said to have enhanced capabilities, rendering the challenge even more acute. You (and your faculty) would be well advised to put it to the test with some real questions used in prior years’ exams. You will almost certainly be surprised and even, perhaps, shocked.

As mentioned above, current discussions in different fora define the problem when it comes to exams, with obvious merit, as one of cheating and fairness. An exam, after all, is meant to test the abilities of the student and having ChatGPT answer the questions is not all that different from asking someone else, perhaps with more experience, to write one’s answers. It is true that ChatGPT can also be used as a search engine—not the best of them. But since the lines to be drawn are pretty porous at this stage of the game, the undisclosed use of ChatGPT in replying to exams should be avoided and probably should be altogether prohibited.

When it comes to testing and grading student work, the ChatGPT challenge presents itself differently in relation to exams (whether in class or “take home”) and seminar papers. I will immediately state that the seminar paper challenge is far more daunting—ChatGPT is “at its best” when it is asked to write an essay or at least a certain type of essay and answer open-ended questions. I will defer addressing the seminar paper challenge to another time. As mentioned, the challenge here is far more daunting given the strengths and weaknesses of the software. Here I will confine my reflections to exams stricto sensu.

We are in the early days of AI and higher education and it will take time for the Academy to adapt. But the problem of exams is already with us, and the following might be seen as no more than possible “stop gap” solutions pending long-term adaptation.

The most common solutions discussed are those which would ban the usage of ChatGPT in exams and find ways of enforcing such.

In an earlier Editorial, “Advice to young scholars VII: Taking exams seriously” (volume 20:1), I expressed my skepticism as regards the 20-30 minutes oral exams as a mode of assessment, as practiced in quite a few jurisdictions. At least regarding this issue (ChatGPT) my colleagues who were displeased with my critique of oral exams must be smiling. But, for better or worse, given my critique of oral exams, I cannot offer this as “the” solution to the problem. One would be sacrificing too much, in my view. As indicated in that Editorial, a combination of written and oral examination (as we do with doctoral dissertations) would probably be ideal, but not practical in many situations and institutions.  

So yes, I do think that as regards other forms of exams, one should prohibit the use of ChatGPT. Enforcement, however, is not as easy as may appear. In most universities with which I am familiar, in in-class exams students are allowed to use their computers. Even without ChatGPT this poses a challenge to teachers or institutions who still believe (I do not) in “closed book” exams. Still, we are familiar with technologies that purport to prevent in-class exam students from accessing internet resources and even downloading resources on their computers. We also know that for many “blocking technologies” there emerges rapidly a “counter blocking technology” and it is never clear who is ahead of the game.

For online teaching, which is increasingly common (even post Covid), this challenge is even greater. Online courses and exams mean that by definition it is a “take home” exam. No matter how sophisticated our online exam software is, how are we, if we take this prohibition approach, to ensure that the student is not using a secondary device on which they will happily ChatGPT (used as a verb) and then transfer its output to their principal computer?

One magic solution being discussed is to require, by regulation, platforms making ChatGPT type programs, to have an automatic watermark on the products of such. It is a chimera. What is to stop the cheating student from producing a ChatGPT text (with the watermark) and just manually copying it to their exam or paper? (Cutting and pasting will not work as the watermark will remain).

I recently experienced the online challenge in person. I am currently enrolled, as a student, in an online Master’s degree. In taking exams we were required to write the exam by hand (!) and to have two devices (phone, iPad, etc.) with the cameras turned on, enabling the proctors to ensure that we were not using extraneous resources. One does not know whether to laugh or cry…. And if the proctoring is not human, another set of issues are likely to emerge as we know from the various shortcoming of face recognition and the like. 

And yet, all this reminds me somewhat of the shrill cries of woe when cheap handheld electronic calculators were introduced in the 1970s and pupils began to bring them to school. Ban them, banish them, prohibit them was the initial reaction. It was a losing battle. The assumption today is that pupils even in grade school will make use of handheld calculators (present in every smart phone) and we simply adapted the way we teach and examine arithmetic and mathematics. 

Be this as it may, it is an occasion to remind our students about integrity. Here is a formulation which resonated with me.

If you cheat in this class, the best-case scenario for you is that you aren’t caught, and that your [GPA] goes up a little …. [A]fter your first job, no one will ever ask you for your GPA again. To buy those temporary [advantages], you’ll have practiced running roughshod over your conscience, making it easier to ignore that still small voice the next time.  (Leah Liresco Sargeant, Cheating with ChatGPT, First Things, 6 April, 2023, https://www.firstthings.com/web-exclusives/2023/04/cheating-with-chatgpt).

Will it solve the problem? You might smile or grimace cynically, but this does not mean that this kind of message should not be heard in the classroom. We are educators, after all.

I want to suggest, tentatively, an altogether different approach to the ChatGPT exam problem. Like in medicine, where AI is used increasingly in diagnostic procedures, AI is most likely, in one form or another, to become part of legal practice. In fact, it is already there. If this is the case, we would be remiss if in our design of courses (and consequently exams) we stick our heads in the sand and pretend that AI does not exist. By this I do not mean that we simply have to have, as already is the case, courses dealing with AI and the law, raising issues such as liability, e.g. when AI screws up and creates damage. I mean, instead, the integration of AI, including ChatGPT, into the regular teaching of all legal subjects.  

So first some good news. At least in the current state of AI, including ChatGPT, it will not obviate the need for legal education and even, surprisingly perhaps, in the classical ways in which it is conducted now. To simplify just a bit, and again I borrow somewhat from the experience in medicine, one needs to “know the law” in order to know how to use ChatGPT effectively: what questions to ask (what prompts to give is the lingo), what tasks to assign to AI, how to read its output intelligently and responsibly.

So no, you will not have to change radically your teaching. At least for now we, teachers, will not be out of a job.

This, however, does not mean that we could go on teaching in exactly the same way we do now, whether you belong to the lecturing guild or to the interactive so-called Socratic guild. You will, by whatever method, have to teach and train your students in the uses and abuses of AI and ChatGPT and similar programs which will no doubt emerge. Not all that different from the way we teach our students how to use LexisNexis, Westlaw, and other “analogical” online resources for legal research.

I want to illustrate this by one possible example and then move on to exams. Let’s say you have finished teaching some area of positive law—say state responsibility (in international law) with the myriad provisions of the Draft Articles. A good teacher will want to satisfy themself that their students have not only understood the various provisions but that they have the capacity to analyze a complex problem of state responsibility, identify the issues it raises and choose correctly the relevant provisions of the Draft Articles, and finally apply such to these issues with sophistication. So in one form or another, in good teaching, there would have to be a “practicum” which follows the doctrinal segment of our teaching. It can be, for example, in the form of a hypothetical, or analysis of a complex case, in class or as homework. There is nothing radical or esoteric in that.  

Enter ChatGPT. We might, for example, go through the hypothetical or the case, and ask the students how would they frame the ChatGPT inquiry to get the most effective responses. We might GPT (used as a verb) the problem or the issues ourselves and present to the students the result, and then analyze where it was helpful and where it was misleading. The students will thus experience both the potential and the risks of ChatGPT usage. I think my readers get the idea. I will add that this type of exercise will also hone the ability of the teacher to write sophisticated exams without GPT anxiety.

Here, then, are some thoughts on how one may approach exam writing in the GPT era. I do not want to pretend that I have found the holy grail, but if one accepts the view that in one form or another AI is here to stay and will be part of higher legal education, and in one form or another part of legal practice, this discussion must begin. My suggestions are intended to stimulate such discussion, solicit critique, and produce further and perhaps more sophisticated approaches.

 I am going to suggest now three, non-mutually exclusive ways, of tackling the GPT problem in legal exams.

  1. Take a slew of questions of different types (see my Editorial on exams) you have used in previous exams. GPT them yourself. You will soon enough discover the kind of questions the AI answers well or at least acceptably, and those where it does a very poor job. You will, thus, identify one kind of question you can set in a GPT anxious-free manner. As the software develops, you will need to repeat this exercise from time to time. When in your exam you have set a question that falls in this category, GPT it yourself as a reality check. You are most likely to discover, as I learnt also from our very own guru of law and technology, Thomas Streinz, that ChatGPT works best with open-ended essay-type exam questions, and worst with complex facts type problems.
  2. You might discover that there is a category of questions you like to give on which GPT does an OK job. Maybe not an A, but, say, a B- or B. Consider the following possibility. You set the question, you GPT it yourself, and in your exam you set out the question and the GPT answer. The task of the students, on which they will be graded, would be to identify the various weaknesses or mistakes in the GPT answer. A variation on this theme, suggested in a very thoughtful Tweet by Dave Hoffmann, would be to ask the students to write an improved version of the ChatGPT product. One advantage of this approach would be that it may well replicate the type of usage made of GPT in law firms. This works for both doctrinal and non-doctrinal questions.
  3. Assuming you have integrated GPT into your course design, and wish to test the students’ proficiency, here is another possibility. Set the question (to repeat, it should not be of an essay type, such as “To what extent does international law-making enjoy democratic legitimacy?”. The task for the students would be to break down the question to its components and formulate the most effective GPT prompts to be used. This is a reminder to our students (and to us) that without having studied law well in the traditional way, GPT is, if not useless, a defective tool.

I imagine that many would not wish to go so far down this route, but we should keep in mind that the use of AI in legal practice is likely to grow and that some of our students, regardless of our preaching, will make use of it. One way or another, one cannot escape the challenge.

JHHW

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *