What Would an AI Grading App Look Like?

While a whole lot of people (academics and non-academics alike) have been losing their minds lately about the potential of students using ChatGPT to cheat on their writing assignments, I haven’t read/heard/seen much about the potential of teachers using AI software to read, grade, and comment on student writing. Maybe it’s out there in the firehose stream of stories about AI I see every day (I’m trying to keep up a list on pinboard) and I’ve just missed it.

I’ve searched and found some discussion of using ChatGPT to grade on Reddit (here and here), and I’ve seen other posts about how teachers might use the software to do things other than grading, but that’s about it. In fact, the reason I’m thinking about this again now is not because of another AI story but because I watched a South Park episode about AI called “Deep Learning.” South Park has been a pretty uneven show for several years, but if you are fan and/or if you’re interested in AI, this is a must-see. A lot happens in this episode, but my favorite reaction about ChatGPT comes from the kids’ infamous teacher, Mr. Garrison. While complaining about grading a stack of long and complicated essays (which the students completed with ChatGPT), Rick (Garrison’s boyfriend) tells him about ChatGPT, and Mr. Garrison has far too honest of a reaction: “This is gonna be amazing! I can use it to grade all my papers and no one will ever know! I’ll just type the title of the essay in, it’ll generate a comment, and I don’t even have to read the stupid thing!”

Of course, even Mr. Garrison knows that would be “wrong” and he must keep this a secret. That probably explains why I still haven’t come across much about an AI grading app. But really though: shouldn’t we be having this discussion? Doesn’t Mr. Garrison have a point?

Teacher concerns about grading/scoring writing with computers are not new, and one of the nice things about having kept a blog so long is I can search and “recall” some of these past discussions. Back in 2005, I had a post about NCTE coming out against the SAT writing test and machine scoring of those tests. There was also a link in that post to an article about a sociologist at the University of Missouri named Edward Brent who had developed a way of giving students feedback on their writing assignments. I couldn’t find the original article, but this one from the BBC in 2005 covers the same story. It seems like it was a tool developed very specifically for the content of Brent’s courses and I’m guessing it was quite crude by today’s standards. I do think Brent makes a good point on the value of these kinds of tools: “It makes our job more interesting because we don’t have to deal so much with the facts and concentrate more on thinking.”

About a decade ago, I also had a couple of other posts about machine grading, both of which were posts that grew out of discussions from the now mostly defunct WPA-L. There was this one from 2012, which included a link to a New York Times article about Educational Testing Service’s product “e-rater,” “Facing a Robo-Grader? Just Keep Obfuscating Mellifluously.” The article features Les Perelman, who was the director of writing at MIT, demonstrating ways to fool e-rater with nonsense and inaccuracies. At the time, I thought Perelman was correct, but also a good argument could be made that if a student was smart enough to fool e-rater, maybe they deserved the higher score.

Then in 2013, there was another kerfuffle on WPA-L about machine grading that involved a petition drive at the website humanreaders.org against machine grading. In my post back then, I agreed with the main goal of the petition,  that “Machine grading software can’t recognize things like a sense of humor or irony, it tends to favor text length over conciseness, it is fairly easy to circumvent with gibberish kinds of writing, it doesn’t work in real world settings, it fuels high stakes testing, etc., etc., etc.” But I also had some questions about all that. I made a comparison between these new tools and the initial resistance to spell checkers, and then I also wrote this:

As a teacher, my least favorite part of teaching is grading. I do not think that I am alone in that sentiment. So while I would not want to outsource my grading to someone else or to a machine (because again, I teach writing, I don’t just assign writing), I would not be against a machine that helps make grading easier. So what if a computer program provided feedback on a chunk of student writing automatically, and then I as the teacher followed behind those machine comments, deleting ones I thought were wrong or unnecessary, expanding on others I thought were useful? What if a machine printed out a report that a student writer and I could discuss in a conference? And from a WPA point of view, what if this machine helped me provide professional development support to GAs and part-timers in their commenting on students’ work?

By the way, an ironic/odd tangent about that post: the domain name humanreaders.org has clearly changed hands. In 2013, it looked like this (this link is from the Internet Archive): basically, a petition form. The current site domain humanreaders.org redirects to this page on some content farm website called we-heart.com. This page, from 2022, is a list of the “six top online college paper writing websites today.”

Anyway, let me state the obvious: I’m not suggesting an AI application for replacing all teacher feedback (as Mr. Garrison is suggesting) at all. Besides the fact that it wouldn’t be “right” no matter how you twist the ethics of it, I don’t think it would work well– yet. Grading/commenting on student writing is my least favorite part of the job, so I understand where Mr. Garrison is coming from. Unfortunately though, reading/ grading/ commenting on student writing is essential to teaching writing. I don’t know how I can evaluate a student’s writing without reading it, and I also don’t know how to help students think about how to revise their writing (and, hopefully, learn how to apply these lessons and advice to writing these students do beyond my class) without making comments.

However, this is A LOT of work that takes A LOT of time. I’ve certainly learned some things that make grading a bit easier than it was when I started. For example, I’ve learned that less is more: marking up every little mistake or thing in the paper and then writing a really long end comment is a waste of time because it confuses and frustrates students and it literally takes longer. But it still takes me about 15-20 minutes to read and comment on each long-ish student essay, which are typically a bit shorter than this blog post. So in a full (25 students) writing class, it takes me 8-10 hours to completely read, comment on, and grade all of their essays; multiply that by two or three or more (since I’m teaching three writing classes a term), and it adds up pretty quickly. Plus we’re talking about student writing here. I don’t mind reading it and students often have interesting and inspiring observations, but by definition, these are writers who are still learning and who often have a lot to learn. So this isn’t like reading The New Yorker or a long novel or something you can get “lost” in as a reader. This ain’t reading for fun– and it’s also one of the reasons why, after reading a bunch of student papers in a day, I’m much more likely to just watch TV at night.

So hypothetically, if there was a tool out there that could help me make this process faster, easier, and less unpleasant, and if this tool also helped students learn more about writing, why wouldn’t I want to use it?

I’ve experimented a bit with ChatGPT with prompts along the lines of “offer advice on how to revise and improve the following text” and then paste in a student essay. The results are mix of (IMO) good, bad, and wrong, and mostly written in the robotic voice typical of AI writing. I think students would have a hard time sorting through these mixed messages. Plus I don’t think there’s a way (yet) for ChatGPT to comment on specific passages in a piece of student writing: that is, it can provide an overall end comment, but it cannot comment on individual sentences and paragraphs and have those comments appear in the margins like the comment feature in Word or Google Docs. Like most writing teachers, that’s a lot of the commenting I do, so an AI that can’t do that (yet) at all just isn’t that useful to me.

But the key phrase there is “yet,” and it does not take a tremendous amount of imagination to figure out how this could work in the near future. For example, what if I could train my own grading AI by feeding it a few classes worth of previous student essays with my comments? I don’t logistically know how that would work, but I am willing to bet that with enough training, a Krause-centric version of ChatGPT would anticipate most of the comments I would make myself on a student writing project. I’m sure it would be far from perfect, and I’d still want to do my own reading and evaluation. But I bet this would save me a lot of time.

Maybe, some time in the future, this will be a real app. But there’s another use of ChatGPT I’ve been playing around with lately, one I hesitate on trying but one that would both help some of my struggling students and save me time on grading. I mentioned this in my first post about using ChatGPT to teach way back in December. What I’ve found in my ChatGPT noodling (so far) is if I take a piece of writing that has a ton of errors in it (incomplete sentences, punctuation in the wrong place, run-on/meandering sentences, stuff like that– all very common issues, especially for first year writing students) and prompt ChatGPT to revise the text so it is grammatically correct, it does a wonderful job.It doesn’t change the meaning or argument of the writing– just the grammar. It generally doesn’t make different word choices and it certainly doesn’t make the student’s argument “smarter”; it just arranges everything so it’s correct.

That might not seem like much, but for a lot of students who struggle with getting these basics right, using ChatGPT like this could really help. And to paraphrase Edward Brent from way back in 2005, if students could use a tool like this to at least deal with basic issues like writing more or less grammatically correct sentences, then I might be able to spend more time concentrating more on the student’s analysis, argument, use of evidence, and so forth.

And yet– I don’t know, it even feels to me like a step too far.

I have students who have diagnosed learning difficulties of one sort or another who show me letters of accommodation from the campus disability resource center which specifically tell me I should allow students to use Grammarly in their writing process. I encourage students to go to the writing center all the time, in part because I want my students– especially the struggling ones– to sit down with a consultant who will help them go through their essays so they can revise and improve it. I never have a problem with students wanting to get feedback on their work from a parent or a friend who is “really good” at writing.

So why does it feel like encouraging students to try this in ChatGPT is more like cheating than it does for me to encourage students to be sure to spell check and to check out the grammar suggestions made by Google Docs? Is it too far? Maybe I’ll find out in class next week.

The Problem is Not the AI

The other day, I heard the opening of this episode of the NPR call-in show 1A, “Know It All: ChatGPT In the Classroom.” It opened with this recorded comment from a listener named Kate:

“I teach freshman English at a local university, and three of my students turned in chatbot papers written this past week. I spent my entire weekend trying to confirm they were chatbot written, then trying to figure out how to confront them, to turn them in as plagiarist, because that is what they are, and how I’m going penalize their grade. This is not pleasant, and this is not a good temptation. These young men’s academic careers now hang in the balance because now they’ve been caught cheating.”

Now, I didn’t listen to the show for long beyond this opener (I was driving around running errands), and based on what’s available on the website, the discussion  also included information about incorporating ChatGPT into teaching. Also, I don’t want to be too hard on poor Kate; she’s obviously really flustered and I am guessing there were a lot of teachers listening to Kate’s story who could very personally relate.

But look, the problem is not the AI.

Perhaps Kate was teaching a literature class and not a composition and rhetoric class, but let’s assume whatever “freshman English” class she was teaching involved a lot of writing assignments. As I mentioned in the last post I had about AI and teaching with GPT-3 back in December, there is a difference between teaching writing and assigning writing. This is especially important in classes where the goal is to help students become better at the kind of writing skills they’ll need in other classes and “in life” in general.

Teaching writing means a series of assignments that build on each other, that involve brainstorming and prewriting activities, and that involve activities like peer reviews, discussions of revision, reflection from students on the process, and so forth. I require students in my first year comp/rhet classes to “show their work” through drafts that is in a way they similar to how they’d be expected to in an Algebra or Calculus course. It’s not just the final answer that counts. In contrast, assigning writing is when teachers give an assignment (often a quite formulaic one, like write a 5 paragraph essay about ‘x’) with no opportunities to talk about getting started, no consideration of audience or purpose, no interaction with the other students who are trying to do the same assignment, and no opportunity to revise or reflect.

While obviously more time-consuming and labor-intensive, teaching writing has two enormous advantages over only assigning writing. First, we know it “works” in that this approach improves student writing– or at least we know it works better than only assigning writing and hoping for the best. We know this because people in my field have been studying this for decades, despite the fact that there are still a lot of people just assigning writing, like Kate. Second, teaching writing makes it extremely difficult to cheat in the way Kate’s students have cheated– or maybe cheated. When I talk to my students about cheating and plagiarism, I always ask “why do you think I don’t worry much about you doing that in this class?” Their answer typically is “because we have to turn in all this other stuff too” and “because it would be too much work,” though I also like to believe that because of the way the assignments are structured, students become interested in their own writing in a way that makes cheating seem silly.

Let me just note that what I’m describing has been the conventional wisdom among specialists in composition and rhetoric for at least the last 30 (and probably more like 50) years. None of this is even remotely controversial in the field, nor is any of this “new.”

But back to Kate: certain that these three students turned in “chatbot papers,” she spent the “entire weekend” working to prove these students committed the crime of plagiarism and they deserve to be punished. She thinks this is a remarkably serious offense– their “academic careers now hang in the balance”– but I don’t think she’s going through all this because of some sort of abstract and academic ideal. No, this is personal. In her mind, these students did this to her and she’s going to punish them. This is beyond a sense of justice. She’s doing this to get even.

I get that feeling, that sense that her students betrayed her. But there’s no point in making teaching about “getting even” or “winning” because as the teacher, you create the game and the rules, you are the best player and the referee, and you always win. Getting even with students is like getting even with a toddler.

Anyway, let’s just assume for a moment that Kate’s suspicions are correct and these three students handed in essays created entirely by ChatGPT. First off, anyone who teaches classes like “Freshman English” should not need an entire weekend or any special software to figure out if these essays were written by an AI. Human writers– at all levels, but especially comparatively inexperienced human writers– do not compose the kind of uniform, grammatically correct, and robotically plodding prose generated by ChatGPT. Every time I see an article with a passage of text that asks “was this written by a robot or a student,” I always guess right– well, almost always I guess right.

Second,  if Kate did spend her weekend trying to find “the original” source ChatGPT used to create these essays, she certainly came up empty handed. That was the old school way of catching plagiarism cheats: you look for the original source the student plagiarized and confront the student with it, court room drama style. But ChatGPT (and other AI tools) do not “copy” from other sources; rather, the AI creates original text every time. That’s why there have been several different articles crediting an AI as a “co-author.”

Instead of wasting a weekend, what Kate should have done is called each of these students into her office or taken them aside one by one in a conference and asked them about their essays. If the students cheated,  they would not be able to answer basic questions about what they handed in, and 99 times out of 100, the confronted cheating student will confess.

Because here’s the thing: despite all the alarm out there that all students are cheating constantly, my experience has been the vast majority do not cheat like this, and they don’t want to cheat like this. Oh sure, students will sometimes “cut corners” by looking over to someone else’s answers on an exam, or maybe by adding a paragraph or two from something without citing it. But in my experience, the kind of over-the-top sort of cheating Kate is worried about is extremely rare. Most students want to do the right thing by doing the work, trying to learn something, and by trying their best– plus students don’t want to get in trouble from cheating either.

Further, the kinds of students who do try to blatantly plagiarize are not “criminal masterminds.” Far from it. Rather, students blatantly plagiarize when they are failing and desperate, and they are certainly not thinking of their “academic careers.” (And as a tangent: seems to me Kate might be overestimating the importance of her “Freshman English” class a smidge).

But here’s the other issue: what if Kate actually talked to these students, and what if it turned out they either did not realize using ChatGPT was cheating, and/or they used ChatGPT in a way that wasn’t significantly different from getting some help from the writing center or a friend? What do you do then? Because– and again, I wrote about this in December— when I asked students to use GPT-3 (OpenAI’s software before ChatGPT) to write an essay and to then reflect on that process, a lot of them described the software as being a brainstorming tool, sort of like a “coach,” and not a lot different from getting help from others in peer review or from a visit to the writing center.

So like I said, I don’t want to be too hard on Kate. I know that there are a lot of teachers who are similarly freaked out about students using AI to cheat, and I’m not trying to suggest that there is nothing to worry about either. I think a lot of what is being predicted as the “next big thing” with AI is either a lot further off in the future than we might think, or it is in the same category as other famous “just around the corner” technologies like flying cars. But no question that this technology is going to continue to improve, and there’s also no question that it’s not going away. So for the Kates out there: instead of spending your weekend on the impossible task of proving that those students cheated, why not spend a little of that time playing around with ChatGPT and seeing what you find out?

AI Can Save Writing by Killing “The College Essay”

I finished reading and grading the last big project from my “Digital Writing” class this semester, an assignment that was about the emergence of openai.com’s artificial intelligence technologies GPT-3 and DALL-E. It was interesting and I’ll probably write more about it later, but the short version for now is my students and I have spent the last month or so noodling around with software and reading about both the potentials and dangers of rapidly improving AI, especially when it comes to writing.

So the timing of of Stephen Marche’s recently published commentary with the clickbaity title “The College Essay Is Dead” in The Atlantic could not be better– or worse? It’s not the first article I’ve read this semester along these lines, that GPT-3 is going to make cheating on college writing so easy that there simply will not be any point in assigning it anymore. Heck, it’s not even the only one in The Atlantic this week! Daniel Herman’s “The End of High-School English” takes a similar tact. In both cases, they claim, GPT-3 will make the “essay assignment” irrelevant.

That’s nonsense, though it might not be nonsense in the not so distant future. Eventually, whatever comes after GPT-3 and ChatGPT might really mean teachers can’t get away with only assigning writing. But I think we’ve got a ways to go before that happens.

Both Marche and Herman (and just about every other mainstream media article I’ve read about AI) make it sound like GPT-3, DALL-E, and similar AIs are as easy as working the computer on the Starship Enterprise: ask the software for an essay about some topic (Marche’s essay begins with a paragraph about “learning styles” written by GPT-3), and boom! you’ve got a finished and complete essay, just like asking the replicator for Earl Grey tea (hot). That’s just not true.

In my brief and amateurish experience, using GPT-3 and DALL-E is all about entering a carefully worded prompt. Figuring out how to come up with a good prompt involved trial and error, and I thought it took a surprising amount of time. In that sense, I found the process of experimenting with prompts similar to the kind of  invention/pre-writing activities  I teach to my students and that I use in my own writing practices all the time.  None of my prompts produced more than about two paragraphs of useful text at a time, and that was the case for my students as well. Instead, what my students and I both ended up doing was entering in several different prompts based on the output we were hoping to generate. And my students and I still had to edit the different pieces together, write transitions between AI generated chunks of texts, and so forth.

In their essays, some students reflected on the usefulness of GPT-3 as a brainstorming tool.  These students saw the AI as a sort of “collaborator” or “coach,” and some wrote about how GPT-3 made suggestions they hadn’t thought of themselves. In that sense, GPT-3 stood in for the feedback students might get from peer review, a visit to the writing center, or just talking with others about ideas. Other students did not think GPT-3 was useful, writing that while they thought the technology was interesting and fun, it was far more work to try to get it to “help” with writing an essay than it was for the student to just write the thing themselves.

These reactions square with the results in more academic/less clickbaity articles about GPT-3. This is especially true about  Paul Fyfe’s “How to cheat on your final paper: Assigning AI for student writing.” The assignment I gave my students was very similar to what Fyfe did and wrote about– that is, we both asked students to write (“cheat”) with AI (GPT-2 in the case of Fyfe’s article) and then reflect on the experience. And if you are a writing teacher reading this because you are curious about experimenting with this technology, go and read Fyfe’s article right away.

Oh yeah, one of the other major limitations of GPT-3’s usefulness as an academic writing/cheating tool: it cannot do even basic “research.” If you ask GPT-3 to write something that incorporates research and evidence, it either doesn’t comply or it completely makes stuff up, citing articles that do not exist. Let me share a long quote from a recent article at The Verge by James Vincent on this:

This is one of several well-known failings of AI text generation models, otherwise known as large language models or LLMs. These systems are trained by analyzing patterns in huge reams of text scraped from the web. They look for statistical regularities in this data and use these to predict what words should come next in any given sentence. This means, though, that they lack hard-coded rules for how certain systems in the world operate, leading to their propensity to generate “fluent bullshit.”

I think this limitation (along with the limitation that GPT-3 and ChatGPT are not capable of searching the internet) makes using GPT-3 as a plagiarism tool in any kind of research writing class kind of a deal-breaker. It certainly would not get students far in most sections of freshman comp where they’re expected to quote from other sources.

Anyway, the point I’m trying to make here (and this is something that I think most people who teach writing regularly take as a given) is that there is a big difference between assigning students to write a “college essay” and teaching students how to write essays or any other genre. Perhaps when Marche was still teaching Shakespeare (before he was a novelist/cultural commentator, Marche earned a PhD specializing in early English drama), he assigned his students to write an essay about one of Shakespeare’s plays. Perhaps he gave his students some basic requirements about the number of words and some other mechanics, but that was about it. This is what I mean by only assigning writing: there’s no discussion of audience or purpose, there are no opportunities for peer review or drafts, there is no discussion of revision.

Teaching writing is a process. It starts by making writing assignments that are specific and that require an investment in things like prewriting and a series of assignments and activities that are “scaffolding” for a larger writing assignment. And ideally, teaching writing includes things like peer reviews and other interventions in the drafting process, and there is at least an acknowledgment that revision is a part of writing.

Most poorly designed assigned writing tasks are good examples of prompts that you enter into GPT-3. The results are definitely impressive, but I don’t think it’s quite useful enough to produce work a would-be cheater can pass off as their own. For example, I asked ChatGPT (twice) to “write a 1000 word college essay about the theme of insanity in Hamlet” and it came up with this and this essay. ChatGPT produced some impressive results, sure, but besides the fact that both of these essays are significantly shorter than 1000 word requirement, they both kind of read like… well, like a robot wrote them. I think that most instructors who received this essay from a student– particularly in an introductory class– would suspect that the student cheated. When I asked ChatGPT to write a well researched essay about the theme of insanity in Hamlet, it managed to produce an essay that quoted from the play, but not any research about Hamlet.

Interestingly, I do think ChatGPT has some potential for helping students revise. I’m not going to share the example here (because it was based on actual student writing), but I asked ChatGPT to “revise the following paragraph so it is grammatically correct” and I then added a particularly pronounced example of “basic” (developmental, grammatically incorrect, etc.) writing. The results didn’t improve the ideas in the writing and it changed only a few words. But it did transform the paragraph into a series of grammatically correct (albeit not terribly interesting) sentences.

In any event, if I were a student intent on cheating on this hypothetical assignment, I think I’d just do a Google search for papers on Hamlet instead. And that’s one of the other things Marche and these other commentators have left out: if a student wants to complete a badly designed “college essay” assignment by cheating, there are much much better and easier ways to do that right now.

Marche does eventually move on from “the college essay is dead” argument by the end of his commentary, and he discusses how GPT-3 and similar natural language processing technologies will have a lot of value to humanities scholars. Academics studying Shakespeare now have a reason to talk to computer science-types to figure out how to make use of this technology to analyze the playwright’s origins and early plays. Academics studying computer science and other fields connected to AI will now have a reason to maybe talk with the English-types as to how well their tools actually can write. As Marche says at the end, “Before that space for collaboration can exist, both sides will have to take the most difficult leaps for highly educated people: Understand that they need the other side, and admit their basic ignorance.”

Plus I have to acknowledge that I have only spent so much time experimenting with my openai.com account because I still only have the free version. That was enough access for my students and me to noodle around enough to complete a short essay composed with the assistance of GPT-3 and to generate an accompanying image with GPT-3. But that was about it. Had I signed up for openai.com’s “pay as you go” payment plan, I might learn more about how to work this thing, and maybe I would have figured out better prompts for that Hamlet assignment. Besides all that, this technology is getting better alarmingly fast. We all know whatever comes after ChatGPT is going to be even more impressive.

But we’re not there yet. And when it is actually as good as Marche fears it might be, and if that makes teachers rethink how they might teach rather than assign writing, that would be a very good thing.