What Would an AI Grading App Look Like?

While a whole lot of people (academics and non-academics alike) have been losing their minds lately about the potential of students using ChatGPT to cheat on their writing assignments, I haven’t read/heard/seen much about the potential of teachers using AI software to read, grade, and comment on student writing. Maybe it’s out there in the firehose stream of stories about AI I see every day (I’m trying to keep up a list on pinboard) and I’ve just missed it.

I’ve searched and found some discussion of using ChatGPT to grade on Reddit (here and here), and I’ve seen other posts about how teachers might use the software to do things other than grading, but that’s about it. In fact, the reason I’m thinking about this again now is not because of another AI story but because I watched a South Park episode about AI called “Deep Learning.” South Park has been a pretty uneven show for several years, but if you are fan and/or if you’re interested in AI, this is a must-see. A lot happens in this episode, but my favorite reaction about ChatGPT comes from the kids’ infamous teacher, Mr. Garrison. While complaining about grading a stack of long and complicated essays (which the students completed with ChatGPT), Rick (Garrison’s boyfriend) tells him about ChatGPT, and Mr. Garrison has far too honest of a reaction: “This is gonna be amazing! I can use it to grade all my papers and no one will ever know! I’ll just type the title of the essay in, it’ll generate a comment, and I don’t even have to read the stupid thing!”

Of course, even Mr. Garrison knows that would be “wrong” and he must keep this a secret. That probably explains why I still haven’t come across much about an AI grading app. But really though: shouldn’t we be having this discussion? Doesn’t Mr. Garrison have a point?

Teacher concerns about grading/scoring writing with computers are not new, and one of the nice things about having kept a blog so long is I can search and “recall” some of these past discussions. Back in 2005, I had a post about NCTE coming out against the SAT writing test and machine scoring of those tests. There was also a link in that post to an article about a sociologist at the University of Missouri named Edward Brent who had developed a way of giving students feedback on their writing assignments. I couldn’t find the original article, but this one from the BBC in 2005 covers the same story. It seems like it was a tool developed very specifically for the content of Brent’s courses and I’m guessing it was quite crude by today’s standards. I do think Brent makes a good point on the value of these kinds of tools: “It makes our job more interesting because we don’t have to deal so much with the facts and concentrate more on thinking.”

About a decade ago, I also had a couple of other posts about machine grading, both of which were posts that grew out of discussions from the now mostly defunct WPA-L. There was this one from 2012, which included a link to a New York Times article about Educational Testing Service’s product “e-rater,” “Facing a Robo-Grader? Just Keep Obfuscating Mellifluously.” The article features Les Perelman, who was the director of writing at MIT, demonstrating ways to fool e-rater with nonsense and inaccuracies. At the time, I thought Perelman was correct, but also a good argument could be made that if a student was smart enough to fool e-rater, maybe they deserved the higher score.

Then in 2013, there was another kerfuffle on WPA-L about machine grading that involved a petition drive at the website humanreaders.org against machine grading. In my post back then, I agreed with the main goal of the petition,  that “Machine grading software can’t recognize things like a sense of humor or irony, it tends to favor text length over conciseness, it is fairly easy to circumvent with gibberish kinds of writing, it doesn’t work in real world settings, it fuels high stakes testing, etc., etc., etc.” But I also had some questions about all that. I made a comparison between these new tools and the initial resistance to spell checkers, and then I also wrote this:

As a teacher, my least favorite part of teaching is grading. I do not think that I am alone in that sentiment. So while I would not want to outsource my grading to someone else or to a machine (because again, I teach writing, I don’t just assign writing), I would not be against a machine that helps make grading easier. So what if a computer program provided feedback on a chunk of student writing automatically, and then I as the teacher followed behind those machine comments, deleting ones I thought were wrong or unnecessary, expanding on others I thought were useful? What if a machine printed out a report that a student writer and I could discuss in a conference? And from a WPA point of view, what if this machine helped me provide professional development support to GAs and part-timers in their commenting on students’ work?

By the way, an ironic/odd tangent about that post: the domain name humanreaders.org has clearly changed hands. In 2013, it looked like this (this link is from the Internet Archive): basically, a petition form. The current site domain humanreaders.org redirects to this page on some content farm website called we-heart.com. This page, from 2022, is a list of the “six top online college paper writing websites today.”

Anyway, let me state the obvious: I’m not suggesting an AI application for replacing all teacher feedback (as Mr. Garrison is suggesting) at all. Besides the fact that it wouldn’t be “right” no matter how you twist the ethics of it, I don’t think it would work well– yet. Grading/commenting on student writing is my least favorite part of the job, so I understand where Mr. Garrison is coming from. Unfortunately though, reading/ grading/ commenting on student writing is essential to teaching writing. I don’t know how I can evaluate a student’s writing without reading it, and I also don’t know how to help students think about how to revise their writing (and, hopefully, learn how to apply these lessons and advice to writing these students do beyond my class) without making comments.

However, this is A LOT of work that takes A LOT of time. I’ve certainly learned some things that make grading a bit easier than it was when I started. For example, I’ve learned that less is more: marking up every little mistake or thing in the paper and then writing a really long end comment is a waste of time because it confuses and frustrates students and it literally takes longer. But it still takes me about 15-20 minutes to read and comment on each long-ish student essay, which are typically a bit shorter than this blog post. So in a full (25 students) writing class, it takes me 8-10 hours to completely read, comment on, and grade all of their essays; multiply that by two or three or more (since I’m teaching three writing classes a term), and it adds up pretty quickly. Plus we’re talking about student writing here. I don’t mind reading it and students often have interesting and inspiring observations, but by definition, these are writers who are still learning and who often have a lot to learn. So this isn’t like reading The New Yorker or a long novel or something you can get “lost” in as a reader. This ain’t reading for fun– and it’s also one of the reasons why, after reading a bunch of student papers in a day, I’m much more likely to just watch TV at night.

So hypothetically, if there was a tool out there that could help me make this process faster, easier, and less unpleasant, and if this tool also helped students learn more about writing, why wouldn’t I want to use it?

I’ve experimented a bit with ChatGPT with prompts along the lines of “offer advice on how to revise and improve the following text” and then paste in a student essay. The results are mix of (IMO) good, bad, and wrong, and mostly written in the robotic voice typical of AI writing. I think students would have a hard time sorting through these mixed messages. Plus I don’t think there’s a way (yet) for ChatGPT to comment on specific passages in a piece of student writing: that is, it can provide an overall end comment, but it cannot comment on individual sentences and paragraphs and have those comments appear in the margins like the comment feature in Word or Google Docs. Like most writing teachers, that’s a lot of the commenting I do, so an AI that can’t do that (yet) at all just isn’t that useful to me.

But the key phrase there is “yet,” and it does not take a tremendous amount of imagination to figure out how this could work in the near future. For example, what if I could train my own grading AI by feeding it a few classes worth of previous student essays with my comments? I don’t logistically know how that would work, but I am willing to bet that with enough training, a Krause-centric version of ChatGPT would anticipate most of the comments I would make myself on a student writing project. I’m sure it would be far from perfect, and I’d still want to do my own reading and evaluation. But I bet this would save me a lot of time.

Maybe, some time in the future, this will be a real app. But there’s another use of ChatGPT I’ve been playing around with lately, one I hesitate on trying but one that would both help some of my struggling students and save me time on grading. I mentioned this in my first post about using ChatGPT to teach way back in December. What I’ve found in my ChatGPT noodling (so far) is if I take a piece of writing that has a ton of errors in it (incomplete sentences, punctuation in the wrong place, run-on/meandering sentences, stuff like that– all very common issues, especially for first year writing students) and prompt ChatGPT to revise the text so it is grammatically correct, it does a wonderful job.It doesn’t change the meaning or argument of the writing– just the grammar. It generally doesn’t make different word choices and it certainly doesn’t make the student’s argument “smarter”; it just arranges everything so it’s correct.

That might not seem like much, but for a lot of students who struggle with getting these basics right, using ChatGPT like this could really help. And to paraphrase Edward Brent from way back in 2005, if students could use a tool like this to at least deal with basic issues like writing more or less grammatically correct sentences, then I might be able to spend more time concentrating more on the student’s analysis, argument, use of evidence, and so forth.

And yet– I don’t know, it even feels to me like a step too far.

I have students who have diagnosed learning difficulties of one sort or another who show me letters of accommodation from the campus disability resource center which specifically tell me I should allow students to use Grammarly in their writing process. I encourage students to go to the writing center all the time, in part because I want my students– especially the struggling ones– to sit down with a consultant who will help them go through their essays so they can revise and improve it. I never have a problem with students wanting to get feedback on their work from a parent or a friend who is “really good” at writing.

So why does it feel like encouraging students to try this in ChatGPT is more like cheating than it does for me to encourage students to be sure to spell check and to check out the grammar suggestions made by Google Docs? Is it too far? Maybe I’ll find out in class next week.

One thought on “What Would an AI Grading App Look Like?”

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.

This site uses Akismet to reduce spam. Learn how your comment data is processed.