(My friend Bill Hart-Davidson unexpectedly died last week. At some point, I’ll write more about Bill here, probably. In the meantime, I thought I’d finish this post I started a while ago about the webinar about Instructify’s AI grading app. Bill and I had been texting/talking more about AI lately, and I wish I would have had a chance to text/talk more about this. Or anything else).
In March 2023, I wrote a blog post titled “What Would an AI Grading App Look Like?” I was inspired by what I still think is one of the best episodes of South Park I have seen in years, “Deep Learning.” Follow this link for a detailed summary or look at my post from last year, but in the nutshell, the kids start using ChatGPT to write a paper assignment and Mr. Garrison figures out how to use ChatGPT to grade those papers. Hijinks ensue.
Well, about a month ago and at a time when I was up to my eyeballs in grading, I saw a webinar presentation from Instructify about their AI product called TALIA. The title of the webinar was “How To Save Dozens of Hours Grading Essays Using AI.” I missed the live event, but I watched the recording– and you can too, if you want— or at least you could when I started writing this. Much more about it after the break, but the tl;dr version is this AI grading tool is not the one I am looking for (not surprisingly), and I think it would be a good idea for these tech startups to include people with actual experience with teaching writing on their development teams.
As I mentioned in that AI grading post last year, “robograding” was an issue in the field in the 2000s when machine scoring was introduced to score things like state-based writing competency exams. Back then, I was an outlier among my colleagues about these things– at least based on the talk at forums like the WPA-L listserv, which used to have lots of vigorous debate about lots of different things before it imploded a few years ago. Most comp/rhet folks thought robo-grading was evil; I thought then (and I still think now) that maybe we shouldn’t be so dismissive of these tools. Maybe they could help teachers reduce the grading work load and thus help students as well.
Let me be clear: it would be bad to let AI do all of the evaluating/reading/commenting on/grading student writing, for two important reasons. First, when it comes to teaching writing (rather than assigning writing), the instructor’s reading/commenting/responding to their students’ writing is a key part of the pedagogical process. This is different than most other college courses. Usually, the instructor gives students some content (from lectures, textbooks, other materials) which students take in, discuss, and study. To figure out how well the students learned that content, the instructor evaluate students. The exam (or quiz or essay or whatever) results in a score that becomes the grade. Assessment happens after the teaching.
In processed-based writing courses– first year writing of course, but lots of others– where feedback on student writing (comments in the margins about both mechanics and content, along with summative comments at the end) are intended to help students think about how to improve their writing with other assignments, or potentially to revise that assignment, the grading and assessment is a critical part of the teaching. It’s not after. AI can probably help with grading (more on that below), but I think the importance of humans reading and responding to any kind of text written by other humans (including student writing) is an obvious given.
All that said, no writing teacher I know likes grading. It’s a hell of a lot of work, and, because a lot of students– especially first year writing students– are often not very good writers, it can be tedious. Do not get me wrong! I don’t expect students to be very good writers, especially in classes like first year writing! That’s why they are taking those courses! I’m just saying reading and commenting on hundreds of pages of not very good writing is a lot of work. (This is also one of the reasons why it is such a reward when I come across a great essay). So if there were some kind of software that could help me save some time with grading– help me, not replace me– then I am all for it.
Which leads me to the second problem with robograding (at least as it existed before the rise of AI): it doesn’t work for writing assignments that are open-ended and where students “invent” an argument based on their own evidence and analysis. So for example, when students are given writing assignments where they are expected to answer a specific question– like “Describe the events that lead up to the United States’ invasion of Iraq in 2003” or, as I blogged about a few weeks ago, “Explain the key components of the concept of sociological imagination”– the software can assign a basic score just as well as a human rater. And by “basic score,” I mean a ranking, like four out of five. But this software cannot score writing from prompts like “Do you believe the 2003 invasion of Iraq was justified? Why or Why not?” or “Describe a time when your individual experience aligned with the concept of sociological imagination.”
So what’s the difference between what’s been around for a long time now and AI?– I mean besides the obvious importance of marketing every and any technological innovation as AI right now? Judging from that Instrunctify webinar, not much.
Like I said, you should try to watch the video yourself– and if it doesn’t load for you right away, try putting in your name and email address, which might sign you on to some mailing list or something. It is not nearly as long as the 40 minute video might suggest. Here’s my recap:
The webinar began with almost eight minutes of sitting around and waiting people to log on, followed by introduction of the key folks involved (all of whom are technology and business folks, none of whom seem to have any experience with education), followed by a sort of ice-breaker survey, and then, about 12 minutes later, on to the presentation itself. So yeah, skip ahead.
After some introductory stuff, they showed a basic GPT they built on ChatGPT for scoring SAT writing tests. GPTs are personalized/customized AI tools that uses the OpenAI software (there was a pretty good article about this back in November in the New York Times here). You have to have access to the paid for version of ChatGPT to use or build GPTs. Funny enough, the link to the GPT they created no longer works and I could not find it by searching for it. I don’t know why this just went away, but that doesn’t give me a lot of confidence in these people.
Then, about 20 minutes into the webinar, they moved into discussing TALIA, which stands for “Teaching and Learning Intelligent Assistant.” It’s a product that Instructify has had for a while which they described as a “digital teaching assistant to professors while they are teaching the course.” Now they have added an AI grading feature. To use it, the instructor first needs to provide TALIA a specific assignment rubric, and the instructor needs to also “train” TALIA with examples of student writing that “calibrates” with the rubric. So in the example they shared, they uploaded an essay that met the grading rubric criteria that they said was one that “had a really good score on comprehension, and adequate score on analysis, and a partial score on grammar.” An instructor can upload many examples to train TALIA, and presumably, the more examples, the better. So, once that’s built by the instructor, TALIA does more or less a “first draft” of the “scoring” relative to the rubric, and it also provides a summary of the grade at the end.
Here’s a video Instrunctify has on YouTube to demonstrate how this looks:
I have no idea if this actually works. The webinar featured a video and not live demo similar to this one, so for all I know, what they showed us was just another simulation, like this one.
Obviously, this isn’t at all useful for the kinds of writing assignments common in first year writing classes. And fundamentally, I don’t think these people understand exactly why faculty talk about how much they don’t like grading student writing. It’s not because the content of student work that makes it so time consuming and tedious to read and grade; it’s because of the writerly/mechanical issues and mistakes.
“But it’s not possible to separate the content and the mechanics in a student’s essay! Besides, you don’t want to mark every mistake anyway!” Sure, but let’s just put a pin in that and instead talk about the realities of grading. About a week ago, I wrapped up my first (and I hope only!) semester where I taught nothing but first year writing. (The experience has given me a new appreciation of the lectures we have who teach four or five sections a semester, but that’s a different topic). EMU has one required first year composition course. A lot of students either test out of it (with a high enough SAT or ACT English score) or they took a class in high school (usually AP English) which we count for credit. Most of the students I have in junior/senior level writing classes tell me they never took first year writing at EMU.
So what that means is that many– not a majority, but many– of the students in first year writing at EMU are what I would classify as basic writers, or at least students who struggle with writing basics: paragraphing, what counts as a complete sentence, where to punctuate, what to capitalize, and so forth. Further, just about every student in this class struggles with citation, how to work in quotes and paraphrases into their writing (and how to do so without accidentally plagiarizing), the stylistic conventions of things like introductions and organization, about explaining evidence to a reader, and of course many other more subtle and complex writerly issues.
Now, I do not want nor do I need an AI to respond to the student’s content– even if it could– because that’s the part I am most interested in reading. The “fun” part of reading and grading my students’ writing is engaging with their ideas, so I don’t want or need an AI to do that. The tedious part is all of the mechanical stuff, both because not good writing is just harder to read and because I write comments on the draft about this issues.
So what I want out of an AI grading tool/assistant is help with these more tedious, boring, and less important than content (but still important issues to help students learn how to write) mechanical issues. Given that at least some of my students are already using Grammarly and similar tools, what I’m asking for shouldn’t be too difficult. Should it?
Well, now that the semester is over, I’ve signed up for the $20/month plan with ChatGPT in part to see if I can figure out how to set up my own GPT to answer that question myself.
I’m glad to see your distinction between “teaching writing” and “assigning writing.”
I wouldn’t mind a bot that helpfully flagged summative phrases like “The author talks about” or “In this quote you can see it says that” and clunky non-transitions like “The next thing I want to talk about is…” but the purpose wouldn’t be to “grade” the assignment. The purpose would be to flag areas for the student to work on. I spend a lot of my time circling phrases like “many people say” and asking “How many? Who counted, and who decided that the number the found should be described as ‘many’ instead of ‘some’ or ‘a few’ or ‘almost all’? On what page of whose study can I read about the counting process?”
Of course students who are determined not to learn will try running their text through a text spinner to hide those signal phrases, but if they don’t replace the vague filler with actual content that shows how their critical thinking skills are developing, that won’t help their grade.
The whole idea of looking to technology to solve our troubles (like my colleagues who ask me what’s the best AI detector on the market) and the feeling of vigilance (that it’s our job as educators to catch and punish students who misuse AI) isn’t the path that interests me. But I have adjusted my rubrics to give more weight to the kind of things that bots cannot (yet) do.
Students who rely on GenAI to avoid doing the foundational work that’s supposed to prepare them for the more advanced assignments just aren’t ready for those assignments.
When a student turns in an assignment that does well the things that GenAI does well, does poorly the things that GenAI does poorly, and does not even attempt the things GenAI cannot do (like “make a connection between the readings and the class discussion”), I can simply say, “I see grammatically impeccable sentences and well-formed paragraphs that demonstrate the ability to summarize, but those aren’t the skills I want to assess in this assignment. Would you like to try again?”
The kind of student who copies and pastes what bots generate is, in my experience, not the kind of student who reads and eagerly follows up on instructor comments, so students only rarely take me up on my offer to let them re-do these preliminary assignments.
I continue to adjust my rubric and assignments to weigh more heavily the kinds of tasks that GenAI does not do well.
100% agree.
I do exactly the same thing with commenting on student work– though all with Google Docs for the last, IDK, 15 years at least. I try to make it a little easier by copying and pasting frequently used comments, and I have started making much better use of the Canvas rubric. But it still is time consuming and tedious. So what I want is some kind of AI app that can actually comment on my students’ Google Docs, highlighting phrases with notes like “this is an incomplete sentence” or this is how you are supposed to cite this” or whatever. Then I’d go through, leaving the comments that I think are helpful and deleting the ones that aren’t, and then comment on the actual content as I go. Now that might actually help speed up the process.
AI detection is a lost cause because it doesn’t work well now and as the AIs improve, there’s no way the detection software is going to keep up. To me, there are two obvious things we should be working on instead. The first is to revise/change/move away from assigned writing to teaching writing, and by “teaching writing” I mean a series of scaffolded assignments, peer review, and revision. Also, all along the way, we should be asking students to “show their work” through the different exercises/assignments that build the writing assignments. As I said to my first year students this year, y’all know that’s how this is supposed to work. Since I’ve taught fycomp about 1000 times, I have the descriptions of all of the various writing projects of the semester. If, after seeing the assignments and reading the syllabus and such, a student showed up a few days later and said to me”Hey, Professor Krause, I finished all these assignments last night, and I want to hand them all in now, you know, early. Is that okay?” Of course that’s not okay! You know that’s not how this works!
So I suppose in part because of this approach, I don’t feel like I had many students who used ChatGPT or a similar AI. I didn’t find that all that surprising because in my experience, students who cheat in first year writing classes are not criminal masterminds. Rather, they are failing and they are desperate to try anything to pass.
The second thing is to teach students how to really use these AIs. That’s my project for the summer.