TALIA? This is Not the AI Grading App I Was Searching For

(My friend Bill Hart-Davidson unexpectedly died last week. At some point, I’ll write more about Bill here, probably. In the meantime, I thought I’d finish this post I started a while ago about the webinar about Instructify’s AI grading app. Bill and I had been texting/talking more about AI lately, and I wish I would have had a chance to text/talk more about this. Or anything else).

In March 2023, I wrote a blog post titled “What Would an AI Grading App Look Like?” I was inspired by what I still think is one of the best episodes of South Park I have seen in years, “Deep Learning.”  Follow this link for a detailed summary or look at my post from last year, but in the nutshell, the kids start using ChatGPT to write a paper assignment and Mr. Garrison figures out how to use ChatGPT to grade those papers. Hijinks ensue.

Well, about a month ago and at a time when I was up to my eyeballs in grading, I saw a webinar presentation from Instructify about their AI product called TALIA. The title of the webinar was “How To Save Dozens of Hours Grading Essays Using AI.” I missed the live event, but I watched the recording– and you can too, if you want— or at least you could when I started writing this. Much more about it after the break, but the tl;dr version is this AI grading tool is not the one I am looking for (not surprisingly), and I think it would be a good idea for these tech startups to include people with actual experience with teaching writing on their development teams.

Continue reading “TALIA? This is Not the AI Grading App I Was Searching For”

Once Again, the Problem is Not AI (a Response to Justus’ and Janos’ “Assessment of Student Learning is Broken”)

I most certainly do not have the time to be writing this  because it’s the height of the “assessment season” (e.g., grading) for several different assignments my students have been working on for a while now. That’s why posting this took me a while– I wrote it during breaks in a week-long grading marathon. In other words, I have better things to do right now. But I find myself needing to write a bit in response to Zach Justus and Nik Janos’ Inside Higher Ed piece “Assessment of Student Learning is Broken,” and I figured I might as well make it into a blog entry. I don’t want to be a jerk about any of this and I’m just Justus and Janos are swell guys and everything, but this op-ed bothered me a lot.

Justus and Janos are both professors at Chico State in California; Justus is a professor in Communications and is the director of the faculty development program there, and Janos is in sociology. They begin their op-ed about AI “breaking” assessment quite briskly:

Generative artificial intelligence (AI) has broken higher education assessment. This has implications from the classroom to institutional accreditation. We are advocating for a one-year pause on assessment requirements from institutions and accreditation bodies. We should divert the time we would normally spend on assessment toward a reevaluation of how to measure student learning. This could also be the start of a conversation about what students need to learn in this new age.

I hadn’t thought a lot about how AI might figure into institutional accreditation, so I kept reading. And that’s where I first began to wonder about the argument they’re making, because very quickly, they seem to equate institutional assessment with assessment in individual classes (grading). Specifically, most of this piece is about the problems caused by AI (supposedly) of a very specific assignment in a very specific sociology class.

I have no direct experience with institutional assessment, but as part of the Writing Program Administration work I’ve dipped into a few times over the years, I have some experience with program assessment. In those kind of assessments, we’re looking at the forest rather than the individual trees. For example, maybe as part of a program assessment, the WPAs might want to consider the average grades of all sections of first year writing. That sort of measure could tell us stuff about the overall pass rate and grade distribution across sections, and so on.  But that data can’t tell you much about grades for specific students or the practices of a specific instructor. As far as I can tell, institutional assessments are similar “big picture” evaluations.

Justus and Janos see it differently, I guess:

“Take an introductory writing class as an example. One instructor may not have an AI policy, another may have a “ban” in place and be using AI detection software, a third may love the technology and be requiring students to use it. These varied policies make the aggregated data as evidence of student learning worthless.”

Yes, different teachers across many different sections of the same introductory writing class take different approaches to teaching writing, including with (or without) AI. That’s because individual instructors are, well, individuals– plus each group of students is different as well. Some of Justus and Janos’ reaction to these differences probably have to do with their disciplinary presumptions about “data”: if it’s not uniform and if it not something that can be quantified, then it is, as they say, “worthless.” Of course in writing studies, we have no problem with much more fuzzy and qualitative data. So from my point of view, as long as the instructors are more or less following the same outcomes/curriculum, I don’t see the problem.

But like I said, Justus and Janos aren’t talking about institutional assessment. Rather, they devote most of this piece to a very specific assignment. Janos teaches a sociology class that has an institutional writing competency requirement for the major. The class has students “writing frequently” with a variety of assignments for “nonacademic audiences,” like “letters-to-the-editor, … encyclopedia articles, and mock speeches to a city council” meeting. Justus and Janos say “Many of these assignments help students practice writing to show general proficiency in grammar, syntax and style.” That may or may not be true, but it’s not at all clear how this was assigned or what sort of feedback students received. .

Anyway, one of the key parts of this class is a series of assignments about:

“a foundational concept in sociology called the sociological imagination (SI), developed by C. Wright Mills. The concept helps people think sociologically by recognizing that what we think of as personal troubles, say being homeless, are really social problems, i.e., homelessness.”

It’s not clear to me what students read and study to learn about SI, but it’s a concept that’s been around for a long time– Mills wrote about it in a book in the 1950s. So not surprisingly, there is A LOT of information about this available online, and presumably that has been the case for years.

Students read about SI and as part of their study, they “are asked to provide, in their own words and without quotes, a definition of the SI.” To help do this, students do activities like “role play” to they are talking to friends or family about a social problem such as homelessness. “Lastly,” (to quote at length one last time):

…students must craft a script of 75 words or fewer that defines the SI and uses it to shed light on the social problem. The script has to be written in everyday language, be set in a gathering of friends or family, use and define the concept, and make one point about the topic.

Generative AI, like ChatGPT, has broken assessment of student learning in an assignment like this. ChatGPT can meet or exceed students’ outcomes in mere seconds. Before fall 2022 and the release of ChatGPT, students struggled to define the sociological imagination, so a key response was to copy and paste boilerplate feedback to a majority of the students with further discussion in class. This spring, in a section of 27 students, 26 nailed the definition perfectly. There is no way to know whether students used ChatGPT, but the outcomes were strikingly different between the pre- and post-AI era.

Hmm. Okay, I have questions.

  • You mean to tell me that the key deliverable/artifact that students produce in this class to demonstrate that they’ve met a university-mandated gen ed writing requirement is a 75 word or fewer passage? That’s it? Really. Really? I am certainly not saying that being able to produce a lot of text should not be the main factor for demonstrating “writing competency,” but this seems more than weird and hard to believe.
  • Is there any instructional apparatus for this assignment at all? In other words, do students have to produce drafts of this script? Are there any sort of in-class work with the role-play that’s documented in some way? Any reflection on the process? Anything?
  • I have no idea what the reading assignments and lectures were for this assignment, so I could very well be missing a key concept with SI. But I feel like I could have copied and pasted together a pretty good script just based on some Google searching around– if I was inclined to cheat in the first place. So given that, why are Justus and Janos confident that students hadn’t been cheating before Fall 2022?
  • The passage about the “before Fall 2022” approach to teaching this writing assignment says a lot. It sounds like there’s no actual discussion of what students wrote, and the main instructions to students back then was to follow “boilerplate feedback.” So, in assessing this assignment, was Janos evaluating the unique choices students made in crafting their SI scripts? Or rather, was he evaluating these SI scripts for the “right answer” he provided in the readings or lectures?
  • And as Justus and Janos note, there is no good way to know for certain if a student handed in something made in part or in whole by AI, so why are they assuming that all of those students who got the “right answer” with their SI scripts were cheating?

So, Justus and Janos conclude, because now instructors are evaluating “some combination of student/AI work,” it is simply impossible to make any assessment for institutional accreditation. Their solution is “we should have a one-year pause wherein no assessment is expected or will be received.” What kinds of assessments are they talking about? Why only a year pause? None of this is clear.

Clearly, the problem here is not institutional assessment or the role of AI; the problem is the writing assignment. The solutions are also obvious.

First, there’s the teaching writing versus assigning it.  I have blogged a lot about this in the last couple years (notably here), but teaching writing means a series of assignments where students need to “show their work.” That seems extremely doable with this particular assignment, too. Sure, it would require more actual instruction and evaluation than “boilerplate feedback,” but this seems like a small class (27 students), so that doesn’t seem that big of a deal.

Second, if you have an assignment in anything that can successfully be completed with a simple prompt into ChatGPT (as in “write a 75 word script explaining SI in everyday language”), then that’s definitely now a bad assignment. That’s the real “garbage in, garbage out” issue here.

And third, one of the things that AI has made me realize is if an instructor has an assignment in a class– and I mean any assignment in any class– which can be successfully completed without having any experience or connection to that instructor or the class, then that’s a bad assignment. Again, that seems like an extremely easy to address with the assignment that Justus and Janos describe. They’d have to make changes to the assignment and assessment, of course, but doesn’t that make more sense than trying to argue that we should completely revamp the institutional accreditation process?

I’m Still Dreaming of an AI Grading Agent (and a bunch of AI things about teaching and writing)

I’m in the thick of the fall semester and I’ve been too busy to think/read/write much about AI for a while. Honestly, I’m too busy to be writing this right now, but I’ve also got a bucket full of AI tabs open on my browser, so I thought I’d do a bit of a procrastination and “round up” post.

In my own classes, students seem to be either leery of or unimpressed with AI. I’ve encouraged my more advanced students to experiment with/play around with AI to help with the assignments, but absent me requiring them to do something with AI, they don’t seem too interested. I’ve talked to my first year writing students about using AI to brainstorm and to revise (and to be careful about trusting what the AI presents as “facts”), but again, they don’t seem interested. I have had at least one (and perhaps more than that) student who tried to use AI to cheat, but it was easy to spot. As I have said before, I think most students want to do the work themselves and to actually learn something, and the students who are inclined to cheat with AI (or just a Google search) are far from criminal geniuses.

That said, there is this report, “GenAI in Higher Education: Fall 2023 Update Time for Class Study,” which was research done by a consulting firm called Tyton Partners and sponsored by Turnitin. I haven’t had a chance to read beyond the executive summary, but they claim half of students are “regular users” of generative AI, though their use is “relatively unsophisticated.” Well, unless a lot of my students are not telling me the truth about using AIs, this isn’t my impression. Of course, they might be using AI stuff more for other classes.

Here’s a very local story about how AI is being used in at least one K-12 school district: “‘AI is here.’ Ypsilanti schools weigh integrity, ethics of new technology,” from MLive. Interestingly, a lot of what this story is about is how teachers are using AI to develop assignments, and also to do some things like helping students who don’t necessarily speak English as their native language:

Serving the roughly 30% of [Ypsilanti Community Schools] students who can speak a language other than English, the English Learner Department has found multiple ways to bring AI into the classroom, including helping teachers develop multilingual explanations of core concepts discussed in the curriculum — and save time doing it.

“A lot of that time saving allows us to focus more on giving that important feedback that allows students to grow an be aware of their progress and their learning,” [Teacher Connor] Laporte said.

Laporte uses an example of a Spanish-speaking intern who improved a vocabulary test by double-checking the translations and using ChatGPT to add more vocabulary words and exercises. Another intern then used ChatGPT to make a French version of the same worksheet.

A lot of the theme of this article is about how teachers have moved beyond being terrified of AI ruining everything to becoming a tool to work with in teaching. That’s happening in lots of places and lots of ways; for example, as Inside Higher Ed noted, “Art Schools Get Creative Tackling AI.” It’s a long piece with a somewhat similar theme: not necessarily embracing AI, but also recognizing the need to work with it.

MLA apparently now has “rules” for how to cite AI. I guess maybe it isn’t the end of the essay then, huh? Of course, that doesn’t mean that a lot of writers are going to be happy about AI.  This one is from a while ago, but in The Atlantic back in September, Alex Reisner wrote about “These 183,000 Books are Fueling the Biggest Fight in Publishing and Tech.” Reisner had written earlier about how Meta’s AI systems were being trained on a collection of more than 191,000 books that were often used without permission. The article has a search feature so you can see if your book(s) were a part of that collection. For what it’s worth, my book and co-edited collection about MOOCs did not make the cut.

Several famous people/famous writers are now involved in various lawsuits where the writers are suing the AI companies for using their work without permission to train (“teach?”) the AIs. There’s a part of me that is more than sympathetic to these lawsuits. After all, I never thought it was fair that companies like Turnitin can use student writing without permission as part of its database for detecting plagiarism. Arguably, this is similar.

But on the other hand, OpenAI et al didn’t “copy” passages from Sarah Silverman or Margaret Atwood or my friend Dennis Danvers (he’s in that database!) and then try to present that work as something the AI wrote. Rather, they trained (taught?) the AI by having the program “read” these books. Isn’t that just how learning works? I mean, everything I’ve ever written has been been influenced in direct and indirect ways by other texts I’ve read (or watched, listened to, seen, etc). Other than scale (because I sure as heck have not read 183,000 books), what’s the difference between me “training” by reading the work of others and the AI doing this?

Of course, even with all of this training and the continual tweaking of the software, AIs still have the problem of making shit up. Cade Metz wrote in The New York Times “Chatbots May ‘Hallucinate’ More Often Than Many Realize.” Among other things, the article is about a new start-up called Vectara that is trying to estimate just how often AIs “hallucinate,” and (to leap ahead a bit) they estimated that different AIs hallucinate at different rates ranging from 3% to 27% of the time. But it’s a little more complicated than that.

Because these chatbots can respond to almost any request in an unlimited number of ways, there is no way of definitively determining how often they hallucinate. “You would have to look at all of the world’s information,” said Simon Hughes, the Vectara researcher who led the project.

Dr. Hughes and his team asked these systems to perform a single, straightforward task that is readily verified: Summarize news articles. Even then, the chatbots persistently invented information.

“We gave the system 10 to 20 facts and asked for a summary of those facts,” said Amr Awadallah, the chief executive of Vectara and a former Google executive. “That the system can still introduce errors is a fundamental problem.”

If I’m understanding this correctly, this means that even when you give the AI a fairly small data-set to analyze (10-20 “facts”), the AI still makes shit up with things not a part of that data-set. That’s a problem.

But it still might not stop me from trying to develop some kind of ChatGPT/AI-based grading tool, and that might be about to get a lot easier. (BTW, talk about burying the lede after that headline!)  OpenAI announced something they’re calling (very confusingly) “GPTs,” which (according to this article by Devin Coldewey in TechCrunch) is “a way for anyone to build their own version of the popular conversational AI system. Not only can you make your own GPT for fun or productivity, but you’ll soon be able to publish it on a marketplace they call the GPT Store — and maybe even make a little cash in the process.”

Needless to say, my first thought was could I use this to make an AI Grading tool? And do I have the technical skills?

As far as I can tell from OpenAI’s announcement about this,  GPTs require upgrading to their $20 a month package and it’s just getting started– the GPT store is rolling out later this month, for example.  Kevin Roose of The New York Times has a thoughtful and detailed article about the dangers and potentials of these things, “Personalized A.I. Agents Are Here. Is the World Ready for Them?” User-created agents will very soon be able to automate responses to questions (that OpenAI announcement has examples like a “Creative Writing Coach,” a “Tech Advisor” for trouble-shooting things, and a “Game Time” advisor that can explain the rules of card and board games. Roose writes a fair amount about how this technology could also be used by customer service or human resource offices, and to handle things like responding to emails or updating schedules. Plus none of this requires any actual programming skills, so I am imagining something like “If This Then That” but much more powerful.

AI agents might also be made to do evil things, which has a lot of security people worried for obvious reasons. Though I don’t think these agents are going to be to powerful enough to do anything too terrible; actually, I don’t think these agents will have the capabilities to make the AI grading app I want, at least not yet. Roose got early access to the OpenAI project, and his article has a couple of examples of how he played around with it:

The first custom bot I made was “Day Care Helper,” a tool for responding to questions about my son’s day care. As the sleep-deprived parent of a toddler, I’m always forgetting details — whether we can send a snack with peanuts or not, whether day care is open or closed for certain holidays — and looking everything up in the parent handbook is a pain.

So I uploaded the parent handbook to OpenAI’s GPT creator tool, and in a matter of seconds, I had a chatbot that I could use to easily look up the answers to my questions. It worked impressively well, especially after I changed its instructions to clarify that it was supposed to respond using only information from the handbook, and not make up an answer to questions the handbook didn’t address.

That sounds pretty cool, and I bet I could create an AI agent capable of writing an summative end-comment on a student essay based on a detailed grading rubric I feed into the machine. But that’s a long way from doing the kind of marginal commenting on student essays that responds to particular sentences, phrases, and paragraphs. I want an AI agent/grading tool that can “read” a piece of student writing that is more like how I would read and comment on a piece of student writing, and that  limited to a rubric.

But this is getting a lot closer to being potentially useful– not a substitute for me actually reading and evaluating student writing, but as a tool to make it easier to do. Right now, the free version of ChatGPT does a good job of revising away grammar and style mistakes and errors, so maybe instead of me making marginal comments on a draft about these issues, students can first try using the AI to help them do this kind of low-level revision before they turn it in. That, combined with a detailed end comment from the AI might, actually work well. I’m not quite sure if this would actually save me any time since it seems like setting up the AI to do this would take a lot of time, and I have a feeling I’d have to set up the AI agent for every unique assignment. Plus, and in addition to the time it would take to set up, this would cost me $20 a month.

Maybe for next semester….

So, What About AI Now? (A talk and an update)

A couple of weeks ago, I gave a talk/lead a discussion called “So, What About AI Now?” That’s a link to my slides. The talk/discussion was for a faculty development program at Washtenaw Community College, a program organized by my friend, colleague, and former student, Hava Levitt-Phillips.

I covered some of the territory I’ve been writing about here for a while now and I thought both the talk and discussion went well. I think most of the people at this thing (it was over Zoom, so it was a little hard to read the room) had seen enough stories like this one on 60 Minutes the other night: Artificial Intelligence is going to at least be as transformative of a technology as “the internet,” and there is not a zero percent chance that it could end civilization as we know it. All of which is to say we probably need to put the dangers of a few college kids using AI (badly) to cheat on poorly designed assignments into perspective.

I also talked about how we really need to question some of the more dubious claims in the MSM about the powers of AI, such as the article in the Chronicle of Higher Education this past summer, “GPT-4 Can Already Pass Freshman Year at Harvard.”  I blogged about that nonsense a couple months ago here, but the gist of what I wrote there is that all of these claims of AI being able to pass all these tests and freshman year at Harvard (etc.) are wrong. Besides the fact that the way a lot of these tests are run make the claims bogus (and that is definitely the case with this CHE piece), students in our classes still need to show up– and I mean that for both f2f and online courses.

And as we talked about at this session, if a teacher gives students some kind of assignment (an essay, an exam, whatever) that can be successfully completed without ever attending class, then that’s a bad assignment.

So the sense that I got from this group– folks teaching right now the kinds of classes where (according to a lot of the nonsense that’s been in MSM for months) the cheating with ChatGPT et al was going to just make it impossible to assign writing anymore, not in college and not in high school— is it hasn’t been that big of a deal. Sure, a few folks talked about students who tried to cheat with AI who were easily caught, but for the most part it hadn’t been much of a problem. The faculty in this group seemed more interested in trying to figure out a way to make use of AI in their teaching than they were in cheating.

I’m not trying to suggest there’s no reason to worry about what AI means for the future of… well, everything, including education. Any of us who are “knowledge workers”– that is, teachers, professors, lawyers, scientists, doctors, accountants, etc. etc.– needs to pay attention to AI because there’s no question this shit is going to change the way we do our jobs. But my sense from this group (and just the general vibe I get on campus and in social media) is that the freak-out about AI is over, which is good.

One last thing though:  just the other day (long after this talk), I saw what I believe to be my first case of a student trying to cheat with ChatGPT– sort of. I don’t want to go into too many details since this is a student in one of my classes right now. But basically, this student (who is struggling quite a bit) turned in a piece of writing that was first and foremost not the assignment I gave, and it also just happened this person used ChatGPT to generate a lot of the text. So as we met to talk about what the actual assignment was and how this student needed to do it again, etc., I also started asking about what they turned in.

“Did you actually write this?” I asked. “This kind of seems like ChatGPT or something.”

“Well, I did use it for some of it, yes.”

“But you didn’t actually read this book ChatGPT is citing here, did you?”

“Well, no…”

And so forth.  Once again, a good reminder that students who resort to cheating with things like AI are far from criminal masterminds.

No, an AI could not pass “freshman year” in college

I am fond of the phrase/quote/mantra/cliché “Ninety percent of success in life is just showing up,” which is usually attributed to Woody Allen. I don’t know if Woody was “the first” person to make this observation (probably not, and I’d prefer if it was someone else), but in my experience, this is very true.

This is why AIs can’t actually pass a college course or their freshmen year or law school or whatever: they can’t show up. And it’s going to stay that way, at least until we’re dealing with advanced AI robots.

This is on my mind because my friend and colleague in the field, Seth Kahn, posted the other day on Facebook about this recent article from The Chronicle of Higher Education by Maya Bodnick, “GPT-4 Can Already Pass Freshman Year at Harvard.” (Bodnick is an undergraduate student at Harvard). It is yet another piece claiming that the AI is smart enough to do just fine on its own at one of the most prestigious universities in the world.

I agreed with all the other comments I saw on Seth’s post. In my comment (which I wrote before I actually read this CHE article), I repeated three points I’ve written about here or on social media before. First, ChatGPT and similar AIs can’t evaluate and cite academic research at even the modest levels I expect in a first year writing class. Second, while OpenAI proudly lists all the “simulated exams” where ChatGPT has excelled (LSAT, SAT, GRE, AP Art History, etc.), you have to click the “show more exams” button on that page to see that none of the versions of their AI has managed better than a “2” on the AP English Language (and also Literature) and Composition exams. It takes a “3” on this exam to get any credit at EMU, and probably a “4” at a lot of other universities.

Third, I think mainstream media and all the rest of us really need to question these claims of AIs passing whatever tests and classes and whatnot much MUCH more carefully than I think most of us have to date.  What I was thinking about when I made that last comment was another article published in CHE and in early July, “A Study Found That AI Could Ace MIT. Three MIT Students Beg to Differ,” by Tom Bartlett. In this article, Bartlett discusses  a study (which I don’t completely understand because it’s too much math and details) conducted by 3 MIT students (class of 2024) who researched the claim that an AI could “ace” MIT classes. The students determined this was bullshit. What were the students’ findings (at least the ones I could understand)? In some of the classes where the AI supposedly had a perfect score, the exams include unsolvable problems, so it’s not even possible to get a perfect score. In other examples, the exam questions the AI supposedly answered correctly did not provide enough information for that to be possible either. The students posted their results online and at least some of the MIT professors who originally made the claims agreed and backtracked.

But then I read this Bodnick article, and holy-moly, this is even more bullshitty than I originally thought. Let me quote at length Bodnick describing her “methodology”:

Three weeks ago, I asked seven Harvard professors and teaching assistants to grade essays written by GPT-4 in response to a prompt assigned in their class. Most of these essays were major assignments which counted for about one-quarter to one-third of students’ grades in the class. (I’ve listed the professors or preceptors for all of these classes, but some of the essays were graded by TAs.)

Here are the prompts with links to the essays, the names of instructors, and the grades each essay received:

  • Microeconomics and Macroeconomics (Jason Furman and David Laibson): Explain an economic concept creatively. (300-500 words for Micro and 800-1000 for Macro). Grade: A-
  • Latin American Politics (Steven Levitsky): What has caused the many presidential crises in Latin America in recent decades? (5-7 pages) Grade: B-
  • The American Presidency (Roger Porter): Pick a modern president and identify his three greatest successes and three greatest failures. (6-8 pages) Grade: A
  • Conflict Resolution (Daniel Shapiro): Describe a conflict in your life and give recommendations for how to negotiate it. (7-9 pages). Grade: A
  • Intermediate Spanish (Adriana Gutiérrez): Write a letter to activist Rigoberta Menchú. (550-600 words) Grade: B
  • Freshman Seminar on Proust (Virginie Greene): Close read a passage from In Search of Lost Time. (3-4 pages) Grade: Pass

I told these instructors that each essay might have been written by me or the AI in order to minimize response bias, although in fact they were all written by GPT-4, the recently updated version of the chatbot from OpenAI.

In order to generate these essays, I inputted the prompts (which were much more detailed than the summaries above) word for word into GPT-4. I submitted exactly the text GPT-4 produced, except that I asked the AI to expand on a couple of its ideas and sequenced its responses in order to meet the word count (GPT-4 only writes about 750 words at a time). Finally, I told the professors and TAs to grade these essays normally, except to ignore citations, which I didn’t include.

Not only can GPT-4 pass a typical social science and humanities-focused freshman year at Harvard, but it can get pretty good grades. As shown in the list above, GPT-4 got all A’s and B’s and one Pass.

JFC. Okay, let’s just think about this for a second:

  • We’re talking about three “essays” that are less than 1000 words and another three that are slightly longer, and based on this work alone, GPT-4 “passed” a year of college at Harvard. That’s all it takes. Really; really?! That’s it?
  • I would like to know more about what Bodnick means when she says that the writing prompts were “much more detailed than the summaries above” because those details matter a lot. But as summarized, these are terrible assignments. They aren’t connected with the context of the class or anything else.  It would be easy to try to answer any of these questions with a minimal amount of Google searching and making educated guesses. I might be going out on a limb here, but I don’t think most writing assignments at Harvard or any other college– even badly assigned ones– are as simplistic as these.
  • It wasn’t just ChatGPT: she had to do some significant editing to put together ChatGPT’s short responses into longer essays. I don’t think the AI could have done that on its own. Unless it hired a tutor.
  • Asking instructors to not pay any attention to the lack of citation (and I am going to guess the need for sources to back up claims in the writing) is giving the AI way WAAAAYYY too much credit, especially since ChatGPT (and other AIs) usually make shit up hallucinate when citing evidence. I’m going to guess that even at Harvard, handing in hallucinations would result in a failing grade. And if the assignment required properly cited sources and the student didn’t do that, then that student would also probably fail.
  • It’s interesting (and Bodnick points this out too) that the texts that received the lowest grades are ones that ask students to “analyze” or to provide their opinions/thoughts, as opposed to assignments that were asking for an “information dump.” Again, I’m going to guess that, even at Harvard, there is a higher value placed on students demonstrating with their writing that they thought about something.

I could go on, but you get the idea. This article is nonsense. It proves literally nothing.

But I also want to return to where I started, the idea that a lot of what it means to succeed in anything (perhaps especially education) is showing up and doing the work. Because after what seems like the zillionth click-bait headline about how ChatGPT could graduate from college or be a lawyer or whatever because it passed a test (supposedly), it finally dawned on me what has been bothering me the most about these kinds of articles: that’s just not how it works! To be a college graduate or a lawyer or damn near anything else takes more than passing a test; it takes the work of showing up.

Granted, there has been a lot more interest and willingness in the last few decades to consider “life experience” credit as part of degrees, and some of these places are kind of legitimate institutions– Southern New Hampshire and the University of Phoenix immediately come to mind. But “life experience” credit is still considered mostly bullshit and the approach taken by a whole lot of diploma mills, and real online universities (like SNHU and Phoenix) still require students to mostly take actual courses, and that requires doing more than writing a couple papers and/or taking a couple of tests.

And sure, it is possible to become a lawyer in California, Vermont, Virginia and Washington without a law degree, and it is also possible to become a lawyer in New York or Maine with just a couple years of law school or an internship. But even these states still require some kind of experience with a law office, most states do require attorneys to have law degrees, and it’s not exactly easy to pass the bar without the experience you get from earning a law degree. Ask Kim Kardashian. 

Bodnick did not ask any of the faculty who evaluated her AI writing examples if it would be possible for a student to pass that professor’s class based solely on this writing sample because she already knew the answer: of course not.

Part of the grade in the courses I teach is based on attendance, participation in the class discussions and peer review, short responses to readings, and so forth. I think this is pretty standard– at least in the humanities. So if some eager ChatGPT enthusiast came to one of my classes– especially one like first year writing, where I post all of the assignments at the beginning of the semester (mainly because I’ve taught this course at least 100 times at this point)– and said to me “Hey Krause, I finished and handed in all the assignments! Does that mean I get an A and go home now?” Um, NO! THAT IS NOT HOW IT WORKS! And of course anyone familiar with how school works knows this.

Oh, and before anyone says “yeah, but what about in an online class?” Same thing! Most of the folks I know who teach online have a structure where students have to regularly participate and interact with assignments, discussions, and so forth. My attendance and participation policies for online courses are only slightly different from my f2f courses.

So please, CHE and MSM in general: stop. Just stop. ChatGPT can (sort of) pass a lot of tests and classes (with A LOT of prompting from the researchers who really really want ChatGPT to pass), but until that AI robot walks/rolls into  a class or sets up its profile on Canvas all on its own, it can’t go to college.

Computers and Writing 2023: Some Miscellaneous Thoughts

Last week, I attended and presented at the 2023 Computers and Writing Conference at the University of California-Davis. Here’s a link to my talk, “What Does ‘Teaching Online’ Even Mean Anymore?” Some thoughts as they occur to me/as I look at my notes:

  • The first academic conference I ever attended and presented at was Computers and Writing almost 30 years ago, in 1994. Old-timers may recall that this was the 10th C&W conference, it was held at the University of Missouri, and it was hosted by Eric Crump. I just did a search and came across this article/review written by the late Michael “Mick” Doherty about the event. All of which is to say I am old.
  • This was the first academic conference I attended in person since Covid; I think that was the case for a lot of attendees.
  • Also worth noting right off the top here: I have had a bad attitude about academic conferences for about 10 years now, and my attitude has only gotten worse. And look, I know, it’s not you, it’s me. My problem with these things is they are getting more and more expensive, most of the people I used to hang out with at conferences have mostly stopped going themselves for whatever reason, and for me, the overall “return on investment” now is pretty low. I mean, when I was a grad student and then a just starting out assistant professor, conferences were extremely important to me. They furthered my education in both subtle and obvious ways, they connected me to lots of other people in the field, and conferences gave me the chance to do scholarship that I could also list on my CV. I used to get a lot out of these events. Now? Well, after (almost) 3o years, things start to sound a little repetitive and the value of yet another conference presentation on my CV is almost zero, especially since I am at a point where I can envision retirement (albeit 10-15 years from now). Like I said, it’s not you, it’s me, but I also know there are plenty of people in my cohort who recognize and even perhaps share a similarly bad attitude.
  • So, why did I go? Well, a big part of it was because I hadn’t been to any conference in about four years– easily the longest stretch of not going in almost 30 years. Also, I had assumed I would be talking in more detail about the interviews I conducted about faculty teaching experiences during Covid, and also about the next phases of research I would be working on during a research release or a sabbatical in 2024. Well, that didn’t work out, as I wrote about here. which inevitably changed my talk to being a “big picture” summary of my findings and an explanation of why I was done.
  • This conference has never been that big, and this year, it was a more “intimate” affair. If a more normal or “robustly” attended C&W gets about 400-500 people to attend (and I honestly don’t know what the average attendance has been at this thing), then I’d guess there was about 200-250 folks there. I saw a lot of the “usual suspects” of course, and also met some new people too.
  • The organizers– Carl Whithaus, Kory Lawson Ching, and some other great people at UC-Davis– put a big emphasis on trying to make the hybrid delivery of panels work. So there were completely on-site panels, completely online (but on the schedule) panels held over Zoom, and hybrid panels which were a mix of participants on-site and online. There was also a small group of completely asynchronous panels as well. Now, this arrangement wasn’t perfect, both because of the inevitable technical glitches and also because there’s no getting around the fact that Zoom interactions are simply not equal to robust face to face interactions, especially for an event like a conference. This was a topic of discussion in the opening town hall meeting, actually.
  • That said, I think it all worked reasonably well. I went to two panels where there was one presenter participating via Zoom (John Gallgher in both presentations, actually) and that went off without (much of a) hitch, and I also attended at least part of a session where all the presenters were on Zoom– and a lot of the audience was on-site.
  • Oh, and speaking of the technology: They used a content management system specifically designed for conferences called Whova that worked pretty well. It’s really for business/professional kinds of conferences so there were some slight disconnects, and I was told by one of the organizers that they found out (after they had committed to using it!) that unlimited storage capacity would have been much more expensive. So they did what C&W folks do well: they improvised, and set up Google Drive folders for every session.
  • My presentation matched up well to my co-presenters, Rich Rice and Jenny Sheppard, in that we were all talking about different aspects of online teaching during Covid– and with no planning on our parts at all! Actually, all the presentations I saw– and I went to more than usual, both the keynotes, one and a half town halls, and four and a half panels– were really quite good.
  • Needless to say, there was a lot of AI and ChatGPT discussion at this thing, even though the overall theme was on hybrid practices. That’s okay– I am pretty sure that AI is just going to become a bigger issue in the larger field and academia as a whole in the next couple of years, and it might stay that way for the rest of my career. Most of what people talked about were essentially more detailed versions of stuff I already (sort of) knew about, and that was reassuring to me. There were a lot of folks who seemed mighty worried about AI, both in the sense of students using it to cheat and also the larger implications of it on society as a whole. Some of the big picture/ethical concerns may have been more amplified here because there were a lot of relatively local participants of course, and Silicon Valley and the Bay Area are more or less at “ground zero” for all things AI. I don’t disagree with the larger social and ethical implications of AI, but these are also things that seem completely out of all of our control in so many different ways.
  • For example, in the second town hall about AI (I arrived late to that one, unfortunately), someone in the audience had one of those impassioned “speech/questions” about how “we” needed to come up with a statement on the problems/dangers/ethical issues about AI. Well, I don’t think there’s a lot of consensus in the field about what we should do about AI at this point. But more importantly and as Wendi Sierra pointed out (she was on the panel, and she is also going to be hosting C&W at Texas Christian University in 2024), there is no “we” here. Computers and Writing is not an organization at all and our abilities to persuade are probably limited to our own institutions. Of course, I have always thought that this was one of the main problems with the Computers and Writing Conference and Community: there is no there there.
  • But hey, let me be clear– I thought this conference was great, one of the best versions of C&W I’ve been to, no question about. It’s a great campus with some interesting quirks, and everything seemed to go off right on schedule and without any glitches at all.
  • Of course, the conference itself was the main reason I went– but it wasn’t the only reason.  I mean, if this had been in, say, Little Rock or Baton Rouge or some other place I would prefer not to visit again or ever, I probably would have sat this out. But I went to C&W when it was at UC-Davis back in 2009 and I had a great time, so going back there seemed like it’d be fun. And it was– though it was a different kind of fun, I suppose. I enjoyed catching up with a lot of folks I’ve known for years at this thing and I also enjoyed meeting some new people too, but it also got to be a little too, um, “much.” I felt a little like an overstimulated toddler after a while. A lot of it is Covid of course, but a lot of it is also what has made me sour on conferences: I don’t have as many good friends at these things anymore– that is, the kind of people I want to hang around with a lot– and I’m also just older. So I embraced opting out of the social events, skipping the banquet or any kind of meet-up with a group at a bar or bowling or whatever, and I played it as a solo vacation. That meant walking around Davis (a lively college town with a lot of similarities to Ann Arbor), eating at the bar at a couple of nice restaurants, and going back to my lovely hotel room and watching things that I know Annette had no interest in watching with me (she did the same back home and at the conference she went to the week before mine). On Sunday, I spent the day as a tourist: I drove through Napa, over to Sonoma Coast Park, and then back down through San Francisco to the airport. It’s not something I would have done on my own without the conference, but like I said, I wouldn’t have gone to the conference if I couldn’t have done something like this on my own for a day.

What Counts as Cheating? And What Does AI Smell Like?

Cheating is at the heart of the fear too many academics have about ChatGPT, and I’ve seen a lot of hand-wringing articles from MSM posted on Facebook and Twitter. One of the more provocative screeds on this I’ve seen lately was in the Chronicle of Higher Education, “ChatGPT is a Plagiarism Machine” by Joseph M. Keegin. In the nutshell, I think this guy is unhinged, but he’s also not alone.

Keegin claims he and his fellow graduate student instructors (he’s a PhD candidate in Philosophy at Tulane) are encountering loads of student work that “smelled strongly of AI generation,” and he and some of his peers have resorted to giving in-class handwritten tests and oral exams to stop the AI cheating. “But even then,” Keegin writes, “much of the work produced in class had a vague, airy, Wikipedia-lite quality that raised suspicions that students were memorizing and regurgitating the inaccurate answers generated by ChatGPT.”

(I cannot help but to recall one of the great lines from [the now problematically icky] Woody Allen in Annie Hall: “I was thrown out of college for cheating on a metaphysics exam; I looked into the soul of the boy sitting next to me.” But I digress.)

If Keegin is exaggerating in order to rattle readers and get some attention, then mission accomplished. But if he’s being sincere– that is, if he really believes his students are cheating everywhere on everything all the time and the way they’re cheating is by memorizing and then rewriting ChatGPT responses to Keegin’s in-class writing prompts– then these are the sort of delusions which should be discussed with a well-trained and experienced therapist. I’m not even kidding about that.

Now, I’m not saying that cheating is nothing to worry about at all, and if a student were to turn in whatever ChatGPT provided for a class assignment with no alterations, then a) yes, I think that’s cheating, but b) that’s the kind of cheating that’s easy to catch, and c) Google is a much more useful cheating tool for this kind of thing. Keegin is clearly wrong about ChatGPT being a “Plagiarism Machine” and I’ve written many many many different times about why I am certain of this. But what I am interested in here is what Keegin thinks does and doesn’t count as cheating.

The main argument he’s trying to make in this article is that administrators need to step in to stop this never ending-battle against the ChatGPT plagiarism. Universities should “devise a set of standards for identifying and responding to AI plagiarism. Consider simplifying the procedure for reporting academic-integrity issues; research AI-detection services and software, find one that works best for your institution, and make sure all paper-grading faculty have access and know how to use it.”

Keegin doesn’t define what he means by cheating (though he does give some examples that don’t actually seem like cheating to me), but I think we can figure it out by reading what he means by a “meaningful education.” He writes (I’ve added the emphasis) “A meaningful education demands doing work for oneself and owning the product of one’s labor, good or bad. The passing off of someone else’s work as one’s own has always been one of the greatest threats to the educational enterprise. The transformation of institutions of higher education into institutions of higher credentialism means that for many students, the only thing dissuading them from plagiarism or exam-copying is the threat of punishment.”

So, I think Keegin sees education as an activity where students labor alone at mastering the material delivered by the instructor. Knowledge is not something shared or communal, and it certainly isn’t created through interactions with others. Rather, students receive knowledge, do the work they are asked to do by the instructor, they do that work alone, and then students reproduce that knowledge investment provided by the instructor– with interest. So any work a student might do that involves anyone or anything else– other students, a tutor, a friend, a google search, and yes ChatGPT– is an opportunity for cheating.

More or less, this what Paulo Freire meant by the ineffective and unjust  “banking model of education” which he wrote about over 50 years ago in Pedagogy of the Oppressed. Friere’s work remains very important in many fields specifically interested in pedagogy (including writing studies), and Pedagogy of the Oppressed is one of the most cited books in the social sciences. And yet, I think a lot of people in higher education– especially in STEM fields, business-oriented and other technical majors, and also in disciplines in the humanities that have not been particularly invested in pedagogy (philosophy, for example)– are okay with this system. These folks think education really is a lot like banking and “investing,” and they don’t see any problem with that metaphor. And if that’s your view of education, then getting help from anyone or anything that is not from the teacher is metaphorically like robbing a bank.

But I think it’s odd that Keegin is also upset with “credentialing” in higher education. That’s a common enough complaint, I suppose, especially when we talk about the problems with grading. But if we were to do away with degrees and grades as an indication of successful learning (or at least completion) and if we instead decided students should learn solely for the intrinsic value of learning, then why would it even matter if students cheated or not? That’d be completely their problem. (And btw, if universities did not offer credentials that have financial, social, and cultural value in the larger society, then universities would cease to exist– but that’s a different post).

Perhaps Keegin might say “I don’t have a problem with students seeking help from other people in the writing center or whatever. I have a problem with students seeking help from an AI.” I think that’s probably true with a lot of faculty. Even when professors have qualms about students getting a little too much help from a tutor, they still generally do see the value and usually encourage students to take advantage of support services, especially for students at the gen-ed levels.

But again, why is that different? If a student asks another human for help brainstorming a topic for an assignment, suggesting some ideas for research, creating an outline, suggesting some phrases to use, and/or helping out with proofreading, citation, and formatting, how is that not cheating when this help comes from a human but it is cheating when it comes from ChatGPT? And suppose a student instead turns to the internet and consults things like CliffsNotes, Wikipedia, Course Hero, other summaries and study guides, etc. etc.; is that cheating?

I could go on, but you get the idea. Again, I’m not saying that cheating in general and with ChatGPT in particular is nothing at all to worry about. And also to be fair to Keegin, he even admits “Some departments may choose to take a more optimistic approach to AI chatbots, insisting they can be helpful as a student research tool if used right.” But the more of these paranoid and shrill commentaries I read about “THE END” of writing assignments and how we have got to come up with harsh punishments for students so they stop using AI, the more I think these folks are just scared that they’re not going to be able to give students the same bullshitty non-teaching writing assignments that they’ve been doing for years.

My Talk About AI at Hope College (or why I still post things on a blog)

I gave a talk at Hope College last week about AI. Here’s a link to my slides, which also has all my notes and links. Right after I got invited to do this in January, I made it clear that I am far from an expert with AI. I’m just someone who had an AI writing assignment last fall (which was mostly based on previous teaching experiments by others), who has done a lot of reading and talking about it on Facebook/Twitter, and who blogged about it in December. So as I promised then, my angle was to stay in my lane and focus on how AI might impact the teaching of writing.

I think the talk went reasonably well. Over the last few months, I’ve watched parts of a couple of different ChatGPT/AI presentations via Zoom or as previously recorded, and my own take-away from them all has been a mix of “yep, I know that and I agree with you” and “oh, I didn’t know that, that’s cool.” That’s what this felt like to me: I talked about a lot of things that most of the folks attending knew about and agreed with, along with a few things that were new to them. And vice versa: I learned a lot too. It probably would have been a little more contentious had this taken place back when the freakout over ChatGPT was in full force. Maybe there still are some folks there who are freaked out by AI and cheating who didn’t show up. Instead, most of the people there had played around with the software and realized that it’s not quite the “cheating machine” being overhyped in the media. So it was a good conversation.

But that’s not really what I wanted to write about right now. Rather, I just wanted to point out that this is why I continue to post here, on a blog/this site, which I have maintained now for almost 20 years. Every once in a while, something I post “lands,” so to speak.

So for example: I posted about teaching a writing assignment involving AI at about the same time MSM is freaking out about ChatGPT. Some folks at Hope read that post (which has now been viewed over 3000 times), and they invited me to give this talk. Back in fall 2020, I blogged about how weird I thought it was that all of these people were going to teach online synchronously over Zoom. Someone involved with the Media & Learning Association, which is a European/Belgian organization, read it, invited me to write a short article based on that post and they also invited me to be on a Zoom panel that was a part of a conference they were having. And of course all of this was the beginning of the research and writing I’ve been doing about teaching online during Covid.

Back in April 2020, I wrote a post “No One Should Fail a Class Because of a Fucking Pandemic;” so far, it’s gotten over 10,000 views, it’s been quoted in a variety of places, and it was why I was interviewed by someone at CHE in the fall. (BTW, I think I’m going to write an update to that post, which will be about why it’s time to return to some pre-Covid requirements). I started blogging about MOOCs in 2012, which lead to a short article in College Composition and Communication and numerous more articles and presentations, a few invited speaking gigs (including TWO conferences sponsored by the University of Naples on the Isle of Capri), an edited collection and a book.

Now, most of the people I know in the field who once blogged have stopped (or mostly stopped) for one reason or another. I certainly do not post here nearly as often as I did before the arrival of Facebook and Twitter, and it makes sense for people to move on to other things. I’ve thought about giving it up, and there have been times where I didn’t post anything for months. Even the extremely prolific and smart local blogger Mark Maynard gave it all up, I suspect because of a combination of burn-out, Trump being voted out, and the additional work/responsibility of the excellent restaurant he co-owns/operates, Bellflower.

Plus if you do a search for “academic blogging is bad,” you’ll find all sorts of warnings about the dangers of it– all back in the day, of course. Deborah Brandt seemed to think it was mostly a bad idea (2014)The Guardian suggested it was too risky (2013), especially for  grad students posting work in progress. There were lots of warnings like this back then. None of them ever made any sense to me, though I didn’t start blogging until after I was on the tenure-track here. And no one at EMU has ever had anything negative to me about doing this, and that includes administrators even back in the old days of EMUTalk.

Anyway, I guess I’m just reflecting/musing now about why this very old-timey practice from the olde days of the Intertubes still matters, at least to me. About 95% of the posts I’ve written are barely read or noticed at all, and that’s fine. But every once in a while, I’ll post something, promote it a bit on social media, and it catches on. And then sometimes, a post becomes something else– an invited talk, a conference presentation, an article. So yeah, it’s still worth it.

Is AI Going to be “Something” or “Everything?”

Way back in January, I applied for release time from teaching for one semester next year– either a sabbatical or what’s called here a “faculty research fellowship” (FRF)– in order to continue the research I’ve been doing about teaching online during Covid. This is work I’ve been doing since fall 2020, including a Zoom talk at a conference in Europe, a survey I ran for about six months, and from that survey, I was able to recruit and interview a bunch of faculty about their experiences. I’ve gotten a lot out of this work already: a couple conference presentations (albeit in the kind of useless “online/on-demand” format), a website (which I had to code myself!) article, and, just last year, I was on one of those FRFs.

Well, a couple weeks ago, I found out that I will not be on sabbatical or FRF next year. My proposal, which was about seeking time to code and analyze all of the interview transcripts I collected last year, got turned down. I am not complaining about that: these awards are competitive, and I’ve been fortunate enough to receive several of these before, including one for this research. But not getting release time is making me rethink how much I want to continue this work, or if it is time for something else.

I think studying how Covid impacted faculty attitudes about online courses is definitely something important worth doing. But it is also looking backwards, and it feels a bit like an autopsy or one of those commissioned reports. And let’s be honest: how many of us want to think deeply about what happened during the pandemic, recalling the mistakes that everyone already knows they made? A couple years after the worst of it, I think we all have a better understanding now why people wanted to forget the 1918 pandemic.

It’s 20/20 hindsight, but I should have put together a sabbatical/research leave proposal about AI. With good reason, the committee that decides on these release time awards tends to favor proposals that are for things that are “cutting edge.” They also like to fund releases for faculty who have book contracts who are finishing things up, which is why I have been lucky enough to secure these awards both at the beginning and end of my MOOC research.

I’ve obviously been blogging about AI a lot lately, and I have casually started amassing quite a number of links to news stories and other resources related to Artificial Intelligence in general, ChatGPT and OpenAI in particular. As I type this entry in April 2023, I already have over 150 different links to things without even trying– I mean, this is all stuff that just shows up in my regular diet of social media and news. I even have a small invited speaking gig about writing and AI, which came about because of a blog post I wrote back in December— more on that in a future post, I’m sure.

But when it comes to me pursuing AI as my next “something” to research, I feel like I have two problems. First, it might already be too late for me to catch up. Sure, I’ve been getting some attention by blogging about it, and I had a “writing with GPT-3” assignment in a class I taught last fall, which I guess kind of puts me at least closer to being current with this stuff in terms of writing studies. But I also know there are already folks in the field (and I know some of these people quite well) who have been working on this for years longer than me.

Plus a ton of folks are clearly rushing into AI research at full speed. Just the other day, the CWCON at Davis organizers sent around a draft of the program for the conference in June. The Call For Proposals they released last summer describes the theme of this year’s event, “hybrid practices of engagement and equity.” I skimmed the program to get an idea of the overall schedule and some of what people were going to talk about, and there were a lot of mentions of ChatGPT and AI, which makes me think a lot of people are likely to be not talking about the CFP theme at all.

This brings me to the bigger problem I see with researching and writing about AI: it looks to me like this stuff is moving very quickly from being “something” to “everything.” Here’s what I mean:

A research agenda/focus needs to be “something” that has some boundaries. MOOCs were a good example of this. MOOCs were definitely “hot” from around 2012 to 2015 or so, and there was a moment back then when folks in comp/rhet thought we were all going to be dealing with MOOCs for first year writing. But even then, MOOCs were just a “something”  in the sense that you could be a perfectly successful writing studies scholar (even someone specializing in writing and technology) and completely ignore MOOCs.

Right now, AI is a myriad of “somethings,” but this is moving very quickly toward “everything.” It feel to me like very soon (five years, tops), anyone who wants to do scholarship in writing studies is going to have to engage with AI. Successful (and even mediocre) scholars in writing studies (especially someone specializing in writing and technology) are not going to be able to ignore AI.

This all reminds me a bit about what happened with word processing technology. Yes, this really was something people studied and debated way back when. In the 1980s and early 1990s, there were hundreds of articles and presentations about whether or not to use word processing to teach writing— for example, “The Word Processor as an Instructional Tool: A Meta-Analysis of Word Processing in Writing Instruction” by Robert L. Bangert-Drowns, or “The Effects of Word Processing on Students’ Writing Quality and Revision Strategies” by Ronald D. Owston, Sharon Murphy, Herbert H. Wideman. These articles were both published in the early 1990s and in major journals, and both are trying to answer the question which one is “better.” (By the way, most but far from all of these studies concluded that word processing is better in the sense it helped students generate more text and revise more frequently. It’s also worth mentioning that a lot of this research overlaps with studies about the role of spell-checking and grammar-checking with writing pedagogy).

Yet in my recollection of those times, this comparison between word processing and writing by hand was rendered irrelevant because everyone– teachers, students, professional writers (at least all but the most stubborn, as Wendell Berry declares in his now cringy and hopelessly dated short essay “Why I Am not Going to Buy a Computer”)– switched to word processing software on computers to write. When I started teaching as a grad student in 1988, I required students to hand in typed papers and I strongly encouraged them to write at least one of their essays with a word processing program. Some students complained because they were never asked to type anything in high school. By the time I started my PhD program five years later in 1993, students all knew they needed to type their essays on a computer and generally with MS Word.

Was this shift a result of some research consensus that using a computer to type texts was better than writing texts out by hand? Not really, and obviously, there are still lots of reasons why people still write some things by hand– a lot of personal writing (poems, diaries, stories, that kind of thing) and a lot of note-taking. No, everyone switched because everyone realized word processing made writing easier (but not necessarily better) in lots and lots of different ways and that was that. Even in the midst of this panicky moment about plagiarism and AI, I have yet to read anyone seriously suggest that we make our students give up Word or Google Docs and require them to turn in handwritten assignments. So, as a researchable “something,” word processing disappeared because (of course) everyone everywhere who writes obviously uses some version of word processing, which means the issue is settled.

One of the other reasons why I’m using word processing scholarship as my example here is because both Microsoft and Google have made it clear that they plan on integrating their versions of AI into their suites of software– and that would include MS Word and Google Docs. This could be rolling out just in time for the start of the fall 2023 semester, maybe earlier. Assuming this is the case, people who teach any kind of writing at any kind of level are not going to have time to debate if AI tools will be “good” or “bad,” and we’re not going to be able to study any sorts of best practices either. This stuff is just going to be a part of the everything, and for better or worse, that means the issue will soon be settled.

And honestly, I think the “everything” of AI is going to impact, well, everything. It feels to me a lot like when “the internet” (particularly with the arrival of web browsers like Mosaic in 1993) became everything. I think the shift to AI is going to be that big, and it’s going to have as big of an impact on every aspect of our professional and technical lives– certainly every aspect that involves computers.

Who the hell knows how this is all going to turn out, but when it comes to what this means for the teaching of writing, as I’ve said before, I’m optimistic. Just as the field adjusted to word processing (and spell-checkers and grammar-checkers, and really just the whole firehouse of text from the internet), I think we’ll be able to adjust to this new something to everything too.

As far as my scholarship goes though: for reasons, I won’t be able to eligible for another release from teaching until the 2025-26 school year. I’m sure I’ll keep blogging about AI and related issues and maybe that will turn into a scholarly project. Or maybe we’ll all be on to something entirely different in three years….

 

What Would an AI Grading App Look Like?

While a whole lot of people (academics and non-academics alike) have been losing their minds lately about the potential of students using ChatGPT to cheat on their writing assignments, I haven’t read/heard/seen much about the potential of teachers using AI software to read, grade, and comment on student writing. Maybe it’s out there in the firehose stream of stories about AI I see every day (I’m trying to keep up a list on pinboard) and I’ve just missed it.

I’ve searched and found some discussion of using ChatGPT to grade on Reddit (here and here), and I’ve seen other posts about how teachers might use the software to do things other than grading, but that’s about it. In fact, the reason I’m thinking about this again now is not because of another AI story but because I watched a South Park episode about AI called “Deep Learning.” South Park has been a pretty uneven show for several years, but if you are fan and/or if you’re interested in AI, this is a must-see. A lot happens in this episode, but my favorite reaction about ChatGPT comes from the kids’ infamous teacher, Mr. Garrison. While complaining about grading a stack of long and complicated essays (which the students completed with ChatGPT), Rick (Garrison’s boyfriend) tells him about ChatGPT, and Mr. Garrison has far too honest of a reaction: “This is gonna be amazing! I can use it to grade all my papers and no one will ever know! I’ll just type the title of the essay in, it’ll generate a comment, and I don’t even have to read the stupid thing!”

Of course, even Mr. Garrison knows that would be “wrong” and he must keep this a secret. That probably explains why I still haven’t come across much about an AI grading app. But really though: shouldn’t we be having this discussion? Doesn’t Mr. Garrison have a point?

Teacher concerns about grading/scoring writing with computers are not new, and one of the nice things about having kept a blog so long is I can search and “recall” some of these past discussions. Back in 2005, I had a post about NCTE coming out against the SAT writing test and machine scoring of those tests. There was also a link in that post to an article about a sociologist at the University of Missouri named Edward Brent who had developed a way of giving students feedback on their writing assignments. I couldn’t find the original article, but this one from the BBC in 2005 covers the same story. It seems like it was a tool developed very specifically for the content of Brent’s courses and I’m guessing it was quite crude by today’s standards. I do think Brent makes a good point on the value of these kinds of tools: “It makes our job more interesting because we don’t have to deal so much with the facts and concentrate more on thinking.”

About a decade ago, I also had a couple of other posts about machine grading, both of which were posts that grew out of discussions from the now mostly defunct WPA-L. There was this one from 2012, which included a link to a New York Times article about Educational Testing Service’s product “e-rater,” “Facing a Robo-Grader? Just Keep Obfuscating Mellifluously.” The article features Les Perelman, who was the director of writing at MIT, demonstrating ways to fool e-rater with nonsense and inaccuracies. At the time, I thought Perelman was correct, but also a good argument could be made that if a student was smart enough to fool e-rater, maybe they deserved the higher score.

Then in 2013, there was another kerfuffle on WPA-L about machine grading that involved a petition drive at the website humanreaders.org against machine grading. In my post back then, I agreed with the main goal of the petition,  that “Machine grading software can’t recognize things like a sense of humor or irony, it tends to favor text length over conciseness, it is fairly easy to circumvent with gibberish kinds of writing, it doesn’t work in real world settings, it fuels high stakes testing, etc., etc., etc.” But I also had some questions about all that. I made a comparison between these new tools and the initial resistance to spell checkers, and then I also wrote this:

As a teacher, my least favorite part of teaching is grading. I do not think that I am alone in that sentiment. So while I would not want to outsource my grading to someone else or to a machine (because again, I teach writing, I don’t just assign writing), I would not be against a machine that helps make grading easier. So what if a computer program provided feedback on a chunk of student writing automatically, and then I as the teacher followed behind those machine comments, deleting ones I thought were wrong or unnecessary, expanding on others I thought were useful? What if a machine printed out a report that a student writer and I could discuss in a conference? And from a WPA point of view, what if this machine helped me provide professional development support to GAs and part-timers in their commenting on students’ work?

By the way, an ironic/odd tangent about that post: the domain name humanreaders.org has clearly changed hands. In 2013, it looked like this (this link is from the Internet Archive): basically, a petition form. The current site domain humanreaders.org redirects to this page on some content farm website called we-heart.com. This page, from 2022, is a list of the “six top online college paper writing websites today.”

Anyway, let me state the obvious: I’m not suggesting an AI application for replacing all teacher feedback (as Mr. Garrison is suggesting) at all. Besides the fact that it wouldn’t be “right” no matter how you twist the ethics of it, I don’t think it would work well– yet. Grading/commenting on student writing is my least favorite part of the job, so I understand where Mr. Garrison is coming from. Unfortunately though, reading/ grading/ commenting on student writing is essential to teaching writing. I don’t know how I can evaluate a student’s writing without reading it, and I also don’t know how to help students think about how to revise their writing (and, hopefully, learn how to apply these lessons and advice to writing these students do beyond my class) without making comments.

However, this is A LOT of work that takes A LOT of time. I’ve certainly learned some things that make grading a bit easier than it was when I started. For example, I’ve learned that less is more: marking up every little mistake or thing in the paper and then writing a really long end comment is a waste of time because it confuses and frustrates students and it literally takes longer. But it still takes me about 15-20 minutes to read and comment on each long-ish student essay, which are typically a bit shorter than this blog post. So in a full (25 students) writing class, it takes me 8-10 hours to completely read, comment on, and grade all of their essays; multiply that by two or three or more (since I’m teaching three writing classes a term), and it adds up pretty quickly. Plus we’re talking about student writing here. I don’t mind reading it and students often have interesting and inspiring observations, but by definition, these are writers who are still learning and who often have a lot to learn. So this isn’t like reading The New Yorker or a long novel or something you can get “lost” in as a reader. This ain’t reading for fun– and it’s also one of the reasons why, after reading a bunch of student papers in a day, I’m much more likely to just watch TV at night.

So hypothetically, if there was a tool out there that could help me make this process faster, easier, and less unpleasant, and if this tool also helped students learn more about writing, why wouldn’t I want to use it?

I’ve experimented a bit with ChatGPT with prompts along the lines of “offer advice on how to revise and improve the following text” and then paste in a student essay. The results are mix of (IMO) good, bad, and wrong, and mostly written in the robotic voice typical of AI writing. I think students would have a hard time sorting through these mixed messages. Plus I don’t think there’s a way (yet) for ChatGPT to comment on specific passages in a piece of student writing: that is, it can provide an overall end comment, but it cannot comment on individual sentences and paragraphs and have those comments appear in the margins like the comment feature in Word or Google Docs. Like most writing teachers, that’s a lot of the commenting I do, so an AI that can’t do that (yet) at all just isn’t that useful to me.

But the key phrase there is “yet,” and it does not take a tremendous amount of imagination to figure out how this could work in the near future. For example, what if I could train my own grading AI by feeding it a few classes worth of previous student essays with my comments? I don’t logistically know how that would work, but I am willing to bet that with enough training, a Krause-centric version of ChatGPT would anticipate most of the comments I would make myself on a student writing project. I’m sure it would be far from perfect, and I’d still want to do my own reading and evaluation. But I bet this would save me a lot of time.

Maybe, some time in the future, this will be a real app. But there’s another use of ChatGPT I’ve been playing around with lately, one I hesitate on trying but one that would both help some of my struggling students and save me time on grading. I mentioned this in my first post about using ChatGPT to teach way back in December. What I’ve found in my ChatGPT noodling (so far) is if I take a piece of writing that has a ton of errors in it (incomplete sentences, punctuation in the wrong place, run-on/meandering sentences, stuff like that– all very common issues, especially for first year writing students) and prompt ChatGPT to revise the text so it is grammatically correct, it does a wonderful job.It doesn’t change the meaning or argument of the writing– just the grammar. It generally doesn’t make different word choices and it certainly doesn’t make the student’s argument “smarter”; it just arranges everything so it’s correct.

That might not seem like much, but for a lot of students who struggle with getting these basics right, using ChatGPT like this could really help. And to paraphrase Edward Brent from way back in 2005, if students could use a tool like this to at least deal with basic issues like writing more or less grammatically correct sentences, then I might be able to spend more time concentrating more on the student’s analysis, argument, use of evidence, and so forth.

And yet– I don’t know, it even feels to me like a step too far.

I have students who have diagnosed learning difficulties of one sort or another who show me letters of accommodation from the campus disability resource center which specifically tell me I should allow students to use Grammarly in their writing process. I encourage students to go to the writing center all the time, in part because I want my students– especially the struggling ones– to sit down with a consultant who will help them go through their essays so they can revise and improve it. I never have a problem with students wanting to get feedback on their work from a parent or a friend who is “really good” at writing.

So why does it feel like encouraging students to try this in ChatGPT is more like cheating than it does for me to encourage students to be sure to spell check and to check out the grammar suggestions made by Google Docs? Is it too far? Maybe I’ll find out in class next week.