I’m Still Dreaming of an AI Grading Agent (and a bunch of AI things about teaching and writing)

I’m in the thick of the fall semester and I’ve been too busy to think/read/write much about AI for a while. Honestly, I’m too busy to be writing this right now, but I’ve also got a bucket full of AI tabs open on my browser, so I thought I’d do a bit of a procrastination and “round up” post.

In my own classes, students seem to be either leery of or unimpressed with AI. I’ve encouraged my more advanced students to experiment with/play around with AI to help with the assignments, but absent me requiring them to do something with AI, they don’t seem too interested. I’ve talked to my first year writing students about using AI to brainstorm and to revise (and to be careful about trusting what the AI presents as “facts”), but again, they don’t seem interested. I have had at least one (and perhaps more than that) student who tried to use AI to cheat, but it was easy to spot. As I have said before, I think most students want to do the work themselves and to actually learn something, and the students who are inclined to cheat with AI (or just a Google search) are far from criminal geniuses.

That said, there is this report, “GenAI in Higher Education: Fall 2023 Update Time for Class Study,” which was research done by a consulting firm called Tyton Partners and sponsored by Turnitin. I haven’t had a chance to read beyond the executive summary, but they claim half of students are “regular users” of generative AI, though their use is “relatively unsophisticated.” Well, unless a lot of my students are not telling me the truth about using AIs, this isn’t my impression. Of course, they might be using AI stuff more for other classes.

Here’s a very local story about how AI is being used in at least one K-12 school district: “‘AI is here.’ Ypsilanti schools weigh integrity, ethics of new technology,” from MLive. Interestingly, a lot of what this story is about is how teachers are using AI to develop assignments, and also to do some things like helping students who don’t necessarily speak English as their native language:

Serving the roughly 30% of [Ypsilanti Community Schools] students who can speak a language other than English, the English Learner Department has found multiple ways to bring AI into the classroom, including helping teachers develop multilingual explanations of core concepts discussed in the curriculum — and save time doing it.

“A lot of that time saving allows us to focus more on giving that important feedback that allows students to grow an be aware of their progress and their learning,” [Teacher Connor] Laporte said.

Laporte uses an example of a Spanish-speaking intern who improved a vocabulary test by double-checking the translations and using ChatGPT to add more vocabulary words and exercises. Another intern then used ChatGPT to make a French version of the same worksheet.

A lot of the theme of this article is about how teachers have moved beyond being terrified of AI ruining everything to becoming a tool to work with in teaching. That’s happening in lots of places and lots of ways; for example, as Inside Higher Ed noted, “Art Schools Get Creative Tackling AI.” It’s a long piece with a somewhat similar theme: not necessarily embracing AI, but also recognizing the need to work with it.

MLA apparently now has “rules” for how to cite AI. I guess maybe it isn’t the end of the essay then, huh? Of course, that doesn’t mean that a lot of writers are going to be happy about AI.  This one is from a while ago, but in The Atlantic back in September, Alex Reisner wrote about “These 183,000 Books are Fueling the Biggest Fight in Publishing and Tech.” Reisner had written earlier about how Meta’s AI systems were being trained on a collection of more than 191,000 books that were often used without permission. The article has a search feature so you can see if your book(s) were a part of that collection. For what it’s worth, my book and co-edited collection about MOOCs did not make the cut.

Several famous people/famous writers are now involved in various lawsuits where the writers are suing the AI companies for using their work without permission to train (“teach?”) the AIs. There’s a part of me that is more than sympathetic to these lawsuits. After all, I never thought it was fair that companies like Turnitin can use student writing without permission as part of its database for detecting plagiarism. Arguably, this is similar.

But on the other hand, OpenAI et al didn’t “copy” passages from Sarah Silverman or Margaret Atwood or my friend Dennis Danvers (he’s in that database!) and then try to present that work as something the AI wrote. Rather, they trained (taught?) the AI by having the program “read” these books. Isn’t that just how learning works? I mean, everything I’ve ever written has been been influenced in direct and indirect ways by other texts I’ve read (or watched, listened to, seen, etc). Other than scale (because I sure as heck have not read 183,000 books), what’s the difference between me “training” by reading the work of others and the AI doing this?

Of course, even with all of this training and the continual tweaking of the software, AIs still have the problem of making shit up. Cade Metz wrote in The New York Times “Chatbots May ‘Hallucinate’ More Often Than Many Realize.” Among other things, the article is about a new start-up called Vectara that is trying to estimate just how often AIs “hallucinate,” and (to leap ahead a bit) they estimated that different AIs hallucinate at different rates ranging from 3% to 27% of the time. But it’s a little more complicated than that.

Because these chatbots can respond to almost any request in an unlimited number of ways, there is no way of definitively determining how often they hallucinate. “You would have to look at all of the world’s information,” said Simon Hughes, the Vectara researcher who led the project.

Dr. Hughes and his team asked these systems to perform a single, straightforward task that is readily verified: Summarize news articles. Even then, the chatbots persistently invented information.

“We gave the system 10 to 20 facts and asked for a summary of those facts,” said Amr Awadallah, the chief executive of Vectara and a former Google executive. “That the system can still introduce errors is a fundamental problem.”

If I’m understanding this correctly, this means that even when you give the AI a fairly small data-set to analyze (10-20 “facts”), the AI still makes shit up with things not a part of that data-set. That’s a problem.

But it still might not stop me from trying to develop some kind of ChatGPT/AI-based grading tool, and that might be about to get a lot easier. (BTW, talk about burying the lede after that headline!)  OpenAI announced something they’re calling (very confusingly) “GPTs,” which (according to this article by Devin Coldewey in TechCrunch) is “a way for anyone to build their own version of the popular conversational AI system. Not only can you make your own GPT for fun or productivity, but you’ll soon be able to publish it on a marketplace they call the GPT Store — and maybe even make a little cash in the process.”

Needless to say, my first thought was could I use this to make an AI Grading tool? And do I have the technical skills?

As far as I can tell from OpenAI’s announcement about this,  GPTs require upgrading to their $20 a month package and it’s just getting started– the GPT store is rolling out later this month, for example.  Kevin Roose of The New York Times has a thoughtful and detailed article about the dangers and potentials of these things, “Personalized A.I. Agents Are Here. Is the World Ready for Them?” User-created agents will very soon be able to automate responses to questions (that OpenAI announcement has examples like a “Creative Writing Coach,” a “Tech Advisor” for trouble-shooting things, and a “Game Time” advisor that can explain the rules of card and board games. Roose writes a fair amount about how this technology could also be used by customer service or human resource offices, and to handle things like responding to emails or updating schedules. Plus none of this requires any actual programming skills, so I am imagining something like “If This Then That” but much more powerful.

AI agents might also be made to do evil things, which has a lot of security people worried for obvious reasons. Though I don’t think these agents are going to be to powerful enough to do anything too terrible; actually, I don’t think these agents will have the capabilities to make the AI grading app I want, at least not yet. Roose got early access to the OpenAI project, and his article has a couple of examples of how he played around with it:

The first custom bot I made was “Day Care Helper,” a tool for responding to questions about my son’s day care. As the sleep-deprived parent of a toddler, I’m always forgetting details — whether we can send a snack with peanuts or not, whether day care is open or closed for certain holidays — and looking everything up in the parent handbook is a pain.

So I uploaded the parent handbook to OpenAI’s GPT creator tool, and in a matter of seconds, I had a chatbot that I could use to easily look up the answers to my questions. It worked impressively well, especially after I changed its instructions to clarify that it was supposed to respond using only information from the handbook, and not make up an answer to questions the handbook didn’t address.

That sounds pretty cool, and I bet I could create an AI agent capable of writing an summative end-comment on a student essay based on a detailed grading rubric I feed into the machine. But that’s a long way from doing the kind of marginal commenting on student essays that responds to particular sentences, phrases, and paragraphs. I want an AI agent/grading tool that can “read” a piece of student writing that is more like how I would read and comment on a piece of student writing, and that  limited to a rubric.

But this is getting a lot closer to being potentially useful– not a substitute for me actually reading and evaluating student writing, but as a tool to make it easier to do. Right now, the free version of ChatGPT does a good job of revising away grammar and style mistakes and errors, so maybe instead of me making marginal comments on a draft about these issues, students can first try using the AI to help them do this kind of low-level revision before they turn it in. That, combined with a detailed end comment from the AI might, actually work well. I’m not quite sure if this would actually save me any time since it seems like setting up the AI to do this would take a lot of time, and I have a feeling I’d have to set up the AI agent for every unique assignment. Plus, and in addition to the time it would take to set up, this would cost me $20 a month.

Maybe for next semester….

So, What About AI Now? (A talk and an update)

A couple of weeks ago, I gave a talk/lead a discussion called “So, What About AI Now?” That’s a link to my slides. The talk/discussion was for a faculty development program at Washtenaw Community College, a program organized by my friend, colleague, and former student, Hava Levitt-Phillips.

I covered some of the territory I’ve been writing about here for a while now and I thought both the talk and discussion went well. I think most of the people at this thing (it was over Zoom, so it was a little hard to read the room) had seen enough stories like this one on 60 Minutes the other night: Artificial Intelligence is going to at least be as transformative of a technology as “the internet,” and there is not a zero percent chance that it could end civilization as we know it. All of which is to say we probably need to put the dangers of a few college kids using AI (badly) to cheat on poorly designed assignments into perspective.

I also talked about how we really need to question some of the more dubious claims in the MSM about the powers of AI, such as the article in the Chronicle of Higher Education this past summer, “GPT-4 Can Already Pass Freshman Year at Harvard.”  I blogged about that nonsense a couple months ago here, but the gist of what I wrote there is that all of these claims of AI being able to pass all these tests and freshman year at Harvard (etc.) are wrong. Besides the fact that the way a lot of these tests are run make the claims bogus (and that is definitely the case with this CHE piece), students in our classes still need to show up– and I mean that for both f2f and online courses.

And as we talked about at this session, if a teacher gives students some kind of assignment (an essay, an exam, whatever) that can be successfully completed without ever attending class, then that’s a bad assignment.

So the sense that I got from this group– folks teaching right now the kinds of classes where (according to a lot of the nonsense that’s been in MSM for months) the cheating with ChatGPT et al was going to just make it impossible to assign writing anymore, not in college and not in high school— is it hasn’t been that big of a deal. Sure, a few folks talked about students who tried to cheat with AI who were easily caught, but for the most part it hadn’t been much of a problem. The faculty in this group seemed more interested in trying to figure out a way to make use of AI in their teaching than they were in cheating.

I’m not trying to suggest there’s no reason to worry about what AI means for the future of… well, everything, including education. Any of us who are “knowledge workers”– that is, teachers, professors, lawyers, scientists, doctors, accountants, etc. etc.– needs to pay attention to AI because there’s no question this shit is going to change the way we do our jobs. But my sense from this group (and just the general vibe I get on campus and in social media) is that the freak-out about AI is over, which is good.

One last thing though:  just the other day (long after this talk), I saw what I believe to be my first case of a student trying to cheat with ChatGPT– sort of. I don’t want to go into too many details since this is a student in one of my classes right now. But basically, this student (who is struggling quite a bit) turned in a piece of writing that was first and foremost not the assignment I gave, and it also just happened this person used ChatGPT to generate a lot of the text. So as we met to talk about what the actual assignment was and how this student needed to do it again, etc., I also started asking about what they turned in.

“Did you actually write this?” I asked. “This kind of seems like ChatGPT or something.”

“Well, I did use it for some of it, yes.”

“But you didn’t actually read this book ChatGPT is citing here, did you?”

“Well, no…”

And so forth.  Once again, a good reminder that students who resort to cheating with things like AI are far from criminal masterminds.

No, an AI could not pass “freshman year” in college

I am fond of the phrase/quote/mantra/cliché “Ninety percent of success in life is just showing up,” which is usually attributed to Woody Allen. I don’t know if Woody was “the first” person to make this observation (probably not, and I’d prefer if it was someone else), but in my experience, this is very true.

This is why AIs can’t actually pass a college course or their freshmen year or law school or whatever: they can’t show up. And it’s going to stay that way, at least until we’re dealing with advanced AI robots.

This is on my mind because my friend and colleague in the field, Seth Kahn, posted the other day on Facebook about this recent article from The Chronicle of Higher Education by Maya Bodnick, “GPT-4 Can Already Pass Freshman Year at Harvard.” (Bodnick is an undergraduate student at Harvard). It is yet another piece claiming that the AI is smart enough to do just fine on its own at one of the most prestigious universities in the world.

I agreed with all the other comments I saw on Seth’s post. In my comment (which I wrote before I actually read this CHE article), I repeated three points I’ve written about here or on social media before. First, ChatGPT and similar AIs can’t evaluate and cite academic research at even the modest levels I expect in a first year writing class. Second, while OpenAI proudly lists all the “simulated exams” where ChatGPT has excelled (LSAT, SAT, GRE, AP Art History, etc.), you have to click the “show more exams” button on that page to see that none of the versions of their AI has managed better than a “2” on the AP English Language (and also Literature) and Composition exams. It takes a “3” on this exam to get any credit at EMU, and probably a “4” at a lot of other universities.

Third, I think mainstream media and all the rest of us really need to question these claims of AIs passing whatever tests and classes and whatnot much MUCH more carefully than I think most of us have to date.  What I was thinking about when I made that last comment was another article published in CHE and in early July, “A Study Found That AI Could Ace MIT. Three MIT Students Beg to Differ,” by Tom Bartlett. In this article, Bartlett discusses  a study (which I don’t completely understand because it’s too much math and details) conducted by 3 MIT students (class of 2024) who researched the claim that an AI could “ace” MIT classes. The students determined this was bullshit. What were the students’ findings (at least the ones I could understand)? In some of the classes where the AI supposedly had a perfect score, the exams include unsolvable problems, so it’s not even possible to get a perfect score. In other examples, the exam questions the AI supposedly answered correctly did not provide enough information for that to be possible either. The students posted their results online and at least some of the MIT professors who originally made the claims agreed and backtracked.

But then I read this Bodnick article, and holy-moly, this is even more bullshitty than I originally thought. Let me quote at length Bodnick describing her “methodology”:

Three weeks ago, I asked seven Harvard professors and teaching assistants to grade essays written by GPT-4 in response to a prompt assigned in their class. Most of these essays were major assignments which counted for about one-quarter to one-third of students’ grades in the class. (I’ve listed the professors or preceptors for all of these classes, but some of the essays were graded by TAs.)

Here are the prompts with links to the essays, the names of instructors, and the grades each essay received:

  • Microeconomics and Macroeconomics (Jason Furman and David Laibson): Explain an economic concept creatively. (300-500 words for Micro and 800-1000 for Macro). Grade: A-
  • Latin American Politics (Steven Levitsky): What has caused the many presidential crises in Latin America in recent decades? (5-7 pages) Grade: B-
  • The American Presidency (Roger Porter): Pick a modern president and identify his three greatest successes and three greatest failures. (6-8 pages) Grade: A
  • Conflict Resolution (Daniel Shapiro): Describe a conflict in your life and give recommendations for how to negotiate it. (7-9 pages). Grade: A
  • Intermediate Spanish (Adriana Gutiérrez): Write a letter to activist Rigoberta Menchú. (550-600 words) Grade: B
  • Freshman Seminar on Proust (Virginie Greene): Close read a passage from In Search of Lost Time. (3-4 pages) Grade: Pass

I told these instructors that each essay might have been written by me or the AI in order to minimize response bias, although in fact they were all written by GPT-4, the recently updated version of the chatbot from OpenAI.

In order to generate these essays, I inputted the prompts (which were much more detailed than the summaries above) word for word into GPT-4. I submitted exactly the text GPT-4 produced, except that I asked the AI to expand on a couple of its ideas and sequenced its responses in order to meet the word count (GPT-4 only writes about 750 words at a time). Finally, I told the professors and TAs to grade these essays normally, except to ignore citations, which I didn’t include.

Not only can GPT-4 pass a typical social science and humanities-focused freshman year at Harvard, but it can get pretty good grades. As shown in the list above, GPT-4 got all A’s and B’s and one Pass.

JFC. Okay, let’s just think about this for a second:

  • We’re talking about three “essays” that are less than 1000 words and another three that are slightly longer, and based on this work alone, GPT-4 “passed” a year of college at Harvard. That’s all it takes. Really; really?! That’s it?
  • I would like to know more about what Bodnick means when she says that the writing prompts were “much more detailed than the summaries above” because those details matter a lot. But as summarized, these are terrible assignments. They aren’t connected with the context of the class or anything else.  It would be easy to try to answer any of these questions with a minimal amount of Google searching and making educated guesses. I might be going out on a limb here, but I don’t think most writing assignments at Harvard or any other college– even badly assigned ones– are as simplistic as these.
  • It wasn’t just ChatGPT: she had to do some significant editing to put together ChatGPT’s short responses into longer essays. I don’t think the AI could have done that on its own. Unless it hired a tutor.
  • Asking instructors to not pay any attention to the lack of citation (and I am going to guess the need for sources to back up claims in the writing) is giving the AI way WAAAAYYY too much credit, especially since ChatGPT (and other AIs) usually make shit up hallucinate when citing evidence. I’m going to guess that even at Harvard, handing in hallucinations would result in a failing grade. And if the assignment required properly cited sources and the student didn’t do that, then that student would also probably fail.
  • It’s interesting (and Bodnick points this out too) that the texts that received the lowest grades are ones that ask students to “analyze” or to provide their opinions/thoughts, as opposed to assignments that were asking for an “information dump.” Again, I’m going to guess that, even at Harvard, there is a higher value placed on students demonstrating with their writing that they thought about something.

I could go on, but you get the idea. This article is nonsense. It proves literally nothing.

But I also want to return to where I started, the idea that a lot of what it means to succeed in anything (perhaps especially education) is showing up and doing the work. Because after what seems like the zillionth click-bait headline about how ChatGPT could graduate from college or be a lawyer or whatever because it passed a test (supposedly), it finally dawned on me what has been bothering me the most about these kinds of articles: that’s just not how it works! To be a college graduate or a lawyer or damn near anything else takes more than passing a test; it takes the work of showing up.

Granted, there has been a lot more interest and willingness in the last few decades to consider “life experience” credit as part of degrees, and some of these places are kind of legitimate institutions– Southern New Hampshire and the University of Phoenix immediately come to mind. But “life experience” credit is still considered mostly bullshit and the approach taken by a whole lot of diploma mills, and real online universities (like SNHU and Phoenix) still require students to mostly take actual courses, and that requires doing more than writing a couple papers and/or taking a couple of tests.

And sure, it is possible to become a lawyer in California, Vermont, Virginia and Washington without a law degree, and it is also possible to become a lawyer in New York or Maine with just a couple years of law school or an internship. But even these states still require some kind of experience with a law office, most states do require attorneys to have law degrees, and it’s not exactly easy to pass the bar without the experience you get from earning a law degree. Ask Kim Kardashian. 

Bodnick did not ask any of the faculty who evaluated her AI writing examples if it would be possible for a student to pass that professor’s class based solely on this writing sample because she already knew the answer: of course not.

Part of the grade in the courses I teach is based on attendance, participation in the class discussions and peer review, short responses to readings, and so forth. I think this is pretty standard– at least in the humanities. So if some eager ChatGPT enthusiast came to one of my classes– especially one like first year writing, where I post all of the assignments at the beginning of the semester (mainly because I’ve taught this course at least 100 times at this point)– and said to me “Hey Krause, I finished and handed in all the assignments! Does that mean I get an A and go home now?” Um, NO! THAT IS NOT HOW IT WORKS! And of course anyone familiar with how school works knows this.

Oh, and before anyone says “yeah, but what about in an online class?” Same thing! Most of the folks I know who teach online have a structure where students have to regularly participate and interact with assignments, discussions, and so forth. My attendance and participation policies for online courses are only slightly different from my f2f courses.

So please, CHE and MSM in general: stop. Just stop. ChatGPT can (sort of) pass a lot of tests and classes (with A LOT of prompting from the researchers who really really want ChatGPT to pass), but until that AI robot walks/rolls into  a class or sets up its profile on Canvas all on its own, it can’t go to college.

Computers and Writing 2023: Some Miscellaneous Thoughts

Last week, I attended and presented at the 2023 Computers and Writing Conference at the University of California-Davis. Here’s a link to my talk, “What Does ‘Teaching Online’ Even Mean Anymore?” Some thoughts as they occur to me/as I look at my notes:

  • The first academic conference I ever attended and presented at was Computers and Writing almost 30 years ago, in 1994. Old-timers may recall that this was the 10th C&W conference, it was held at the University of Missouri, and it was hosted by Eric Crump. I just did a search and came across this article/review written by the late Michael “Mick” Doherty about the event. All of which is to say I am old.
  • This was the first academic conference I attended in person since Covid; I think that was the case for a lot of attendees.
  • Also worth noting right off the top here: I have had a bad attitude about academic conferences for about 10 years now, and my attitude has only gotten worse. And look, I know, it’s not you, it’s me. My problem with these things is they are getting more and more expensive, most of the people I used to hang out with at conferences have mostly stopped going themselves for whatever reason, and for me, the overall “return on investment” now is pretty low. I mean, when I was a grad student and then a just starting out assistant professor, conferences were extremely important to me. They furthered my education in both subtle and obvious ways, they connected me to lots of other people in the field, and conferences gave me the chance to do scholarship that I could also list on my CV. I used to get a lot out of these events. Now? Well, after (almost) 3o years, things start to sound a little repetitive and the value of yet another conference presentation on my CV is almost zero, especially since I am at a point where I can envision retirement (albeit 10-15 years from now). Like I said, it’s not you, it’s me, but I also know there are plenty of people in my cohort who recognize and even perhaps share a similarly bad attitude.
  • So, why did I go? Well, a big part of it was because I hadn’t been to any conference in about four years– easily the longest stretch of not going in almost 30 years. Also, I had assumed I would be talking in more detail about the interviews I conducted about faculty teaching experiences during Covid, and also about the next phases of research I would be working on during a research release or a sabbatical in 2024. Well, that didn’t work out, as I wrote about here. which inevitably changed my talk to being a “big picture” summary of my findings and an explanation of why I was done.
  • This conference has never been that big, and this year, it was a more “intimate” affair. If a more normal or “robustly” attended C&W gets about 400-500 people to attend (and I honestly don’t know what the average attendance has been at this thing), then I’d guess there was about 200-250 folks there. I saw a lot of the “usual suspects” of course, and also met some new people too.
  • The organizers– Carl Whithaus, Kory Lawson Ching, and some other great people at UC-Davis– put a big emphasis on trying to make the hybrid delivery of panels work. So there were completely on-site panels, completely online (but on the schedule) panels held over Zoom, and hybrid panels which were a mix of participants on-site and online. There was also a small group of completely asynchronous panels as well. Now, this arrangement wasn’t perfect, both because of the inevitable technical glitches and also because there’s no getting around the fact that Zoom interactions are simply not equal to robust face to face interactions, especially for an event like a conference. This was a topic of discussion in the opening town hall meeting, actually.
  • That said, I think it all worked reasonably well. I went to two panels where there was one presenter participating via Zoom (John Gallgher in both presentations, actually) and that went off without (much of a) hitch, and I also attended at least part of a session where all the presenters were on Zoom– and a lot of the audience was on-site.
  • Oh, and speaking of the technology: They used a content management system specifically designed for conferences called Whova that worked pretty well. It’s really for business/professional kinds of conferences so there were some slight disconnects, and I was told by one of the organizers that they found out (after they had committed to using it!) that unlimited storage capacity would have been much more expensive. So they did what C&W folks do well: they improvised, and set up Google Drive folders for every session.
  • My presentation matched up well to my co-presenters, Rich Rice and Jenny Sheppard, in that we were all talking about different aspects of online teaching during Covid– and with no planning on our parts at all! Actually, all the presentations I saw– and I went to more than usual, both the keynotes, one and a half town halls, and four and a half panels– were really quite good.
  • Needless to say, there was a lot of AI and ChatGPT discussion at this thing, even though the overall theme was on hybrid practices. That’s okay– I am pretty sure that AI is just going to become a bigger issue in the larger field and academia as a whole in the next couple of years, and it might stay that way for the rest of my career. Most of what people talked about were essentially more detailed versions of stuff I already (sort of) knew about, and that was reassuring to me. There were a lot of folks who seemed mighty worried about AI, both in the sense of students using it to cheat and also the larger implications of it on society as a whole. Some of the big picture/ethical concerns may have been more amplified here because there were a lot of relatively local participants of course, and Silicon Valley and the Bay Area are more or less at “ground zero” for all things AI. I don’t disagree with the larger social and ethical implications of AI, but these are also things that seem completely out of all of our control in so many different ways.
  • For example, in the second town hall about AI (I arrived late to that one, unfortunately), someone in the audience had one of those impassioned “speech/questions” about how “we” needed to come up with a statement on the problems/dangers/ethical issues about AI. Well, I don’t think there’s a lot of consensus in the field about what we should do about AI at this point. But more importantly and as Wendi Sierra pointed out (she was on the panel, and she is also going to be hosting C&W at Texas Christian University in 2024), there is no “we” here. Computers and Writing is not an organization at all and our abilities to persuade are probably limited to our own institutions. Of course, I have always thought that this was one of the main problems with the Computers and Writing Conference and Community: there is no there there.
  • But hey, let me be clear– I thought this conference was great, one of the best versions of C&W I’ve been to, no question about. It’s a great campus with some interesting quirks, and everything seemed to go off right on schedule and without any glitches at all.
  • Of course, the conference itself was the main reason I went– but it wasn’t the only reason.  I mean, if this had been in, say, Little Rock or Baton Rouge or some other place I would prefer not to visit again or ever, I probably would have sat this out. But I went to C&W when it was at UC-Davis back in 2009 and I had a great time, so going back there seemed like it’d be fun. And it was– though it was a different kind of fun, I suppose. I enjoyed catching up with a lot of folks I’ve known for years at this thing and I also enjoyed meeting some new people too, but it also got to be a little too, um, “much.” I felt a little like an overstimulated toddler after a while. A lot of it is Covid of course, but a lot of it is also what has made me sour on conferences: I don’t have as many good friends at these things anymore– that is, the kind of people I want to hang around with a lot– and I’m also just older. So I embraced opting out of the social events, skipping the banquet or any kind of meet-up with a group at a bar or bowling or whatever, and I played it as a solo vacation. That meant walking around Davis (a lively college town with a lot of similarities to Ann Arbor), eating at the bar at a couple of nice restaurants, and going back to my lovely hotel room and watching things that I know Annette had no interest in watching with me (she did the same back home and at the conference she went to the week before mine). On Sunday, I spent the day as a tourist: I drove through Napa, over to Sonoma Coast Park, and then back down through San Francisco to the airport. It’s not something I would have done on my own without the conference, but like I said, I wouldn’t have gone to the conference if I couldn’t have done something like this on my own for a day.

What Counts as Cheating? And What Does AI Smell Like?

Cheating is at the heart of the fear too many academics have about ChatGPT, and I’ve seen a lot of hand-wringing articles from MSM posted on Facebook and Twitter. One of the more provocative screeds on this I’ve seen lately was in the Chronicle of Higher Education, “ChatGPT is a Plagiarism Machine” by Joseph M. Keegin. In the nutshell, I think this guy is unhinged, but he’s also not alone.

Keegin claims he and his fellow graduate student instructors (he’s a PhD candidate in Philosophy at Tulane) are encountering loads of student work that “smelled strongly of AI generation,” and he and some of his peers have resorted to giving in-class handwritten tests and oral exams to stop the AI cheating. “But even then,” Keegin writes, “much of the work produced in class had a vague, airy, Wikipedia-lite quality that raised suspicions that students were memorizing and regurgitating the inaccurate answers generated by ChatGPT.”

(I cannot help but to recall one of the great lines from [the now problematically icky] Woody Allen in Annie Hall: “I was thrown out of college for cheating on a metaphysics exam; I looked into the soul of the boy sitting next to me.” But I digress.)

If Keegin is exaggerating in order to rattle readers and get some attention, then mission accomplished. But if he’s being sincere– that is, if he really believes his students are cheating everywhere on everything all the time and the way they’re cheating is by memorizing and then rewriting ChatGPT responses to Keegin’s in-class writing prompts– then these are the sort of delusions which should be discussed with a well-trained and experienced therapist. I’m not even kidding about that.

Now, I’m not saying that cheating is nothing to worry about at all, and if a student were to turn in whatever ChatGPT provided for a class assignment with no alterations, then a) yes, I think that’s cheating, but b) that’s the kind of cheating that’s easy to catch, and c) Google is a much more useful cheating tool for this kind of thing. Keegin is clearly wrong about ChatGPT being a “Plagiarism Machine” and I’ve written many many many different times about why I am certain of this. But what I am interested in here is what Keegin thinks does and doesn’t count as cheating.

The main argument he’s trying to make in this article is that administrators need to step in to stop this never ending-battle against the ChatGPT plagiarism. Universities should “devise a set of standards for identifying and responding to AI plagiarism. Consider simplifying the procedure for reporting academic-integrity issues; research AI-detection services and software, find one that works best for your institution, and make sure all paper-grading faculty have access and know how to use it.”

Keegin doesn’t define what he means by cheating (though he does give some examples that don’t actually seem like cheating to me), but I think we can figure it out by reading what he means by a “meaningful education.” He writes (I’ve added the emphasis) “A meaningful education demands doing work for oneself and owning the product of one’s labor, good or bad. The passing off of someone else’s work as one’s own has always been one of the greatest threats to the educational enterprise. The transformation of institutions of higher education into institutions of higher credentialism means that for many students, the only thing dissuading them from plagiarism or exam-copying is the threat of punishment.”

So, I think Keegin sees education as an activity where students labor alone at mastering the material delivered by the instructor. Knowledge is not something shared or communal, and it certainly isn’t created through interactions with others. Rather, students receive knowledge, do the work they are asked to do by the instructor, they do that work alone, and then students reproduce that knowledge investment provided by the instructor– with interest. So any work a student might do that involves anyone or anything else– other students, a tutor, a friend, a google search, and yes ChatGPT– is an opportunity for cheating.

More or less, this what Paulo Freire meant by the ineffective and unjust  “banking model of education” which he wrote about over 50 years ago in Pedagogy of the Oppressed. Friere’s work remains very important in many fields specifically interested in pedagogy (including writing studies), and Pedagogy of the Oppressed is one of the most cited books in the social sciences. And yet, I think a lot of people in higher education– especially in STEM fields, business-oriented and other technical majors, and also in disciplines in the humanities that have not been particularly invested in pedagogy (philosophy, for example)– are okay with this system. These folks think education really is a lot like banking and “investing,” and they don’t see any problem with that metaphor. And if that’s your view of education, then getting help from anyone or anything that is not from the teacher is metaphorically like robbing a bank.

But I think it’s odd that Keegin is also upset with “credentialing” in higher education. That’s a common enough complaint, I suppose, especially when we talk about the problems with grading. But if we were to do away with degrees and grades as an indication of successful learning (or at least completion) and if we instead decided students should learn solely for the intrinsic value of learning, then why would it even matter if students cheated or not? That’d be completely their problem. (And btw, if universities did not offer credentials that have financial, social, and cultural value in the larger society, then universities would cease to exist– but that’s a different post).

Perhaps Keegin might say “I don’t have a problem with students seeking help from other people in the writing center or whatever. I have a problem with students seeking help from an AI.” I think that’s probably true with a lot of faculty. Even when professors have qualms about students getting a little too much help from a tutor, they still generally do see the value and usually encourage students to take advantage of support services, especially for students at the gen-ed levels.

But again, why is that different? If a student asks another human for help brainstorming a topic for an assignment, suggesting some ideas for research, creating an outline, suggesting some phrases to use, and/or helping out with proofreading, citation, and formatting, how is that not cheating when this help comes from a human but it is cheating when it comes from ChatGPT? And suppose a student instead turns to the internet and consults things like CliffsNotes, Wikipedia, Course Hero, other summaries and study guides, etc. etc.; is that cheating?

I could go on, but you get the idea. Again, I’m not saying that cheating in general and with ChatGPT in particular is nothing at all to worry about. And also to be fair to Keegin, he even admits “Some departments may choose to take a more optimistic approach to AI chatbots, insisting they can be helpful as a student research tool if used right.” But the more of these paranoid and shrill commentaries I read about “THE END” of writing assignments and how we have got to come up with harsh punishments for students so they stop using AI, the more I think these folks are just scared that they’re not going to be able to give students the same bullshitty non-teaching writing assignments that they’ve been doing for years.

My Talk About AI at Hope College (or why I still post things on a blog)

I gave a talk at Hope College last week about AI. Here’s a link to my slides, which also has all my notes and links. Right after I got invited to do this in January, I made it clear that I am far from an expert with AI. I’m just someone who had an AI writing assignment last fall (which was mostly based on previous teaching experiments by others), who has done a lot of reading and talking about it on Facebook/Twitter, and who blogged about it in December. So as I promised then, my angle was to stay in my lane and focus on how AI might impact the teaching of writing.

I think the talk went reasonably well. Over the last few months, I’ve watched parts of a couple of different ChatGPT/AI presentations via Zoom or as previously recorded, and my own take-away from them all has been a mix of “yep, I know that and I agree with you” and “oh, I didn’t know that, that’s cool.” That’s what this felt like to me: I talked about a lot of things that most of the folks attending knew about and agreed with, along with a few things that were new to them. And vice versa: I learned a lot too. It probably would have been a little more contentious had this taken place back when the freakout over ChatGPT was in full force. Maybe there still are some folks there who are freaked out by AI and cheating who didn’t show up. Instead, most of the people there had played around with the software and realized that it’s not quite the “cheating machine” being overhyped in the media. So it was a good conversation.

But that’s not really what I wanted to write about right now. Rather, I just wanted to point out that this is why I continue to post here, on a blog/this site, which I have maintained now for almost 20 years. Every once in a while, something I post “lands,” so to speak.

So for example: I posted about teaching a writing assignment involving AI at about the same time MSM is freaking out about ChatGPT. Some folks at Hope read that post (which has now been viewed over 3000 times), and they invited me to give this talk. Back in fall 2020, I blogged about how weird I thought it was that all of these people were going to teach online synchronously over Zoom. Someone involved with the Media & Learning Association, which is a European/Belgian organization, read it, invited me to write a short article based on that post and they also invited me to be on a Zoom panel that was a part of a conference they were having. And of course all of this was the beginning of the research and writing I’ve been doing about teaching online during Covid.

Back in April 2020, I wrote a post “No One Should Fail a Class Because of a Fucking Pandemic;” so far, it’s gotten over 10,000 views, it’s been quoted in a variety of places, and it was why I was interviewed by someone at CHE in the fall. (BTW, I think I’m going to write an update to that post, which will be about why it’s time to return to some pre-Covid requirements). I started blogging about MOOCs in 2012, which lead to a short article in College Composition and Communication and numerous more articles and presentations, a few invited speaking gigs (including TWO conferences sponsored by the University of Naples on the Isle of Capri), an edited collection and a book.

Now, most of the people I know in the field who once blogged have stopped (or mostly stopped) for one reason or another. I certainly do not post here nearly as often as I did before the arrival of Facebook and Twitter, and it makes sense for people to move on to other things. I’ve thought about giving it up, and there have been times where I didn’t post anything for months. Even the extremely prolific and smart local blogger Mark Maynard gave it all up, I suspect because of a combination of burn-out, Trump being voted out, and the additional work/responsibility of the excellent restaurant he co-owns/operates, Bellflower.

Plus if you do a search for “academic blogging is bad,” you’ll find all sorts of warnings about the dangers of it– all back in the day, of course. Deborah Brandt seemed to think it was mostly a bad idea (2014)The Guardian suggested it was too risky (2013), especially for  grad students posting work in progress. There were lots of warnings like this back then. None of them ever made any sense to me, though I didn’t start blogging until after I was on the tenure-track here. And no one at EMU has ever had anything negative to me about doing this, and that includes administrators even back in the old days of EMUTalk.

Anyway, I guess I’m just reflecting/musing now about why this very old-timey practice from the olde days of the Intertubes still matters, at least to me. About 95% of the posts I’ve written are barely read or noticed at all, and that’s fine. But every once in a while, I’ll post something, promote it a bit on social media, and it catches on. And then sometimes, a post becomes something else– an invited talk, a conference presentation, an article. So yeah, it’s still worth it.

Is AI Going to be “Something” or “Everything?”

Way back in January, I applied for release time from teaching for one semester next year– either a sabbatical or what’s called here a “faculty research fellowship” (FRF)– in order to continue the research I’ve been doing about teaching online during Covid. This is work I’ve been doing since fall 2020, including a Zoom talk at a conference in Europe, a survey I ran for about six months, and from that survey, I was able to recruit and interview a bunch of faculty about their experiences. I’ve gotten a lot out of this work already: a couple conference presentations (albeit in the kind of useless “online/on-demand” format), a website (which I had to code myself!) article, and, just last year, I was on one of those FRFs.

Well, a couple weeks ago, I found out that I will not be on sabbatical or FRF next year. My proposal, which was about seeking time to code and analyze all of the interview transcripts I collected last year, got turned down. I am not complaining about that: these awards are competitive, and I’ve been fortunate enough to receive several of these before, including one for this research. But not getting release time is making me rethink how much I want to continue this work, or if it is time for something else.

I think studying how Covid impacted faculty attitudes about online courses is definitely something important worth doing. But it is also looking backwards, and it feels a bit like an autopsy or one of those commissioned reports. And let’s be honest: how many of us want to think deeply about what happened during the pandemic, recalling the mistakes that everyone already knows they made? A couple years after the worst of it, I think we all have a better understanding now why people wanted to forget the 1918 pandemic.

It’s 20/20 hindsight, but I should have put together a sabbatical/research leave proposal about AI. With good reason, the committee that decides on these release time awards tends to favor proposals that are for things that are “cutting edge.” They also like to fund releases for faculty who have book contracts who are finishing things up, which is why I have been lucky enough to secure these awards both at the beginning and end of my MOOC research.

I’ve obviously been blogging about AI a lot lately, and I have casually started amassing quite a number of links to news stories and other resources related to Artificial Intelligence in general, ChatGPT and OpenAI in particular. As I type this entry in April 2023, I already have over 150 different links to things without even trying– I mean, this is all stuff that just shows up in my regular diet of social media and news. I even have a small invited speaking gig about writing and AI, which came about because of a blog post I wrote back in December— more on that in a future post, I’m sure.

But when it comes to me pursuing AI as my next “something” to research, I feel like I have two problems. First, it might already be too late for me to catch up. Sure, I’ve been getting some attention by blogging about it, and I had a “writing with GPT-3” assignment in a class I taught last fall, which I guess kind of puts me at least closer to being current with this stuff in terms of writing studies. But I also know there are already folks in the field (and I know some of these people quite well) who have been working on this for years longer than me.

Plus a ton of folks are clearly rushing into AI research at full speed. Just the other day, the CWCON at Davis organizers sent around a draft of the program for the conference in June. The Call For Proposals they released last summer describes the theme of this year’s event, “hybrid practices of engagement and equity.” I skimmed the program to get an idea of the overall schedule and some of what people were going to talk about, and there were a lot of mentions of ChatGPT and AI, which makes me think a lot of people are likely to be not talking about the CFP theme at all.

This brings me to the bigger problem I see with researching and writing about AI: it looks to me like this stuff is moving very quickly from being “something” to “everything.” Here’s what I mean:

A research agenda/focus needs to be “something” that has some boundaries. MOOCs were a good example of this. MOOCs were definitely “hot” from around 2012 to 2015 or so, and there was a moment back then when folks in comp/rhet thought we were all going to be dealing with MOOCs for first year writing. But even then, MOOCs were just a “something”  in the sense that you could be a perfectly successful writing studies scholar (even someone specializing in writing and technology) and completely ignore MOOCs.

Right now, AI is a myriad of “somethings,” but this is moving very quickly toward “everything.” It feel to me like very soon (five years, tops), anyone who wants to do scholarship in writing studies is going to have to engage with AI. Successful (and even mediocre) scholars in writing studies (especially someone specializing in writing and technology) are not going to be able to ignore AI.

This all reminds me a bit about what happened with word processing technology. Yes, this really was something people studied and debated way back when. In the 1980s and early 1990s, there were hundreds of articles and presentations about whether or not to use word processing to teach writing— for example, “The Word Processor as an Instructional Tool: A Meta-Analysis of Word Processing in Writing Instruction” by Robert L. Bangert-Drowns, or “The Effects of Word Processing on Students’ Writing Quality and Revision Strategies” by Ronald D. Owston, Sharon Murphy, Herbert H. Wideman. These articles were both published in the early 1990s and in major journals, and both are trying to answer the question which one is “better.” (By the way, most but far from all of these studies concluded that word processing is better in the sense it helped students generate more text and revise more frequently. It’s also worth mentioning that a lot of this research overlaps with studies about the role of spell-checking and grammar-checking with writing pedagogy).

Yet in my recollection of those times, this comparison between word processing and writing by hand was rendered irrelevant because everyone– teachers, students, professional writers (at least all but the most stubborn, as Wendell Berry declares in his now cringy and hopelessly dated short essay “Why I Am not Going to Buy a Computer”)– switched to word processing software on computers to write. When I started teaching as a grad student in 1988, I required students to hand in typed papers and I strongly encouraged them to write at least one of their essays with a word processing program. Some students complained because they were never asked to type anything in high school. By the time I started my PhD program five years later in 1993, students all knew they needed to type their essays on a computer and generally with MS Word.

Was this shift a result of some research consensus that using a computer to type texts was better than writing texts out by hand? Not really, and obviously, there are still lots of reasons why people still write some things by hand– a lot of personal writing (poems, diaries, stories, that kind of thing) and a lot of note-taking. No, everyone switched because everyone realized word processing made writing easier (but not necessarily better) in lots and lots of different ways and that was that. Even in the midst of this panicky moment about plagiarism and AI, I have yet to read anyone seriously suggest that we make our students give up Word or Google Docs and require them to turn in handwritten assignments. So, as a researchable “something,” word processing disappeared because (of course) everyone everywhere who writes obviously uses some version of word processing, which means the issue is settled.

One of the other reasons why I’m using word processing scholarship as my example here is because both Microsoft and Google have made it clear that they plan on integrating their versions of AI into their suites of software– and that would include MS Word and Google Docs. This could be rolling out just in time for the start of the fall 2023 semester, maybe earlier. Assuming this is the case, people who teach any kind of writing at any kind of level are not going to have time to debate if AI tools will be “good” or “bad,” and we’re not going to be able to study any sorts of best practices either. This stuff is just going to be a part of the everything, and for better or worse, that means the issue will soon be settled.

And honestly, I think the “everything” of AI is going to impact, well, everything. It feels to me a lot like when “the internet” (particularly with the arrival of web browsers like Mosaic in 1993) became everything. I think the shift to AI is going to be that big, and it’s going to have as big of an impact on every aspect of our professional and technical lives– certainly every aspect that involves computers.

Who the hell knows how this is all going to turn out, but when it comes to what this means for the teaching of writing, as I’ve said before, I’m optimistic. Just as the field adjusted to word processing (and spell-checkers and grammar-checkers, and really just the whole firehouse of text from the internet), I think we’ll be able to adjust to this new something to everything too.

As far as my scholarship goes though: for reasons, I won’t be able to eligible for another release from teaching until the 2025-26 school year. I’m sure I’ll keep blogging about AI and related issues and maybe that will turn into a scholarly project. Or maybe we’ll all be on to something entirely different in three years….

 

What Would an AI Grading App Look Like?

While a whole lot of people (academics and non-academics alike) have been losing their minds lately about the potential of students using ChatGPT to cheat on their writing assignments, I haven’t read/heard/seen much about the potential of teachers using AI software to read, grade, and comment on student writing. Maybe it’s out there in the firehose stream of stories about AI I see every day (I’m trying to keep up a list on pinboard) and I’ve just missed it.

I’ve searched and found some discussion of using ChatGPT to grade on Reddit (here and here), and I’ve seen other posts about how teachers might use the software to do things other than grading, but that’s about it. In fact, the reason I’m thinking about this again now is not because of another AI story but because I watched a South Park episode about AI called “Deep Learning.” South Park has been a pretty uneven show for several years, but if you are fan and/or if you’re interested in AI, this is a must-see. A lot happens in this episode, but my favorite reaction about ChatGPT comes from the kids’ infamous teacher, Mr. Garrison. While complaining about grading a stack of long and complicated essays (which the students completed with ChatGPT), Rick (Garrison’s boyfriend) tells him about ChatGPT, and Mr. Garrison has far too honest of a reaction: “This is gonna be amazing! I can use it to grade all my papers and no one will ever know! I’ll just type the title of the essay in, it’ll generate a comment, and I don’t even have to read the stupid thing!”

Of course, even Mr. Garrison knows that would be “wrong” and he must keep this a secret. That probably explains why I still haven’t come across much about an AI grading app. But really though: shouldn’t we be having this discussion? Doesn’t Mr. Garrison have a point?

Teacher concerns about grading/scoring writing with computers are not new, and one of the nice things about having kept a blog so long is I can search and “recall” some of these past discussions. Back in 2005, I had a post about NCTE coming out against the SAT writing test and machine scoring of those tests. There was also a link in that post to an article about a sociologist at the University of Missouri named Edward Brent who had developed a way of giving students feedback on their writing assignments. I couldn’t find the original article, but this one from the BBC in 2005 covers the same story. It seems like it was a tool developed very specifically for the content of Brent’s courses and I’m guessing it was quite crude by today’s standards. I do think Brent makes a good point on the value of these kinds of tools: “It makes our job more interesting because we don’t have to deal so much with the facts and concentrate more on thinking.”

About a decade ago, I also had a couple of other posts about machine grading, both of which were posts that grew out of discussions from the now mostly defunct WPA-L. There was this one from 2012, which included a link to a New York Times article about Educational Testing Service’s product “e-rater,” “Facing a Robo-Grader? Just Keep Obfuscating Mellifluously.” The article features Les Perelman, who was the director of writing at MIT, demonstrating ways to fool e-rater with nonsense and inaccuracies. At the time, I thought Perelman was correct, but also a good argument could be made that if a student was smart enough to fool e-rater, maybe they deserved the higher score.

Then in 2013, there was another kerfuffle on WPA-L about machine grading that involved a petition drive at the website humanreaders.org against machine grading. In my post back then, I agreed with the main goal of the petition,  that “Machine grading software can’t recognize things like a sense of humor or irony, it tends to favor text length over conciseness, it is fairly easy to circumvent with gibberish kinds of writing, it doesn’t work in real world settings, it fuels high stakes testing, etc., etc., etc.” But I also had some questions about all that. I made a comparison between these new tools and the initial resistance to spell checkers, and then I also wrote this:

As a teacher, my least favorite part of teaching is grading. I do not think that I am alone in that sentiment. So while I would not want to outsource my grading to someone else or to a machine (because again, I teach writing, I don’t just assign writing), I would not be against a machine that helps make grading easier. So what if a computer program provided feedback on a chunk of student writing automatically, and then I as the teacher followed behind those machine comments, deleting ones I thought were wrong or unnecessary, expanding on others I thought were useful? What if a machine printed out a report that a student writer and I could discuss in a conference? And from a WPA point of view, what if this machine helped me provide professional development support to GAs and part-timers in their commenting on students’ work?

By the way, an ironic/odd tangent about that post: the domain name humanreaders.org has clearly changed hands. In 2013, it looked like this (this link is from the Internet Archive): basically, a petition form. The current site domain humanreaders.org redirects to this page on some content farm website called we-heart.com. This page, from 2022, is a list of the “six top online college paper writing websites today.”

Anyway, let me state the obvious: I’m not suggesting an AI application for replacing all teacher feedback (as Mr. Garrison is suggesting) at all. Besides the fact that it wouldn’t be “right” no matter how you twist the ethics of it, I don’t think it would work well– yet. Grading/commenting on student writing is my least favorite part of the job, so I understand where Mr. Garrison is coming from. Unfortunately though, reading/ grading/ commenting on student writing is essential to teaching writing. I don’t know how I can evaluate a student’s writing without reading it, and I also don’t know how to help students think about how to revise their writing (and, hopefully, learn how to apply these lessons and advice to writing these students do beyond my class) without making comments.

However, this is A LOT of work that takes A LOT of time. I’ve certainly learned some things that make grading a bit easier than it was when I started. For example, I’ve learned that less is more: marking up every little mistake or thing in the paper and then writing a really long end comment is a waste of time because it confuses and frustrates students and it literally takes longer. But it still takes me about 15-20 minutes to read and comment on each long-ish student essay, which are typically a bit shorter than this blog post. So in a full (25 students) writing class, it takes me 8-10 hours to completely read, comment on, and grade all of their essays; multiply that by two or three or more (since I’m teaching three writing classes a term), and it adds up pretty quickly. Plus we’re talking about student writing here. I don’t mind reading it and students often have interesting and inspiring observations, but by definition, these are writers who are still learning and who often have a lot to learn. So this isn’t like reading The New Yorker or a long novel or something you can get “lost” in as a reader. This ain’t reading for fun– and it’s also one of the reasons why, after reading a bunch of student papers in a day, I’m much more likely to just watch TV at night.

So hypothetically, if there was a tool out there that could help me make this process faster, easier, and less unpleasant, and if this tool also helped students learn more about writing, why wouldn’t I want to use it?

I’ve experimented a bit with ChatGPT with prompts along the lines of “offer advice on how to revise and improve the following text” and then paste in a student essay. The results are mix of (IMO) good, bad, and wrong, and mostly written in the robotic voice typical of AI writing. I think students would have a hard time sorting through these mixed messages. Plus I don’t think there’s a way (yet) for ChatGPT to comment on specific passages in a piece of student writing: that is, it can provide an overall end comment, but it cannot comment on individual sentences and paragraphs and have those comments appear in the margins like the comment feature in Word or Google Docs. Like most writing teachers, that’s a lot of the commenting I do, so an AI that can’t do that (yet) at all just isn’t that useful to me.

But the key phrase there is “yet,” and it does not take a tremendous amount of imagination to figure out how this could work in the near future. For example, what if I could train my own grading AI by feeding it a few classes worth of previous student essays with my comments? I don’t logistically know how that would work, but I am willing to bet that with enough training, a Krause-centric version of ChatGPT would anticipate most of the comments I would make myself on a student writing project. I’m sure it would be far from perfect, and I’d still want to do my own reading and evaluation. But I bet this would save me a lot of time.

Maybe, some time in the future, this will be a real app. But there’s another use of ChatGPT I’ve been playing around with lately, one I hesitate on trying but one that would both help some of my struggling students and save me time on grading. I mentioned this in my first post about using ChatGPT to teach way back in December. What I’ve found in my ChatGPT noodling (so far) is if I take a piece of writing that has a ton of errors in it (incomplete sentences, punctuation in the wrong place, run-on/meandering sentences, stuff like that– all very common issues, especially for first year writing students) and prompt ChatGPT to revise the text so it is grammatically correct, it does a wonderful job.It doesn’t change the meaning or argument of the writing– just the grammar. It generally doesn’t make different word choices and it certainly doesn’t make the student’s argument “smarter”; it just arranges everything so it’s correct.

That might not seem like much, but for a lot of students who struggle with getting these basics right, using ChatGPT like this could really help. And to paraphrase Edward Brent from way back in 2005, if students could use a tool like this to at least deal with basic issues like writing more or less grammatically correct sentences, then I might be able to spend more time concentrating more on the student’s analysis, argument, use of evidence, and so forth.

And yet– I don’t know, it even feels to me like a step too far.

I have students who have diagnosed learning difficulties of one sort or another who show me letters of accommodation from the campus disability resource center which specifically tell me I should allow students to use Grammarly in their writing process. I encourage students to go to the writing center all the time, in part because I want my students– especially the struggling ones– to sit down with a consultant who will help them go through their essays so they can revise and improve it. I never have a problem with students wanting to get feedback on their work from a parent or a friend who is “really good” at writing.

So why does it feel like encouraging students to try this in ChatGPT is more like cheating than it does for me to encourage students to be sure to spell check and to check out the grammar suggestions made by Google Docs? Is it too far? Maybe I’ll find out in class next week.

The Problem is Not the AI

The other day, I heard the opening of this episode of the NPR call-in show 1A, “Know It All: ChatGPT In the Classroom.” It opened with this recorded comment from a listener named Kate:

“I teach freshman English at a local university, and three of my students turned in chatbot papers written this past week. I spent my entire weekend trying to confirm they were chatbot written, then trying to figure out how to confront them, to turn them in as plagiarist, because that is what they are, and how I’m going penalize their grade. This is not pleasant, and this is not a good temptation. These young men’s academic careers now hang in the balance because now they’ve been caught cheating.”

Now, I didn’t listen to the show for long beyond this opener (I was driving around running errands), and based on what’s available on the website, the discussion  also included information about incorporating ChatGPT into teaching. Also, I don’t want to be too hard on poor Kate; she’s obviously really flustered and I am guessing there were a lot of teachers listening to Kate’s story who could very personally relate.

But look, the problem is not the AI.

Perhaps Kate was teaching a literature class and not a composition and rhetoric class, but let’s assume whatever “freshman English” class she was teaching involved a lot of writing assignments. As I mentioned in the last post I had about AI and teaching with GPT-3 back in December, there is a difference between teaching writing and assigning writing. This is especially important in classes where the goal is to help students become better at the kind of writing skills they’ll need in other classes and “in life” in general.

Teaching writing means a series of assignments that build on each other, that involve brainstorming and prewriting activities, and that involve activities like peer reviews, discussions of revision, reflection from students on the process, and so forth. I require students in my first year comp/rhet classes to “show their work” through drafts that is in a way they similar to how they’d be expected to in an Algebra or Calculus course. It’s not just the final answer that counts. In contrast, assigning writing is when teachers give an assignment (often a quite formulaic one, like write a 5 paragraph essay about ‘x’) with no opportunities to talk about getting started, no consideration of audience or purpose, no interaction with the other students who are trying to do the same assignment, and no opportunity to revise or reflect.

While obviously more time-consuming and labor-intensive, teaching writing has two enormous advantages over only assigning writing. First, we know it “works” in that this approach improves student writing– or at least we know it works better than only assigning writing and hoping for the best. We know this because people in my field have been studying this for decades, despite the fact that there are still a lot of people just assigning writing, like Kate. Second, teaching writing makes it extremely difficult to cheat in the way Kate’s students have cheated– or maybe cheated. When I talk to my students about cheating and plagiarism, I always ask “why do you think I don’t worry much about you doing that in this class?” Their answer typically is “because we have to turn in all this other stuff too” and “because it would be too much work,” though I also like to believe that because of the way the assignments are structured, students become interested in their own writing in a way that makes cheating seem silly.

Let me just note that what I’m describing has been the conventional wisdom among specialists in composition and rhetoric for at least the last 30 (and probably more like 50) years. None of this is even remotely controversial in the field, nor is any of this “new.”

But back to Kate: certain that these three students turned in “chatbot papers,” she spent the “entire weekend” working to prove these students committed the crime of plagiarism and they deserve to be punished. She thinks this is a remarkably serious offense– their “academic careers now hang in the balance”– but I don’t think she’s going through all this because of some sort of abstract and academic ideal. No, this is personal. In her mind, these students did this to her and she’s going to punish them. This is beyond a sense of justice. She’s doing this to get even.

I get that feeling, that sense that her students betrayed her. But there’s no point in making teaching about “getting even” or “winning” because as the teacher, you create the game and the rules, you are the best player and the referee, and you always win. Getting even with students is like getting even with a toddler.

Anyway, let’s just assume for a moment that Kate’s suspicions are correct and these three students handed in essays created entirely by ChatGPT. First off, anyone who teaches classes like “Freshman English” should not need an entire weekend or any special software to figure out if these essays were written by an AI. Human writers– at all levels, but especially comparatively inexperienced human writers– do not compose the kind of uniform, grammatically correct, and robotically plodding prose generated by ChatGPT. Every time I see an article with a passage of text that asks “was this written by a robot or a student,” I always guess right– well, almost always I guess right.

Second,  if Kate did spend her weekend trying to find “the original” source ChatGPT used to create these essays, she certainly came up empty handed. That was the old school way of catching plagiarism cheats: you look for the original source the student plagiarized and confront the student with it, court room drama style. But ChatGPT (and other AI tools) do not “copy” from other sources; rather, the AI creates original text every time. That’s why there have been several different articles crediting an AI as a “co-author.”

Instead of wasting a weekend, what Kate should have done is called each of these students into her office or taken them aside one by one in a conference and asked them about their essays. If the students cheated,  they would not be able to answer basic questions about what they handed in, and 99 times out of 100, the confronted cheating student will confess.

Because here’s the thing: despite all the alarm out there that all students are cheating constantly, my experience has been the vast majority do not cheat like this, and they don’t want to cheat like this. Oh sure, students will sometimes “cut corners” by looking over to someone else’s answers on an exam, or maybe by adding a paragraph or two from something without citing it. But in my experience, the kind of over-the-top sort of cheating Kate is worried about is extremely rare. Most students want to do the right thing by doing the work, trying to learn something, and by trying their best– plus students don’t want to get in trouble from cheating either.

Further, the kinds of students who do try to blatantly plagiarize are not “criminal masterminds.” Far from it. Rather, students blatantly plagiarize when they are failing and desperate, and they are certainly not thinking of their “academic careers.” (And as a tangent: seems to me Kate might be overestimating the importance of her “Freshman English” class a smidge).

But here’s the other issue: what if Kate actually talked to these students, and what if it turned out they either did not realize using ChatGPT was cheating, and/or they used ChatGPT in a way that wasn’t significantly different from getting some help from the writing center or a friend? What do you do then? Because– and again, I wrote about this in December— when I asked students to use GPT-3 (OpenAI’s software before ChatGPT) to write an essay and to then reflect on that process, a lot of them described the software as being a brainstorming tool, sort of like a “coach,” and not a lot different from getting help from others in peer review or from a visit to the writing center.

So like I said, I don’t want to be too hard on Kate. I know that there are a lot of teachers who are similarly freaked out about students using AI to cheat, and I’m not trying to suggest that there is nothing to worry about either. I think a lot of what is being predicted as the “next big thing” with AI is either a lot further off in the future than we might think, or it is in the same category as other famous “just around the corner” technologies like flying cars. But no question that this technology is going to continue to improve, and there’s also no question that it’s not going away. So for the Kates out there: instead of spending your weekend on the impossible task of proving that those students cheated, why not spend a little of that time playing around with ChatGPT and seeing what you find out?

AI Can Save Writing by Killing “The College Essay”

I finished reading and grading the last big project from my “Digital Writing” class this semester, an assignment that was about the emergence of openai.com’s artificial intelligence technologies GPT-3 and DALL-E. It was interesting and I’ll probably write more about it later, but the short version for now is my students and I have spent the last month or so noodling around with software and reading about both the potentials and dangers of rapidly improving AI, especially when it comes to writing.

So the timing of of Stephen Marche’s recently published commentary with the clickbaity title “The College Essay Is Dead” in The Atlantic could not be better– or worse? It’s not the first article I’ve read this semester along these lines, that GPT-3 is going to make cheating on college writing so easy that there simply will not be any point in assigning it anymore. Heck, it’s not even the only one in The Atlantic this week! Daniel Herman’s “The End of High-School English” takes a similar tact. In both cases, they claim, GPT-3 will make the “essay assignment” irrelevant.

That’s nonsense, though it might not be nonsense in the not so distant future. Eventually, whatever comes after GPT-3 and ChatGPT might really mean teachers can’t get away with only assigning writing. But I think we’ve got a ways to go before that happens.

Both Marche and Herman (and just about every other mainstream media article I’ve read about AI) make it sound like GPT-3, DALL-E, and similar AIs are as easy as working the computer on the Starship Enterprise: ask the software for an essay about some topic (Marche’s essay begins with a paragraph about “learning styles” written by GPT-3), and boom! you’ve got a finished and complete essay, just like asking the replicator for Earl Grey tea (hot). That’s just not true.

In my brief and amateurish experience, using GPT-3 and DALL-E is all about entering a carefully worded prompt. Figuring out how to come up with a good prompt involved trial and error, and I thought it took a surprising amount of time. In that sense, I found the process of experimenting with prompts similar to the kind of  invention/pre-writing activities  I teach to my students and that I use in my own writing practices all the time.  None of my prompts produced more than about two paragraphs of useful text at a time, and that was the case for my students as well. Instead, what my students and I both ended up doing was entering in several different prompts based on the output we were hoping to generate. And my students and I still had to edit the different pieces together, write transitions between AI generated chunks of texts, and so forth.

In their essays, some students reflected on the usefulness of GPT-3 as a brainstorming tool.  These students saw the AI as a sort of “collaborator” or “coach,” and some wrote about how GPT-3 made suggestions they hadn’t thought of themselves. In that sense, GPT-3 stood in for the feedback students might get from peer review, a visit to the writing center, or just talking with others about ideas. Other students did not think GPT-3 was useful, writing that while they thought the technology was interesting and fun, it was far more work to try to get it to “help” with writing an essay than it was for the student to just write the thing themselves.

These reactions square with the results in more academic/less clickbaity articles about GPT-3. This is especially true about  Paul Fyfe’s “How to cheat on your final paper: Assigning AI for student writing.” The assignment I gave my students was very similar to what Fyfe did and wrote about– that is, we both asked students to write (“cheat”) with AI (GPT-2 in the case of Fyfe’s article) and then reflect on the experience. And if you are a writing teacher reading this because you are curious about experimenting with this technology, go and read Fyfe’s article right away.

Oh yeah, one of the other major limitations of GPT-3’s usefulness as an academic writing/cheating tool: it cannot do even basic “research.” If you ask GPT-3 to write something that incorporates research and evidence, it either doesn’t comply or it completely makes stuff up, citing articles that do not exist. Let me share a long quote from a recent article at The Verge by James Vincent on this:

This is one of several well-known failings of AI text generation models, otherwise known as large language models or LLMs. These systems are trained by analyzing patterns in huge reams of text scraped from the web. They look for statistical regularities in this data and use these to predict what words should come next in any given sentence. This means, though, that they lack hard-coded rules for how certain systems in the world operate, leading to their propensity to generate “fluent bullshit.”

I think this limitation (along with the limitation that GPT-3 and ChatGPT are not capable of searching the internet) makes using GPT-3 as a plagiarism tool in any kind of research writing class kind of a deal-breaker. It certainly would not get students far in most sections of freshman comp where they’re expected to quote from other sources.

Anyway, the point I’m trying to make here (and this is something that I think most people who teach writing regularly take as a given) is that there is a big difference between assigning students to write a “college essay” and teaching students how to write essays or any other genre. Perhaps when Marche was still teaching Shakespeare (before he was a novelist/cultural commentator, Marche earned a PhD specializing in early English drama), he assigned his students to write an essay about one of Shakespeare’s plays. Perhaps he gave his students some basic requirements about the number of words and some other mechanics, but that was about it. This is what I mean by only assigning writing: there’s no discussion of audience or purpose, there are no opportunities for peer review or drafts, there is no discussion of revision.

Teaching writing is a process. It starts by making writing assignments that are specific and that require an investment in things like prewriting and a series of assignments and activities that are “scaffolding” for a larger writing assignment. And ideally, teaching writing includes things like peer reviews and other interventions in the drafting process, and there is at least an acknowledgment that revision is a part of writing.

Most poorly designed assigned writing tasks are good examples of prompts that you enter into GPT-3. The results are definitely impressive, but I don’t think it’s quite useful enough to produce work a would-be cheater can pass off as their own. For example, I asked ChatGPT (twice) to “write a 1000 word college essay about the theme of insanity in Hamlet” and it came up with this and this essay. ChatGPT produced some impressive results, sure, but besides the fact that both of these essays are significantly shorter than 1000 word requirement, they both kind of read like… well, like a robot wrote them. I think that most instructors who received this essay from a student– particularly in an introductory class– would suspect that the student cheated. When I asked ChatGPT to write a well researched essay about the theme of insanity in Hamlet, it managed to produce an essay that quoted from the play, but not any research about Hamlet.

Interestingly, I do think ChatGPT has some potential for helping students revise. I’m not going to share the example here (because it was based on actual student writing), but I asked ChatGPT to “revise the following paragraph so it is grammatically correct” and I then added a particularly pronounced example of “basic” (developmental, grammatically incorrect, etc.) writing. The results didn’t improve the ideas in the writing and it changed only a few words. But it did transform the paragraph into a series of grammatically correct (albeit not terribly interesting) sentences.

In any event, if I were a student intent on cheating on this hypothetical assignment, I think I’d just do a Google search for papers on Hamlet instead. And that’s one of the other things Marche and these other commentators have left out: if a student wants to complete a badly designed “college essay” assignment by cheating, there are much much better and easier ways to do that right now.

Marche does eventually move on from “the college essay is dead” argument by the end of his commentary, and he discusses how GPT-3 and similar natural language processing technologies will have a lot of value to humanities scholars. Academics studying Shakespeare now have a reason to talk to computer science-types to figure out how to make use of this technology to analyze the playwright’s origins and early plays. Academics studying computer science and other fields connected to AI will now have a reason to maybe talk with the English-types as to how well their tools actually can write. As Marche says at the end, “Before that space for collaboration can exist, both sides will have to take the most difficult leaps for highly educated people: Understand that they need the other side, and admit their basic ignorance.”

Plus I have to acknowledge that I have only spent so much time experimenting with my openai.com account because I still only have the free version. That was enough access for my students and me to noodle around enough to complete a short essay composed with the assistance of GPT-3 and to generate an accompanying image with GPT-3. But that was about it. Had I signed up for openai.com’s “pay as you go” payment plan, I might learn more about how to work this thing, and maybe I would have figured out better prompts for that Hamlet assignment. Besides all that, this technology is getting better alarmingly fast. We all know whatever comes after ChatGPT is going to be even more impressive.

But we’re not there yet. And when it is actually as good as Marche fears it might be, and if that makes teachers rethink how they might teach rather than assign writing, that would be a very good thing.