I most certainly do not have the time to be writing this because it’s the height of the “assessment season” (e.g., grading) for several different assignments my students have been working on for a while now. That’s why posting this took me a while– I wrote it during breaks in a week-long grading marathon. In other words, I have better things to do right now. But I find myself needing to write a bit in response to Zach Justus and Nik Janos’ Inside Higher Ed piece “Assessment of Student Learning is Broken,” and I figured I might as well make it into a blog entry. I don’t want to be a jerk about any of this and I’m just Justus and Janos are swell guys and everything, but this op-ed bothered me a lot.
Justus and Janos are both professors at Chico State in California; Justus is a professor in Communications and is the director of the faculty development program there, and Janos is in sociology. They begin their op-ed about AI “breaking” assessment quite briskly:
Generative artificial intelligence (AI) has broken higher education assessment. This has implications from the classroom to institutional accreditation. We are advocating for a one-year pause on assessment requirements from institutions and accreditation bodies. We should divert the time we would normally spend on assessment toward a reevaluation of how to measure student learning. This could also be the start of a conversation about what students need to learn in this new age.
I hadn’t thought a lot about how AI might figure into institutional accreditation, so I kept reading. And that’s where I first began to wonder about the argument they’re making, because very quickly, they seem to equate institutional assessment with assessment in individual classes (grading). Specifically, most of this piece is about the problems caused by AI (supposedly) of a very specific assignment in a very specific sociology class.
I have no direct experience with institutional assessment, but as part of the Writing Program Administration work I’ve dipped into a few times over the years, I have some experience with program assessment. In those kind of assessments, we’re looking at the forest rather than the individual trees. For example, maybe as part of a program assessment, the WPAs might want to consider the average grades of all sections of first year writing. That sort of measure could tell us stuff about the overall pass rate and grade distribution across sections, and so on. But that data can’t tell you much about grades for specific students or the practices of a specific instructor. As far as I can tell, institutional assessments are similar “big picture” evaluations.
Justus and Janos see it differently, I guess:
“Take an introductory writing class as an example. One instructor may not have an AI policy, another may have a “ban” in place and be using AI detection software, a third may love the technology and be requiring students to use it. These varied policies make the aggregated data as evidence of student learning worthless.”
Yes, different teachers across many different sections of the same introductory writing class take different approaches to teaching writing, including with (or without) AI. That’s because individual instructors are, well, individuals– plus each group of students is different as well. Some of Justus and Janos’ reaction to these differences probably have to do with their disciplinary presumptions about “data”: if it’s not uniform and if it not something that can be quantified, then it is, as they say, “worthless.” Of course in writing studies, we have no problem with much more fuzzy and qualitative data. So from my point of view, as long as the instructors are more or less following the same outcomes/curriculum, I don’t see the problem.
But like I said, Justus and Janos aren’t talking about institutional assessment. Rather, they devote most of this piece to a very specific assignment. Janos teaches a sociology class that has an institutional writing competency requirement for the major. The class has students “writing frequently” with a variety of assignments for “nonacademic audiences,” like “letters-to-the-editor, … encyclopedia articles, and mock speeches to a city council” meeting. Justus and Janos say “Many of these assignments help students practice writing to show general proficiency in grammar, syntax and style.” That may or may not be true, but it’s not at all clear how this was assigned or what sort of feedback students received. .
Anyway, one of the key parts of this class is a series of assignments about:
“a foundational concept in sociology called the sociological imagination (SI), developed by C. Wright Mills. The concept helps people think sociologically by recognizing that what we think of as personal troubles, say being homeless, are really social problems, i.e., homelessness.”
It’s not clear to me what students read and study to learn about SI, but it’s a concept that’s been around for a long time– Mills wrote about it in a book in the 1950s. So not surprisingly, there is A LOT of information about this available online, and presumably that has been the case for years.
Students read about SI and as part of their study, they “are asked to provide, in their own words and without quotes, a definition of the SI.” To help do this, students do activities like “role play” to they are talking to friends or family about a social problem such as homelessness. “Lastly,” (to quote at length one last time):
…students must craft a script of 75 words or fewer that defines the SI and uses it to shed light on the social problem. The script has to be written in everyday language, be set in a gathering of friends or family, use and define the concept, and make one point about the topic.
Generative AI, like ChatGPT, has broken assessment of student learning in an assignment like this. ChatGPT can meet or exceed students’ outcomes in mere seconds. Before fall 2022 and the release of ChatGPT, students struggled to define the sociological imagination, so a key response was to copy and paste boilerplate feedback to a majority of the students with further discussion in class. This spring, in a section of 27 students, 26 nailed the definition perfectly. There is no way to know whether students used ChatGPT, but the outcomes were strikingly different between the pre- and post-AI era.
Hmm. Okay, I have questions.
- You mean to tell me that the key deliverable/artifact that students produce in this class to demonstrate that they’ve met a university-mandated gen ed writing requirement is a 75 word or fewer passage? That’s it? Really. Really? I am certainly not saying that being able to produce a lot of text should not be the main factor for demonstrating “writing competency,” but this seems more than weird and hard to believe.
- Is there any instructional apparatus for this assignment at all? In other words, do students have to produce drafts of this script? Are there any sort of in-class work with the role-play that’s documented in some way? Any reflection on the process? Anything?
- I have no idea what the reading assignments and lectures were for this assignment, so I could very well be missing a key concept with SI. But I feel like I could have copied and pasted together a pretty good script just based on some Google searching around– if I was inclined to cheat in the first place. So given that, why are Justus and Janos confident that students hadn’t been cheating before Fall 2022?
- The passage about the “before Fall 2022” approach to teaching this writing assignment says a lot. It sounds like there’s no actual discussion of what students wrote, and the main instructions to students back then was to follow “boilerplate feedback.” So, in assessing this assignment, was Janos evaluating the unique choices students made in crafting their SI scripts? Or rather, was he evaluating these SI scripts for the “right answer” he provided in the readings or lectures?
- And as Justus and Janos note, there is no good way to know for certain if a student handed in something made in part or in whole by AI, so why are they assuming that all of those students who got the “right answer” with their SI scripts were cheating?
So, Justus and Janos conclude, because now instructors are evaluating “some combination of student/AI work,” it is simply impossible to make any assessment for institutional accreditation. Their solution is “we should have a one-year pause wherein no assessment is expected or will be received.” What kinds of assessments are they talking about? Why only a year pause? None of this is clear.
Clearly, the problem here is not institutional assessment or the role of AI; the problem is the writing assignment. The solutions are also obvious.
First, there’s the teaching writing versus assigning it. I have blogged a lot about this in the last couple years (notably here), but teaching writing means a series of assignments where students need to “show their work.” That seems extremely doable with this particular assignment, too. Sure, it would require more actual instruction and evaluation than “boilerplate feedback,” but this seems like a small class (27 students), so that doesn’t seem that big of a deal.
Second, if you have an assignment in anything that can successfully be completed with a simple prompt into ChatGPT (as in “write a 75 word script explaining SI in everyday language”), then that’s definitely now a bad assignment. That’s the real “garbage in, garbage out” issue here.
And third, one of the things that AI has made me realize is if an instructor has an assignment in a class– and I mean any assignment in any class– which can be successfully completed without having any experience or connection to that instructor or the class, then that’s a bad assignment. Again, that seems like an extremely easy to address with the assignment that Justus and Janos describe. They’d have to make changes to the assignment and assessment, of course, but doesn’t that make more sense than trying to argue that we should completely revamp the institutional accreditation process?
Thanks so much for doing such a thorough read. You make a lot of excellent points here about the shortcomings of the examples we used and your suggestions about alternative assignment structures are really spot on.
I really appreciate this deep read of our piece and it points to several things Nik and I went back and forth about explaining further. In addition, your suggestions about alternative assignment structures are great and the kind of thing we recommend to people. Thanks for this thoughtful reply!