Three thoughts on the “Essay,” assessing, and using “robo-grading” for good

NPR had a story on Weekend Edition last week, “More States Opting to ‘Robo-Grade” Student Essays By Computer,” that got some attention from other comp/rhet folks though not as much as I thought it might. Essentially, the story is about the use of computers to “assess” (really “rate,” but I’ll get to that in a second) student writing on standardized tests. Most composition and rhetoric scholars think this software is a bad idea. I think this is not not true, though I do have three thoughts.

First, I agree with what my friend and colleague Bill Hart-Davidson writes here about essays, though this is not what most people think “essay” means. Bill draws on the classic French origins of the word, noting that an essay is supposed to be a “try,” an attempt and often a wandering one at that. Read any of the quite old classics (de Montaigne comes to mind, though I don’t know his work as well as I should) or even the more modern ones (E.B. White or Joan Didion or the very contemporary David Sedaris) and you get more of a sense of this classic meaning. Sure, these writers’ essays are organized and have a point, but they wander to them and they are presented (presumably after much revision) as if the writer was discovering their point along with the reader.

In my own teaching, I tend to use the term project to describe what I assign students to do because I think it’s a term that can include a variety of different kinds of texts (including essays) and other deliverables. I hate the far too common term paper because it suggests writing that is static, boring, routine, uninteresting, and bureaucratic. It’s policing, as in “show me your papers” when trying to pass through a boarder. No one likes completing “paperwork,” but it is one of those necessary things grown-ups have to do.

Nonetheless, for most people including most writing teachers–  the term “essay” and “paper” are synonymous. The original meaning of essay has been replaced by the school meaning of essay (or paper– same thing).  Thus we have the five paragraph form, or even this comparably enlightened advice from the Bow Valley College Library and Learning Commons, one of the first links that came up in a simple Google search. It’s a list (five steps, too!) for creating an essay (or paper) driven by a thesis and research.  For most college students, papers (or essays) are training for white collar careers to learn how to complete required office paperwork.

Second, while it is true that robo-grading standardized tests does not help anyone learn how to write, the most visible aspect of writing pedagogy to people who have no expertise in teaching (beyond experience as a student, of course) is not the teaching but the assessment. So in that sense, it’s not surprising this article focuses on assessment at the expense of teaching.

Besides, composition and rhetoric as a field is very into assessment, sometimes (IMO) at the expense of teaching and learning about writing. Much of the work of Writing Program Administration and scholarship in the field is tied to assessment– and a lot (most?) comp/rhet specialists end up involved in WPA work at some point in their careers. WPAs have to consider large-scale assessment issues to measure outcomes across many different sections of first year writing, and they usually have to mentor instructors on small-scale assessment– that is, how to grade and comment all these student essays papers in a way that is both useful to students and that does not take an enormous amount of time.  There is a ton of scholarship on assessment– how to do it, what works or doesn’t, the pros and cons of portfolios, etc. There are books and journals and conferences devoted to assessment. Plenty of comp/rhet types have had very good careers as assessment specialists. Our field loves this stuff.

Don’t get me wrong– I think assessment is important, too. There is stuff to be learned (and to be shown to administrators) from these large scale program assessments, and while the grades we give to students aren’t always an accurate measure of what they learned or how well they can write, grades are critical to making the system of higher education work. Plus students themselves are too often a major part of the problem of over-assessing. I am not one to speak about the “kids today” because I’ve been teaching long enough to know students now are not a whole lot different than they were 30 years ago. But one thing I’ve noticed in recent years– I think because of “No Child Left Behind” and similar efforts– is the extent to which students nowadays seem puzzled about embarking on almost any writing assignment without a detailed rubric to follow.

But again, assessing writing is not the same thing as fostering an environment where students can learn more about writing, and it certainly is not how writing worth reading is created. I have never read an essay which mattered to me written by someone closely following the guidance of a typical  assignment rubric. It’s really easy as a teacher to forget that, especially while trying to make the wheels of a class continue to turn smoothly with the help of tools like rubrics. As a teacher, I have to remind myself about that all the time.

The third thing: as long as writing teachers believe more in essays than in papers and as long as they are more concerned with creating learning opportunities rather than sites for assessment, “robo-grader” technology of the soft described in this NPR story are kind of irrelevant– and it might even be helpful.

I blogged about this several years ago here as well, but it needs to be emphasized again: this software is actually pretty limited. As I understand it, software like this can rate/grade the response to a specific essay question– “in what ways did the cinematic techniques of Citizen Kane revolutionize the way we watch and understand movies today”– but it is not very good at more qualitative questions– “did you think Citizen Kane was a good movie?”– and it is not very good at all at rating/grading pieces of writing with almost no constraints, as in “what’s your favorite movie?”

Furthermore, as the NPR story points out, this software can be tricked. Les Perleman has been demonstrating for years how these robo-graders can be fooled, though I have to say I am a lot more impressed with the ingenuity shown by some students in Utah who found ways to “game” the system: “One year… a student who wrote a whole page of the letter “b” ended up with a good score. Other students have figured out that they could do well writing one really good paragraph and just copying that four times to make a five-paragraph essay that scores well. Others have pulled one over on the computer by padding their essays with long quotes from the text they’re supposed to analyze, or from the question they’re supposed to answer.” The raters keep “tweaking” the code to present these tricks, but of course, students will keep trying new tricks.

I have to say I have some sympathy with one of the arguments made in this article that if a student is smart enough to trick the software, then maybe they deserve a high rating anyway. We are living in an age in which it is an increasingly important and useful skill for humans to write texts in a way that can be “understood” both by other people and machines– or maybe just machines. So maybe mastering the robo-grader is worth something, even if it isn’t exactly what most of us mean by “writing.”

Anyway, my point is it really should not be difficult at all for composition and rhetoric folks to push back against the use of tools like this in writing classes because robo-graders can’t replicate what human teachers and students can do as readers: to be an actual audience. In that sense, this technology is not really all that much different than stuff like spell-checkers and grammar-checkers I have been doing this work long enough to know that there were plenty of writing teachers who thought those tools were the beginning of the end, too.

Or, another way of putting it: I think the kind of teaching (and teachers) that can be replaced by software like this is pretty bad teaching.