No, an AI could not pass “freshman year” in college

I am fond of the phrase/quote/mantra/cliché “Ninety percent of success in life is just showing up,” which is usually attributed to Woody Allen. I don’t know if Woody was “the first” person to make this observation (probably not, and I’d prefer if it was someone else), but in my experience, this is very true.

This is why AIs can’t actually pass a college course or their freshmen year or law school or whatever: they can’t show up. And it’s going to stay that way, at least until we’re dealing with advanced AI robots.

This is on my mind because my friend and colleague in the field, Seth Kahn, posted the other day on Facebook about this recent article from The Chronicle of Higher Education by Maya Bodnick, “GPT-4 Can Already Pass Freshman Year at Harvard.” (Bodnick is an undergraduate student at Harvard). It is yet another piece claiming that the AI is smart enough to do just fine on its own at one of the most prestigious universities in the world.

I agreed with all the other comments I saw on Seth’s post. In my comment (which I wrote before I actually read this CHE article), I repeated three points I’ve written about here or on social media before. First, ChatGPT and similar AIs can’t evaluate and cite academic research at even the modest levels I expect in a first year writing class. Second, while OpenAI proudly lists all the “simulated exams” where ChatGPT has excelled (LSAT, SAT, GRE, AP Art History, etc.), you have to click the “show more exams” button on that page to see that none of the versions of their AI has managed better than a “2” on the AP English Language (and also Literature) and Composition exams. It takes a “3” on this exam to get any credit at EMU, and probably a “4” at a lot of other universities.

Third, I think mainstream media and all the rest of us really need to question these claims of AIs passing whatever tests and classes and whatnot much MUCH more carefully than I think most of us have to date.  What I was thinking about when I made that last comment was another article published in CHE and in early July, “A Study Found That AI Could Ace MIT. Three MIT Students Beg to Differ,” by Tom Bartlett. In this article, Bartlett discusses  a study (which I don’t completely understand because it’s too much math and details) conducted by 3 MIT students (class of 2024) who researched the claim that an AI could “ace” MIT classes. The students determined this was bullshit. What were the students’ findings (at least the ones I could understand)? In some of the classes where the AI supposedly had a perfect score, the exams include unsolvable problems, so it’s not even possible to get a perfect score. In other examples, the exam questions the AI supposedly answered correctly did not provide enough information for that to be possible either. The students posted their results online and at least some of the MIT professors who originally made the claims agreed and backtracked.

But then I read this Bodnick article, and holy-moly, this is even more bullshitty than I originally thought. Let me quote at length Bodnick describing her “methodology”:

Three weeks ago, I asked seven Harvard professors and teaching assistants to grade essays written by GPT-4 in response to a prompt assigned in their class. Most of these essays were major assignments which counted for about one-quarter to one-third of students’ grades in the class. (I’ve listed the professors or preceptors for all of these classes, but some of the essays were graded by TAs.)

Here are the prompts with links to the essays, the names of instructors, and the grades each essay received:

  • Microeconomics and Macroeconomics (Jason Furman and David Laibson): Explain an economic concept creatively. (300-500 words for Micro and 800-1000 for Macro). Grade: A-
  • Latin American Politics (Steven Levitsky): What has caused the many presidential crises in Latin America in recent decades? (5-7 pages) Grade: B-
  • The American Presidency (Roger Porter): Pick a modern president and identify his three greatest successes and three greatest failures. (6-8 pages) Grade: A
  • Conflict Resolution (Daniel Shapiro): Describe a conflict in your life and give recommendations for how to negotiate it. (7-9 pages). Grade: A
  • Intermediate Spanish (Adriana Gutiérrez): Write a letter to activist Rigoberta Menchú. (550-600 words) Grade: B
  • Freshman Seminar on Proust (Virginie Greene): Close read a passage from In Search of Lost Time. (3-4 pages) Grade: Pass

I told these instructors that each essay might have been written by me or the AI in order to minimize response bias, although in fact they were all written by GPT-4, the recently updated version of the chatbot from OpenAI.

In order to generate these essays, I inputted the prompts (which were much more detailed than the summaries above) word for word into GPT-4. I submitted exactly the text GPT-4 produced, except that I asked the AI to expand on a couple of its ideas and sequenced its responses in order to meet the word count (GPT-4 only writes about 750 words at a time). Finally, I told the professors and TAs to grade these essays normally, except to ignore citations, which I didn’t include.

Not only can GPT-4 pass a typical social science and humanities-focused freshman year at Harvard, but it can get pretty good grades. As shown in the list above, GPT-4 got all A’s and B’s and one Pass.

JFC. Okay, let’s just think about this for a second:

  • We’re talking about three “essays” that are less than 1000 words and another three that are slightly longer, and based on this work alone, GPT-4 “passed” a year of college at Harvard. That’s all it takes. Really; really?! That’s it?
  • I would like to know more about what Bodnick means when she says that the writing prompts were “much more detailed than the summaries above” because those details matter a lot. But as summarized, these are terrible assignments. They aren’t connected with the context of the class or anything else.  It would be easy to try to answer any of these questions with a minimal amount of Google searching and making educated guesses. I might be going out on a limb here, but I don’t think most writing assignments at Harvard or any other college– even badly assigned ones– are as simplistic as these.
  • It wasn’t just ChatGPT: she had to do some significant editing to put together ChatGPT’s short responses into longer essays. I don’t think the AI could have done that on its own. Unless it hired a tutor.
  • Asking instructors to not pay any attention to the lack of citation (and I am going to guess the need for sources to back up claims in the writing) is giving the AI way WAAAAYYY too much credit, especially since ChatGPT (and other AIs) usually make shit up hallucinate when citing evidence. I’m going to guess that even at Harvard, handing in hallucinations would result in a failing grade. And if the assignment required properly cited sources and the student didn’t do that, then that student would also probably fail.
  • It’s interesting (and Bodnick points this out too) that the texts that received the lowest grades are ones that ask students to “analyze” or to provide their opinions/thoughts, as opposed to assignments that were asking for an “information dump.” Again, I’m going to guess that, even at Harvard, there is a higher value placed on students demonstrating with their writing that they thought about something.

I could go on, but you get the idea. This article is nonsense. It proves literally nothing.

But I also want to return to where I started, the idea that a lot of what it means to succeed in anything (perhaps especially education) is showing up and doing the work. Because after what seems like the zillionth click-bait headline about how ChatGPT could graduate from college or be a lawyer or whatever because it passed a test (supposedly), it finally dawned on me what has been bothering me the most about these kinds of articles: that’s just not how it works! To be a college graduate or a lawyer or damn near anything else takes more than passing a test; it takes the work of showing up.

Granted, there has been a lot more interest and willingness in the last few decades to consider “life experience” credit as part of degrees, and some of these places are kind of legitimate institutions– Southern New Hampshire and the University of Phoenix immediately come to mind. But “life experience” credit is still considered mostly bullshit and the approach taken by a whole lot of diploma mills, and real online universities (like SNHU and Phoenix) still require students to mostly take actual courses, and that requires doing more than writing a couple papers and/or taking a couple of tests.

And sure, it is possible to become a lawyer in California, Vermont, Virginia and Washington without a law degree, and it is also possible to become a lawyer in New York or Maine with just a couple years of law school or an internship. But even these states still require some kind of experience with a law office, most states do require attorneys to have law degrees, and it’s not exactly easy to pass the bar without the experience you get from earning a law degree. Ask Kim Kardashian. 

Bodnick did not ask any of the faculty who evaluated her AI writing examples if it would be possible for a student to pass that professor’s class based solely on this writing sample because she already knew the answer: of course not.

Part of the grade in the courses I teach is based on attendance, participation in the class discussions and peer review, short responses to readings, and so forth. I think this is pretty standard– at least in the humanities. So if some eager ChatGPT enthusiast came to one of my classes– especially one like first year writing, where I post all of the assignments at the beginning of the semester (mainly because I’ve taught this course at least 100 times at this point)– and said to me “Hey Krause, I finished and handed in all the assignments! Does that mean I get an A and go home now?” Um, NO! THAT IS NOT HOW IT WORKS! And of course anyone familiar with how school works knows this.

Oh, and before anyone says “yeah, but what about in an online class?” Same thing! Most of the folks I know who teach online have a structure where students have to regularly participate and interact with assignments, discussions, and so forth. My attendance and participation policies for online courses are only slightly different from my f2f courses.

So please, CHE and MSM in general: stop. Just stop. ChatGPT can (sort of) pass a lot of tests and classes (with A LOT of prompting from the researchers who really really want ChatGPT to pass), but until that AI robot walks/rolls into  a class or sets up its profile on Canvas all on its own, it can’t go to college.

3 Replies to “No, an AI could not pass “freshman year” in college”

  1. Steve,
    This critique is astute and devastating. The “author” you’re critiquing of course has an advantage you lack: She knows little or nothing about college teaching, aside from an a priori belief that this new technology will “change everything.” The history of technology for centuries been filled with much hype and exaggeration, much like ChatCPT promotions are today.

    One thing I’d value your thoughts on. Many instructors (including me) at Eastern are now pressed or required to teach “online asynchronous” courses. The rational for this is purely enrollment: lacking any class “meetings” of any sort, they draw students. How do you think such courses (they’re not true classes) might be affected by ChatCPT? Thanks!

    And thanks for your post. Very useful!

    1. I have spoken and published a fair amount about online teaching– especially during Covid– during the last few years, and I’m happy to talk with you more about this and/or to point you more specifically to things I’ve written/presented about this. You can search this site and you can also look at the CV I have here since I link to these materials there.

      But in brief:

      * Of course online asynchronous college courses are “true courses!” We’ve been teaching in formats like this in higher education for well over 12o years in the form of correspondence courses. There have been hundreds of studies and discussions over this time that have argued (successfully, I might add) that these courses can be just as effective as f2f ones. There are entire universities that only teach their courses in online and asynchronous formats. Before Covid, around a third of all college students across all types of institutions took at least one of their courses online. At EMU, we were teaching about a quarter of our courses online this way– and again, this was before Covid. So the basic assumptions you seem to be making here (an online asynchronous course is just not a “true” class, whatever that’s supposed to mean) is just really wrong.

      * Prior to Covid, around 85-90% of all online courses offered in the U.S. (and this is largely true internationally as well) were offered asynchronously, and basically for three reasons. First, the technology for synchronous online learning/teaching is not good. Zoom is an okay meeting software, but it is a terrible teaching software. Second, there is a simple logistic issue: asynchronous classes enable students who otherwise couldn’t take a college course because of a specific location (on campus) and also because of a specific time (synchronously). And look, the only reason why we have “distance learning” in higher education at all (and again, this goes back to correspondence courses) is we want to enable people to go to college who otherwise couldn’t attend. And third, there is pretty good research to suggest that asynchronous courses do a better job of maximizing the affordances of the online format.

      * I think the impact of ChatGPT and other AI technology in online courses is going to be basically the same as it is in f2f courses. If an instructor gives students an out of context and merely assigned (rather than taught) writing task where the instructor is expecting students to come up with a “right” answer, then there is perhaps an increased chance that students might “cheat,” depending on what your definition of cheating is. But guess what? They were probably cheating with Google before. On the other hand, if an instructor gives students writing assignments that are specific to the context of the class, that are scaffolded in a series of assignments (e.g., discussing brainstorming and drafting, having peer review, an expectation for revision) and also assignments that require students to support their points with research (particularly from academic sources), then ChatGPT might actually be a useful tool to help students out.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.

This site uses Akismet to reduce spam. Learn how your comment data is processed.