I'd rather work late than let machines grade essays - Macleans.ca

I’d rather work late than let machines grade essays

Poor marks for a new technology


Photo by: Jessica Darmanin

Sometimes when I’m halfway through a pile of 40 essays, I get tired. At these moments, if I had a grading machine, I would probably be tempted to insert the remaining essays and watch them pop out all freshly marked.

However, after 20 years of grading university essays, I know this would be terribly misguided. I’m here to help my students learn how to thrive in university—and beyond. In order to do that, they need to have strong thinking, reading, and writing skills. Machine generated grades will not help them develop these skills. With this in mind, I pick up my pen and go back to providing the constructive human feedback that will help them.

Just to be clear, I am not some remote professor in an ivory tower above Lake Ontario. In fact, I am a non-tenured “gun-for-hire,” fighting hard in the trenches of the humanities. This past year alone I have taught 914 students in six classes in writing, literature, and film. I have graded hundreds of exercises and essays, some brilliant, some hard to understand. I have worked with 19 teaching assistants, generated over 20 evaluation rubrics, assigned 12 essays. Basically, I have been battling to keep my head up, to keep teaching as well as I am able, and to keep grading the many essays that come my way.

This is nothing exceptional, actually. It’s what most humanities instructors do these days as classes expand and funds contract. While it can be extremely challenging, I think giving up this battle by letting machines grade students’ essays would have far greater negative consequences than even I would like to imagine.

Those who champion machine grading (most of whom, not coincidentally, also develop software) would rather focus on the inevitability of such automated education. The argument goes like this. Traditional higher education is old-fashioned and expensive. Our digital age demands highly specialized skills—not all those airy-fairy critical thinking, reading, and writing skills that are touted by elite, graying humanities professors. It follows that essays in the humanities—and perhaps the humanities altogether—must share in the pain that comes from such inevitable change. Of course there will be those who struggle against new pedagogical technologies, but they must be cranky luddites, frozen in the path of an avalanche that will destroy them if they do not jump on the digital bus… and fast.

Of course, the idea of machine grading essays is both economic and political. It is not simply a shortsighted idea generated by this or that brilliant engineer at M.I.T. or Harvard. It is an effect of a perceived demand for cheaper, faster, and more scientific (in other words, more consistent and objective) grading methods. But, as is the case with every effect, this effect will become the cause of some other effect: that is, machine grading will prioritize quantifiable aspects of writing (such as spelling and grammar), making those aspects most important to students who are writing-to-the-machine. This will cause students to focus on mechanics over content. This focus on mechanics will then lead students to generate shallow—though technically correct—essays, for which they will receive A’s. These A’s will shore up students’ sense of their own exceptionalism, even as they are really only signs of exceptional conformism. Sufficiently skilled and intellectually myopic, these workers will be ideal cogs in the wheels of progress because they won’t question their position in the machine.

The point is that this causal chain is reversible. We need to do five things to shift the course.

1. Challenge the emotionally loaded term, “progress.”

These days, the term “progress,” is just a sound bite that evokes a jolt of emotions. For some, progress means Tesla electric cars; for others it means the Keystone Pipeline. For some involved in higher education, the term “progress” means cutting costs by reducing brick-and-mortar redundancies. The question we have to ask and answer is this: What values lie behind our particular use of the term “progress”?

Few even try to explore the values behind machine grading, unless you count the half-baked gesture of Dr. Anant Agarwal (a professor at M.I.T. and the president of edX, which is the biggest machine grading software producer in North America) who alludes to some vague, “huge value in learning with instant feedback.”

What is this “huge value”? Well, Dr. Agarwal says, students like it. They say they “learn much better with instant feedback.”

Better? In comparison to what? And what is it they are actually learning, anyway? The answer is hard to miss: at best, they are learning basic literacy skills. Period.

Herein lies the Orwellian politics of machine grading: it’s for second-tier schools, not “prestigious universities,” which, as Professor Shermis confirms in the article, “do a much better job of providing feedback than a machine ever could”.

The circle is complete: the richest, best universities are spearheading technology that will enable the poorer schools to deal with their relative poverty by using grading software that will encourage dull, connect-the-dots writing without insight or critical thought. The elite will still be able to think outside the box—and so to make the rules of the game— while the poor will, increasingly, neither want to think outside the box nor be able to do so. They will be the ideal Proles, docile subjects who can manage to do the skilled whitecollar work that the Inner Party Members find dull, but who can never quite formulate any clear thoughts about how they themselves are positioned in and by the system in the first place.

2. Define what we mean by “writing.”

University writing is critical writing. In other words, university students write essays in which they make thoughtful claims, which evolve and develop in a reciprocal relationship with the evidence they find and analyze. Ideally, a really strong essay will also explore what is at stake in the claims in the first place. Such claim-based writing is not about the blind regurgitation of received ideas. It is about seeing outside the given frame, researching beyond what pops up first on Google, evaluating credible (and in-credible) sources, figuring out how these sources appeal to readers (even if the sources seem to lack logical reasoning), understanding the different positions at play, synthesizing what you analyze, and arriving at thoughtful interpretations. Finally, it is about articulating your own arguments clearly and coherently.

This kind of critical thinking and writing is as challenging to do as it is to teach. Instructors spend years honing pedagogical approaches that work; they then adapt these approaches in response to the knowledge, needs, and biases of each class and of individual students. Commenting on prewriting work (like outlines and rough drafts) is often a critical part of this process. There is no question that this gets harder with larger classes, but if instructors stick to their core definition of writing as a process, then they can come up with different commenting options (like online responses to outlines) that let them continue to engage in the teaching process, so they can continue to help their students engage in the thinking and writing process.

3. Consider what is really gained (and lost) by using a new technology.

A lot of new technologies seem cool at first. Clickers, for example, were going to be the rage a few years ago. With clickers you can ask questions and get immediate data on how many students answered one way or another. The problem? The questions have to be multiple choice, prepared in advance to generate useful data on the spot. This means endless hours of scripting on the instructor’s part, and rather dull—yet quantifiable—results on everyone’s part. Clickers are still used in some large classes, but to me they seem better suited for training dogs than for engaging students.

More recently, programmers and hobbyists everywhere are toying with peer-review and editing software. The problem? To date, these programs either act like glorified grammar checkers or demand so much from human beings that we humans are still much better off using our own brains to do the work. By using our brains we also benefit from the fact that we develop these brains, even as we hone our reviewing, editing, and communicating skills, all of which are increasingly necessary in today’s online matrix-structured work environments, where folks review one another’s proposals and drafts on a daily, if not hourly, basis.

4. Address human readers, not machines.

We are people who write to other people. In the process, we learn how to consider our audience as a person or people with particular needs, attitudes, and knowledge, and, in so doing, we learn how to accept that each of us is a person who both speaks and listens. Machines take away this all-important human element. We become passive observers in a truncated, virtual communication process that just doesn’t feel real… because it isn’t real.

5. Be alert when numbers replace logic.

Corporate innovators often dodge the work of logical reasoning by offering up numbers that seem to prove their point. This makes sense: if they want to make money, they need to find and present reassuring numbers to investors. As long as the numbers are accurate and objective, everything is fine. Right? Not so fast. What if the corporation is the one doing the research to figure out the numbers? What if they are designing the questions, picking the study subjects, and setting up the experiment? Of course, they will construct the study in a way that gets them the data they want—and they will publish the findings that they find informative.

This is the precisely the case for edX, as noted by M.I.T. professor Les Perelman. The problem? Faulty methodology. EdX never actually compared “the software directly to human graders.”  Why? Because they are not comparable. Certainly they can grade faster, but this begs the question of just what kind of grading they have done.  I am reminded of Woody Allen’s joke: “I just finished a course in speed reading.  It was fantastic.  Afterwards, I was able to read War and Peace in twenty minutes. It’s about Russia.”

I want to end with one final bonus point in celebration of all us human graders, teachers, students, and writers.

6) “Break any of these rules sooner than [do] anything outright barbarous.” (George Orwell)

The machine grading of essays may not seem like an earth-shattering issue, but as I have tried to suggest in this commentary, it may have serious consequences. The points I have outlined above are meant to explore some of the potential consequences of our increasing acceptance of lateral thinking and regurgitative writing, both of which are reinforced, or at the very least left unchallenged, by digitalized pedagogy. My hope is that when we are considering automated grading and other such high-tech options, we will remember what Marshall McLuhan put so well: “We shape our tools, and our tools shape us.”  We have shaped the grading machine, and my comments here are intended to highlight how this tool will, in turn, shape us . . . perhaps rendering us obsolete in the process.

Flynn lectures on English at the University of Toronto and co-authored The McGraw-Hill Handbook.


I’d rather work late than let machines grade essays

  1. I am a Professor of Literature and Film at the University of Toronto. Finally, a machine that will help turn academia into a Walmart where grading can be out-sourced to somewhere where people will work for pennies a day. Perhaps I’ll be allowed to stay on as a “greeter” under an ivy-covered arch for people who want to indulge in nostalgia. It’s called HUMANities, people and so you can take out the humans if you want, but you really should then call it something else.

    Under the regime of the grading machines, “Improving” will actually be “conforming”– which might be OK in such practical fields as learning Real Estate laws in order to be a better informed realtor, for example, but won’t help much when pondering the enigma of ‘Waiting for Godot”.

    My favorite quote from this same article: ” learning turns into a game, with students naturally gravitating toward resubmitting the work until they get it right,” said Daphne Koller, a computer scientist and a founder of Coursera.”

    How cool is that? It’s the same dynamic as Mario Brothers. Understanding the importance of Godot’s failure to arrive will be translated into a quest to rescue Princess Peach.

    Or this one: ” The challenge is that this requires human graders, and so they cost a lot more and they take a lot more time.”
    Really? “the challenge is that this requires human graders”? !! What we used to, until five minutes ago apparently, call “teachers”?

    Silly me. I thought the challenge was to convince Governments to adequately fund education as something more important than corporate tax breaks of the sort that assure more money can be stockpiled on scenic islands, the modern day equivalent of pirates burying their treasure chests in the sand. Much better to get rid of teachers, “freeing them up for other tasks”(!) And then let students face the unblinking eye of Gradotron who will almost instantly suss-out their suitability to supply menial labor for trans-national money-generators.

    Clearly, I’m a critic of this technology, so how best to deal with the likes of me? Here we go:

    “With increasingly large classes, it is impossible for most teachers to give students meaningful feedback on writing assignments,” he said. Plus, he noted, critics of the technology have tended to come from the nation’s best universities, where the level of pedagogy is much better than at most schools.

    “Often they come from very prestigious institutions where, in fact, they do a much better job of providing feedback than a machine ever could,” Dr. Shermis said. “There seems to be a lack of appreciation of what is actually going on in the real world.”

    The old “real world” trick should knock me right out of the loop. Who made this “real world” where classes are large and getting larger while funding drops and faculty student ratios plummet? Even this proponent of machine grading notes it can be done better by “humans” (who are these people, anyway, aliens?). It’s not that I don’t “know” the “real” world that is the problem –but that I know it too well, and in a way that is inconvenient for this amazing argument that, now, not only are “banks too big to fail” but Universities are too over-enrolled to employ teachers. And I’M the one “out of touch”?! This is quite a juggernaut in the making. “Free” the Professors to go do something else (collect unemployment), and then all those pesky critical thinkers will be out of the way, so we can really roll in the machines. Soon, you will not be able to receive a degree until your efforts have been judged adequate (don’t worry, you’ll be allowed to submit and re-submit until this happens, or you die, whichever comes first) and, effectively, education itself will be a pure “product” impossible to access without paying for the patented machinery that separates those that “pass” from those that don’t. (Something eerily similar is going on with the “gold rush” to patent human genes in anticipation of medical cures so that these, too, will be guided by and subject to the tender mercies of commercial capitalism). Teachers, most of whom struggle heroically to deliver continued quality in the face of rising enrollments, are chided for failing to provide “meaningful feedback”—If that’s really your concern, and of course it isn’t, bring back tutorials for large lectures where a Professor can give a great lecture and students can go over its implications in smaller groups with TAs who were actually in attendance at the lecture in the first place. It is very far from clear this machine “feedback” would rise to my definition of “meaningful”.

    And by the way, essay factories online provide pre-written essays by the thousands. A “human “ grader can see they are mediocre at best, whereas the computer is likely to judge them refreshingly straightforward. Soon machines will also WRITE these purchasable essays. And then they can be submitted to a machine that will approve/grade them. And then we will ALL be free to pursue “other tasks” until, surrounded by unprecedented oceans of information, and encouraged to drift far out from shore by our various portable distraction devices, the first really big wave from the REALLY “real world” will swamp us, and we will drown in our own ignorance.

    To borrow an old adage from early computer days: “garbage in, garbage out”

    In one example I read of challenging the grading machines, the prompt for the essay question asked the student to describe the difficulties and challenges of modern life. The essay was to be graded by a machine. The student submitted “The Waste Land” by T. S. Eliot as his response. The student flunked, receiving twenty points (for “adequate spelling”)out of a possible one hundred.

    Bring on the machines and we can double down on TS Eliot’s observation from this essay:

    What are the roots that clutch, what branches grow
    Out of this stony rubbish? Son of man, 20
    You cannot say, or guess, for you know only
    A heap of broken images, where the sun beats,
    And the dead tree gives no shelter, the cricket no relief,
    And the dry stone no sound of water.

    Oh, but wait, the machine flunked this so, oh well, never mind– Hey, anyway, did you see the youtube of the parrot that was taught The Gettysburg Address?

  2. Your commitment to your students deserves more recognition than your University will likely provide, but I am sure your students notice and appreciate the difference, and for me, hearing from a colleague who uphold the core values of humanities education in the face of what has become a strongly dehumanizing trend is extremely heartening.

    Machine grading (and machine writing) is a large part of an ever growing problem that includes Universities’ aggressive promotion of web-based learning through distance ed and MOOCs (as though interrelations mediated by a screen are the equivalent of engaged and respectful real-time dialogue in a community of peers). The question of how digital learning might not only diminish the quality of student’s education but also diminish the University’s “brand” through the empty feelings prompted by the faceless impersonal qualities of student-teacher and peer interactions is a hot topic of debate right now, but the consensus among Canadian and American Universities is that we must all launch ourselves unself-reflexively into the digital world where knowledge becomes information, intellectual curiosity becomes web browsing, insight is replaced by accuracy, and relevance is measured by facebook “likes” and “dislikes.”

    Another issue affecting undergraduate education across Canada in the live classroom you bring up has to do with class size. That humanities professors can teach 900 students in a semester should make every student applying to University think twice. Professors will never know more than a small handful of students as individuals, and this is no small thing, for it means they cannot provide existentially meaningful support and encouragement. For many students who find their education alienating, the knowledge that someone cares about their success who knows them well enough to solicit their best work can make all the difference. Such inflated class sizes also mean that professors cannot write students convincing letters of recommendation for scholarships or jobs, and since most students hope to use their University education to give them an edge in the work force, they will need to find other ways to distinguish themselves (though how or where I do not know.)

    I wonder if students know about this at all, and would love to hear their comments.

    Thanks Professor Flynn.