1

Harvard academic Todd Rose on the fallacy of averages

Todd Rose, a high school dropout turned Harvard professor, explains why the idea of average is damaging, especially when it comes to education


 
Todd Rose, a professor at Harvard University and author of the new book "The End of Average: The Science of What Makes Us Different", poses for a portrait in Larson Hall at Harvard University in Cambridge, Massachusetts. (Photograph by Lauren DeCicca)

Todd Rose, a professor at Harvard University and author of the new book “The End of Average: The Science of What Makes Us Different”, poses for a portrait in Larson Hall at Harvard University in Cambridge, Massachusetts. (Photograph by Lauren DeCicca)

Born in rural Utah in 1974, Todd Rose dropped out of high school in 1993, with a D average and a pregnant girlfriend (now his wife), and took a job stocking shelves for $4.25 an hour. Two decades later, Rose is director of the mind, brain and education program at Harvard University’s graduate school of education. He is a leading figure in the science of individuality, a new interdisciplinary field that draws from ergodic mathematical theory (the study of dynamic systems), developmental biology, neuroscience and psychology. It’s dedicated, like Rose’s new book, The End of Average, to banishing statistical averages as a predictor of anything meaningful about individual human beings.

Q: Your book begins by explaining how we came to the age of average during the first wave of modernity—mass industrialization, mass world wars, the masses in general—when scientists suddenly had a lot of information about aggregate populations. Now the language is so embedded in our thinking it’s unconscious?

A: We’ve become so used to the concept as a measuring and sorting tool, that it and its correlates—below-average, above-average—are everyday speech. We don’t even question the language, although the challenges we face require a different mindset.

Q: Most people realize that “average” truly fits very few people, but you say it barely reaches anyone; even when you expand the average band to cover 90 per cent of the sample, only two per cent of the subjects fit in. In a study of 4,000 USAF pilots, exactly zero fit the average criteria established to help construct the most usable cockpit seat—height, weight, arm length, leg length, etc.—in every aspect.

A: That was one of my most surprising discoveries when I dug into the history of average-ism: When you actually get the data, it rarely captures anyone. Which then begs the question, why are we using this as a reference standard for human beings?

Q: The conceptual error at the heart of this is that average-ism insists, quite wrongly, that measuring a lot of people once gives you as valid results as measuring the same person many, many times?

A: That’s actually the whole conceptual underpinning for the way we do most of our sciences. If I wanted to know what, say, my daily spending habits were, the obvious way to do it would be to record what I spend for a while, then average it out. The assumption made in most social sciences is that, well, that’s kind of hard, that would take a lot of time. What if we could find a bunch of people like Todd, study their spending habits for one day, and average them out? Wouldn’t that equal my spending over time? Say it out loud, it seems ridiculous, but that’s the basis. The Age of Average gave us a lot. Take clothing: We’ve all benefited remarkably from large, medium and small sizes making things affordable and available, but when it really counts—the wedding gown and the pressurized fighter pilot suit—it’s bespoke all the way. It’s fine to pretend that people are one-dimensional, like in body size; the problem comes when you forget that you are just pretending.

Q: People are probably more accepting of this in some areas than in others—talent for instance. I assume you get more push-back on the notion that character is as contextualized as talent.

A: You’re mapping exactly my experiences in this. Talent—really, everyone agrees, it’s multidimensional, and often overlooked in standard assessments. That’s not hard for people to accept. When you start digging into things like character, though, the notion that people have high character or low character is very strong. What’s crazy is that my thinking is not a new insight. The very first large-scale study of character, still one of the largest ever, was done in the early 1900s by Hugh Hartshorne, an ordained minister and a scientist. He thought he was going to find character was a thing, a trait—if you were honest in one aspect of life, you’d be honest in all—and instead he found that’s not so. Character is incredibly jagged, and incredibly contextualized, even to the point where I still feel uncomfortable thinking about it. When Hartshorne published his results, saying, ‘Hey look, the kid who cheats on a test at school is not the kid who cheats in sports, is not the kid who lies at home, it’s just not like that’—people freaked out.

Q: You have an atypical life story for a guy who runs a Harvard lab, one that illustrates the real-life roots of your thinking.

A: Growing up in rural Utah had a lot of benefits, but in an environment that prized conformity, fit wasn’t one of them. I ended up in my senior year with a 0.9 GPA, which I think you actually have to work pretty hard to get. In the exact same month they kicked me out of school, my girlfriend—still my wife today—told me she was pregnant. So, it was an interesting start to life: working 10 or 12 minimum-wage jobs; getting bored really quickly and quitting; having my in-laws—rightly—in full panic mode and thinking I had some kind of character flaw. Until my dad, who had thought hard about it, told me my problem was if I wasn’t interested in what I was doing, I just paid no attention at all to it. There was interesting work out there, but I had to figure out a way through school. I enrolled at a local college, but this time paid attention to myself—took only courses that really interested me, even if they weren’t in sequence; kept out of classes with people I knew from high school, because I tended to act like the class clown around them; selected teachers by their teaching style—until I could build up my study habits. I ended up graduating with a 3.97 GPA and got into Harvard for my doctorate.

Elevated view of university students walking up and down stairs. (David Schaffer/Getty Images)

Elevated view of university students walking up and down stairs. (David Schaffer/Getty Images)

Q: Now you run a Harvard graduate program and have a second, very cool-sounding job, as head of the university’s Laboratory for Individuality. What do you do there?

A: That’s our research lab where some of the ideas I outline in the book are actually contributing to the science. We have a brain individuality project, where we are looking at the different ways human brains respond to doing the exact same tasks and trying to understand the patterns that give rise to that. Our biggest project is actually more in the social sciences, where we are studying mastery—how people get good at things—only we do it from an individuality perspective.

Q: It’s apparent in The End of Average that, while you applaud personalized medicine, what really interests you is education.

A: Education and the workforce: I think these two things go together in terms of human potential. Historically, education has been about batch processing: standardize everything against the average, rank kids, sort them to see who gets more and who really doesn’t deserve to be there. The problem, even if you’re just being selfish from an economic standpoint, is we’re not producing the talent we need: companies tell me that even in their best attempts to hire people, it’s a 50/50 proposition whether that person pans out a year later. We need to develop people rather than process them. But if you accept that, there are huge implications, including a whole different organizing set of principles. Right now, for instance, we resist giving people extra time on exams or for assignments, as though it’s unfair to the faster students. Well, is the purpose of the assessment to understand what they know or to rank them against the average? The whole idea of timing tests is a century old, from a scientist who thought speed and ability were tightly correlated, which they are not. We don’t have that obsession with, say, a driver’s licence: take the test as often as you need; when you pass we’ll allow you to operate a multi-tonne machine.

Q: Are you still working on ways to personalize education?

A: Our role so far has been to clarify for the public a way of seeing this. We use the Air Force analogy: there were expensive things they had to do to get a cockpit suitable for a lot of pilots, like wraparound windshields, but their initial solutions, when they realized average didn’t work, were adjustable seats. How in the world did they not already have adjustable seats in their planes? We’re looking for adjustable seats for education, for basic things that we can do. Solutions are out there in piecemeal that need to be brought to the centre of the system. Abandon fixed-time, grade-based classes; if something is valuable, have mastery focus, where we give a flexibility in time and it’s all about getting you to competency.

Q: If your children were in high school and grading, for lack of a better term, the way you did in high school, do you have solid options that were not available to your parents?

A: Yes, but right now they’re decidedly skewed to people with money. That rubs me wrong. I care deeply about opportunity and fairness, because I grew up really poor. What motivates me is that, for the first time, we can have the knowledge to scale these kind of solutions and make them available to all. But that requires making good choices right now about the way we’ll use our technology, and the purpose we have for education. That’s not going to happen magically. We can make really bad choices and double-down on the system we have, so we have to work at it.


 

Harvard academic Todd Rose on the fallacy of averages

  1. For some reason, we humans like to generalize and worse freak out when this approach doesn’t work leading to behaviors like racism and sectarianism. An average is one thing as simply the norm of a given measurement of a sample set i.e. the sum of measures of the same attribute from multiple samplings divided by the number of measures; many things called average such as an average mark or an average body shape are not actually averages at all as they represent a summation of measures of unlike attributes. Why educators do this is unclear – what does the sum of a student’s achievement in composition and track say about them and how does it rank them against other students? And then, what about a semester when either composition or track is not studied – is it even reasonable to continue the comparison or to compare results in that semester to some other? As the article notes, as soon as one attempts to produce an average which is the catenation of several independent parameters, the result is likely not representative of anything: I had a brother-in-law 6’3″ tall 140 lbs with broad shoulders – no off-the-rack shirt ever fit. Evaluating academic proficiency from test scores is deceptive: as we know skill is largely developed through practice and the amount of practice follows from interest, opportunity, availability, resources and priority; also the measure, out of practical considerations, comprises an arbitrary subset of small measures and a tally which is an arbitrarily weighted summation (if questions are assigned 3,4 or 5 marks, even if somehow in accurate proportion, there must be a considerable quantization error in the final tally). It’s also not possible to set an examination that is not affected by context – I always hated sitting a highly theoretical physics exam in the same room with English lit students doing an open book exam; more importantly, I schooled with a student who was brilliant in math but whose English as a second language was junior grade school – 10 minutes to understand a propostion, 30 seconds to solve it – a sure case against time limits. In practice, there is still more foolishness such as assuming that all evaluators are equally capable and/or have the same bias (in process control problems on actually takes steps to measure that variability as well) and that there is only one acceptable solution. There’s also the false notion that all instruction is equally effective and, worse, that it can be; more exactly, the notion that excellence is the average which is nonsense; along with that there’s the ‘teaching to the test’ or not uncertainty. And finally, the notion that competent students will be good at everything (brain surgeons can also fix transmissions … right?).

Sign in to comment.