The Judgment of Princeton seems to be this: who are we to judge?
Last Friday, nine judges blind-tasted twenty wines, some from France (mostly expensive) and some from New Jersey (mostly cheap), at the author George Taber’s homage to the 1976 California-vs.-France Judgment of Paris. Of those 20, there was just one bottle for which we—and by “we,” I mean not me but Richard Quandt, economist and author of the canonical paper “On Wine Bullshit: Some New Software?”—can infer to have done better than the others for any reason other than chance alone. (Quandt, along with his Princeton colleague and conference host Orley Ashenfelter, set out the methodology for testing this in a 1999 paper called Judging a Wine Tasting Scientifically, which reinterprets the Judgment of Paris results.)
Even that single winning wine—the Drouhin Clos des Mouches 2009, a white Burgundy—was ranked dead last (#10 out of 10) by one of the nine judges, and tied for second-to-last by two others. There was one red wine that seemed to have done worse than the others in a meaningful way—the Four JG’s Cabernet Franc 2008, from New Jersey—but as for the rest, it was an 18-way tie (or, to be more accurate, two nine-way ties). If the entire Judgment of Princeton were replayed, from start to finish, on another day, we wouldn’t even have enough evidence to conclude that any one of those 18 wines would be likely to do better or worse than any other.
In what might seem like a brief brush with rationality, Château Mouton Rothschild—the $500+ first-growth Bordeaux—came out on top of the red wine standings, but it still didn’t beat the margin of error. The Mouton might have done well because it was good, but it also might have done well because of its advantageous position in the lineup—most judges tasted it second out of 10, and at first read, there looks to be a significant tasting-order bias in the results. Or, of course, it might have done well by chance alone. We can’t know which it was. Regardless, the fact that a $500 bottle can’t set itself apart definitively, statistically, from one that costs under $20, even with ten wine experts comparing the two, casts us right back out to Neverland.
If this is starting to sound like the frustrating result of a study that simply needs more data to yield useful results, consider this: our setup—nine experienced wine professionals and educators, tasting under highly controlled conditions, with no ad sales departments breathing down their necks, no winery dogs or estate owners dry-humping their legs—looks a lot more like a real scientific experiment than what any of the mainstream wine ratings agencies do when they taste. The real pundits’ panels tend to be much smaller, and they usually get fed a lot more biasing information about the wine before tasting. For instance, they generally know the appellation and vintage (and thus, by inference, the price category). Blind seers.
Karl Storchmann has posted the full results of the Judgment of Princeton tasting here, including the scores of each of the individual judges, who deserve credit for agreeing to have their scores publicly displayed. Academic openness was the whole point of this exercise, and is one of the whole points of the American Association of Wine Economists, which might have emerged as the single most defiantly, proudly weird professional organization in the field of economics—a quality that’s not unrelated to the prodigious amount of wine the group consumes over the course of three days. There’s not much spitting. After all, it is a core function of the organization, as Stephen Colbert once put it so eloquently while ridiculing the AAWE, “to get hammered on Merlot.”
Speaking of getting hammered, Quandt, for reasons I won’t go into, was consistently denied both food and alcohol throughout the process of tabulating the votes (I was denied food, but not alcohol)—although at one point, with the covert support of certain collaborators around the room, I snuck him a leftover half-glass of the Domaine Leflaive Puligny-Montrachet, which the judges had ranked just below the Unionville Pheasant Hill, Heritage Chardonnay, and Silver Decoy “Black Feather” from New Jersey. We both agreed with them. Quandt sipped for a little while with a guilty look on his face, but then the Prohibitionists showed up pretty quickly to seize the glass from the hands of the eminent econometrician.
Did the judges really like the Black Feather more than the Puligny? Do they even have wine preferences in the sense we’ve come to expect? Why do people seem to be more consistent in their dislikes than their likes? Is heterogeneity in taste preferences among experts seems too great for pooled opinions like these to be meaningful? Is it fair or unfair to judge an alcoholic drink while sober? Why do people convince themselves that they like the same things as each other when they really like different things? On Judgment Day, if there were a range of possible divine creatures who were going to assign an overall heaven/purgatory/hell score, would we be comfortable with the luck of the draw?
Albert Einstein, whose bronze bust (in defiance of his wishes) hangs out on one edge of the campus, complete with relativity equations etched into the base, was a contemporary of Pauli’s, and was on the Princeton faculty when Quandt graduated from the school in 1952. Einstein died here in 1955, a year before Quandt began his tenure as assistant professor. Forty-seven years later, Quandt is still teaching at Princeton, and the FORTRAN code for his excellent tasting-analysis software, it seems, can only be executed from an MS-DOS window on his computer, which is not equipped with a working USB port. I volunteered to record the results of the Judgment of Princeton on the chalkboard as Quandt spoke, and was thus granted the brief thrill of feeling like his teaching assistant for a few minutes.
What is it like to spend almost fifty years on one college campus? The last time I was here, it was 1993, I was 16, and I had never really been drunk. I was visiting colleges. This time, at night, making up for lost time, it turned out that everybody in the bars was from somewhere else, Trenton or Wilmington or the shore. As for the wine economists, they either didn’t want to pay the five-dollar cover charge for live music, or didn’t want to hear live music, or both, so we all ended up crowded into the sports bar, monogamists and polygamists alike, all of us singing.
But on the other side of Nassau Street, it’s so quiet in June. You walk a few feet to the campus gates, and the sound turns off completely except for birds, sometimes rain. You see the odd syllabus shred blowing around through the air, but the children are gone from here. There are only the stone gray Collegiate Gothic Oxford-knockoff buildings that seem way bigger and lower than they do when they’re full of teenagers. When I visited Princeton as a high school wannabe, it was like a church. Einstein walked here. The humid grass expanses had this benign shade of bright green, like stadium turf, and they still do. It’s crazy that something so crotchety is still possible, that these hulking stone cathedrals in the haze are still the trusted authorities on the shape of the universe, on the immensity and heaviness of everything.
Jeffrey Postman
The results of the so-called “Judgment of Princeton” seem outrageous and they probably are. I would like to try some of the New Jersey wines (am I missing a significant new wine area?) but it doesn’t seem possible to get hold of them without visiting the wineries. A friend of mine from New Jersey says they are terrible.
I am not that surprised by the result. I am convinced that the human brain is not structured to be able to make fine distinctions between wines when tasted in a blind setting. Please see my treatise on this which was published as a letter in the Journal of Wine Economics, Vol. 5, No. 1, p. 184.
The best thing to come out of the Princeton tasting is the idea that trying to compare wines in that fashion is farcical. People may finally catch on to the fact that the results of the 1976 Paris tasting prove nothing.
Jeffrey Postman
The results of the so-called “Judgment of Princeton” seem outrageous and they probably are. I would like to try some of the New Jersey wines (am I missing a significant new wine area?) but it doesn’t seem possible to get hold of them without visiting the wineries. A friend of mine from New Jersey says they are terrible.
I am not that surprised by the result. I am convinced that the human brain is not structured to be able to make fine distinctions between wines when tasted in a blind setting. Please see my treatise on this which was published as a letter in the Journal of Wine Economics, Vol. 5, No. 1, p. 184.
The best thing to come out of the Princeton tasting is the idea that trying to compare wines in that fashion is farcical. People may finally catch on to the fact that the results of the 1976 Paris tasting proved nothing.
Santosh
Was the “familiarity effect” taken into account with the Mouton? Had any of the tasting experts tasted the Mouton before? As you know, it is really easy to spot a Mouton once you have tasted it previously… “They all say it must be good, so it must be good”… etc…
Toby Bailey
All this seems to be based on an assumption that there is a one-dimensional measure that determines every drinker’s value for a wine. This is daft. Mouton is expensive because enough people like it enough and its quantity is very limited.