The Judgment of Princeton seems to be this: who are we to judge?
Last Friday, nine judges blind-tasted twenty wines, some from France (mostly expensive) and some from New Jersey (mostly cheap), at the author George Taber’s homage to the 1976 California-vs.-France Judgment of Paris. Of those 20, there was just one bottle for which we—and by “we,” I mean not me but Richard Quandt, economist and author of the canonical paper “On Wine Bullshit: Some New Software?”—can infer to have done better than the others for any reason other than chance alone. (Quandt, along with his Princeton colleague and conference host Orley Ashenfelter, set out the methodology for testing this in a 1999 paper called Judging a Wine Tasting Scientifically, which reinterprets the Judgment of Paris results.)
Even that single winning wine—the Drouhin Clos des Mouches 2009, a white Burgundy—was ranked dead last (#10 out of 10) by one of the nine judges, and tied for second-to-last by two others. There was one red wine that seemed to have done worse than the others in a meaningful way—the Four JG’s Cabernet Franc 2008, from New Jersey—but as for the rest, it was an 18-way tie (or, to be more accurate, two nine-way ties). If the entire Judgment of Princeton were replayed, from start to finish, on another day, we wouldn’t even have enough evidence to conclude that any one of those 18 wines would be likely to do better or worse than any other.
In what might seem like a brief brush with rationality, Château Mouton Rothschild—the $500+ first-growth Bordeaux—came out on top of the red wine standings, but it still didn’t beat the margin of error. The Mouton might have done well because it was good, but it also might have done well because of its advantageous position in the lineup—most judges tasted it second out of 10, and at first read, there looks to be a significant tasting-order bias in the results. Or, of course, it might have done well by chance alone. We can’t know which it was. Regardless, the fact that a $500 bottle can’t set itself apart definitively, statistically, from one that costs under $20, even with ten wine experts comparing the two, casts us right back out to Neverland.
If this is starting to sound like the frustrating result of a study that simply needs more data to yield useful results, consider this: our setup—nine experienced wine professionals and educators, tasting under highly controlled conditions, with no ad sales departments breathing down their necks, no winery dogs or estate owners dry-humping their legs—looks a lot more like a real scientific experiment than what any of the mainstream wine ratings agencies do when they taste. The real pundits’ panels tend to be much smaller, and they usually get fed a lot more biasing information about the wine before tasting. For instance, they generally know the appellation and vintage (and thus, by inference, the price category). Blind seers.
Karl Storchmann has posted the full results of the Judgment of Princeton tasting here, including the scores of each of the individual judges, who deserve credit for agreeing to have their scores publicly displayed. Academic openness was the whole point of this exercise, and is one of the whole points of the American Association of Wine Economists, which might have emerged as the single most defiantly, proudly weird professional organization in the field of economics—a quality that’s not unrelated to the prodigious amount of wine the group consumes over the course of three days. There’s not much spitting. After all, it is a core function of the organization, as Stephen Colbert once put it so eloquently while ridiculing the AAWE, “to get hammered on Merlot.” And I love it.
Speaking of getting hammered, Quandt, for reasons I won’t go into, was consistently denied both food and alcohol throughout the process of tabulating the votes (I was denied food, but not alcohol)—although at one point, with the covert support of certain collaborators around the room, I snuck him a leftover half-glass of the Domaine Leflaive Puligny-Montrachet, which the judges had ranked just below the Unionville Pheasant Hill, Heritage Chardonnay, and Silver Decoy “Black Feather” from New Jersey. We both agreed with them. Quandt sipped for a little while, but then the Prohibitionists showed up pretty quickly to seize the glass from the hands of the eminent econometrician, who by this point already had a guilty look on his face.
Did the judges really like the Black Feather more than the Puligny? Do they even have wine preferences in the sense we’ve come to expect? Why do people seem to be more consistent in their dislikes than their likes? Is heterogeneity in taste preferences among experts seems too great for pooled opinions like these to be meaningful? Is it fair or unfair to judge an alcoholic drink while sober? Why do people convince themselves they like the same things as each other when they really like different things? On Judgment Day, if there were a range of possible divine creatures who were going to assign an overall heaven/purgatory/hell score, would we be comfortable with the luck of the draw? If we happened to be assigned to judge that valued power over complexity or vice versa, fruit-forwardness over aging potential or vice versa? If we’re going to playing God with wine, it’s not clear to me that we’re even collecting the right information, asking the right questions. It’s like that great dismissive quote from the quantum physicist Wolfgang Pauli: “Not only are you not right—you’re not even wrong.”
Albert Einstein, whose bronze bust (in defiance of his wishes) hangs out on one edge of the campus, complete with relativity equations etched into the base, was a contemporary of Pauli’s, and was on the Princeton faculty when Quandt graduated from the school in 1952. Einstein died here in 1955, a year before Quandt began his tenure as assistant professor. Forty-seven years later, Quandt is still teaching at Princeton, and the FORTRAN code for his excellent tasting-analysis software, it seems, can only be executed from an MS-DOS window on his computer, which is not equipped with a working USB port. I volunteered to record the results of the Judgment of Princeton on the chalkboard as Quandt spoke, and was thus granted the brief thrill of feeling like his teaching assistant for a few minutes.
What is it like to spend almost fifty years on one college campus? The last time I was here, it was 1993, I was 16, and I had never really been drunk. I was visiting colleges. This time, at night, making up for lost time, it turned out that everybody in the bars was from somewhere else, Trenton or Wilmington or the shore. As for the wine economists, they either didn’t want to pay the five-dollar cover charge for live music, or didn’t want to hear live music, or both, so they ended up crowded into the sports bar next to the tequila polygamists.
But on the other side of Nassau Street, it’s so quiet in June. You walk a few feet to the campus gates, and the sound turns off completely except for birds, sometimes rain. You see the odd syllabus shred blowing around through the air, but the children are gone from here. There are only the stone gray Collegiate Gothic Oxford-knockoff buildings that seem way bigger and lower than they do when they’re full of teenagers. The grass expanses have this humid shade of bright green, like stadium turf. It’s fucking crazy that something so crotchety and benign is still possible, that these hulking stone cathedrals are still a basis for something modern and expected, that the campus can just sit there and gravitate in the haze, haze you with Nobel gargoyles, press you through the universe without ever explaining itself, a warm dark matter, immense and heavy and patient.