I had a long conversation about prediction markets and data-crunching with Justin Wolfers at the Money:Tech conference last week, which continued over the weekend via email. Given that we’re both bloggers, it seems time to bring the debate out into the open. The two main questions: Was Hillary Clinton’s victory in New Hampshire a historic upset? I say: No. And what can we learn about Roger Clemens and doping from looking at his baseball statistics? I say: Very little.
Justin has spent a lot of time looking at political prediction markets, and on the basis of the data, he felt entirely justified in saying that Hillary Clinton’s victory in New Hampshire was extremely improbable.
Election-eve trading had suggested that Sen. Obama had a 92% chance to win in New Hampshire, while Sen. Clinton rated only a 7% chance.
Against this background, it is no exaggeration to term the result truly historic. Not that there haven’t been more dramatic upsets or come-from-behind wins that carried more significance — this was just an early primary, albeit a pivotal one. But in terms of unpredictability, or at least the failure of everyone to predict it, it may have no modern match.
My point, which I made at the time, was that it’s a bit shortsighted to glom onto that single 7% datapoint, without looking at the history of the contract. Clinton was nearly always the favorite in New Hampshire; she only dropped down to underdog status in the crazy final days after Barack Obama won the Iowa caucus.
So my view is that Clinton’s victory was not nearly as improbable as you might think from simply looking at the 7% datapoint. Last-minute polls are unreliable, and last-minute prediction-market trades tend to reflect last-minute polls. So it’s not really true to say that everybody failed to predict the Clinton victory. Rather, lots of people predicted the Clinton victory; they just didn’t do so during the craziness of the final days.
But is this just hindsight talking? Did anybody at the time discount the usefulness of last-minute polls and dismiss them as so much noise? Actually, yes. One such person is Janet Elder, the editor in charge of polling at the NYT. Not only does the NYT not conduct its own polls in immediately before a primary, but it also tends to ignore others’ polls, as well, as Clark Hoyt reports:
The decision looked great before the primary in New Hampshire, where polls suggested that Obama was about to deal Clinton’s campaign a potentially fatal blow. But Clinton won there, and embarrassed news media were left to dissect why the polls were wrong…
Elder said she had no second thoughts about forgoing a poll before last week’s vote. “We don’t want to appear to be projecting results,” she said. “We’ve learned that opinion is in such flux in the last days before people vote that we tend to stand back.”
Justin, by contrast, takes the view that "in
a moderately efficient market today’s market price consolidates not only
today’s wisdom, but also the wisdom of those who traded in the past." I think he’s wrong about that, and that New Hampshire actually proves him wrong. Which do you think is more likely: that Hillary Clinton scored a spectacularly unlikely upset in New Hampshire, or that in the craziness of the post-Iowa polling, markets overshot and started generating some garbage numbers?
I should also mention that Justin had a piece in the sports section of the NYT on Sunday, crunching the numbers of Roger Clemens. I’m not sure how useful this exercise is, since as the piece admits, number-crunching can prove neither innocence nor guilt when it comes to doping. It’s a point made in the same day’s paper by Alan Schwarz:
Mr. Souder and others have called for using baseball statistics as a deviance divining rod, but experts in baseball statistics are far more hesitant. They caution that even more important than knowing what numbers say is knowing what they do not. Perhaps less scientifically, the former major leaguer Toby Harrah compared statistics to bikinis: “They both show a lot,” Mr. Harrah said, “but not everything.”…
Most experts agree that a player’s use of so-called performance-enhancing drugs must in some way affect his numbers and therefore baseball’s entire statistical ecosystem — but they also argue that using them forensically is futile.
Schwarz points out that there are non-doping players who saw freakish late-career surges, as well as lots of alleged dopers whose performance actually declined. Justin’s right to say that the Roger Clemens report can’t prove his innocence. But that’s a statement you can make without crunching any numbers and without even reading the report. In reality, statistics are much less useful when it comes to the doping scandal than either Clemens or Wolfers might have you believe.