Yesterday at the Australian Open, Ana Ivanovic defeated Serena Williams, despite having failed to take a set in four previous meetings. Later in the day, Tomas Berdych beat Kevin Anderson for the tenth straight time.
Commentators and bettors love head-to-head records. You’ll often hear people say, “tennis is a game of matchups,” which, I suppose, is hardly disprovable.
But how much do head-to-head records really mean? If Player A has a better record than Player B but Player B has won the majority of their career meetings, who do you pick? To what extent does head-to-head record trump everything (or anything) else?
It’s important to remember that, most of the time, head-to-head records don’t clash with any other measurement of relative skill. On the ATP tour, head-to-head record agrees with relative ranking 69% of the time–that is, the player who is leading the H2H is also the one with the better record. When a pair of players have faced each other five or more times, H2H agrees with relative ranking 75% of the time.
Usually, then, the head-to-head record is right. It’s less clear whether it adds anything to our understanding. Sure, Rafael Nadal owns Stanislas Wawrinka, but would we expect anything much different from the matchup of a dominant number one and a steady-but-unspectacular number eight?
H2H against the rankings
If head-to-head records have much value, we’d expect them–at least for some subset of matches–to outperform the ATP rankings. That’s a pretty low bar–the official rankings are riddled with limitations that keep them from being very predictive.
To see if H2Hs met that standard, I looked at ATP tour-level matches since 1996. For each match, I recorded whether the winner was ranked higher than his opponent and what his head-to-head record was against that opponent. (I didn’t consider matches outside of the ATP tour in calculating head-to-heads.)
Thus, for each head-to-head record (for instance, five wins in eight career meetings), we can determine how many the H2H-favored player won, how many the higher-ranked player won, and so on.
For instance, I found 1,040 matches in which one of the players had beaten his opponent in exactly four of their previous five meetings. 65.0% of those matches went the way of the player favored by the head-to-head record, while 68.8% went to the higher-ranked player. (54.5% of the matches fell in both categories.)
Things get more interesting in the 258 matches in which the two metrics did not agree. When the player with the 4-1 record was lower in the rankings, he won only 109 (42.2%) of those matchups. In other words, at least in this group of matches, you’d be better off going with ATP rankings than with head-to-head results.
Broader view, similar conclusions
For almost every head-to-head record, the findings are the same. There were 26 head-to-head records–everything from 1-0 to 7-3–for which we have at least 100 matches worth of results, and in 20 of them, the player with the higher ranking did better than the player with the better head-to-head. In 19 of the 26 groups, when the ranking disagreed with the head-to-head, ranking was a more accurate predictor of the outcome.
If we tally the results for head-to-heads with at least five meetings, we get an overall picture of how these two approaches perform. 68.5% of the time, the player with the higher ranking wins, while 66.0% of the time, the match goes to the man who leads in the head-to-head. When the head-to-head and the relative ranking don’t match, ranking proves to be the better indicator 56.5% of the time.
The most extreme head-to-heads–that is, undefeated pairings such as 7-0, 8-0, and so on, are the only groups in which H2H consistently tells us more than ATP ranking does. 80% of the time, these matches go to the higher-ranked player, while 81.9% of the time, the undefeated man prevails. In the 78 matches for which H2H and ranking don’t agree, H2H is a better predictor exactly two-thirds of the time.
Explanations against intuition
When you weigh a head-to-head record more heavily than a pair of ATP rankings, you’re relying on a very small sample instead of a very big one. Yes, that small sample may be much better targeted, but it is also very small.
Not only is the sample small, often it is not as applicable as you might think. When Roger Federer defeated Lleyton Hewitt in the fourth round of the 2004 Australian Open, he had beaten the Aussie only twice in nine career meetings. Yet at that point in their careers, the 22-year-old, #2-ranked Fed was clearly in the ascendancy while Hewitt was having difficulty keeping up. Even though most of their prior meetings had been on the same surface and Hewitt had won the three most recent encounters, that small subset of Roger’s performances did not account for his steady improvement.
The most recent Fed-Hewitt meeting is another good illustration. Entering the Brisbane final, Roger had won 15 of their previous 16 matches, but while Hewitt has maintained a middle-of-the-pack level for the last several years, Federer has declined. Despite having played 26 times in their careers before the Brisbane final, none of those contests had come in the last two years.
Whether it’s surface, recency, injury, weather conditions, or any one of dozens of other factors, head-to-heads are riddled with external factors. That’s the problem with any small sample size–the noise is much more likely to overwhelm the signal. If noise can win out in the extensive Fed-Hewitt head-to-head, most one-on-one records don’t stand a chance.
Any set of rankings, whether the ATP’s points system or my somewhat more sophisticated (and more predictive) jrank algorithm, takes into account every match both players have been involved in for a fairly long stretch of time. In most cases, having all that perspective on both players’ current levels is much more valuable than a noise-ridden handful of matches. If head-to-heads can’t beat ATP rankings, they would look even worse against a better algorithm.
Some players surely do have an edge on particular opponents or types of opponents, whether it’s Andy Murray with lefties or David Ferrer with Nicolas Almagro. But most of the time, those edges are reflected in the rankings–even if the rankings don’t explicitly set out to incorporate such things.
Next time Kevin Anderson draws Berdych, he should take heart. His odds of beating the Czech next time aren’t that much different from any other man ranked around #20 against someone in the bottom half of the top ten. Even accounting for the slight effect I’ve observed in undefeated head-to-heads, a lopsided one-on-one record isn’t fate.