Category Archives: Forecasting

The Limited Value of Head-to-Head Records

Yesterday at the Australian Open, Ana Ivanovic defeated Serena Williams, despite having failed to take a set in four previous meetings. Later in the day, Tomas Berdych beat Kevin Anderson for the tenth straight time.

Commentators and bettors love head-to-head records. You’ll often hear people say, “tennis is a game of matchups,” which, I suppose, is hardly disprovable.

But how much do head-to-head records really mean?  If Player A has a better record than Player B but Player B has won the majority of their career meetings, who do you pick? To what extent does head-to-head record trump everything (or anything) else?

It’s important to remember that, most of the time, head-to-head records don’t clash with any other measurement of relative skill. On the ATP tour, head-to-head record agrees with relative ranking 69% of the time–that is, the player who is leading the H2H is also the one with the better record. When a pair of players have faced each other five or more times, H2H agrees with relative ranking 75% of the time.

Usually, then, the head-to-head record is right. It’s less clear whether it adds anything to our understanding. Sure, Rafael Nadal owns Stanislas Wawrinka, but would we expect anything much different from the matchup of a dominant number one and a steady-but-unspectacular number eight?

H2H against the rankings

If head-to-head records have much value, we’d expect them–at least for some subset of matches–to outperform the ATP rankings. That’s a pretty low bar–the official rankings are riddled with limitations that keep them from being very predictive.

To see if H2Hs met that standard, I looked at ATP tour-level matches since 1996. For each match, I recorded whether the winner was ranked higher than his opponent and what his head-to-head record was against that opponent. (I didn’t consider matches outside of the ATP tour in calculating head-to-heads.)

Thus, for each head-to-head record (for instance, five wins in eight career meetings), we can determine how many the H2H-favored player won, how many the higher-ranked player won, and so on.

For instance, I found 1,040 matches in which one of the players had beaten his opponent in exactly four of their previous five meetings.  65.0% of those matches went the way of the player favored by the head-to-head record, while 68.8% went to the higher-ranked player. (54.5% of the matches fell in both categories.)

Things get more interesting in the 258 matches in which the two metrics did not agree.  When the player with the 4-1 record was lower in the rankings, he won only 109 (42.2%) of those matchups. In other words, at least in this group of matches, you’d be better off going with ATP rankings than with head-to-head results.

Broader view, similar conclusions

For almost every head-to-head record, the findings are the same. There were 26 head-to-head records–everything from 1-0 to 7-3–for which we have at least 100 matches worth of results, and in 20 of them, the player with the higher ranking did better than the player with the better head-to-head.  In 19 of the 26 groups, when the ranking disagreed with the head-to-head, ranking was a more accurate predictor of the outcome.

If we tally the results for head-to-heads with at least five meetings, we get an overall picture of how these two approaches perform. 68.5% of the time, the player with the higher ranking wins, while 66.0% of the time, the match goes to the man who leads in the head-to-head. When the head-to-head and the relative ranking don’t match, ranking proves to be the better indicator 56.5% of the time.

The most extreme head-to-heads–that is, undefeated pairings such as 7-0, 8-0, and so on, are the only groups in which H2H consistently tells us more than ATP ranking does.  80% of the time, these matches go to the higher-ranked player, while 81.9% of the time, the undefeated man prevails. In the 78 matches for which H2H and ranking don’t agree, H2H is a better predictor exactly two-thirds of the time.

Explanations against intuition

When you weigh a head-to-head record more heavily than a pair of ATP rankings, you’re relying on a very small sample instead of a very big one. Yes, that small sample may be much better targeted, but it is also very small.

Not only is the sample small, often it is not as applicable as you might think. When Roger Federer defeated Lleyton Hewitt in the fourth round of the 2004 Australian Open, he had beaten the Aussie only twice in nine career meetings. Yet at that point in their careers, the 22-year-old, #2-ranked Fed was clearly in the ascendancy while Hewitt was having difficulty keeping up. Even though most of their prior meetings had been on the same surface and Hewitt had won the three most recent encounters, that small subset of Roger’s performances did not account for his steady improvement.

The most recent Fed-Hewitt meeting is another good illustration. Entering the Brisbane final, Roger had won 15 of their previous 16 matches, but while Hewitt has maintained a middle-of-the-pack level for the last several years, Federer has declined. Despite having played 26 times in their careers before the Brisbane final, none of those contests had come in the last two years.

Whether it’s surface, recency, injury, weather conditions, or any one of dozens of other factors, head-to-heads are riddled with external factors. That’s the problem with any small sample size–the noise is much more likely to overwhelm the signal. If noise can win out in the extensive Fed-Hewitt head-to-head, most one-on-one records don’t stand a chance.

Any set of rankings, whether the ATP’s points system or my somewhat more sophisticated (and more predictive) jrank algorithm, takes into account every match both players have been involved in for a fairly long stretch of time. In most cases, having all that perspective on both players’ current levels is much more valuable than a noise-ridden handful of matches. If head-to-heads can’t beat ATP rankings, they would look even worse against a better algorithm.

Some players surely do have an edge on particular opponents or types of opponents, whether it’s Andy Murray with lefties or David Ferrer with Nicolas Almagro. But most of the time, those edges are reflected in the rankings–even if the rankings don’t explicitly set out to incorporate such things.

Next time Kevin Anderson draws Berdych, he should take heart. His odds of beating the Czech next time aren’t that much different from any other man ranked around #20 against someone in the bottom half of the top ten. Even accounting for the slight effect I’ve observed in undefeated head-to-heads, a lopsided one-on-one record isn’t fate.


Filed under Forecasting, Head-to-Heads, Research

Winners and Losers in the 2014 Australian Open Men’s Draw

Every draw carries with it plenty of luck, but even by Grand Slam standards, this year’s Australian Open men’s singles draw seems a bit lopsided.  The top half makes possible a Rafael NadalRoger Federer semifinal, at least if Federer gets past Andy Murray and Nadal beats the likes of Bernard Tomic.

While Novak Djokovic is seeded below Nadal, he gets the benefit of a projected semifinal matchup with David Ferrer.  A more substantial challenge may arise one round earlier, as a possible quarterfinal opponent is Stanislas Wwrinka, who took Djokovic to a fifth set twice in the last four Grand Slams.

As I’ve done in the past, let’s quantify each player’s draw luck.  Using my forecast, combined with a forecast generated by randomizing the bracket, we can see who were the biggest winners and losers in yesterday’s draw ceremony.

The algorithmic approach is most useful in confirming our suspicions about the draw luck of the top players.  Djokovic and Ferrer, the top seeds in the bottom half, definitely came out ahead.  While Djokovic had a respectable 28.0% chance of winning the tournament in the randomized projection, he has a 33.7% chance given the way the draw turned out.  In turns of expected ranking points, the draw gave him a 10.7% boost, from an expectation of 747 points to one of 827 points.  In percentage terms, Ferrer’s expectation jumped even more, from 312 to 368 (18.0%).

Nadal, however, had the worst draw luck of the top ten seeds.  Before the bracket was arranged, he had a 30.7% chance of winning the title, with an expectation of 763 ranking points.  Once the draw was set, his title chances fell to 24.9% and his point expectation dropped to 662.  No one else in the top ten lost more than 7% of their expected ranking points on draw day; Nadal lost 13%.

It doesn’t take an algorithm, though, to identify the draw’s worst losers.  They’re placed where you’ll always find them: right next to the top two seeds.  In the randomized projection, Tomic had a 58% chance of winning his first-round match and a 27% chance of reaching the third round.  In reality, though, he’ll play Nadal first.  His slight chance of earning a place in the second round gives him an expectation of 29 ranking points (10 of which he earns simply by showing up).  In the random projection, his ranking point expectation was 75.

Lukas Lacko, the unlucky man who will play Djokovic in the first round, didn’t suffer quite so much, if only because he didn’t have as high of expectations in the first place.  Before the draw, he could expect 48 ranking points and a 15% chance of reaching the third round.  Now, his projection is a mere 24 ranking points, one of the worst in the entire draw.

The luckiest players are always those who had little chance of progressing far in the draw, but managed to draw someone equally inept.  At the Australian Open, the four luckiest guys have yet to be identified: all are qualifiers.  The luckiest man of all will be the one who is placed in the topmost qualifying spot, opposite Lucas Pouille.  At this stage, my rating system doesn’t think much of the Frenchman, so it is likely that the qualifier will be the heavy favorite entering that match.

In the randomized projection, each qualifier has a 29% chance of winning his first match and a 6% chance of winning his second, for a weighted average of 32 ranking points.  The man who plays Pouille, however, will enter the field with an expectation of 55 ranking points.  Other qualifiers with nearly the same happy outcome will be those who draw Federico Delbonis, Julian Reister, and Jan Hajek in the opening round.

Here are the pre-draw and post-draw expected ranking points of the men’s seeds, along with the percentage of pre-draw points they gained or lost:

Player                 Seed  Pre  Post  Change  
Rafael Nadal           1     763   662  -13.2%  
Novak Djokovic         2     747   827   10.7%  
David Ferrer           3     312   368   18.0%  
Andy Murray            4     473   488    3.1%  
Juan Martin Del Potro  5     421   393   -6.6%  
Roger Federer          6     411   397   -3.4%  
Tomas Berdych          7     264   317   20.2%  
Stanislas Wawrinka     8     290   279   -3.9%  

Player                 Seed  Pre  Post  Change
Richard Gasquet        9     186   186    0.1%  
Jo Wilfried Tsonga     10    151   187   23.8%  
Milos Raonic           11    223   234    5.0%  
Tommy Haas             12    207   222    7.5%  
John Isner             13    176   196   11.2%  
Mikhail Youzhny        14    190   193    1.5%  
Fabio Fognini          15    101    81  -19.3%  
Kei Nishikori          16    172   135  -21.6%  

Player                 Seed  Pre  Post  Change
Tommy Robredo          17     71    61  -13.4%  
Gilles Simon           18    116    95  -18.3%  
Kevin Anderson         19     80   107   33.9%  
Jerzy Janowicz         20     99   154   55.3%  
Philipp Kohlschreiber  21    125   132    6.2%  
Grigor Dimitrov        22    136   122  -10.1%  
Ernests Gulbis         23    125   107  -14.1%  
Andreas Seppi          24     94    49  -47.8%  

Player                 Seed  Pre  Post  Change
Gael Monfils           25    147   101  -31.4%  
Feliciano Lopez        26    100    80  -20.7%  
Benoit Paire           27     94    89   -5.5%  
Vasek Pospisil         28     82    81   -0.9%  
Jeremy Chardy          29    111   126   13.7%  
Dmitry Tursunov        30    101    80  -21.0%  
Fernando Verdasco      31    106   105   -0.8%  
Ivan Dodig             32    104   106    1.8%

1 Comment

Filed under Australian Open, Forecasting

Challenger Tour Finals Forecast

I wrote an extensive preview of this week’s Challenger Tour Finals for The Changeover, so you should check that out first.  (Also worth a read is the preview at Foot Soldiers of Tennis.)

Because so much less separates players at this level (compared to those at last year’s World Tour Finals), my forecast stops just short of throwing its hands up in dismay.  Coming into the event, Italian clay specialist Filippo Volandri was the favorite, with a 15.5% chance of winning the event.  He lost today to Alejandro Gonzalez, making it much less likely that he’ll progress out of the round-robin stage.

Today’s other winners were top seed Teymuraz Gabashvili, Oleksandr Nedovyesov, and Jesse Huta Galung.  My numbers now consider Huta Galung the favorite, with a better than 20% chance of winning the title.  The situation in Grupo Verde will become much more clear after tomorrow’s night match between Gabashvili and Nedovyesov.

Here is the pre-tournament forecast:

Player       3-0  2-1  1-2  0-3     SF      F      W  
Gabashvili   12%  38%  37%  13%  49.8%  24.3%  12.0%  
Volandri     15%  40%  35%  10%  55.3%  29.3%  15.5%  
Nedovyesov   14%  39%  36%  11%  53.0%  26.9%  13.7%  
Huta Galung  14%  39%  36%  11%  53.8%  28.2%  14.6%  
Gonzalez     10%  35%  40%  15%  45.0%  21.8%  10.4%  
Ungur        10%  35%  40%  15%  45.0%  20.9%   9.8%  
Martin       11%  36%  39%  14%  46.0%  22.4%  10.7%  
Clezar       13%  38%  37%  11%  52.2%  26.3%  13.3%

And here is the forecast updated with the results of today’s four matches:

Player       3-0  2-1  1-2  0-3     SF      F      W  
Gabashvili   24%  50%  26%   0%  71.5%  35.0%  17.1%  
Volandri      0%  27%  50%  23%  30.2%  16.2%   8.6%  
Nedovyesov   28%  50%  22%   0%  75.7%  38.3%  19.5%  
Huta Galung  27%  50%  23%   0%  74.7%  39.0%  20.5%  
Gonzalez     23%  50%  27%   0%  70.1%  33.7%  15.8%  
Ungur         0%  22%  50%  29%  23.1%  10.8%   5.1%  
Martin        0%  23%  50%  27%  25.1%  12.2%   5.9%  
Clezar        0%  27%  50%  23%  29.6%  14.9%   7.4%

(My algorithm doesn’t implement the details of the number-of-sets-won tiebreaker, so Guilherme Clezar, the only loser today to win a set, probably has a slightly better chance of advancing than these numbers give him credit for.)

Challenger charting: The most interesting match of the day–if not the cleanest–was the last one, between Nedovyesov and Clezar.  I charted it, so you can check out detailed serve, return, and shot-by-shot stats for that contest.

And if you’re really into this stuff–Challengers and/or charting–here are my stat reports from yesterday’s first-round matches in Champaign between Ram and Giron and Sandgren and Peliwo.

Leave a comment

Filed under Challengers, Elsewhere, Forecasting

Rafael Nadal, Top Twosomes, and the Future

The only match that either Rafael Nadal or Novak Djokovic lost in London was the final, when Nadal fell to Djokovic.  It was a good summary of the season as a whole.  The top two weren’t undefeated for the entire season, but they might as well have been.

Between them, Rafa and Novak lost only 16 matches this year, six of them to each other.  Fittingly, they split those six matches.  No single player poses a serious threat to their dominance.  Only Juan Martin del Potro defeated both this year, and he lost his five other encounters with the top-ranked duo.  The injured Andy Murray remains only a wildcard, having split Grand Slam finals with Djokovic this year but without having played Nadal since 2011.

Barring a huge upset loss in Davis Cup, Djokovic will end the season with the best-ever winning percentage for a #2-ranked player.  His 88.9% just edges out the 88.7% posted by Nadal in 2005, when he finished second to Roger Federer.  In the last thirty years, only five other #2’s won at least 85% of their matches.

Taking these six prior pairs as the best single-year twosomes the ATP has recently produced, it’s surprising to see what happened to them the following year.  In three of those seasons, neither of the ultra-dominant duos finished the next season at #1.  A third player overcame them both.

Here is the list of the seven most dominant twosomes of the last thirty years, along with their year-end rankings 12 months after the end of their notable seasons (Nx):

Yr  #1              W-L    Nx  #2              W-L    Nx  
83  John McEnroe    62-9    1  Mats Wilander   74-11   4  
85  Ivan Lendl      83-7    1  John McEnroe    72-10  14  
87  Ivan Lendl      70-7    2  Stefan Edberg   76-12   5  
89  Ivan Lendl      80-7    3  Boris Becker    58-8    2  
05  Roger Federer   81-4    1  Rafael Nadal    79-10   2  
12  Novak Djokovic  75-12   2  Roger Federer   74-13   6  
13  Rafael Nadal    76-7    ?  Novak Djokovic  72-9    ?

In 1988, Mats Wilander overcame both Ivan Lendl and Stefan Edberg to claim the #1 position.  In 1990, it was Edberg who leapfrogged Lendl and Boris Becker.  This year, of course, Nadal reclaimed the top spot from last year’s top two of Djokovic and Federer.

Those of us who watched the Tour Finals for the last week might find it hard to imagine that anyone–certainly not any of the other six men in London–would outperform either Rafa or Novak over the course of a season.  But injuries strike, slumps take hold, and–unlikely as it may seem in 2013–young players emerge and dominate. For all of the radical changes in the game since the late 80s, these precedents serve as an important reminder of the unpredictability of tennis.


Filed under Forecasting, Rankings

Round Robin Shutouts

At this year’s World Tour Finals, we were spared the knottiest sort of round robin tiebreakers.  Each group had a clear winner (Rafael Nadal and Novak Djokovic) who went undefeated, along with another player (David Ferrer and Richard Gasquet) who failed to win a single match.

Since 1987, 33 players have recorded a 3-0 record in Tour Finals round-robin play.  This year is the first time since 2010 (Nadal and Roger Federer) that two players have done so, and before that, we have to go back to 2005 (Federer and Nikolay Davydenko).  It’s not that rare of an event–this year is the 11th time since 1987 that two players have beaten every opponent in their group.

Undefeated players are hardly guaranteed further advances, however.  Those 33 undefeated competitors have a mere 17-16 record in the semifinals, and the 17 men who reached the final won the title only nine times, against nine final-round losses.  (Twice, two undefeated players faced off in the finals–the aforementioned 2010 event along with 1993, when Michael Stich and Pete Sampras contested the title.)

The tiny sample of three round-robin matches pales in predictive value next to the old standby of ATP ranking.  In the last 26 years, the higher-ranked player has won 16 finals.  In the more top-heavy 21st century, the title has gone to the man with the superior ranking 11 of 13 times.  (Advantage: Nadal.)

That said, the gap between the two finalists is traditionally greater than it is expected to be tomorrow.  (If Stanislas Wawrinka upsets Novak Djokovic in the second semifinal, you can disregard this paragraph.  Sorry, Stan, but I’m betting against you.)  Only twice in the round-robin era have the top two players in the ATP rankings met in the concluding match of the Tour Finals–2010 (again) and 2012 (Djokovic d. Federer).

Not a shutout, but shut out

Exactly as many players–33 through 2012–have gone 0-3 in the round robin as the number who did the opposite.  Ferrer and Gasquet find themselves in quality company.

Ferrer is the 7th player ranked in the top three to lose three round robin matches.  In 2001, #1 Gustavo Kuerten was winless, only a year after claiming the championship.  Jim Courier (1993), Juan Carlos Ferrero (2003), and Nadal (2009) went 0-3 from a #2 ranking, while Thomas Muster (1995) and Djokovic (2007) did so while ranked #3.

Ferrer is notable for another dubious achievement: going 0-3 twice.  He previously did so in 2010, so this year, he matches the mark of Michael Chang, the only other man in the round-robin era to post multiple 0-3s, having gone winless in both 1989 and 1992.

His age may work against him, but there is a glimmer of hope for Ferrer.  Four players (including Kuerten, mentioned above) have gone 0-3 at one Tour Finals and won the title at another.  Andre Agassi was winless in 1989, then won the event in 1990.  Stich was 0-3 in 1991, then claimed the title in 1993.  As we’ve seen, Djokovic failed to win a single match in 2007, yet came back to win the tournament in 2008.  (Then did so again last year.)

If Nadal wins tomorrow, we can add one more name to this list, in his case finally adding the trophy to his collection four years after suffering through a winless week.  His 4-0 record so far this week may be no guarantee of success in the final, but it will hardly count against him.

Match reports: I charted today’s Federer-Nadal semifinal, as well as yesterday’s Federer-del Potro match.  Click the links for exhaustive serve, return, and shot statistics.

Worth a read: Carl Bialik analyzes ATP rematches–pairings like Fed-Delpo that faced off in back-to-back weeks.  As usual, we have to rewrite the rules for Rafa.

Leave a comment

Filed under Forecasting, Records, World Tour Finals

2013 World Tour Finals Forecast

The field for the World Tour Finals next week is set, and the round robin groups are determined.  That allows us to simulate the event, and–using my player ratings–project the outcome.  (My ratings don’t yet incorporate Paris results. David Ferrer and Roger Federer may get mild boosts once their showings this week are considered.)

Obviously, Rafael Nadal is your favorite.  He has a substantial advantage in every category. He’s more likely than any other contender to progress through the round robin stage undefeated, to reach the final four, to play in the title match, and to win the championship.

Not only is Nadal the best player in the field–even on hard courts–but he was also favored by the draw.  For all of Ferrer’s success in Bercy, he is a weaker hard-court player than Juan Martin del Potro, who will play in Novak Djokovic‘s half during the round robin stage.  Federer, despite his decline, is a still more of a hard-court threat than Tomas Berdych–and Nadal drew Berdych.  The only disadvantage in Nadal’s fortunes is represented by Stanislas Wawrinka, who is considerably more dangerous than Richard Gasquet.  As the forecast below shows, Gasquet is very unlikely to be a factor here.

Here is the complete forecast, showing each player’s chances of winning 3, 2, 1, or 0 matches in the round robin, along with reaching the semis, reaching the final, and winning the event:

Player     3-0  2-1  1-2  0-3     SF      F      W  
Nadal      35%  44%  18%   3%  81.0%  49.2%  31.1%  
Djokovic   25%  45%  26%   5%  70.8%  43.0%  25.0%  
Ferrer      8%  34%  42%  16%  42.4%  16.4%   6.0%  
Del Potro  15%  41%  35%   9%  55.9%  29.4%  14.1%  
Federer    11%  37%  39%  12%  48.4%  23.8%  10.7%  
Berdych     7%  32%  43%  18%  39.6%  15.2%   5.4%  
Wawrinka    6%  31%  43%  19%  37.0%  13.7%   4.8%  
Gasquet     4%  22%  45%  29%  24.9%   9.3%   3.1%

As I mentioned above, while Nadal (and, to a lesser extent, the other three members of his group) got the fortunate draw, the impact isn’t that great.  Here is a “draw-neutral” forecast, which randomizes the group assignments with each simulation:

Player        SF      F      W  
Nadal      77.9%  48.4%  30.2%  
Djokovic   74.4%  43.8%  25.7%  
Ferrer     40.6%  16.0%   5.9%  
del Potro  57.3%  30.2%  14.5%  
Federer    50.5%  24.4%  10.6%  
Berdych    37.7%  14.6%   5.3%  
Wawrinka   32.4%  12.4%   4.3%  
Gasquet    29.3%  10.2%   3.4%

The biggest losers in the draw ceremony were Djokovic and Gasquet.  While Novak’s chances of reaching and winning the final are similar, the draw pushed his probability of surviving the round robin stage from 74.4% down to 70.8%.  The odds are against Gasquet in any scenario, but the specific group assignments determined today knocked his chances of surviving the first three matches from 29.3% down to 24.9%.

The good news for Gasquet is that he’s a much, much better eight seed than Janko Tipsarevic was last year.  And with Richie at the end of what may be his career year, it’s that much more likely that anyone in the field of eight could make things interesting this week.

[update: Thanks to Jovan M. for catching some dodgy numbers in the first table. Due to a coding error, I showed each player’s chances of reaching each win total to be too low.  The SF/F/W columns in both tables are unchanged.]

Leave a comment

Filed under Forecasting, World Tour Finals

The Most Lopsided ATP Semifinals

In the latest step of Rafael Nadal‘s minor league rehab assignment comeback, he’ll play Martin Alund tonight in the Sao Paulo semifinals.  Yes, the same Martin Alund who had never played a tour-level event before last week, has a career losing record in challengers, and only made the main draw as a lucky loser.

The jrank forecast gives Alund a 4.3% chance of beating Rafa tonight which, even having seen Nadal’s unconvincing win over Carlos Berlocq last night, seems a bit generous.

It also seems odd.  Even in lower-rung ATP events, players of Alund’s caliber (even a caliber or two above that) rarely reach the semis.  In San Jose this week, the lowest-ranked player in the semifinals is #22 Tommy Haas, assuring fans in California a very different level of play today.

As it turns out, hugely lopsided semifinals do occur now and then, and occasionally they even result in upsets.

Since the beginning of 2001, there have been about 1600 tour-level semifinals.  Using jrank, I estimated each player’s chances in those matches.  Nadal’s 95.7% probability of winning tonight doesn’t even rank in the top ten most lopsided semis.

Rafa has long been a stalwart of one-sided semifinals.  His dominance on clay is reflected in the numbers, and when he does play smaller events, he makes some opponents look woefully overmatched.  Of the 11 semifinals that were more lopsided than tonight’s showdown, Rafa was the favorite in four–including last week’s dismantling of Jeremy Chardy.  At the 2008 Barcelona event, Denis Gremelmayr had a mere 1.6% chance of triumphing over Rafa.  He won a single game.

(Chardy is rated quite a bit higher than Alund, but after last week’s loss to Horacio Zeballos, Nadal’s rating has fallen accordingly.  The jrank forecast for this week’s semifinal is thus almost identical to last week’s.)

Of course, there’s a big difference between a high probability and a certainty, and some of these lopsided matchups have generated surprises.  In Washington in 2007, the virtually unknown John Isner took out Gael Monfils, despite a mere 2.4% chance of victory.  The same year in Amersfoort, qualifier and eventual champion Steve Darcis defeated Mikhail Youzhny, overcoming a pre-match probability of only 6.1%.

Even Nadal has suffered in these situations.  The third-biggest ATP semifinal upset was Rafa’s 2010 Bangkok loss at the hands of Guillermo Garcia Lopez.

In all of those examples the underdog was a player of undeniable talent, while Alund has stumbled into his first ATP semifinal.  But as Nadal’s stumbles against Zeballos and Berlocq have shown us, it doesn’t matter so much who is across the net–the king of clay is far from his usual invincible self.

(After the break, find a list of the 63 most lopsided ATP semifinals since 2001. Asterisks denote upsets.)

Continue reading

Leave a comment

Filed under Forecasting

The 2012 World Tour Finals Forecast

With Jo Wilfried Tsonga‘s win last night over Nicolas Almagro, the field is set for the tour finals.  Novak Djokovic and Roger Federer will each head one of the two round robin groups, and will be joined by Andy Murray, David Ferrer, Tomas Berdych, Juan Martin Del Potro, Tsonga, and Janko Tipsarevic.

Despite Federer’s dominance on indoor hard courts last year, he is hardly the same unstoppable force this season.  Not only did he lose in last week’s final to Del Potro, but my rating algorithm, Jrank, views him as a slightly inferior hard-court player to Murray.  Though it will certainly be close, my forecast favors both the Serb and the Brit over the soon-to-be world #2:

Player         SF      F      W  
Djokovic    77.7%  47.7%  28.8%  
Murray      70.0%  41.9%  23.3%  
Federer     72.6%  40.4%  22.3%  
Del Potro   45.9%  20.2%   8.3%  
Ferrer      45.4%  17.7%   6.5%  
Berdych     38.8%  15.2%   5.5%  
Tsonga      30.4%  11.3%   3.8%  
Tipsarevic  19.2%   5.5%   1.5%

As always, there are as many reasons to question these numbers as there are to put one’s faith in them.  Djokovic’s loss to Sam Querrey this week seriously questions his current ability to play his best tennis.  Murray’s loss to rising star Jerzy Janowicz isn’t quite so troubling, but it also fails to fit the profile of a dominant player.

In the bottom half of the pack, one or two of these guys are likely to play in the Paris final, meaning they’ll be relatively tired upon arrival in London.  It’s one thing to play the first round of a tournament on weak legs; it’s another when that event is the Tour Finals and your first opponent is a fellow top-tenner.

[UPDATE, 3 Nov]

The draw is set.  Federer is joined in Group B with Ferrer, Del Potro, and Tipsarevic, leaving Djokovic with Murray, Berdych, and Tsonga.  This is a dream setup for Federer, and even dreamier for Delpo.

Federer’s career H2H against the three men in his group is 31-3.  His career H2H against Novak’s opponents is 27-18.  He might prefer not to face Del Potro again so soon, but historically, the Argentine hasn’t been any more dangerous for Roger than any of the three men Djokovic will have to face.

As noted, it’s the absolute perfect draw for Delpo, too.  Statistically, Federer is weaker than Djokovic.  My numbers might overstate Ferrer’s competitiveness in London (and they still aren’t very high), and Tipsarevic is essentially a non-factor.  In the pre-draw simulation above, Del Potro has a 45.9% chance of reaching the semis and a 8.3% chance of winning it all.  Post-draw, 54.4% and 9.2%.  It’s an uphill battle no matter what the draw, but avoiding the Murray group is a huge help.

Here are the projections, now reflecting the draw:

Player         SF      F      W  
Djokovic    74.0%  47.2%  28.2%  
Federer     76.7%  41.2%  23.0%  
Murray      68.5%  41.6%  22.6%  
Del Potro   54.4%  22.4%   9.2%  
Ferrer      46.9%  17.9%   6.8%  
Berdych     31.2%  13.5%   5.0%  
Tsonga      26.3%  10.4%   3.6%  
Tipsarevic  22.1%   5.8%   1.6%

Thanks to his relatively weak round-robin group, Federer has the best shot at reaching the semis, but only the third best chance of reaching the final, since he’s likely to face either Djokovic or Murray in his semi.  Despite the tougher draw, Djokovic remains the favorite to win the event and put an exclamation point on his season-ending #1 ranking.

(A quick programming note for regular readers: I won’t be able to update these predictions throughout the tournament on, and due to an uncooperative travel schedule, the next update (including Bercy results) may not occur until Tuesday or Wednesday.)

1 Comment

Filed under Forecasting, World Tour Finals

The Five-Set Advantage

Last night, the heavily-favored Janko Tipsarevic won his first round match against Guillaume Rufin despite dropping the first two sets.  Had Rufin taken the first two sets against Janko in Cincinnati, Monte Carlo, or just about anywhere else on the ATP tour, he would’ve scored his first top-ten scalp.

Other seeds have similar stories.  Milos Raonic, Marin Cilic, Gilles Simon, and Alexandr Dolgopolov all would be headed home had their matches been judged on the first three sets.  Only two seeds had the opposite experience: Juan Monaco and Tommy Haas were each up two sets to love before losing their next three.

Simply (if tongue-twistingly) put, the five-set format favors favorites.

In all grand slam first rounds since 1991, seeds have come back from 0-2 or 1-2 down against unseeded players 125 times, while seeds have squandered 2-0 or 2-1 advantages only 71 times.  Just looking at those 32 matches per slam, that’s almost one upset averted per tournament.  The US Open draw would look awfully different right now if Tipsarevic, Raonic, Cilic, Simon, and Dolgopolov were among the first-round losers, even if Haas and Monaco replaced them in the second round.

Set theory

These numbers shouldn’t surprise us, since longer formats should do a better job of revealing the better player.  There are reasons why the baseball World Series is best-of-7 instead of a single game and the final sets of singles matches aren’t super-tiebreaks.  The difference between best-of-3 and best-of-5 isn’t quite so simple–fitness and mental strength play a part–but from a purely mathematical perspective, there should be fewer upsets in best-of-5s than best-of-3s.

Take Raonic for example.  My numbers (which don’t differentiate between 3-set and 5-set matches–shame on me) gave him approximately a 70% chance of beating Santiago Giraldo.  If 70% is his probability of winning a three-set match and sets are independent (more on that in a minute), that number implies a 63.7% chance of winning any given set.  A 63.7% chance of winning a set translates into a 74.4% shot at winning a best-of-five.

A four- or five-point increase doesn’t radically change the complexion of the tournament, but it does make a different.  My original numbers suggested that we could expect 20 or 21 first-round upsets.  If we adjust my odds in the manner I described for Raonic, the likely number of upsets falls to 18.

The most important implication here is the effect it has on the chances that top players reach the final rounds.  Earlier this week a commenter took me to task for my unintuively low probabilities that Federer and Djokovic would reach the semifinals.  Obviously, if you give an overwhelming favorite a boost in every round, as the five-set format does, the cumulative effect is substantial.  For the top seeds, it can halve their probability of losing against a much lower-ranked opponent.

For Federer, adjusting the odds to reflect the theoretical advantage of the best-of-five format raises his chances of reaching the semis from 52.5% to over 65%.  Djokovic’s numbers are almost identical.

Dependent outcomes

Everything I’ve said so far seems intuitively sound, with one caveat.  Earlier I mentioned the assumptions that sets are independent.  That is, a player has the same chance of winning a particular set no matter what the outcome of the previous sets–there is no “hangover effect” based on what has come before.

Tennis players, even professionals, aren’t robots, so the assumption probably isn’t completely valid.  Sometimes frustration with one’s own performance, the environment, or line calls can carry over into the next set and give one’s opponent an advantage.  Perhaps more importantly, the result of one set sometimes reveal that pre-match expectations were wrong in the first place.  Had David Nalbandian played this week instead of withdrawn, no number of sets would reveal that he was a better player–his health would prevent him from playing at his usual level.

Another related caveat is that beyond a certain match length, the outcome is no longer dependent on the same skills.  When Michael Russell played Yuichi Sugita in the Wimbledon qualifying round, the two men looked equal for four sets.  In the fifth, Russell’s fitness gave him an advantage that didn’t exist in the first couple of hours.  In this case, an estimate of Russell’s probability of winning a set against Sugita may be independent of previous outcomes, but it is not the same for every set.

These allowances aside, there is little doubt that favorites are more likely to win best-of-five matches than best-of-threes.  Whether you want to watch the entire thing … that’s another story.

1 Comment

Filed under Forecasting, The Rules

2012 US Open Men’s Projections

Here are my pre-tournament odds for the 2012 US Open.  For some background reading, follow the links for more on my player rating systemcurrent rankings, and more on how I simulate tournaments.

I’ve made one tweak to the algorithm (for men only) since last posting odds.  As many of you have noticed, I seem to underestimate the chances that the very best players will progress through the draw.  Some analysis of past results showed that this is correct, so for now, there’s a bit of a band-aid in the system, boosting the odds of the current top ten in a way that reflects how they’ve outperformed my projections in the past.

Still, Federer and Djokovic both have well under 30% chances of winning the Open, and fall just short of 50% between them.  My rankings give Djokovic a very slight edge despite Federer’s big season, and the tournament draw, which places Murray in Federer’s half, firmly tilts the scales in the Serb’s favor.

    Player                    R64    R32    R16        W  
1   Roger Federer           90.6%  84.0%  74.0%    23.2%  
    Donald Young             9.4%   5.4%   2.5%     0.0%  
    Maxime Authom           32.9%   2.3%   0.7%     0.0%  
    Bjorn Phau              67.1%   8.3%   3.7%     0.0%  
    Albert Ramos            50.1%  15.1%   1.7%     0.0%  
    Robby Ginepri           49.9%  14.8%   1.7%     0.0%  
    Rui Machado             15.1%   5.5%   0.4%     0.0%  
25  Fernando Verdasco       84.9%  64.6%  15.4%     0.3%  

    Player                    R64    R32    R16        W  
23  Mardy Fish              77.1%  50.6%  33.9%     1.3%  
    Go Soeda                22.9%   8.8%   3.3%     0.0%  
    Nikolay Davydenko       88.6%  39.4%  21.4%     0.2%  
    Guido Pella             11.4%   1.2%   0.1%     0.0%  
    Ivo Karlovic            67.5%  34.2%  14.7%     0.1%  
    Jimmy Wang              32.5%  10.9%   3.0%     0.0%  
    Michael Russell         35.7%  16.2%   5.4%     0.0%  
16  Gilles Simon            64.3%  38.6%  18.1%     0.3%  

    Player                    R64    R32    R16        W  
11  Nicolas Almagro         52.9%  33.6%  20.2%     0.3%  
    Radek Stepanek          47.1%  28.5%  16.5%     0.2%  
    Nicolas Mahut           48.7%  18.2%   8.6%     0.0%  
    Philipp Petzschner      51.3%  19.6%   9.5%     0.0%  
    Blaz Kavcic             45.9%  15.3%   4.8%     0.0%  
    Flavio Cipolla          54.1%  19.8%   6.9%     0.0%  
    Jack Sock               19.8%   7.7%   1.9%     0.0%  
22  Florian Mayer           80.2%  57.2%  31.6%     0.5%  

    Player                    R64    R32    R16        W  
27  Sam Querrey             64.9%  51.7%  27.6%     0.7%  
    Yen-Hsun Lu             35.1%  23.9%   9.3%     0.1%  
    Ruben Ramirez Hidalgo   31.4%   4.8%   0.8%     0.0%  
    Somdev Devvarman        68.6%  19.6%   5.5%     0.0%  
    Denis Istomin           62.4%  23.8%  11.8%     0.1%  
    Jurgen Zopp             37.6%  10.2%   3.8%     0.0%  
    David Goffin            28.7%  14.8%   6.9%     0.0%  
6   Tomas Berdych           71.3%  51.3%  34.3%     1.7%  

    Player                    R64    R32    R16        W  
3   Andy Murray             87.6%  76.3%  63.9%    13.7%  
    Alex Bogomolov Jr.      12.4%   6.3%   2.7%     0.0%  
    Hiroki Moriya           22.9%   1.8%   0.4%     0.0%  
    Ivan Dodig              77.1%  15.7%   7.8%     0.1%  
    Thomaz Bellucci         65.9%  29.0%   6.6%     0.1%  
    Pablo Andujar           34.1%   9.9%   1.4%     0.0%  
    Robin Haase             31.9%  15.6%   3.0%     0.0%  
30  Feliciano Lopez         68.1%  45.5%  14.1%     0.3%  

    Player                    R64    R32    R16        W  
24  Marcel Granollers       63.8%  37.7%  19.2%     0.2%  
    Denis Kudla             36.2%  16.4%   6.3%     0.0%  
    Lukas Lacko             46.7%  20.6%   8.4%     0.0%  
    James Blake             53.3%  25.2%  10.8%     0.1%  
    Paul-Henri Mathieu      45.6%  14.3%   5.9%     0.0%  
    Igor Andreev            54.4%  19.2%   8.7%     0.0%  
    Santiago Giraldo        30.9%  16.5%   7.7%     0.0%  
15  Milos Raonic            69.1%  50.0%  33.0%     1.0%  

    Player                    R64    R32    R16        W  
12  Marin Cilic             70.6%  56.4%  31.1%     0.9%  
    Marinko Matosevic       29.4%  18.6%   6.5%     0.0%  
    Daniel Brands           70.6%  20.5%   6.0%     0.0%  
    Adrian Ungur            29.4%   4.5%   0.7%     0.0%  
    Tim Smyczek             53.1%  15.1%   5.8%     0.0%  
    Bobby Reynolds          46.9%  12.1%   4.3%     0.0%  
    Guido Andreozzi          5.7%   0.9%   0.1%     0.0%  
17  Kei Nishikori           94.3%  71.9%  45.6%     1.7%  

    Player                    R64    R32    R16        W  
32  Jeremy Chardy           84.1%  55.5%  23.6%     0.3%  
    Filippo Volandri        15.9%   4.3%   0.7%     0.0%  
    Tatsuma Ito             44.6%  16.6%   4.5%     0.0%  
    Matthew Ebden           55.4%  23.6%   7.3%     0.0%  
    Martin Klizan           42.3%   8.7%   3.2%     0.0%  
    Alejandro Falla         57.7%  14.7%   6.4%     0.0%  
    Karol Beck              16.7%   8.2%   3.2%     0.0%  
5   Jo-Wilfried Tsonga      83.3%  68.5%  51.2%     3.9%  

    Player                    R64    R32    R16        W  
8   Janko Tipsarevic        81.6%  69.4%  49.7%     1.9%  
    Guillaume Rufin         18.4%  10.4%   3.8%     0.0%  
    Brian Baker             40.9%   7.1%   1.8%     0.0%  
    Jan Hajek               59.1%  13.1%   4.5%     0.0%  
    Grega Zemlja            55.9%  22.5%   8.1%     0.0%  
    Ricardo Mello           44.1%  15.5%   4.7%     0.0%  
    Cedrik-Marcel Stebe     39.2%  21.6%   8.2%     0.0%  
29  Viktor Troicki          60.8%  40.4%  19.2%     0.2%  

    Player                    R64    R32    R16        W  
19  Philipp Kohlschreiber   54.1%  32.9%  16.2%     0.3%  
    Michael Llodra          45.9%  26.1%  11.9%     0.2%  
    Grigor Dimitrov         54.9%  23.7%   9.8%     0.1%  
    Benoit Paire            45.1%  17.4%   6.4%     0.0%  
    Mikhail Kukushkin       46.2%  14.5%   6.0%     0.0%  
    Jarkko Nieminen         53.8%  18.3%   8.2%     0.1%  
    Xavier Malisse          33.7%  19.2%   9.6%     0.1%  
9   John Isner              66.3%  48.0%  31.9%     1.6%  

    Player                    R64    R32    R16        W  
13  Richard Gasquet         82.1%  51.9%  27.6%     0.9%  
    Albert Montanes         17.9%   5.3%   1.3%     0.0%  
    Jurgen Melzer           82.7%  39.6%  18.1%     0.3%  
    Bradley Klahn           17.3%   3.1%   0.5%     0.0%  
    Steve Johnson           35.5%   5.3%   1.1%     0.0%  
    Rajeev Ram              64.5%  15.4%   4.7%     0.0%  
    Ernests Gulbis          27.6%  18.4%   7.6%     0.0%  
21  Tommy Haas              72.4%  60.9%  39.1%     2.5%  

    Player                    R64    R32    R16        W  
28  Mikhail Youzhny         68.2%  49.4%  22.9%     0.6%  
    Gilles Muller           31.8%  17.4%   5.2%     0.0%  
    Tobias Kamke            48.9%  15.9%   4.2%     0.0%  
    Lleyton Hewitt          51.1%  17.2%   4.6%     0.0%  
    Igor Sijsling           69.4%  17.1%   7.3%     0.0%  
    Daniel Gimeno-Traver    30.6%   4.0%   1.0%     0.0%  
    Kevin Anderson          27.6%  18.3%   9.8%     0.1%  
4   David Ferrer            72.4%  60.6%  44.9%     3.9%  

    Player                    R64    R32    R16        W  
7   Juan Martin Del Potro   70.1%  55.3%  45.2%     4.6%  
    David Nalbandian        29.9%  18.4%  12.2%     0.3%  
    Benjamin Becker         48.9%  12.7%   7.0%     0.0%  
    Ryan Harrison           51.1%  13.6%   7.7%     0.1%  
    Lukasz Kubot            71.1%  38.8%  11.8%     0.1%  
    Leonardo Mayer          28.9%  10.0%   1.5%     0.0%  
    Tommy Robredo           31.0%  11.8%   2.1%     0.0%  
26  Andreas Seppi           69.0%  39.5%  12.5%     0.1%  

    Player                    R64    R32    R16        W  
20  Andy Roddick            89.4%  57.3%  36.9%     1.1%  
    Rhyne Williams          10.6%   2.0%   0.4%     0.0%  
    Carlos Berlocq          23.0%   5.2%   1.5%     0.0%  
    Bernard Tomic           77.0%  35.5%  19.7%     0.3%  
    Edouard Roger-Vasselin  44.4%  14.4%   4.3%     0.0%  
    Fabio Fognini           55.6%  21.1%   7.3%     0.0%  
    Guillermo Garcia-Lopez  38.8%  22.5%   8.9%     0.0%  
10  Juan Monaco             61.2%  41.9%  21.0%     0.4%  

    Player                    R64    R32    R16        W  
14  Alexandr Dolgopolov     61.8%  36.8%  19.6%     0.3%  
    Jesse Levine            38.2%  18.1%   7.7%     0.0%  
    Marcos Baghdatis        67.8%  34.5%  17.2%     0.2%  
    Matthias Bachinger      32.2%  10.6%   3.5%     0.0%  
    Steve Darcis            59.5%  23.6%  10.8%     0.1%  
    Malek Jaziri            40.5%  12.6%   4.6%     0.0%  
    Sergiy Stakhovsky       28.8%  14.1%   5.8%     0.0%  
18  Stanislas Wawrinka      71.2%  49.8%  30.9%     0.8%  

    Player                    R64    R32    R16        W  
31  Julien Benneteau        64.7%  43.7%   9.6%     0.3%  
    Olivier Rochus          35.3%  18.7%   2.8%     0.0%  
    Dennis Novikov          34.1%   9.6%   1.0%     0.0%  
    Jerzy Janowicz          65.9%  28.1%   4.4%     0.0%  
    Rogerio Dutra Silva     39.5%   2.5%   0.6%     0.0%  
    Teymuraz Gabashvili     60.5%   5.4%   1.9%     0.0%  
    Paolo Lorenzi            6.4%   3.6%   1.2%     0.0%  
2   Novak Djokovic          93.6%  88.6%  78.5%    26.5%


Filed under Forecasting, U.S. Open