Category Archives: Elsewhere

Sloan Conference Presentation on Tennis Analytics

Last weekend at the Sloan Sports Analytics Conference in Boston, I gave a talk, “First Service: The Advent of Actionable Tennis Analytics.” The presentation was in three parts:

  1. The sorry state of tennis data
  2. Schedule optimization (based in part on this blog post)
  3. The Match Charting Project (more about that in this post, among others)

The conference video-recorded all presentations, and I understand that video will be posted on the Sloan site. When it becomes available, I’ll post a link here.

In the meantime, many people have asked for my slide deck: First Service.

Also, Jim Pagels wrote a brief piece for Forbes drawing on my talk, which you can read here.

1 Comment

Filed under Elsewhere

Challenger Tour Finals Forecast

I wrote an extensive preview of this week’s Challenger Tour Finals for The Changeover, so you should check that out first.  (Also worth a read is the preview at Foot Soldiers of Tennis.)

Because so much less separates players at this level (compared to those at last year’s World Tour Finals), my forecast stops just short of throwing its hands up in dismay.  Coming into the event, Italian clay specialist Filippo Volandri was the favorite, with a 15.5% chance of winning the event.  He lost today to Alejandro Gonzalez, making it much less likely that he’ll progress out of the round-robin stage.

Today’s other winners were top seed Teymuraz Gabashvili, Oleksandr Nedovyesov, and Jesse Huta Galung.  My numbers now consider Huta Galung the favorite, with a better than 20% chance of winning the title.  The situation in Grupo Verde will become much more clear after tomorrow’s night match between Gabashvili and Nedovyesov.

Here is the pre-tournament forecast:

Player       3-0  2-1  1-2  0-3     SF      F      W  
Gabashvili   12%  38%  37%  13%  49.8%  24.3%  12.0%  
Volandri     15%  40%  35%  10%  55.3%  29.3%  15.5%  
Nedovyesov   14%  39%  36%  11%  53.0%  26.9%  13.7%  
Huta Galung  14%  39%  36%  11%  53.8%  28.2%  14.6%  
Gonzalez     10%  35%  40%  15%  45.0%  21.8%  10.4%  
Ungur        10%  35%  40%  15%  45.0%  20.9%   9.8%  
Martin       11%  36%  39%  14%  46.0%  22.4%  10.7%  
Clezar       13%  38%  37%  11%  52.2%  26.3%  13.3%

And here is the forecast updated with the results of today’s four matches:

Player       3-0  2-1  1-2  0-3     SF      F      W  
Gabashvili   24%  50%  26%   0%  71.5%  35.0%  17.1%  
Volandri      0%  27%  50%  23%  30.2%  16.2%   8.6%  
Nedovyesov   28%  50%  22%   0%  75.7%  38.3%  19.5%  
Huta Galung  27%  50%  23%   0%  74.7%  39.0%  20.5%  
Gonzalez     23%  50%  27%   0%  70.1%  33.7%  15.8%  
Ungur         0%  22%  50%  29%  23.1%  10.8%   5.1%  
Martin        0%  23%  50%  27%  25.1%  12.2%   5.9%  
Clezar        0%  27%  50%  23%  29.6%  14.9%   7.4%

(My algorithm doesn’t implement the details of the number-of-sets-won tiebreaker, so Guilherme Clezar, the only loser today to win a set, probably has a slightly better chance of advancing than these numbers give him credit for.)

Challenger charting: The most interesting match of the day–if not the cleanest–was the last one, between Nedovyesov and Clezar.  I charted it, so you can check out detailed serve, return, and shot-by-shot stats for that contest.

And if you’re really into this stuff–Challengers and/or charting–here are my stat reports from yesterday’s first-round matches in Champaign between Ram and Giron and Sandgren and Peliwo.

Leave a comment

Filed under Challengers, Elsewhere, Forecasting


As keen-eyed readers will have noted, my last few posts have included links to a new tennis stats site: I’ve developed this site over the last few months, and it’s finally ready for the public.

Tennis Abstract is my response to the frustration of finding and filtering match results. For any player in the last 20 years, the site shows you all of his tour-level results. Best of all, you have the ability to sort by any number of stats and identify the exact subset of matches you want to see.

The best way to see what the site has to offer is to go to your favorite player’s page (here’s Djokovic, Nadal, Federer, and Berlocq) and start clicking around. Once you get accustomed to what Tennis Abstract makes possible, you may find it difficult to go back to other results sites. I certainly have.

For all that, the site is very much a work in progress. There are plenty of as-yet unresolved bugs I know about, and there are probably far more that I don’t. I have a long list of features I’d like to add, but as this is just a spare-time project for me, improvements will only be incremental.

As it is now, you can find the full career records of every active player, along with those of many retired greats. You can filter results in more than 10 different ways, and you can find head-to-head results for any pair of players.

One note: The site has been developed using Google Chrome, and it optimized for that browser. It also works very well (maybe even a bit faster) in Firefox. However, there are serious performance issues in Internet Explorer, so for now, you’ll get an error message if you try to use the site in IE. In that case, download Chrome already!

Feedback (especially bug reporting) is welcome. Feel free to comment on this post or drop me an email.


Filed under Elsewhere

The Speed of Every Surface

Last week, I wrote an article for the Wall Street Journal noting the relatively slow speed of this year’s U.S. Open.  It’s not clear whether the surface itself is the cause, or whether the main factor is the humidity from Hurricane Irene and Tropical Storm Lee.  For whatever reason, aces were lower than usual, creating an environment more favorable to, say, Novak Djokovic than someone like Andy Roddick.

The limited space in the Journal prevented me from going into much detail about the methodology or showing results from tournaments other than the slams.  There’s no word limit here at Heavy Topspin, so here goes…

Aces and Server’s Winning Percentage

Surface speed is tricky to measure–as I’ve already mentioned, “surface speed” is really a jumble of many factors, including the court surface, but also heavily influenced by the atmosphere and altitude.  (And, possibly, different types of balls.)  If you were able to physically move the clay courts in Madrid to the venue of the Rome Masters, you would get different results.  But teasing out the different environmental influences is little more than semantics–we’re interested in how the ball bounces off the court, and how that affects the style of play.

So then, what stats best reflect surface speed?  Rally length would be useful, as would winner counts–shorter rallies and more winners would imply a faster court.  But we don’t have those for more than a few tournaments.  Instead, I stuck with the basics: aces, and the percentage of points won by the server.

Important in any analysis of this sort is to control for the players at each tournament.  The players who show up for a lower-rung clay tournament are more likely to be clay specialists, and the men who get through qualifying are more likely to be comfortable on clay.  Also, the players who reach the later rounds are more likely to be better on the tournament’s surface.  Thus, the number of aces at, say, the French Open is partially influenced by surface, and partially influenced by who plays, and how much each player plays.

Thus, instead of looking at raw numbers (e.g. 5% of points at Monte Carlo were aces), I took each server in each match, and compared his ace rate to his season-long ace rate.  Then I aggregated those comparisons for all matches in the tournament.  This allows us to measure each tournament’s ace rate against a neutral, average-speed surface.

The Path to Blandness

The ace rate numbers varied widely.  While the Australian Open and this year’s US Open were close to a hypothetical neutral surface speed, other tourneys feature barely half the average number of aces, and still others have nearly half-again the number of aces of a neutral surface.   I’ve included a long list of tournaments and their ace rates below; you won’t be surprised to see the indoor and grass tournaments on the high end and clay events at the other extreme.

But there’s a surprise waiting.  I also calculated the percentage of points won by the server, and like ace rate, I controlled for the mix of players in every event.  While ace rate varies from 53% of average to 145% of average, the percentage of points won by the server never falls below 90% of average, rarely drops below 95%, and never exceeds 105%.  53 of the 67 tournaments listed below fall between 97% and 103%–suggesting that surface influences the outcome of only handful of points per match.

That may defy intuition, but think back to the mix of players at each tournament.  Big-serving Americans don’t show up at Monte Carlo, while South Americans generally skip every non-mandatory event in North America.  The nominal rate at which servers win points varies quite a bit, but that’s because of the players in the mix.

Also, this finding suggests that, as a stat, aces are overrated.  They may be a useful proxy for server dominance–if a players hits 15 aces in a match, he’s probably a pretty good server–but they come nowhere near telling the whole story.  Aces on grass turn into service winners on hard courts, and then become weak returns and third-shot winners on clay.  The end result is usually the same, but Milos Raonic is a lot scarier when the serves bounce over your head.

Finally, it would be a mistake to say that a variance of 3-5% in serve points won is meaningless.  It may be less than expected, but especially between good servers, 3-5% can be the difference.  Move Saturday’s Federer/Djokovic semifinal to a surface like Wimbledon’s, and we’d be looking at a different champion.

All the Numbers

Here is the breakdown of ace rate and serve points won, compared to season average, for nearly every current ATP event.

Since I am using each season’s average, you may wonder whether the averages themselves have changed from year to year.  I’ve read that courts are getting slower, but in the five-year span I’ve studied here, the ace rate has actually crept up a tiny bit.  Each tournament varies quite a bit–probably due to weather–but generally ends up at the same numbers.

Below, find the 2011 ace rate and percentage of serve points won, as well as the average back to 2007.   Again, these are controlled for the mix of players (including how much each guy played), and the numbers are all relative to season average.

The little letter next to the tournament name is surface: c = clay, h = hard, g = grass, and i = indoor.

Tournament          2011Ace  2011Sv%    AvgAce  AvgSv%  
Estoril          c    57.5%    96.6%     53.3%   94.3%  
Monte Carlo      c    52.0%    92.1%     53.9%   91.2%  
Umag             c    58.6%    95.2%     58.7%   94.3%  
Serbia           c    54.2%    93.5%     61.0%   94.8%  
Rome             c    62.5%    95.9%     62.9%   94.4%  
Buenos Aires     c    61.9%    99.0%     62.9%   98.6%  
Houston          c    64.9%    97.2%     66.6%   96.8%  
Valencia         i                       68.0%   96.4%  
Barcelona        c    55.7%    94.3%     68.0%   96.2%  
Dusseldorf       c    45.7%    96.5%     72.8%   97.2%  

Hamburg          c    78.0%    96.6%     74.3%   96.4%  
Bastad           c    63.8%    94.5%     76.8%   97.7%  
Roland Garros    c    78.0%    98.4%     77.1%   97.5%  
Santiago         c    84.5%    98.5%     81.5%   99.4%  
Costa do Sauipe  c    83.4%   101.7%     84.2%   98.9%  
Nice             c    88.5%    97.4%     84.3%   98.1%  
Casablanca       c    79.1%    99.0%     84.9%   98.2%  
Acupulco         c    70.9%    95.6%     86.0%   98.7%  
Madrid           c    77.0%    98.5%     86.1%   98.0%  
Munich           c    87.9%   100.1%     86.5%  100.0%  

Beijing          h                       86.7%   97.3%  
Los Angeles      h    84.7%    97.2%     87.7%   97.3%  
Kitzbuhel        c    95.8%    97.9%     89.0%   98.6%  
Toronto          h                       89.6%   98.3%  
Chennai          h    82.3%    98.0%     89.6%   98.7%  
Stuttgart        c    77.0%    95.8%     89.7%   98.1%  
Indian Wells     h    88.9%    99.0%     90.9%   98.0%  
Doha             h   125.5%   101.9%     91.2%   97.6%  
Auckland         h   103.1%   102.0%     93.9%   98.7%  
Miami            h    94.5%    97.9%     94.4%   98.0%  

Shanghai         h                       94.6%   98.1%  
Australian Open  h    97.6%    97.3%     96.5%   96.9%  
Kuala Lumpur     h                       97.1%   97.3%  
Sydney           h   105.8%   100.0%     97.4%   99.1%  
St. Petersburg   i                       97.8%  101.7%  
Montreal         h    91.3%    98.4%     98.1%   98.2%  
Delray Beach     h   106.2%    99.9%     99.1%   98.6%  
Gstaad           c   104.5%   100.1%    101.2%  101.4%  
Dubai            h   102.7%    96.5%    103.2%   98.2%  
US Open          h   101.3%    97.4%    104.0%   98.7%  

Vienna           i                      105.8%  101.4%  
Johannesburg     h   110.0%   102.7%    106.0%  101.0%  
Washington DC    h    97.5%   100.1%    106.8%   99.8%  
Newport          g    93.3%    99.0%    107.5%  101.7%  
Winston-Salem    h   108.1%    99.6%    108.1%   99.6%  
Atlanta          h   110.0%   100.9%    108.4%   99.0%  
Bangkok          h                      110.5%  101.6%  
Cincinnati       h    96.2%    98.9%    111.7%  100.5%  
Zagreb           i   107.0%    99.2%    112.3%  102.3%  
Moscow           i                      113.0%  101.3%  

Brisbane         h   130.6%   100.3%    113.4%  100.0%  
Eastbourne       g   111.2%   101.8%    114.1%  102.9%  
Paris Indoors    i                      115.4%   99.6%  
Rotterdam        i   123.8%   103.7%    115.9%  101.0%  
Basel            i                      117.7%  101.3%  
San Jose         i   108.6%   103.0%    120.0%  102.7%  
Wimbledon        g   119.4%   102.8%    120.7%  103.0%  
Queen's Club     g   113.3%   101.8%    121.5%  103.2%  
Halle            g   122.9%   104.7%    123.2%  102.5%  
Marseille        i   127.4%   102.8%    124.2%  102.2%  

Stockholm        i                      124.4%   99.8%  
Metz             i                      124.6%  101.7%  
Tokyo            h                      124.7%  100.5%  
s-Hertogenbosch  g   110.9%   102.1%    126.3%  104.0%  
Memphis          i   117.1%   101.2%    129.1%  102.0%  
Montpellier      i                      145.4%  104.5%


Filed under Elsewhere, Research

Video: The Present and Future of Statistics in Tennis

Last Tuesday, I gave a talk at the Longwood Cricket Club in Boston about tennis statistics.  Many thanks to Rick Devereaux for extending the invitation, and to everyone at Longwood for their hospitality.  (And for their beautiful grass courts!)

In the talk, I discuss the value of different types of statistics in sports, what tennis stats are out there now, and what we can expect in the not-too-distant future. I also detour into baseball analysis to show some of the potential for research in tennis.  It’s about 36 minutes long.

Apologies for the video quality–the room was dark to accommodate the projector, and my handy little Flip camera could only do so much.  Still, the audio is generally clear.


Leave a comment

Filed under Elsewhere

At ESPN, On Andy Murray

At ESPN today, you can read my article on Andy Murray’s quietly excellent clay-court season, and what it might portend for the rest of Andy’s year.

Leave a comment

Filed under Elsewhere