Category Archives: Hawkeye

There Is No Analytics Revolution In Tennis

I’m sure you’ve heard about the trend. First, statistics overhauled baseball, and teams in every major sport now employ quants to search out that extra edge. Tennis has lagged behind the others, but with the help of big data, we’re on the cusp of a whole new era.

That’s the story, anyway. Yesterday brought us another example.

What happened in baseball is, quite simply, never going to happen in tennis.

To oversimplify a bit, the “Moneyball revolution” refers to front offices using analytics to identify underrated and underpriced players. To a lesser extent, it refers to deploying those players in a smarter way–say, rearranging the batting order or attempting fewer stolen bases.

In tennis, there are no front offices. Players aren’t paid salaries by teams. And there are no managers to decide how best to use their players.

In short: There are no organizations with both the incentives and the resources to analyze data.

Of course, when people get breathless about all the raw data floating around in tennis, that isn’t what they’re talking about. (No one really thinks Hawkeye data is going to revolutionize, say, the World Team Tennis draft.) Instead, they are implying that the data can be analyzed in such a way to be actionable for players.

That’s an admirable objective. In theory, Kevin Anderson’s coach could look at all the data from all the matches between Anderson and Tomas Berdych and identify which tactics worked, which didn’t, and make recommendations accordingly. Of course, Kevin’s coach is already watching all those matches, taking notes, reviewing video, and presumably making recommendations, so if big data is going to change the game, it needs to somehow offer coaches demonstrably better insights.

With all the cameras pointed at tennis’s show courts, that’s certainly possible. The closest analogue in baseball is the pitch f/x system, which tracks the speed, location, and movement of every pitch. Some pitchers have been able to use pitch f/x data to analyze and improve upon their own performance. The same could eventually happen in tennis. But there are systemic reasons why it hasn’t yet, and those root causes are unlikely to disappear anytime soon.

What needs to change

Hawkeye cameras are aimed at a lot of courts and have the capability of collecting an enormous amount of data. That’s how broadcasts are able to bring you stats like average net clearance and meters run. Those cameras also help generate graphics like those showing where all of a player’s serves landed.

After a match is over, with no calls left to be overturned and no broadcast needs likely to arise, what happens to the data? For all practical purposes, it gets stashed in the attic and forgotten. (Here’s a more thorough explanation.) Contrast that to Major League Baseball, which makes all pitch f/x data available immediately–to the public, for free–and has archived it indefinitely.

If tennis is to see any meaningful analytical breakthroughs, Hawkeye data needs to be aggregated in a single database. Results from one match are sometimes interesting (hey look, Andy’s net clearance is 15% greater than Roger’s!), but if we’re always looking at one match, or one tournament, at a time, we’ll never learn which of these Hawkeye-derived statistics matter, or how much.

IBM, the collector of much of this information, may already maintain some version of that database. But the results are jaw-droppingly uninspiring. On broadcasts, we get the same old stats and graphics. When IBM has ventured into predicting match outcomes, their “millions of data points” are outperformed by my much simpler model.

IBM is the one organization in the sport with the resources to do the kind of analysis that will transform tennis. But they have no incentive to do so. To IBM (and now SAP, in the women’s game), tennis is a public relations opportunity, one that allows them to brand tournament websites and on-screen graphics with their logo. (Not to mention those suspiciously pro-IBM trend pieces linked to above.)

Players might eventually benefit from data-based insights, but only a tiny fraction of them could afford to hire even a single analyst. (Hi Simona! Text me anytime!)

Once again, we have to turn to baseball for a precedent. Even in that immense sport, with its billion-dollar franchises, it was amateurs–outsiders–who did the work that brought about the analytics revolution. Even now, with teams aggressively hiring promising talent from outside the game, many of the most profitable insights still come from independent researchers. If MLB made its data as inaccessible as tennis does, that trend would’ve ground to a halt long ago.

Nice as it is to dream about a better world of tennis data, we’re unlikely to see it anytime soon. Tennis doesn’t have a commissioner, so there’s no one to appoint a data czar, let alone anyone who could convince the alphabet soup of the ATP, WTA, ITF, IBM, SAP, and Hawkeye to aggregate their data in any meaningful way.

Until that happens, and until the data is publicly available, there will be no analytics revolution in tennis. We’ll continue to get what we have now: the occasional Hawkeye stat, free of context, illustrating the same sort of analysis we’ve been hearing for decades.

Leave a comment

Filed under Hawkeye, Rants

Halep’s Beatdown, Challenges by Gender, Djokovic Unthreatened

Thanks to the dominance of players like Serena Williams and Victoria Azarenka, it’s not much of a surprise to see a scoreline like 6-1 6-0 in the first week of a Grand Slam.  But when an upset comes with scores like that, we should sit up and take notice.

That’s what Simona Halep did to Maria Kirilenko, and trust me, it wasn’t any closer than the score suggests.  Halep has a deceptively big game, content to counterpunch but always looking for an opening for what can be a monster backhand.  I charted her match yesterday (along with Vika’s third-rounder against Alize Cornet), so look for some detailed stats from those matches later today.

Even before the first matches were played, it was clear that the Romanian landed in the right part of the draw, in a quarter free of Serena, Vika, Agnieszka Radwanska, and Na Li.  With the early upsets of Sara Errani and Caroline Wozniacki, the two highest-ranked women in her quarter, Halep’s position looks even better.

Strangely enough, though, her next two opponents are women she might prefer not to face.  Flavia Pennetta, who will play her in the round of 16, was the last woman outside of the top 20 to beat Halep.  (Granted, Simona retired in the third set with a lower back injury.)  Her likely quarterfinal opponent, Roberta Vinci, is a more  interesting case.  The pair have already faced off three times this year, and on the first of those occasions, Vinci dished out Halep’s worst loss of the year, a 6-0 6-3 drubbing on the carpet in Paris.  Since then, Simona has won two equally lopsided matches, on both clay and grass.

The way Halep was playing yesterday, though, we can safely pencil her into the semifinals, regardless who she draws in the meantime.

Did you know that, at Grand Slams, men use the challenge system more that women do?

At the Open so far this year, men have made 7.52 challenges per match, while women have made 3.38.  The same pattern held at the Australian Open and Wimbledon this year.  In general, there are about twice as many challenges in a men’s Slam match than in a women’s slam match.

Of course, a big part of that discrepancy arises because men play best-of-5 matches while women play best-of-3.  The more sets, the more points, and the more points, the more potential reasons to challenge.

Still, the structural difference doesn’t entirely account for the gap.  For instance, there were roughly 90 men’s matches and 90 women’s matches played on Hawkeye courts in Melbourne this year, and the men’s matches averaged about 60% more points.  Men challenged calls once every 32 points, while women challenged once every 37 points.

That’s not quite as dramatic as the 2:1 ratio we started with, but it’s still notable, and it has remained consistent throughout multiple slams this year.

One possibility is that men challenge more because, on average, they hit the ball harder, particularly on the serve.  The harder the shot, the tougher it is for everyone to see exactly where it lands, and the greater likelihood of disagreement.  To corroborate, it would be interesting to know whether chair umpires are more or less likely to overrule in men’s matches.

Yesterday I noted that Djokovic had a remarkably easy path to the quarterfinals.  If Marcel Granollers beats Tim Smyczek, the Spaniard will be Novak’s highest-ranked opponent en route to the quarters.  (That’s assuming Djokovic beats 95th-ranked Joao Sousa today.)

If Granollers advances, Djokovic’s first four opponents will have the following rankings: 112, 87, 95, and 43.  In 24 previous Grand Slam quarterfinal runs, Novak has needed to beat someone in the top 40 20 of those times, and someone in the top 30 17 of those times.

If, as all patriotic Americans fervently hope, Smyczek wins today, we’ll venture into more extreme territory.  In that case, Djokovic’s highest-ranked opponent will have been 87th-ranked Benjamin Becker.  One suspects that a fair number of ATP players could advance to the quarterfinals given this draw.

In the Smyczek scenario, Djokovic will have faced an easier path than Roger Federer ever has in his 36 Slam quarterfinal showings.  As Carl Bialik reported during last year’s French Open, Roger’s first four rounds at Roland Garros were the easiest of his career–his highest-ranked opponent was #78 Tobias Kamke.

Federer’s experience leaves it unclear whether such a friendly draw is a good thing.  In the quarterfinals of that tournament, he lost his first two sets to Juan Martin del Potro before charging back for the five-set victory.  Perhaps we can expect such a thriller from Djokovic and Tommy Haas next week.

Want to know more about Tim Smyczek?  Here’s a good place to start.

Here’s another excellent win probability graph from Betting Market Analytics, this time covering the five-setter between Hewitt and del Potro.

Leave a comment

Filed under Hawkeye, Simona Halep