PDA

View Full Version : From BP: How valid is that data?



M2
04-14-2010, 11:09 AM
Colin Wyers takes some time to dig under the surface of ball-in-play data (http://baseballprospectus.com/article.php?articleid=10523) and finds that it's far from a clean set of numbers.

Here's one of the key findings:


Letís face itóno matter how much we massage the data, there simply is not a way to objectively define the difference between a fly ball and a line drive. It is inherently a subjective and somewhat arbitrary distinction. ... a simple stopwatch could provide more accurate, quantifiable data than what weíre getting right now.

If you had a half dozen sources collecting BIP data, it looks like you could get a half dozen significantly different set of findings based on that data. That's a fairly huge problem for the newer defensive metrics.

camisadelgolf
04-14-2010, 11:11 AM
In my opinion, there are two major things that define the difference between a fly ball and a pop-up: velocity and angle. With all the advances in technology, I don't think it'd be too difficult to come up with a universally-accepted definition of the two types of contact.

Brutus
04-14-2010, 11:28 AM
I don't see there being much of a problem.

I would argue that about 90% of the balls hit would be agreed upon by 90% of people watching as either a fly or a line. There are certainly going to be some in-between hits, but I disagree with the notion there's a major flaw in the labeling of these batted balls.

And as someone said, whatever issues we do have with perhaps those 10% of balls that are in-between, with FX data with batted balls on the brink of being used to a full scale, we can easily come up with standards to fill in the piece of that puzzle.

Cedric
04-14-2010, 11:41 AM
BABIP is the most overused stat in baseball, IMO. Lately people have been using faulty data or blatantly wrong data to make baseball a black or white sport. It's a grey area sport and some things can't be deduced with one tidy number.

RedsManRick
04-14-2010, 12:09 PM
I think Colin takes it a bit to far in terms of throwing his hands up at the end of the article. Though he definitely has a valid point. Given the choice, I would probably defer to the people who are using video rather than watch live from a perspective above the play, particularly in light of his research which shows the biased introduced by height.

All that said, I still think we're fairly far along the accuracy curve. The next big improvement, which we should have very soon, is accurate hit data which includes not just vector and distance but height, ball speed and ball rotation. At that point we can very easily categorize hit types. The last frontier would then be the interaction between hit types and fielder positioning.

The critique is definitely appropriate, but I guess I'm more optimistic.

lollipopcurve
04-14-2010, 02:35 PM
All that said, I still think we're fairly far along the accuracy curve. The next big improvement, which we should have very soon, is accurate hit data which includes not just vector and distance but height, ball speed and ball rotation. At that point we can very easily categorize hit types. The last frontier would then be the interaction between hit types and fielder positioning.

Of course. It will revolutionize defensive metrics, which include pitching. We're still medieval until that happens, in my opinion.

M2
04-14-2010, 02:38 PM
BABIP is the most overused stat in baseball, IMO. Lately people have been using faulty data or blatantly wrong data to make baseball a black or white sport. It's a grey area sport and some things can't be deduced with one tidy number.

I agree in general, though I'll note the one thing BABIP has going for it is it doesn't delve into the qualitative, which is where the garbage is piling up. BABIP is just a big, dumb number. It's got some value for pitchers in that they tend to stick to a general norm. It doesn't seem to have much use for hitters since some are routinely pathetic and others are routinely awesome when it comes to BABIP (and the information tends to tell us what we already knew from watching them).

Probably it's best use is its flipside at the team level - DER. That will give you a good snapshot of how a total team defense has played.


The next big improvement, which we should have very soon, is accurate hit data which includes not just vector and distance but height, ball speed and ball rotation.

I was under the impression that it could be at least a decade before we have that. Maybe some teams will have that data sooner, but it's not terribly close to being at the fingertips of fans like you and I.

I'll also note that it's likely to take a long time to digest that information once we start getting it. People have barely begun to unspool pitch f/x data. They're going to try to take a pile of different data points on a given ball in play and synthesize it into a single number (feeding right into the problem Cedric underlined). Chances are the proper interpretation of such varied data will extend far beyond a single number. So we have that to look forward to even after we figure out how to drink from a firehose.

And even then some bystander is going to walk by and note that we're still not getting direct metrics on the thing we're supposed to be assessing - the fielder.


I would argue that about 90% of the balls hit would be agreed upon by 90% of people watching as either a fly or a line.

The problem is it's a relatively small percentage of balls in play that separate the good fielders from the bad fielders. Most defenders get to most of the balls in play, and most of those are the balls everybody would agree upon.

That leaves us with the difficult-to-classify balls forming an inordinately large percentage of the plays we're using to stratify fielders with supposedly objective numbers.

On top of that, you might be surprised how little people would agree on how to classify a ball in play. The entire field of perceptual psychology revolves around how not everything is as obvious as a given individual may think it is. The human brain is as likely to be reading the fielder's reaction to the ball in play as it is the ball itself.

And the human brain might very well be right to be doing that. Of course, that doesn't make for a clean objective interpretation of the event.

dougdirt
04-14-2010, 02:38 PM
Of course. It will revolutionize defensive metrics, which include pitching. We're still medieval until that happens, in my opinion.

Its actually happening now. The system is in place. We just don't have the data publicly available yet.

pahster
04-14-2010, 02:49 PM
Just correlate the categorization of balls in play made by the various sources. Is the correlation high? If so, no problem.

Brutus
04-14-2010, 02:51 PM
The problem is it's a relatively small percentage of balls in play that separate the good fielders from the bad fielders. Most defenders get to most of the balls in play, and most of those are the balls everybody would agree upon.

That leaves us with the difficult-to-classify balls forming an inordinately large percentage of the plays we're using to stratify fielders with supposedly objective numbers.

On top of that, you might be surprised how little people would agree on how to classify a ball in play. The entire field of perceptual psychology revolves around how not everything is as obvious as a given individual may think it is. The human brain is as likely to be reading the fielder's reaction to the ball in play as it is the ball itself.

And the human brain might very well be right to be doing that. Of course, that doesn't make for a clean objective interpretation of the event.

I agree regarding the defenders, but that information predominately is coming simply in the form of where balls are landing in the zones. Right now, they're not really parsing whether or not defenders should get to a ball based on trajectory and velocity (though unquestionably they should be), but simply taking the grid and splitting up the field into the zones that are playable and the ones that are out of the realm of expectation.

It's the hitters, at least on an individual basis, that are more affected by the classification of batted ball types. While BABIP is not an all-encompassing stat, xBABIP takes it one step further and gives a better snapshot of what a hitter's average should be based on historical data of line drives, fly balls, etc. You're right that sometimes even 10% can absolutely make a big difference in what we consider a good to great fielder or good to great hitter. But for now, I still think the common sense approach is at least good enough to give us a usable ballpark figure for what we want to accomplish.

I agree with the premise there's room for improvement. I don't think, though, that anyone refers to BABIP as gospel - at least any more than the other stats get so heavily clung to.

MississippiRed
04-14-2010, 02:58 PM
There are several comments/questions that come to my mind regarding this thread.

1. Social scientists have used training programs to establish inter-rater reliability among data collectors for years. If baseball is serious about these stats, they should do something similar. It really isn't that hard to find some good examples (not the obvious ones, but the ones people might initially disagree on) and reach group concensus based on expert analysis. Of course, getting experts to agree on the criteria may be difficult, but getting the raters to agree can be done fairly simply.

2. When you look at a pitcher's ground ball/fly ball percentages, where did line drives get categorized? Do they have to go to the outfield in the air? If line drives get categorized as ground balls, what about a line drive home run?

3. When we look at HR/FB ratios, does this include line drives that make it to the outfield but don't get caught? I could envision a system where any ball that gets to the outfield that isn't caught could be considered a line drive. (It was obviously hit hard enough to fall in safely, right?)

Inquiring minds want to know.

pahster
04-14-2010, 02:59 PM
There are several comments/questions that come to my mind regarding this thread.

1. Social scientists have used training programs to establish inter-rater reliability among data collectors for years. If baseball is serious about these stats, they should do something similar. It really isn't that hard to find some good examples (not the obvious ones, but the ones people might initially disagree on) and reach group concensus based on expert analysis. Of course, getting experts to agree on the criteria may be difficult, but getting the raters to agree can be done fairly simply.


Yes. I do this all the time.

M2
04-14-2010, 03:01 PM
Just correlate the categorization of balls in play made by the various sources. Is the correlation high? If so, no problem.

There's only two sources at the moment. If there were more I might agree with you that we could apply some wisdom of crowds. As it is, we're only getting enough information to note that there's significant variance. We might need three more sources to encompass the full scope of the variance.

Raisor
04-14-2010, 03:09 PM
I don't see there being much of a problem.

I would argue that about 90% of the balls hit would be agreed upon by 90% of people watching as either a fly or a line. There are certainly going to be some in-between hits, but I disagree with the notion there's a major flaw in the labeling of these batted balls.

.

We're just over a week into the season, and in the NL alone, there have been 3158 balls in play. 10% of that is 316.

10% may not seem like "alot", but we'll be talking thousands of Balls in play by the end of the seaon.

Brutus
04-14-2010, 03:13 PM
We're just over a week into the season, and in the NL alone, there have been 3158 balls in play. 10% of that is 316.

10% may not seem like "alot", but we'll be talking thousands of Balls in play by the end of the seaon.

But even if there's some disagreement, it doesn't mean a majority of people wouldn't agree with labeling a ball a line drive or fly ball. The point is, while some people may reasonably disagree, it doesn't mean the data is terribly skewed or that we can't get some reasonable conclusions about players because of it.

RedsManRick
04-14-2010, 03:40 PM
Just correlate the categorization of balls in play made by the various sources. Is the correlation high? If so, no problem.

The correlation moderately high, as you would expect. But the are some systematic biases, particularly regarding the classification of line drives and we have no way to determine which is more correct.

wolfboy
04-14-2010, 03:45 PM
This is an interesting response to the Wyers article. http://www.fangraphs.com/blogs/index.php/some-thoughts-on-batted-ball-data

pahster
04-14-2010, 03:45 PM
The correlation moderately high, as you would expect. But the are some systematic biases, particularly regarding the classification of line drives and we have no way to determine which is more correct.

Neither is more correct unless we know what the difference between a fly ball and liner are. We don't know this because there's not really a true cutpoint between the two. That's why access to data on the trajectory and speed of batted balls is such a leap forward for analysts.

RedsManRick
04-14-2010, 04:13 PM
Neither is more correct unless we know what the difference between a fly ball and liner are. We don't know this because there's not really a true cutpoint between the two. That's why access to data on the trajectory and speed of batted balls is such a leap forward for analysts.

Fair point; correct was not the right word to use. They are just different. The definitions as this point are subjective and should be treated as such; in particular, we should no co-mingle the data.

IslandRed
04-14-2010, 04:52 PM
The point is, while some people may reasonably disagree, it doesn't mean the data is terribly skewed or that we can't get some reasonable conclusions about players because of it.

Reasonably broad, you mean? Yes. Except that we like to be more specific about that when we argue about ballplayers, and in a sport where the proverbial one hit a week makes a huge difference in how we look at a hitter's total value, vagueness in the underlying data can be a problem. I was struck by the article's example of Wandy Rodriguez, and how the two sources of batted-ball data, fed into the same estimation formula that is supposed to give you his "true" worth independent of luck and defense, came out half a run different. I think we'd all agree that half a run per game in the context of a pitcher is not a small difference whatsoever.