Turn Off Ads?
Page 1 of 2 12 LastLast
Results 1 to 15 of 20

Thread: From BP: How valid is that data?

Hybrid View

  1. #1
    Posting in Dynarama M2's Avatar
    Join Date
    Sep 2000
    Location
    Boston
    Posts
    28,137

    From BP: How valid is that data?

    Colin Wyers takes some time to dig under the surface of ball-in-play data and finds that it's far from a clean set of numbers.

    Here's one of the key findings:

    Letís face itóno matter how much we massage the data, there simply is not a way to objectively define the difference between a fly ball and a line drive. It is inherently a subjective and somewhat arbitrary distinction. ... a simple stopwatch could provide more accurate, quantifiable data than what weíre getting right now.
    If you had a half dozen sources collecting BIP data, it looks like you could get a half dozen significantly different set of findings based on that data. That's a fairly huge problem for the newer defensive metrics.
    Baseball isn't a magic trick ... it doesn't get spoiled if you figure out how it works. - gonelong

    I'm witchcrafting everybody.

  2. #2
    Vampire Weekend @Bernie's camisadelgolf's Avatar
    Join Date
    Dec 2004
    Location
    Cincinnati, OH
    Posts
    11,431

    Re: From BP: How valid is that data?

    In my opinion, there are two major things that define the difference between a fly ball and a pop-up: velocity and angle. With all the advances in technology, I don't think it'd be too difficult to come up with a universally-accepted definition of the two types of contact.

  3. #3
    Et tu, Brutus? Brutus's Avatar
    Join Date
    Jul 2006
    Location
    Atlanta, Ga.
    Posts
    10,477

    Re: From BP: How valid is that data?

    I don't see there being much of a problem.

    I would argue that about 90% of the balls hit would be agreed upon by 90% of people watching as either a fly or a line. There are certainly going to be some in-between hits, but I disagree with the notion there's a major flaw in the labeling of these batted balls.

    And as someone said, whatever issues we do have with perhaps those 10% of balls that are in-between, with FX data with batted balls on the brink of being used to a full scale, we can easily come up with standards to fill in the piece of that puzzle.
    "No matter how good you are, you're going to lose one-third of your games. No matter how bad you are you're going to win one-third of your games. It's the other third that makes the difference." ~Tommy Lasorda

  4. #4
    Member Cedric's Avatar
    Join Date
    Aug 2002
    Location
    Monroe
    Posts
    6,371

    Re: From BP: How valid is that data?

    BABIP is the most overused stat in baseball, IMO. Lately people have been using faulty data or blatantly wrong data to make baseball a black or white sport. It's a grey area sport and some things can't be deduced with one tidy number.
    This is the time. The real Reds organization is back.

  5. #5
    Posting in Dynarama M2's Avatar
    Join Date
    Sep 2000
    Location
    Boston
    Posts
    28,137

    Re: From BP: How valid is that data?

    Quote Originally Posted by Cedric View Post
    BABIP is the most overused stat in baseball, IMO. Lately people have been using faulty data or blatantly wrong data to make baseball a black or white sport. It's a grey area sport and some things can't be deduced with one tidy number.
    I agree in general, though I'll note the one thing BABIP has going for it is it doesn't delve into the qualitative, which is where the garbage is piling up. BABIP is just a big, dumb number. It's got some value for pitchers in that they tend to stick to a general norm. It doesn't seem to have much use for hitters since some are routinely pathetic and others are routinely awesome when it comes to BABIP (and the information tends to tell us what we already knew from watching them).

    Probably it's best use is its flipside at the team level - DER. That will give you a good snapshot of how a total team defense has played.

    Quote Originally Posted by RedsManRick
    The next big improvement, which we should have very soon, is accurate hit data which includes not just vector and distance but height, ball speed and ball rotation.
    I was under the impression that it could be at least a decade before we have that. Maybe some teams will have that data sooner, but it's not terribly close to being at the fingertips of fans like you and I.

    I'll also note that it's likely to take a long time to digest that information once we start getting it. People have barely begun to unspool pitch f/x data. They're going to try to take a pile of different data points on a given ball in play and synthesize it into a single number (feeding right into the problem Cedric underlined). Chances are the proper interpretation of such varied data will extend far beyond a single number. So we have that to look forward to even after we figure out how to drink from a firehose.

    And even then some bystander is going to walk by and note that we're still not getting direct metrics on the thing we're supposed to be assessing - the fielder.

    Quote Originally Posted by Brutus the Pimp
    I would argue that about 90% of the balls hit would be agreed upon by 90% of people watching as either a fly or a line.
    The problem is it's a relatively small percentage of balls in play that separate the good fielders from the bad fielders. Most defenders get to most of the balls in play, and most of those are the balls everybody would agree upon.

    That leaves us with the difficult-to-classify balls forming an inordinately large percentage of the plays we're using to stratify fielders with supposedly objective numbers.

    On top of that, you might be surprised how little people would agree on how to classify a ball in play. The entire field of perceptual psychology revolves around how not everything is as obvious as a given individual may think it is. The human brain is as likely to be reading the fielder's reaction to the ball in play as it is the ball itself.

    And the human brain might very well be right to be doing that. Of course, that doesn't make for a clean objective interpretation of the event.
    Baseball isn't a magic trick ... it doesn't get spoiled if you figure out how it works. - gonelong

    I'm witchcrafting everybody.

  6. #6
    Et tu, Brutus? Brutus's Avatar
    Join Date
    Jul 2006
    Location
    Atlanta, Ga.
    Posts
    10,477

    Re: From BP: How valid is that data?

    Quote Originally Posted by M2 View Post
    The problem is it's a relatively small percentage of balls in play that separate the good fielders from the bad fielders. Most defenders get to most of the balls in play, and most of those are the balls everybody would agree upon.

    That leaves us with the difficult-to-classify balls forming an inordinately large percentage of the plays we're using to stratify fielders with supposedly objective numbers.

    On top of that, you might be surprised how little people would agree on how to classify a ball in play. The entire field of perceptual psychology revolves around how not everything is as obvious as a given individual may think it is. The human brain is as likely to be reading the fielder's reaction to the ball in play as it is the ball itself.

    And the human brain might very well be right to be doing that. Of course, that doesn't make for a clean objective interpretation of the event.
    I agree regarding the defenders, but that information predominately is coming simply in the form of where balls are landing in the zones. Right now, they're not really parsing whether or not defenders should get to a ball based on trajectory and velocity (though unquestionably they should be), but simply taking the grid and splitting up the field into the zones that are playable and the ones that are out of the realm of expectation.

    It's the hitters, at least on an individual basis, that are more affected by the classification of batted ball types. While BABIP is not an all-encompassing stat, xBABIP takes it one step further and gives a better snapshot of what a hitter's average should be based on historical data of line drives, fly balls, etc. You're right that sometimes even 10% can absolutely make a big difference in what we consider a good to great fielder or good to great hitter. But for now, I still think the common sense approach is at least good enough to give us a usable ballpark figure for what we want to accomplish.

    I agree with the premise there's room for improvement. I don't think, though, that anyone refers to BABIP as gospel - at least any more than the other stats get so heavily clung to.
    "No matter how good you are, you're going to lose one-third of your games. No matter how bad you are you're going to win one-third of your games. It's the other third that makes the difference." ~Tommy Lasorda

  7. #7
    RaisorZone Raisor's Avatar
    Join Date
    Jun 2001
    Location
    Charlotte, Nc
    Posts
    15,088

    Re: From BP: How valid is that data?

    Quote Originally Posted by Brutus the Pimp View Post
    I don't see there being much of a problem.

    I would argue that about 90% of the balls hit would be agreed upon by 90% of people watching as either a fly or a line. There are certainly going to be some in-between hits, but I disagree with the notion there's a major flaw in the labeling of these batted balls.

    .
    We're just over a week into the season, and in the NL alone, there have been 3158 balls in play. 10% of that is 316.

    10% may not seem like "alot", but we'll be talking thousands of Balls in play by the end of the seaon.
    "But I do know Joey's sister indirectly (or foster sister) and I have heard stories of Joey being into shopping, designer wear, fancy coffees, and pedicures."

  8. #8
    Et tu, Brutus? Brutus's Avatar
    Join Date
    Jul 2006
    Location
    Atlanta, Ga.
    Posts
    10,477

    Re: From BP: How valid is that data?

    Quote Originally Posted by Raisor View Post
    We're just over a week into the season, and in the NL alone, there have been 3158 balls in play. 10% of that is 316.

    10% may not seem like "alot", but we'll be talking thousands of Balls in play by the end of the seaon.
    But even if there's some disagreement, it doesn't mean a majority of people wouldn't agree with labeling a ball a line drive or fly ball. The point is, while some people may reasonably disagree, it doesn't mean the data is terribly skewed or that we can't get some reasonable conclusions about players because of it.
    "No matter how good you are, you're going to lose one-third of your games. No matter how bad you are you're going to win one-third of your games. It's the other third that makes the difference." ~Tommy Lasorda

  9. #9
    Charlie Brown All-Star IslandRed's Avatar
    Join Date
    May 2001
    Location
    Melbourne, FL
    Posts
    4,835

    Re: From BP: How valid is that data?

    Quote Originally Posted by Brutus the Pimp View Post
    The point is, while some people may reasonably disagree, it doesn't mean the data is terribly skewed or that we can't get some reasonable conclusions about players because of it.
    Reasonably broad, you mean? Yes. Except that we like to be more specific about that when we argue about ballplayers, and in a sport where the proverbial one hit a week makes a huge difference in how we look at a hitter's total value, vagueness in the underlying data can be a problem. I was struck by the article's example of Wandy Rodriguez, and how the two sources of batted-ball data, fed into the same estimation formula that is supposed to give you his "true" worth independent of luck and defense, came out half a run different. I think we'd all agree that half a run per game in the context of a pitcher is not a small difference whatsoever.
    Not all who wander are lost

  10. #10
    Stat Wanker Hodiernus RedsManRick's Avatar
    Join Date
    Dec 2004
    Location
    Chicago, IL
    Posts
    15,898

    Re: From BP: How valid is that data?

    I think Colin takes it a bit to far in terms of throwing his hands up at the end of the article. Though he definitely has a valid point. Given the choice, I would probably defer to the people who are using video rather than watch live from a perspective above the play, particularly in light of his research which shows the biased introduced by height.

    All that said, I still think we're fairly far along the accuracy curve. The next big improvement, which we should have very soon, is accurate hit data which includes not just vector and distance but height, ball speed and ball rotation. At that point we can very easily categorize hit types. The last frontier would then be the interaction between hit types and fielder positioning.

    The critique is definitely appropriate, but I guess I'm more optimistic.
    Games are won on run differential -- scoring more than your opponent. Runs are runs, scored or prevented they all count the same. Worry about scoring more and allowing fewer, not which positions contribute to which side of the equation or how "consistent" you are at your current level of performance.

  11. #11
    Member
    Join Date
    Apr 2003
    Location
    Shelburne Falls, MA
    Posts
    10,064

    Re: From BP: How valid is that data?

    All that said, I still think we're fairly far along the accuracy curve. The next big improvement, which we should have very soon, is accurate hit data which includes not just vector and distance but height, ball speed and ball rotation. At that point we can very easily categorize hit types. The last frontier would then be the interaction between hit types and fielder positioning.
    Of course. It will revolutionize defensive metrics, which include pitching. We're still medieval until that happens, in my opinion.
    "Baseball is a very, very complex business. It's more of a people business than most businesses." - Bob Castellini

  12. #12
    The Boss dougdirt's Avatar
    Join Date
    Jan 2006
    Posts
    34,877

    Re: From BP: How valid is that data?

    Quote Originally Posted by lollipopcurve View Post
    Of course. It will revolutionize defensive metrics, which include pitching. We're still medieval until that happens, in my opinion.
    Its actually happening now. The system is in place. We just don't have the data publicly available yet.

  13. #13
    Something clever pahster's Avatar
    Join Date
    Feb 2005
    Location
    Columbia, MO
    Posts
    1,907

    Re: From BP: How valid is that data?

    Just correlate the categorization of balls in play made by the various sources. Is the correlation high? If so, no problem.

  14. #14
    Posting in Dynarama M2's Avatar
    Join Date
    Sep 2000
    Location
    Boston
    Posts
    28,137

    Re: From BP: How valid is that data?

    Quote Originally Posted by pahster View Post
    Just correlate the categorization of balls in play made by the various sources. Is the correlation high? If so, no problem.
    There's only two sources at the moment. If there were more I might agree with you that we could apply some wisdom of crowds. As it is, we're only getting enough information to note that there's significant variance. We might need three more sources to encompass the full scope of the variance.
    Baseball isn't a magic trick ... it doesn't get spoiled if you figure out how it works. - gonelong

    I'm witchcrafting everybody.

  15. #15
    Stat Wanker Hodiernus RedsManRick's Avatar
    Join Date
    Dec 2004
    Location
    Chicago, IL
    Posts
    15,898

    Re: From BP: How valid is that data?

    Quote Originally Posted by pahster View Post
    Just correlate the categorization of balls in play made by the various sources. Is the correlation high? If so, no problem.
    The correlation moderately high, as you would expect. But the are some systematic biases, particularly regarding the classification of line drives and we have no way to determine which is more correct.
    Games are won on run differential -- scoring more than your opponent. Runs are runs, scored or prevented they all count the same. Worry about scoring more and allowing fewer, not which positions contribute to which side of the equation or how "consistent" you are at your current level of performance.


Turn Off Ads?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

Board Moderators may, at their discretion and judgment, delete and/or edit any messages that violate any of the following guidelines: 1. Explicit references to alleged illegal or unlawful acts. 2. Graphic sexual descriptions. 3. Racial or ethnic slurs. 4. Use of edgy language (including masked profanity). 5. Direct personal attacks, flames, fights, trolling, baiting, name-calling, general nuisance, excessive player criticism or anything along those lines. 6. Posting spam. 7. Each person may have only one user account. It is fine to be critical here - that's what this board is for. But let's not beat a subject or a player to death, please.

Thank you, and most importantly, enjoy yourselves!


RedsZone.com is a privately owned website and is not affiliated with the Cincinnati Reds or Major League Baseball


Contact us: Boss | GIK | BCubb2003 | dabvu2498 | Gallen5862 | LexRedsFan | Plus Plus | RedlegJake | redsfan1995 | The Operator | Tommyjohn25