Turn Off Ads?
Page 1 of 2 12 LastLast
Results 1 to 15 of 20

Thread: From BP: How valid is that data?

  1. #1
    Posting in Dynarama M2's Avatar
    Join Date
    Sep 2000
    Location
    Boston
    Posts
    28,159

    From BP: How valid is that data?

    Colin Wyers takes some time to dig under the surface of ball-in-play data and finds that it's far from a clean set of numbers.

    Here's one of the key findings:

    Letís face itóno matter how much we massage the data, there simply is not a way to objectively define the difference between a fly ball and a line drive. It is inherently a subjective and somewhat arbitrary distinction. ... a simple stopwatch could provide more accurate, quantifiable data than what weíre getting right now.
    If you had a half dozen sources collecting BIP data, it looks like you could get a half dozen significantly different set of findings based on that data. That's a fairly huge problem for the newer defensive metrics.
    Baseball isn't a magic trick ... it doesn't get spoiled if you figure out how it works. - gonelong

    I'm witchcrafting everybody.

  2. Turn Off Ads?
  3. #2
    Vampire Weekend @Bernie's camisadelgolf's Avatar
    Join Date
    Dec 2004
    Location
    Cincinnati, OH
    Posts
    11,431

    Re: From BP: How valid is that data?

    In my opinion, there are two major things that define the difference between a fly ball and a pop-up: velocity and angle. With all the advances in technology, I don't think it'd be too difficult to come up with a universally-accepted definition of the two types of contact.

  4. #3
    Et tu, Brutus? Brutus's Avatar
    Join Date
    Jul 2006
    Location
    Atlanta, Ga.
    Posts
    10,485

    Re: From BP: How valid is that data?

    I don't see there being much of a problem.

    I would argue that about 90% of the balls hit would be agreed upon by 90% of people watching as either a fly or a line. There are certainly going to be some in-between hits, but I disagree with the notion there's a major flaw in the labeling of these batted balls.

    And as someone said, whatever issues we do have with perhaps those 10% of balls that are in-between, with FX data with batted balls on the brink of being used to a full scale, we can easily come up with standards to fill in the piece of that puzzle.
    "No matter how good you are, you're going to lose one-third of your games. No matter how bad you are you're going to win one-third of your games. It's the other third that makes the difference." ~Tommy Lasorda

  5. #4
    Member Cedric's Avatar
    Join Date
    Aug 2002
    Location
    Monroe
    Posts
    6,383

    Re: From BP: How valid is that data?

    BABIP is the most overused stat in baseball, IMO. Lately people have been using faulty data or blatantly wrong data to make baseball a black or white sport. It's a grey area sport and some things can't be deduced with one tidy number.
    This is the time. The real Reds organization is back.

  6. #5
    Stat Wanker Hodiernus RedsManRick's Avatar
    Join Date
    Dec 2004
    Location
    Chicago, IL
    Posts
    15,910

    Re: From BP: How valid is that data?

    I think Colin takes it a bit to far in terms of throwing his hands up at the end of the article. Though he definitely has a valid point. Given the choice, I would probably defer to the people who are using video rather than watch live from a perspective above the play, particularly in light of his research which shows the biased introduced by height.

    All that said, I still think we're fairly far along the accuracy curve. The next big improvement, which we should have very soon, is accurate hit data which includes not just vector and distance but height, ball speed and ball rotation. At that point we can very easily categorize hit types. The last frontier would then be the interaction between hit types and fielder positioning.

    The critique is definitely appropriate, but I guess I'm more optimistic.
    Games are won on run differential -- scoring more than your opponent. Runs are runs, scored or prevented they all count the same. Worry about scoring more and allowing fewer, not which positions contribute to which side of the equation or how "consistent" you are at your current level of performance.

  7. #6
    Member
    Join Date
    Apr 2003
    Location
    Shelburne Falls, MA
    Posts
    10,070

    Re: From BP: How valid is that data?

    All that said, I still think we're fairly far along the accuracy curve. The next big improvement, which we should have very soon, is accurate hit data which includes not just vector and distance but height, ball speed and ball rotation. At that point we can very easily categorize hit types. The last frontier would then be the interaction between hit types and fielder positioning.
    Of course. It will revolutionize defensive metrics, which include pitching. We're still medieval until that happens, in my opinion.
    "Baseball is a very, very complex business. It's more of a people business than most businesses." - Bob Castellini

  8. #7
    Posting in Dynarama M2's Avatar
    Join Date
    Sep 2000
    Location
    Boston
    Posts
    28,159

    Re: From BP: How valid is that data?

    Quote Originally Posted by Cedric View Post
    BABIP is the most overused stat in baseball, IMO. Lately people have been using faulty data or blatantly wrong data to make baseball a black or white sport. It's a grey area sport and some things can't be deduced with one tidy number.
    I agree in general, though I'll note the one thing BABIP has going for it is it doesn't delve into the qualitative, which is where the garbage is piling up. BABIP is just a big, dumb number. It's got some value for pitchers in that they tend to stick to a general norm. It doesn't seem to have much use for hitters since some are routinely pathetic and others are routinely awesome when it comes to BABIP (and the information tends to tell us what we already knew from watching them).

    Probably it's best use is its flipside at the team level - DER. That will give you a good snapshot of how a total team defense has played.

    Quote Originally Posted by RedsManRick
    The next big improvement, which we should have very soon, is accurate hit data which includes not just vector and distance but height, ball speed and ball rotation.
    I was under the impression that it could be at least a decade before we have that. Maybe some teams will have that data sooner, but it's not terribly close to being at the fingertips of fans like you and I.

    I'll also note that it's likely to take a long time to digest that information once we start getting it. People have barely begun to unspool pitch f/x data. They're going to try to take a pile of different data points on a given ball in play and synthesize it into a single number (feeding right into the problem Cedric underlined). Chances are the proper interpretation of such varied data will extend far beyond a single number. So we have that to look forward to even after we figure out how to drink from a firehose.

    And even then some bystander is going to walk by and note that we're still not getting direct metrics on the thing we're supposed to be assessing - the fielder.

    Quote Originally Posted by Brutus the Pimp
    I would argue that about 90% of the balls hit would be agreed upon by 90% of people watching as either a fly or a line.
    The problem is it's a relatively small percentage of balls in play that separate the good fielders from the bad fielders. Most defenders get to most of the balls in play, and most of those are the balls everybody would agree upon.

    That leaves us with the difficult-to-classify balls forming an inordinately large percentage of the plays we're using to stratify fielders with supposedly objective numbers.

    On top of that, you might be surprised how little people would agree on how to classify a ball in play. The entire field of perceptual psychology revolves around how not everything is as obvious as a given individual may think it is. The human brain is as likely to be reading the fielder's reaction to the ball in play as it is the ball itself.

    And the human brain might very well be right to be doing that. Of course, that doesn't make for a clean objective interpretation of the event.
    Baseball isn't a magic trick ... it doesn't get spoiled if you figure out how it works. - gonelong

    I'm witchcrafting everybody.

  9. #8
    The Boss dougdirt's Avatar
    Join Date
    Jan 2006
    Posts
    35,040

    Re: From BP: How valid is that data?

    Quote Originally Posted by lollipopcurve View Post
    Of course. It will revolutionize defensive metrics, which include pitching. We're still medieval until that happens, in my opinion.
    Its actually happening now. The system is in place. We just don't have the data publicly available yet.

  10. #9
    Something clever pahster's Avatar
    Join Date
    Feb 2005
    Location
    Columbia, MO
    Posts
    1,907

    Re: From BP: How valid is that data?

    Just correlate the categorization of balls in play made by the various sources. Is the correlation high? If so, no problem.

  11. #10
    Et tu, Brutus? Brutus's Avatar
    Join Date
    Jul 2006
    Location
    Atlanta, Ga.
    Posts
    10,485

    Re: From BP: How valid is that data?

    Quote Originally Posted by M2 View Post
    The problem is it's a relatively small percentage of balls in play that separate the good fielders from the bad fielders. Most defenders get to most of the balls in play, and most of those are the balls everybody would agree upon.

    That leaves us with the difficult-to-classify balls forming an inordinately large percentage of the plays we're using to stratify fielders with supposedly objective numbers.

    On top of that, you might be surprised how little people would agree on how to classify a ball in play. The entire field of perceptual psychology revolves around how not everything is as obvious as a given individual may think it is. The human brain is as likely to be reading the fielder's reaction to the ball in play as it is the ball itself.

    And the human brain might very well be right to be doing that. Of course, that doesn't make for a clean objective interpretation of the event.
    I agree regarding the defenders, but that information predominately is coming simply in the form of where balls are landing in the zones. Right now, they're not really parsing whether or not defenders should get to a ball based on trajectory and velocity (though unquestionably they should be), but simply taking the grid and splitting up the field into the zones that are playable and the ones that are out of the realm of expectation.

    It's the hitters, at least on an individual basis, that are more affected by the classification of batted ball types. While BABIP is not an all-encompassing stat, xBABIP takes it one step further and gives a better snapshot of what a hitter's average should be based on historical data of line drives, fly balls, etc. You're right that sometimes even 10% can absolutely make a big difference in what we consider a good to great fielder or good to great hitter. But for now, I still think the common sense approach is at least good enough to give us a usable ballpark figure for what we want to accomplish.

    I agree with the premise there's room for improvement. I don't think, though, that anyone refers to BABIP as gospel - at least any more than the other stats get so heavily clung to.
    "No matter how good you are, you're going to lose one-third of your games. No matter how bad you are you're going to win one-third of your games. It's the other third that makes the difference." ~Tommy Lasorda

  12. #11
    Member MississippiRed's Avatar
    Join Date
    Apr 2006
    Location
    Starkville, Mississippi
    Posts
    328

    Re: From BP: How valid is that data?

    There are several comments/questions that come to my mind regarding this thread.

    1. Social scientists have used training programs to establish inter-rater reliability among data collectors for years. If baseball is serious about these stats, they should do something similar. It really isn't that hard to find some good examples (not the obvious ones, but the ones people might initially disagree on) and reach group concensus based on expert analysis. Of course, getting experts to agree on the criteria may be difficult, but getting the raters to agree can be done fairly simply.

    2. When you look at a pitcher's ground ball/fly ball percentages, where did line drives get categorized? Do they have to go to the outfield in the air? If line drives get categorized as ground balls, what about a line drive home run?

    3. When we look at HR/FB ratios, does this include line drives that make it to the outfield but don't get caught? I could envision a system where any ball that gets to the outfield that isn't caught could be considered a line drive. (It was obviously hit hard enough to fall in safely, right?)

    Inquiring minds want to know.
    Win some, lose some, some get rained out.

  13. #12
    Something clever pahster's Avatar
    Join Date
    Feb 2005
    Location
    Columbia, MO
    Posts
    1,907

    Re: From BP: How valid is that data?

    Quote Originally Posted by MississippiRed View Post
    There are several comments/questions that come to my mind regarding this thread.

    1. Social scientists have used training programs to establish inter-rater reliability among data collectors for years. If baseball is serious about these stats, they should do something similar. It really isn't that hard to find some good examples (not the obvious ones, but the ones people might initially disagree on) and reach group concensus based on expert analysis. Of course, getting experts to agree on the criteria may be difficult, but getting the raters to agree can be done fairly simply.
    Yes. I do this all the time.

  14. #13
    Posting in Dynarama M2's Avatar
    Join Date
    Sep 2000
    Location
    Boston
    Posts
    28,159

    Re: From BP: How valid is that data?

    Quote Originally Posted by pahster View Post
    Just correlate the categorization of balls in play made by the various sources. Is the correlation high? If so, no problem.
    There's only two sources at the moment. If there were more I might agree with you that we could apply some wisdom of crowds. As it is, we're only getting enough information to note that there's significant variance. We might need three more sources to encompass the full scope of the variance.
    Baseball isn't a magic trick ... it doesn't get spoiled if you figure out how it works. - gonelong

    I'm witchcrafting everybody.

  15. #14
    RaisorZone Raisor's Avatar
    Join Date
    Jun 2001
    Location
    Charlotte, Nc
    Posts
    15,122

    Re: From BP: How valid is that data?

    Quote Originally Posted by Brutus the Pimp View Post
    I don't see there being much of a problem.

    I would argue that about 90% of the balls hit would be agreed upon by 90% of people watching as either a fly or a line. There are certainly going to be some in-between hits, but I disagree with the notion there's a major flaw in the labeling of these batted balls.

    .
    We're just over a week into the season, and in the NL alone, there have been 3158 balls in play. 10% of that is 316.

    10% may not seem like "alot", but we'll be talking thousands of Balls in play by the end of the seaon.
    "But I do know Joey's sister indirectly (or foster sister) and I have heard stories of Joey being into shopping, designer wear, fancy coffees, and pedicures."

  16. #15
    Et tu, Brutus? Brutus's Avatar
    Join Date
    Jul 2006
    Location
    Atlanta, Ga.
    Posts
    10,485

    Re: From BP: How valid is that data?

    Quote Originally Posted by Raisor View Post
    We're just over a week into the season, and in the NL alone, there have been 3158 balls in play. 10% of that is 316.

    10% may not seem like "alot", but we'll be talking thousands of Balls in play by the end of the seaon.
    But even if there's some disagreement, it doesn't mean a majority of people wouldn't agree with labeling a ball a line drive or fly ball. The point is, while some people may reasonably disagree, it doesn't mean the data is terribly skewed or that we can't get some reasonable conclusions about players because of it.
    "No matter how good you are, you're going to lose one-third of your games. No matter how bad you are you're going to win one-third of your games. It's the other third that makes the difference." ~Tommy Lasorda


Turn Off Ads?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

Board Moderators may, at their discretion and judgment, delete and/or edit any messages that violate any of the following guidelines: 1. Explicit references to alleged illegal or unlawful acts. 2. Graphic sexual descriptions. 3. Racial or ethnic slurs. 4. Use of edgy language (including masked profanity). 5. Direct personal attacks, flames, fights, trolling, baiting, name-calling, general nuisance, excessive player criticism or anything along those lines. 6. Posting spam. 7. Each person may have only one user account. It is fine to be critical here - that's what this board is for. But let's not beat a subject or a player to death, please.

Thank you, and most importantly, enjoy yourselves!


RedsZone.com is a privately owned website and is not affiliated with the Cincinnati Reds or Major League Baseball


Contact us: Boss | GIK | BCubb2003 | dabvu2498 | Gallen5862 | LexRedsFan | Plus Plus | RedlegJake | redsfan1995 | The Operator | Tommyjohn25