Turn Off Ads?
Page 1 of 2 12 LastLast
Results 1 to 15 of 21

Thread: Interesting new look at BABIP and 'luck'

  1. #1
    The Boss dougdirt's Avatar
    Join Date
    Jan 2006
    Posts
    35,045

    Interesting new look at BABIP and 'luck'

    I thought this would be something of interest for most of the people on here. Its an article written by Chris Dutton and Peter Bendix and is currently on TheHardballTimes.com. Its a longish read, but some fairly interesting stuff.
    http://www.hardballtimes.com/main/ar...ers-and-babip/
    What we did

    Batting average on balls in play—the rate at which batted balls other than home runs become hits—is commonly used as a measure of pitching performance. However, precious little work has been done to explore BABIP from the hitter’s perspective. While luck is bound to play a large role in determining whether a ball in play will become a hit or an out, there are certainly some quantifiable aspects of hitting ability that give a batter at least some control of the outcome.

    Some people like to add .120 to a batter's Line Drive Percentage to predict his BABIP (a guideline originally suggested by Dave Studeman). But one would expect that BABIP depends on more than just the ability to hit line drives. Speed, for instance, clearly seems to play a significant role. And what about the ability to control the strike zone, make consistent contact and hit the ball to all fields?

    For example, if Jacoby Ellsbury hits a ground ball in the hole between short and third, he has a higher chance of getting a hit than if Bengie Molina hits the exact same ball in the exact same place. Anecdotally, this is how Ichiro manages to get so many hits every year. And fans of the Red Sox, Yankees and Rays can tell you that David Ortiz, Jason Giambi and Carlos Pena have been robbed of many a base hit because of the extreme defensive shifts used against them, whereas Dustin Pedroia, Derek Jeter and BJ Upton have gotten more hits because of their batting eye and their ability to use the whole field. Surely, these factors contribute to whether or not a batted ball becomes a hit.

    We endeavored to take a more scientific look at batted ball data to develop a better method of finding a hitter’s expected BABIP. Using Baseball Prospectus data from 2002-2008, we calculated a range of variables that we considered to be the primary factors in determining BABIP:
    Code:
    Variable 	Description
    BABIP 	Batting average on balls in play, calculated as non-homerun hits divided by balls in play ((h-hr)/(pa-so-bb-hr)).
    Hitter_Eye 	A measure of plate discipline and knowledge of the strike zone, calculated as (BB rate/SO rate).
    Pitches_perEBH 	Pitches per extra base hit, which is a measure of how often a hitter makes solid contact (pitches/(doub+trip+hr)).
    LD_per 	Line drive percentage, as defined by MLB Advanced Media and provided by Baseball Prospectus.
    FB_GB_ratio 	Fly ball/ground ball ratio, using percentages provided by Baseball Prospectus.
    Speed Score 	A comprehensive measure of speed, developed by Bill James. The speed score is the average of five individual formulas based on stolen base percentage, stolen base attempts, triples, runs per time on base and double plays.
    Contact_Rate 	A measure of the ability to make contact and avoid striking out, simply calculated as ((ab-so)/ab).
    Spray 	Measure of how well a hitter distributes balls in play to the entire field. Calculated as |1(LF%) + -1(RF%)|.
    Pitches 	A hitter’s average number of pitches per plate appearance, to account for patience and selectiveness at the plate.
    Park 	        A vector of binary stadium variables, to account for the influence of park effects on BABIP.
    Year 	        A vector of year variables from 2002 through 2007, to account for potential time effects.
    Lefty 	        A binary variable equal to “1” if the hitter is a lefty, “0” otherwise.
    Switch 	        A binary variable equal to “1” if the player is a switch hitter, “0” otherwise.
    Using this dataset, we designed a regression model to determine the relationship between each factor and a hitter’s BABIP. Essentially, the model takes seven years worth of data and compresses it into a single formula that inputs the variables above and spits out a predicted BABIP. Using this, we can compare players’ actual and predicted BABIP to identify instances in which a player significantly outperformed or underperformed his expectations. Furthermore, we can use the model to strip luck from the equation and calculate a “luck-neutral” measure of BABIP.

    Our regression model yields an R-squared value of .348, and all non-vector explanatory variables are significant at the 1 percent level. This suggests that the factors included are all highly significant, and jointly explain roughly 35 percent of the variance in a hitter’s BABIP. As an additional test of accuracy, we find a robust 59 percent correlation between actual and predicted BABIP for all players in our sample.

    Given the tremendous uncertainty regarding the outcome of balls in play, these results are extremely promising. By contrast, commonly used models based on line drive percentage alone explain only about 3 percent of the variance in BABIP when applied to the same dataset, and yield a mere 18 percent correlation between predicted and actual values.

    As mentioned above, all of our key independent variables are statistically significant at the 1 percent level. That is to say, there is virtually no chance that the effects reflected in this model are the product of random chance. Our regression results show positive effects for hitter’s eye, line drive percentage, speed score and pitches per plate appearance, all of which conform to common sense. On the other hand, we find negative coefficients on pitches per extra-base hit, fly ball/ground ball ratio, spray and contact rate.

    One might expect a higher contact rate to lead to a higher BABIP, but the opposite actually seems to be the case. This is likely caused by the correlation between strikeouts and power, since players who swing hard tend to either miss entirely or crush the ball for hits. If this theory is reflected in our data, it makes sense that we would expect a player with a lower contact rate to generate a higher predicted BABIP. This is consistent with Studeman's follow-up work on BABIP.

    What does it mean?

    Okay, now you know what we did. Let’s discuss what it means.

    We’ve developed a new and better way of finding a batter’s expected BABIP. We will call our model’s predicted BABIP "xBABIP," in contrast to the old way of calculating BABIP, which was LD% + .120. We will refer to this old model of calculating expected BABIP as "old-xBABIP."

    The idea is to separate skill from variance. We’ve isolated a batter’s skill at getting hits on balls in play; therefore, we can assume that most deviation in BABIP from our model’s predicted BABIP is likely due to random fluctuation, and therefore unlikely to be repeated.

    We can actually test this theory by looking to the past. Let’s examine the players whose actual BABIPs differed most from their xBABIPs in 2007 (the expected BABIP as predicted by our model), and then look at what happened in 2008. Our hypothesis is that these players shouldn’t consistently under/over-perform their xBABIP.

    Let’s start with players who were “unlucky” in 2007.
    Code:
      YEAR    NAME             BABIP xBABIP  Diff  YEAR  NAME            BABIP xBABIP  Diff
      2007  Ramon Vazquez      .258    .322 -.063  2008  Ramon Vazquez    .342   .322  .020
      2007  John Buck          .233    .283 -.049  2008  John Buck        .269   .282 -.013
      2007  Bobby Crosby       .253    .303 -.050  2008  Bobby Crosby     .275   .276 -.001
      2007  Julio Lugo         .258    .309 -.051  2008  Julio Lugo       .312   .284  .029
      2007  Ray Durham         .231    .276 -.044  2008  Ray Durham       .342   .313  .029
      2007  Lyle Overbay       .270    .321 -.051  2008  Lyle Overbay     .311   .310  .001
      2007  Rickie Weeks       .270    .321 -.050  2008  Rickie Weeks     .266   .294 -.028
      2007  Dioner Navarro     .243    .286 -.043  2008  Dioner Navarro   .313   .303  .010
      2007  Brad Wilkerson     .269    .316 -.046  2008  Brad Wilkerson   .267   .279 -.011
      2007  Jay Payton         .261    .299 -.038  2008  Jay Payton       .266   .304 -.038
      2007  Adam Lind          .265    .303 -.038  2008  Adam Lind        .313   .302  .011
      2007  Ian Kinsler        .267    .305 -.038  2008  Ian Kinsler      .325   .295  .030
      2007  Nick Punto         .251    .285 -.034  2008  Nick Punto       .331   .304  .027
      2007  Dan Uggla          .268    .304 -.036  2008  Dan Uggla        .313   .294  .018
    Wow—that’s pretty compelling evidence for the model. We didn’t cherry-pick these, either—these were the “unluckiest” hitters of 2007 who also had enough plate appearances to qualify for our model in 2008. Only Rickie Weeks and Jay Payton saw their actual BABIP remain below their xBABIP in 2008, while everyone else had a 2008 BABIP that was either very close to their xBABIP, or above it. Had we seen these numbers after 2007, we may have been able to predict the rise of Vazquez, Navarro, Lind, Kinsler and Uggla—all of whom seemingly "came out of nowhere" in 2008.

    And what about hitters who were particularly lucky in 2007?
    Code:
      YEAR    NAME             BABIP  xBABIP Diff  YEAR   NAME           BABIP xBABIP  Diff
      2007  Matt Kemp          .411    .301  .110  2008  Matt Kemp        .359   .312  .047
      2007  Ichiro Suzuki      .384    .317  .067  2008  Ichiro Suzuki    .330   .307  .023
      2007  Willy Taveras      .355    .293  .062  2008  Willy Taveras    .282   .292 -.010
      2007  Magglio Ordonez    .379    .315  .064  2008  Magglio Ordone   .331   .303  .028
      2007  Howie Kendrick     .374    .314  .060  2008  Howie Kendrick   .351   .316  .035
      2007  Jayson Werth       .380    .322  .058  2008  Jayson Werth     .319   .314  .005
      2007  Mark Reynolds      .368    .313  .055  2008  Mark Reynolds    .319   .304  .014
      2007  Edgar Renteria     .373    .319  .053  2008  Edgar Renteria   .289   .301 -.012
      2007  Mike Lowell        .335    .288  .047  2008  Mike Lowell      .278   .282 -.003
      2007  Ryan Braun         .353    .304  .050  2008  Ryan Braun       .301   .287  .014
      2007  David Ortiz        .352    .306  .046  2008  David Ortiz      .269   .302 -.033
      2007  Jose Vidro         .333    .290  .043  2008  Jose Vidro       .242   .290 -.048
      2007  B.j. Upton         .387    .338  .048  2008  B.j. Upton       .340   .340  .000
      2007  Luis Castillo      .318    .284  .034  2008  Luis Castillo    .258   .245  .013
    Again good results, although more mixed. Kemp, Ichiro and Kendrick again significantly beat their xBABIP in 2008. Interestingly, Ichiro and Kendrick are both known to be unique hitters. Does Matt Kemp do anything differently than most other hitters?

    But nearly all of the "lucky" players in 2007 regressed in 2008. The model predicted the downfall of Renteria, Taveras, Vidro (although he was also quite unlucky in 08) and Castillo. It correctly predicted a return-to-earth for Upton, Reynolds, Ortiz, Braun and Lowell.

    Next, let’s look at hitters for whom xBABIP disagreed strongly with old-xBABIP. Here are the top cases where old-xBABIP overrated players in 2008:
    Code:
      YEAR  NAME                BABIP        xBABIP   old-xBABIP
      2008  Brian Schneider     .275         .289         .355
      2008  Ryan Ludwick        .333         .325         .404
      2008  Kevin Millar        .244         .257         .311
      2008  Jesus Flores        .309         .293         .353
      2008  Omar Infante        .323         .294         .356
      2008  Joey Gathright      .278         .156         .215
      2008  Jose Lopez          .302         .286         .346
      2008  Khalil Greene       .251         .276         .336
      2008  Cesar Izturis       .271         .286         .344
      2008  Todd Helton         .296         .312         .369
      2008  John Bowker         .296         .308         .364
      2008  Damion Easley       .278         .262         .317
      2008  Paul Konerko        .243         .280         .330
      2008  Clint Barmes        .322         .308         .360
      2008  Freddy Sanchez      .282         .304         .356
      2008  Jack Wilson         .284         .299         .348
      2008  Omar Vizquel        .239         .277         .326
      2008  Dioner Navarro      .313         .303         .352
      2008  Xavier Nady         .327         .321         .370
    For these players, the old guideline would lead you to believe that the players had been rather unlucky this season. However, our new model shows that these players were far less unlucky than previously thought. In other words, simply using line-drive percentage to predict BABIP overrated these players.

    And the players that were most underrated by the old model:
    Code:
      YEAR  NAME                BABIP        xBABIP   old-xBABIP
      2008  Gary Matthews Jr.   .289         .307         .252
      2008  Hunter Pence        .298         .290         .236
      2008  Jeff Mathis         .231         .269         .217
      2008  Alexi Casilla       .288         .281         .235
      2008  Fred Lewis          .365         .336         .293
      2008  Carlos Gomez        .324         .301         .260
      2008  Delmon Young        .334         .306         .268
      2008  Nick Punto          .331         .304         .267
      2008  Jacoby Ellsbury     .305         .326         .290
      2008  Lance Berkman       .336         .309         .273
      2008  Rickie Weeks        .266         .294         .260
      2008  Denard Span         .328         .338         .306
      2008  Michael Bourn       .283         .277         .246
      2008  Yunel Escobar       .303         .296         .265
      2008  Erick Aybar         .297         .304         .274
      2008  Brendan Harris      .312         .297         .273
      2008  Jason Varitek       .270         .295         .272
      2008  Coco Crisp          .308         .321         .298
      2008  Howie Kendrick      .351         .316         .294
    For the most part, our model believes these players’ actual BABIP are closer in line with expectations than the old model’s xBABIP. In other words, old-xBABIP may think that Alexi Casilla got lucky, but our model suggests he hit in line with expectations. Simply using line-drive percentage to predict BABIP underrated these players.

    Finally, let’s take a look at the players who were the most lucky and unlucky this season. We’d expect that many of these players will regress in 2009—not necessarily all are going to, as some are simply going to get lucky or unlucky again. However, we can be confident that most of these players will experience regression in '09.

    Let’s start with 2008’s luckiest hitters:
    Code:
      YEAR  NAME                BABIP   xBABIP   Diff
      2008  Joey Gathright       .278    .156    .122
      2008  Chipper Jones        .382    .325    .058
      2008  Matt Kemp            .359    .312    .047
      2008  Ryan Theriot         .335    .291    .044
      2008  Felipe Lopez         .324    .287    .037
      2008  Milton Bradley       .375    .334    .041
      2008  Aaron Miles          .337    .301    .037
      2008  Yadier Molina        .307    .274    .033
      2008  Shin-soo Choo        .359    .320    .039
      2008  Geovany Soto         .331    .295    .036
      2008  Mike Aviles          .355    .317    .038
      2008  Reed Johnson         .338    .302    .036
      2008  Jason Bay            .318    .285    .033
      2008  Chone Figgins        .329    .295    .034
      2008  Chase Headley        .356    .319    .036
      2008  Howie Kendrick       .351    .316    .035
      2008  Edgar V Gonzalez     .335    .302    .033
      2008  Ryan Doumit          .328    .297    .031
      2008  Manny Ramirez        .360    .326    .034
      2008  Aaron Rowand         .318    .288    .029
    Unsurprisingly, this list includes a lot of 2008’s surprises—Bradley, Miles, Aviles, Doumit, Choo, Lopez. Interestingly, Gathright’s xBABIP of .156 was nearly 90 points lower than the next-closest person (and remember, we do take speed into account in the model). Maybe Geovany Soto isn’t quite this good. Perhaps Manny Ramirez and Milton Bradley will disappoint whoever signs them. The Cardinals’ middle infielders aren't as good as they seemed.

    And 2008’s unluckiest hitters:
    Code:
      YEAR  NAME                BABIP   xBABIP   Diff
      2008  Brandon Inge         .229    .292   -.063
      2008  Corey Patterson      .210    .262   -.051
      2008  Carlos Ruiz          .230    .282   -.052
      2008  Willy Aybar          .261    .314   -.054
      2008  Jason Giambi         .234    .282   -.048
      2008  Nick Swisher         .245    .294   -.049
      2008  Jose Vidro           .242    .290   -.048
    
      2008  Kenji Johjima        .226    .266   -.040
      2008  Austin Kearns        .242    .284   -.042
      2008  Jeff Mathis          .231    .269   -.038
      2008  Omar Vizquel         .239    .277   -.038
      2008  Adrian Beltre        .275    .319   -.044
      2008  Mike Jacobs          .259    .300   -.040
      2008  Paul Konerko         .243    .280   -.038
      2008  Brandon Boggs        .296    .342   -.046
      2008  Jim Edmonds          .246    .283   -.037
      2008  Eric Hinske          .267    .306   -.038
      2008  Willie Harris        .268    .306   -.038
      2008  Jay Payton           .266    .304   -.038
      2008  Gabe Gross           .272    .308   -.036
    Some team is going to get a steal in Jason Giambi. Willy Aybar is deserving of full-time action. Nick Swisher and Austin Kearns are a lot better than they showed in 08. Would you believe that Brandon Boggs had the highest xBABIP in 2008 of any player in our database? Jim Edmonds may not be done quite yet. Adrian Beltre is very underrated.

    While our model cannot explain all of the variation in BABIP, we believe that it is an improvement over current explanations of BABIP, as it takes into account many factors that influence a hitter’s BABIP. By finding players who over- and under-performed their expected BABIP, we can further isolate skill from luck, and infer that players such as Mike Aviles are likely to regress and player such as Nick Swisher are likely to improve.



    References and Resources
    We owe a tremendous amount of thanks to Leanne, Dave, Jeremy, Steven and Kevin, who actively conducted the research with us, as part of Baseball Analysis at Tufts’ (BAT) Research Committee. The Committee, headed by Dutton, met once a week throughout the 2007/2008 academic year to discuss and conduct research, as well as analyze the results.

    BAT, founded by Bendix and Matt Gallagher in 2005, is the first baseball analysis club on a college campus. It has hosted such speakers as Bill James, Alan Schwarz, Keith Law, sportswriters from the Boston Globe and more. It continues to host various speakers and events, as well as provide a forum for intelligent baseball discussion and research on the Tufts campus. For more information, please contact Peter Bendix at peterabendix@gmail.com or Chris Dutton at csdutton33@gmail.com.

    Chris Dutton and Peter Bendix established a sabermetrics fan club and research committee as Tufts University students in 2006. Bendix became co-founder and President of Baseball Analysis at Tufts (BAT) while Dutton founded and directed the research team. As a group, BAT has conducted a variety of research projects using economic analysis and statistical tools. Additional work by the authors can be found at Beyond the Boxscore, FanGraphs, and Bleacher Report.

  2. Turn Off Ads?
  3. #2
    It's showtime! RedEye's Avatar
    Join Date
    Feb 2006
    Location
    Atlanta, GA
    Posts
    7,941

    Re: Interesting new look at BABIP and 'luck'

    Beltre comes out well here.

    Some team is going to get a steal in Jason Giambi. Willy Aybar is deserving of full-time action. Nick Swisher and Austin Kearns are a lot better than they showed in 08. Would you believe that Brandon Boggs had the highest xBABIP in 2008 of any player in our database? Jim Edmonds may not be done quite yet. Adrian Beltre is very underrated.
    Nice to be over 2,000 posts.
    "Iíll kind of have a foot on the back of my own butt. Thatís just how I do things.Ē -- Bryan Price, 10/22/2013

  4. #3
    Five Tool Fool jojo's Avatar
    Join Date
    Nov 2006
    Posts
    18,616

    Re: Interesting new look at BABIP and 'luck'

    Nick Swisher and Austin Kearns are a lot better than they showed in 08. Adrian Beltre is very underrated.
    Too much holiday turkey in mom's basement?
    "This isnít stats vs scouts - this is stats and scouts working together, building an organization that blends the best of both worlds. This is the blueprint for how a baseball organization should be run. And, whether the baseball men of the 20th century like it or not, this is where baseball is going."---Dave Cameron, U.S.S. Mariner

  5. #4
    He has the Evil Eye! flyer85's Avatar
    Join Date
    Jul 2004
    Location
    south of the border
    Posts
    23,858

    Re: Interesting new look at BABIP and 'luck'

    the new BABIP took a big chunk out of Jay Bruce. xBABIP from 343 to 306
    What are you, people? On dope? - Mr Hand

  6. #5
    Member SMcGavin's Avatar
    Join Date
    Jan 2005
    Location
    Indianapolis, IN
    Posts
    1,483

    Re: Interesting new look at BABIP and 'luck'

    Very interesting article. It continues to boggle my mind that research like this is available for free.

    Quote Originally Posted by flyer85 View Post
    the new BABIP took a big chunk out of Jay Bruce. xBABIP from 343 to 306
    But his actual BABIP (.296) was pretty close to in line with his xBABIP. A little unlucky if anything, but Bruce's 08 numbers were close to his true skill (according to this).

    Meanwhile Encarnacion, a guy we have talked about a lot on here: Actual BABIP .257, expected BABIP .275. This suggests that he was a little unlucky in 08, but not much.

    Other Reds, this model has Joey Votto right on his expected BABIP (which is very high, because Joey Votto is an awesome hitter). It has Phillips a touch unlucky in 08.

  7. #6
    Something clever pahster's Avatar
    Join Date
    Feb 2005
    Location
    Columbia, MO
    Posts
    1,907

    Re: Interesting new look at BABIP and 'luck'

    They didn't report the actual results of the model, nor did they report the adjusted R^2, which is far more important than the R^2 when a model involves that many independent variables. Also, they need to present a correlation matrix of their independent variables. It would not at all surprise me if their model suffered from multicolinearity because a lot of those IVs seem to be derived from the same data points.

  8. #7
    BobC, get a legit F.O.! Mario-Rijo's Avatar
    Join Date
    Apr 2005
    Location
    Springfield, Ohio
    Posts
    9,052

    Re: Interesting new look at BABIP and 'luck'

    Quote Originally Posted by pahster View Post
    They didn't report the actual results of the model, nor did they report the adjusted R^2, which is far more important than the R^2 when a model involves that many independent variables. Also, they need to present a correlation matrix of their independent variables. It would not at all surprise me if their model suffered from multicolinearity because a lot of those IVs seem to be derived from the same data points.
    I have no idea what you just said. But if what you are getting at is that you are not gonna take this at face value just yet due to them not giving you enough info to work with I'll just have to trust you on it.

    That said I hope this is more correct than the previous model and to the extent they suggest. I happen to buy it to some extent because the variables they used did need to be included into the equation at least most of them anyway. I'm not sure about the handedness part and how it is important, but I assume it somehow has it's place.
    Last edited by Mario-Rijo; 12-02-2008 at 03:23 PM.
    "You can't let praise or criticism get to you. It's a weakness to get caught up in either one."

    --Woody Hayes

  9. #8
    Ripsnort wheels's Avatar
    Join Date
    Dec 2002
    Location
    Columbus, Ohio
    Posts
    7,524

    Re: Interesting new look at BABIP and 'luck'

    Quote Originally Posted by pahster View Post
    They didn't report the actual results of the model, nor did they report the adjusted R^2, which is far more important than the R^2 when a model involves that many independent variables. Also, they need to present a correlation matrix of their independent variables. It would not at all surprise me if their model suffered from multicolinearity because a lot of those IVs seem to be derived from the same data points.
    Holy crap.

    Are you a scientist?:
    "We know we're better than this, but we can't prove it." - Tony Gwynn

  10. #9
    Member Cedric's Avatar
    Join Date
    Aug 2002
    Location
    Monroe
    Posts
    6,384

    Re: Interesting new look at BABIP and 'luck'

    Quote Originally Posted by wheels View Post
    Holy crap.

    Are you a scientist?:
    No, but he did stay at a Holiday Inn last night.

    Allright. I had to do it.
    This is the time. The real Reds organization is back.

  11. #10
    Ripsnort wheels's Avatar
    Join Date
    Dec 2002
    Location
    Columbus, Ohio
    Posts
    7,524

    Re: Interesting new look at BABIP and 'luck'

    Quote Originally Posted by Cedric View Post
    No, but he did stay at a Holiday Inn last night.

    Allright. I had to do it.
    "We know we're better than this, but we can't prove it." - Tony Gwynn

  12. #11
    Moderator RedlegJake's Avatar
    Join Date
    Dec 2004
    Location
    North Kansas City, Mo
    Posts
    5,664

    Re: Interesting new look at BABIP and 'luck'

    Quote Originally Posted by pahster View Post
    They didn't report the actual results of the model, nor did they report the adjusted R^2, which is far more important than the R^2 when a model involves that many independent variables. Also, they need to present a correlation matrix of their independent variables. It would not at all surprise me if their model suffered from multicolinearity because a lot of those IVs seem to be derived from the same data points.
    How many people besides Steel even have a clue what he just said?

  13. #12
    Something clever pahster's Avatar
    Join Date
    Feb 2005
    Location
    Columbia, MO
    Posts
    1,907

    Re: Interesting new look at BABIP and 'luck'

    Quote Originally Posted by wheels View Post
    Holy crap.

    Are you a scientist?:
    Er, not yet. :

  14. #13
    Stat Wanker Hodiernus RedsManRick's Avatar
    Join Date
    Dec 2004
    Location
    Chicago, IL
    Posts
    15,910

    Re: Interesting new look at BABIP and 'luck'

    Quote Originally Posted by RedlegJake View Post
    How many people besides Steel even have a clue what he just said?
    I do . Let me try to simplify it.

    Firstly, a standard regression is simply asking the question "How well can I predict A if I know B". The more strongly correlated they are (R^2 close to 1), the better predictor B is of A (If I know B, I know A). A multi-variate regression is basically the same thing, except instead of just B, you have B, C, D, and E.

    A, the thing you are trying to predict, is called the dependent variable. B, C, D, and E are independent variables. In general, an added variable can do one of two things. It can either improve your model, helping you predict A and increase your correlation, or it can add nothing. Given that, the more crap you throw in the model, the better it will predict.

    Now, that's all well and good if all you care about is predicting A as good as you can. But if you want to know which variables are actually helping, you've got a problem. If the independent variables are correlated to each other, the "credit" for their contribution to predicting A is going to be unfairly distributed and you can't tell which variables are really helping. Even though he says each variable is statistically significant (meaning it's helpful in predicting A), if there is multicolinearity, that significance could be due to getting credit it doesn't deserve just because that variable happens to be correlated to one that really does matter.

    Adjusted R^2 is a way of penalizing you for adding more variables to the model. It helps you figure out which variables really add information to the model and get rid of those which are junk.

    So, given all that, I'm not sold that all of those variables listed are really significant. However, I see no reason to seriously question the predicted BABIP the model generates. Those potential extraneous variables just add noise. The ultimate conclusion of the article is still accurate.

    Of course, I haven't taken a stats class in over 4 years. So if I'm dead wrong -- sorry.
    Last edited by RedsManRick; 12-02-2008 at 09:05 PM.
    Games are won on run differential -- scoring more than your opponent. Runs are runs, scored or prevented they all count the same. Worry about scoring more and allowing fewer, not which positions contribute to which side of the equation or how "consistent" you are at your current level of performance.

  15. #14
    Something clever pahster's Avatar
    Join Date
    Feb 2005
    Location
    Columbia, MO
    Posts
    1,907

    Re: Interesting new look at BABIP and 'luck'

    Quote Originally Posted by RedsManRick View Post
    I do . Let me try to simplify it.

    Firstly, a standard regression is simply asking the question "How well can I predict A if I know B". The more strongly correlated they are (R being close to 1 or -1), the better predictor B is of A. A multi-variate regression is basically the same thing, except instead of just B, you have B, C, D, and E.

    A, the thing you are trying to predict, is called the dependent variable. B, C, D, and E are independent variables. In general, an added variable can do one of two things. It can either improve your model, helping you predict A and increase your correlation, or it can add nothing. Given that, the more crap you throw in the model, the better it will.

    Now, that's all well and good if all you care about is predicting A as good as you can. But if you want to know which variables are actually helping, you've got a problem. If the independent variables are correlated to each other, the "credit" for their contribution to predicting A is going to be unfairly distributed and you can't tell which variables are really helping. Even though he says each variable is statistically significant (meaning it's helpful in predicting A), if there is multicolinearity, that significance could be due to getting credit it doesn't deserve just because that variable happens to be correlated to one that really does matter.

    Adjusted R^2 is a way of penalized you for adding more variables to the model and helping you figure out which model is really using the variables which add information and get rid of those which are junk.

    So, given all that, I'm not sold that all of those variables listed are really significant. However, I see no reason to seriously question the predicted BABIP the model generates. Those potential extraneous variables just add noise. The ultimate conclusion of the article is still accurate.

    Of course, I haven't taken a stats class in over 4 years. So if I'm dead wrong -- sorry.
    This is all correct.

  16. #15
    Member SteelSD's Avatar
    Join Date
    Mar 2002
    Posts
    9,322

    Re: Interesting new look at BABIP and 'luck'

    Quote Originally Posted by RedlegJake View Post
    How many people besides Steel even have a clue what he just said?
    pahster's right, and I can guarantee he's more mathematically competent than I am. Basically, there are a number of independent variables that may correlate too highly with each other because they're driven from similar data points. When that happens, it may add noise to the data and that may be an issue with the variances the authors are seeing. If two or more highly correlated independent variables are identified, the study should back out one of them and then guage how that affects a regression coefficient. If the swing is large, then the study results might suffer from the effect pahster noted.

    I'm not saying that the study is invalid. And I'm not saying the results are worse than the incredibly odd notion that we should just add .120 to a hitter's Line Drive rate (a dumb idea due to the fact that Line Drive rates aren't exactly repeatable). But I'd need to have the formula to properly evaluate the independent variables and their relationship to each other. Should all have minimal correlation to each other, then that's good. Otherwise, not so much.
    "The problem with strikeouts isn't that they hurt your team, it's that they hurt your feelings..." --Rob Neyer

    "The single most important thing for a hitter is to get a good pitch to hit. A good hitter can hit a pitch thatís over the plate three times better than a great hitter with a ball in a tough spot.Ē
    --Ted Williams


Turn Off Ads?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

Board Moderators may, at their discretion and judgment, delete and/or edit any messages that violate any of the following guidelines: 1. Explicit references to alleged illegal or unlawful acts. 2. Graphic sexual descriptions. 3. Racial or ethnic slurs. 4. Use of edgy language (including masked profanity). 5. Direct personal attacks, flames, fights, trolling, baiting, name-calling, general nuisance, excessive player criticism or anything along those lines. 6. Posting spam. 7. Each person may have only one user account. It is fine to be critical here - that's what this board is for. But let's not beat a subject or a player to death, please.

Thank you, and most importantly, enjoy yourselves!


RedsZone.com is a privately owned website and is not affiliated with the Cincinnati Reds or Major League Baseball


Contact us: Boss | GIK | BCubb2003 | dabvu2498 | Gallen5862 | LexRedsFan | Plus Plus | RedlegJake | redsfan1995 | The Operator | Tommyjohn25