PDA

View Full Version : How Big of a Sample is Big Enough?

RedsManRick
07-16-2012, 03:01 PM
BP just released an updated version of its study about when certain metrics become more signal than noise.

Statistic Definition Stabilized at Notes

GB*rate GB/BIP 80 BIP Min 1000 BIP, Retrosheet classifications used
FB*rate (FB+PU)/BIP 80 BIP Min 1000 BIP including*HR
HR*per*FB HR/FB 50 FBs (~125 BIP) Min 500*FB
LD rate LD/BIP 600 BIP Min 1000 BIP including*HR, Estimate*
BABIP Hits/BIP 820 BIP Min 1000 BIP,*HR*not included

Strikeout rate K/PA 60*PA
Walk rate BB/PA 120*PA IBB's not included
ISO (2B+2*3B+3*HR)/AB 160*AB Min 2000 ABs, Cronbach's alpha used
HR*rate HR/PA 170*PA
HBP*rate HBP/PA 240*PA
Single rate 1B/PA 290*PA
SLG (1B+2*2B+3*3B+4*HR)/AB 320*AB (~350 PA) Min 2000 ABs, Cronbach's alpha used, Estimate*
OBP (H+HBP+BB)/PA 460*PA
AVG H/AB 910*AB (~1000 PA) Min 2000 ABs
XBH rate (2B+3B)/PA 1610*PA Estimate*

For the same of argument, let's just that about 2/3 of plate appearances, on average, result in a ball in play. So you can multiply the BIP figures by 1.5 to get a rough comparison to PA.

My big takeaway: There's a reason the three true outcomes are given the most attention. They stabilize the most quickly, suggesting some combination of two things:

1) They are inherently more a function of skill than the other metrics
2) The skills they are indicators of have less natural fluctuation than the skills that the others do

Not surprisingly, the more other people (defense) and things (park effects) affect the outcomes, the bigger the sample needs to be to tell you about the skill the player himself possesses.

One easy way to interpret this is thus: If we don't have a big enough sample yet (per this table), then the players career stats (or a projection based on them) are a better predictor of how he's likely to perform moving forward than how he's performed in this sample.

nate
07-16-2012, 09:40 PM
Excellent post, Rick.

Vottomatic
07-16-2012, 10:13 PM