Sample Size

**Ironman92** · 08-25-2013, 11:30 PM

What enough of a sample size for you?

Hitter starting their career?

Innings for a pitcher?

Head to head pitching/hitting match ups?

How good a team is?

**BearcatShane** · 08-25-2013, 11:32 PM

What enough of a sample size for you?

Hitter starting their career? 1000 PA's

Innings for a pitcher? 400

Head to head pitching/hitting match ups? 40

How good a team is? Pythag after 80 games

But that's just my 2 cents.

**dougdirt** · 08-26-2013, 01:04 AM

Originally Posted by Ironman92

What enough of a sample size for you?

Hitter starting their career?

Innings for a pitcher?

Head to head pitching/hitting match ups?

How good a team is?

It all depends.

Starting their career.... what did their minor league career and scouting report say? How are the peripherals?

Innings for a pitcher.... depends on what question I am trying to answer.

Depends. If a guy is 10-10 off of a guy, it is enough of a sample. If he is 3-10, it isn't.

A team.... that depends. Do the team stats look like they are right or is the team sporting a .340 BABIP? Is the team relying on pitching with a team BABIP against of .250?

**AtomicDumpling** · 08-26-2013, 04:00 AM

Originally Posted by Ironman92

What enough of a sample size for you?

Hitter starting their career?

Innings for a pitcher?

Head to head pitching/hitting match ups?

How good a team is?

Hitter starting their career? 1200 plate appearances

Innings for a pitcher? 300 innings

Head to head pitching/hitting match ups? 1200 plate appearances (pitcher/hitter matchup stats are totally worthless)

How good a team is? 162 games of pythag

The larger the sample size the more certainty you have. You can still develop an estimate of a player's value with a smaller sample size but you would have a very large margin of error which shrinks very slowly as the sample size grows.

**mth123** · 08-26-2013, 06:35 AM

Originally Posted by AtomicDumpling

Hitter starting their career? 1200 plate appearances

Innings for a pitcher? 300 innings

Head to head pitching/hitting match ups? 1200 plate appearances (pitcher/hitter matchup stats are totally worthless)

How good a team is? 162 games of pythag

The larger the sample size the more certainty you have. You can still develop an estimate of a player's value with a smaller sample size but you would have a very large margin of error which shrinks very slowly as the sample size grows.

Mostly agree with this. I think there are definitely Hitter/Pitcher match-ups that should be avoided/exploited depending on your perspective, but I'm not sure the Stats are where to look for that information.

Less than these still can give clues, but I generally am a believer that first year stats are not to be trusted when projecting a guy going forward and second year stats may actually mislead the opposite direction. I think the third full season tells the tale. Part-time stats can be skewed by managing a guys match-ups to where he succeeds (platoons, pitcher type, etc.). They can possibly tell something about how to match him up, but as we've repeatedly seen (Nunnally, Stynes, Heisey, Paul, Janish, etc), have little relationship to how a guy would perform on a daily basis.

**nate** · 08-26-2013, 09:12 AM

Originally Posted by junkhead

Even when the sample sizes are small, pitcher/hitter matchup stats are not totally worthless.
Below are Choo's batting lines vs Reds pitchers when he was an Indian.

Code:

                                                                    
                   PA AB  H 2B 3B HR RBI BB SO   BA  OBP   SLG   OPS
Bronson Arroyo     15 14  8  3  0  4   7  1  0 .571 .600 1.643 2.243
Homer Bailey       11  8  3  0  1  0   0  3  2 .375 .545  .625 1.170
Mat Latos           7  7  3  1  0  1   2  0  2 .429 .429 1.000 1.429
Mike Leake          6  5  3  1  0  2   2  1  1 .600 .667 2.000 2.667
Sam LeCure          5  2  0  0  0  0   0  3  0 .000 .600  .000  .600
Alfredo Simon       3  3  1  0  0  0   1  0  2 .333 .333  .333  .667
Jonathan Broxton    2  1  0  0  0  0   0  1  0 .000 .500  .000  .500
Aroldis Chapman     2  2  1  0  0  0   0  0  0 .500 .500  .500 1.000
J.J. Hoover         2  2  0  0  0  0   0  0  2 .000 .000  .000  .000
Logan Ondrusek      1  1  0  0  0  0   0  0  0 .000 .000  .000  .000
Total              54 45 19  5  1  7  12  9  9 .422 .519 1.044 1.563

Against Arroyo, 8(3 doubles 4 homers) for 14.
Was Choo just lucky? No.

If you're trying to determine what would happen going forward, these are worthless. I wouldn't even consider the total to be meaningful.

If you're trying to say "yep, that's what happened," these are priceless.

**nate** · 08-26-2013, 09:13 AM

RMR does a great yearly post of what makes a meaningful sample. Maybe he will repost it.

**AtomicDumpling** · 08-26-2013, 05:47 PM

Originally Posted by junkhead

Even when the sample sizes are small, pitcher/hitter matchup stats are not totally worthless.
Below are Choo's batting lines vs Reds pitchers when he was an Indian.

Code:

                                                                    
                   PA AB  H 2B 3B HR RBI BB SO   BA  OBP   SLG   OPS
Bronson Arroyo     15 14  8  3  0  4   7  1  0 .571 .600 1.643 2.243
Homer Bailey       11  8  3  0  1  0   0  3  2 .375 .545  .625 1.170
Mat Latos           7  7  3  1  0  1   2  0  2 .429 .429 1.000 1.429
Mike Leake          6  5  3  1  0  2   2  1  1 .600 .667 2.000 2.667
Sam LeCure          5  2  0  0  0  0   0  3  0 .000 .600  .000  .600
Alfredo Simon       3  3  1  0  0  0   1  0  2 .333 .333  .333  .667
Jonathan Broxton    2  1  0  0  0  0   0  1  0 .000 .500  .000  .500
Aroldis Chapman     2  2  1  0  0  0   0  0  0 .500 .500  .500 1.000
J.J. Hoover         2  2  0  0  0  0   0  0  2 .000 .000  .000  .000
Logan Ondrusek      1  1  0  0  0  0   0  0  0 .000 .000  .000  .000
Total              54 45 19  5  1  7  12  9  9 .422 .519 1.044 1.563

Against Arroyo, 8(3 doubles 4 homers) for 14.
Was Choo just lucky? No.

No it doesn't mean he was "lucky", it doesn't mean anything at all really. It records the fact that he had those hits, but it says absolutely nothing about how Choo is likely to hit against Arroyo the next time they square off. It would be absurd to think that Choo would continue to hit .571 if he faced Arroyo another 100 times.

If you flipped a coin 10 times and got 8 heads and 2 Tails it doesn't mean that you are a good Heads-flipper. Your chance of getting Heads is still 50% regardless of the fact you "earned" an .800 batting average in your small sample.

If a batter has a "True Batting Average" of .300 it doesn't mean that he is going to hit exactly .300 in every sample. It means that he has a 30% chance of getting a hit each at-bat, but he will still have small samples where he gets a hit 40% or 60% or even 100% of the time, and in other samples he will get hits in only 20%, 10% or 0% of his at-bats. Even if that batter were to face the same pitcher in all 650 of his plate appearances during a season we wouldn't expect him to get exactly 3 hits in every 10 ABs.

Similarly, if you divide up a hitter's season into some arbitrary splits that have nothing to do with his talent or skills you are likely to get different results in those splits. For example, if you divided up Choo's season stats for each day of the week do you think his batting average will be exactly the same on Monday, Tuesday, Wednesday, Thursday, Friday, Saturday and Sunday? Probably not. If the results are different would you conclude that the day of the week has an affect on his hitting skill or would you conclude that it is random fluctuation (some people mistakenly call that luck) based on the sample size?

There have been quite a few studies done on batter/pitcher matchup stats and the results have shown these stats to have no predictive value.

**nate** · 08-26-2013, 05:50 PM

Originally Posted by AtomicDumpling

If a batter has a "True Batting Average" of .300 it doesn't mean that he is going to hit exactly .300 in every sample. It means that he has a 30% chance of getting a hit each at-bat, but he will still have small samples where he gets a hit 40% or 60% or even 100% of the time, and in other samples he will get hits in only 20%, 10% or 0% of his at-bats.

This is really the crux.

Most splits outside of handedness have no predictive value.

Historic? Yep, that's what happened.

**RedsManRick** · 08-26-2013, 06:35 PM

It depends completely on what you're trying to do with the data. If you're talking about estimations of talent (which I assume you are), it still depends on what skill you're looking at. Certain attributes "stabilize" very quickly.

One of the notable sabermetricians (and real world stats teacher) "Pizza Cutter" used a method called "split-half reliability" to find the sample size at which the majority of the variation in a player's performance can be explained by factors within the player himself.

http://www.fangraphs.com/library/pri...s/sample-size/

Code:

Stabilization Points for Offense Statistics:

60   PA: Strikeout rate
120  PA: Walk rate
240  PA: HBP rate
290  PA: Single rate
1610 PA: XBH rate
170  PA: HR rate
910  AB: AVG
460  PA: OBP
320  AB: SLG
160  AB: ISO
80  BIP: GB rate
80  BIP: FB rate
600 BIP: LD rate
50  FBs: HR per FB
820 BIP: BABIP


Stabilization Points for Pitching Statistics:

70   BF: Strikeout rate
170  BF: Walk rate
640  BF: HBP rate
670  BF: Single rate
1450 BF: XBH rate
1320 BF: HR rate
630  BF: AVG
540  BF: OBP
550  AB: SLG
630  AB: ISO
70  BIP: GB rate
70  BIP: FB rate
650 BIP: LD rate
400  FB: HR per FB
2000BIP: BABIP

You can ballpark estimate IP as BF (batters faced) divided 4 and BIP divided by 2

**davereds24** · 08-26-2013, 06:41 PM

threeve

**RedsManRick** · 08-26-2013, 06:47 PM

Originally Posted by AtomicDumpling

No it doesn't mean he was "lucky", it doesn't mean anything at all really. It records the fact that he had those hits, but it says absolutely nothing about how Choo is likely to hit against Arroyo the next time they square off. It would be absurd to think that Choo would continue to hit .571 if he faced Arroyo another 100 times.

Good post. however, to clarify, it's very possibly that Choo was "lucky". It's possible he had batted balls that usually become outs but which, for whatever reason, due to circumstances beyond his control, happened to fall in that day. Maybe the fielder slipped. Maybe he was just a poor defender. Maybe the wind was playing tricks with the ball.

As you described, the problem with the small sample is simply that we don't have enough observations for either the random variation of performance OR luck to even out.

Generally speaking, that player's underlying ability will result in a stable level of performance over a large enough sample. Some of that is just a function of him having more chances to fail/succeed. But it's also a function of his "luck" evening out. Again, as you point out, we should just be careful not to confuse true luck (external, idiosyncratic, but real, influences on the outcome) from performance variance which is within the player's control.

**Ironman92** · 08-26-2013, 10:14 PM

Personally....for a hitter, after his second season. This allows for him to get a good start, pitchers adjust....he adjust back, sophomore slump.....then we'll see what he's made of. I wanna see Cespedes next year.

For pitchers.....a couple times thru the league and several situations of bad luck, calls situations.

Head to head....the best and worst need little. 4/5 with 2 doubles and a HR...he's seeing him well. 0-7 with 4 K...good enough for me. The 3 for 10's likely play out that way

A full season tells me about the team

**Old school 1983** · 08-27-2013, 07:47 PM

Originally Posted by davereds24

threeve

Texas with a dollar sign

**max venable** · 08-27-2013, 08:00 PM

Evidently, sample size depends on who you're talking about. If you bring up the fact that Votto is 1 for 8 with the bases loaded this year, people will be super-quick to point out that the sample size is waaaaay too small.

But if you talk about Allen Craig being 7 for 10 with the bases loaded, you will hear about what an outstanding clutch hitter he is.

Thread: Sample Size

Thread Tools

Display

Sample Size

Re: Sample Size

Re: Sample Size

Re: Sample Size

Re: Sample Size

Re: Sample Size

Re: Sample Size

Likes:

Re: Sample Size

Re: Sample Size

Re: Sample Size

Re: Sample Size

Likes:

Re: Sample Size

Re: Sample Size

Re: Sample Size

Re: Sample Size

Posting Permissions