I hear small sample size quite a bit on this website. We are now over 10% into the season. When is it no longer a small sample size? In every situation I've ever seen (outside of baseball), a 10% sample is pretty adequate.
I hear small sample size quite a bit on this website. We are now over 10% into the season. When is it no longer a small sample size? In every situation I've ever seen (outside of baseball), a 10% sample is pretty adequate.
one tenth of anything is not adequate
10% is fine in manufacturing but not in cases with a great deal of variation such as a baseball season. This year that 10% came in under abnormally cold conditions which favors the pitchers. Give it time for the climate to even things out and for slow starters to get going and then look at your sample. I would compare this to purchasing a new machine and sampling the first 100 pieces and expecting the next 900 to be the same. You wouldn't do that in that situation because their is a learning curve for the operators and usually a break-in period for the machine. I think it works similarly in baseball.
Sometimes 10% is adequate, but that depends on what you're researching. And a sample size in a poll is quite different than the first 10% of a baseball season.
I believe that a period 3 weeks is a fairly short amount of time to determine what the season will be like. We can look at the Astros over the past 2 years and see that how you start doesn't determine how you finish.
If you sampled 10% of the games from the season randomly (so 16ish game) you could probably get a semi-acurate prediction of what a players season looked like. But what you have here is the first ten percent of the season, which is intrinsically liked with the situations that occured over the first 16ish games. So cold weather, a hitting slump because of a minor injury, etc.
Tom Shearn... who knew?
Reds reccord when I attend in 2007: 6-1
Exactly, the way these stats are being used isn't a sample, it's an extrapolation. A sample implies a piece of the whole. The whole is unknown as yet. There are a myriad of things that will happen over a season to affect the whole, in order to get a picture of them, one would take a random sample, not 20 games in a row.
I'm just like everybody else. I have two arms, two legs and 4,000 hits."
-Pete Rose
NEVERMIND THIS POST...I see what you meant. I disagree...when you look at 16 game stretchs you don't get an accurate sample. Or even a semi accurate sample...real quick here is Dunn's BA for each month last year....(a much larger size than your 16 games) .265 .212 .221 .354 .188 .157. Most of those numbers are not even close to his .234 season BA. Face it, baseball is a game of hot and cold streaks, go to espn.com and look at players stats for last year. Most have dramatic swings from month to month.
Last edited by kaldaniels; 04-23-2007 at 04:30 PM.
Everyone’s floor is that they get the bubonic plague and take out the whole roster before they die themselves. - Dave Cameron
Good explaination, Rotnoid.
Rem
Because I have no life I looked up a random number generator online and had it randomly pick out 16 games for AD during the 2006 season. I compiled the numbers and multiplied by 10. It seems as if this RNG is not a big AD fan.
This was only a test using a random sampling to see how close it would get.Code:ave - .183 slg - .350 obp - .258 ops - .608 ab - 600 hits - 110 runs - 90 hr - 20 rbi - 70 k - 300 bb - 60
I just wanted to see what it would look like. Infer nothing.
Nice work super sleuth. I guess this RNG picked some games dunner was not doing so hot in... for the record, I don't think 10% is that great a sample in baseball even if you have the whole season (like what you did) not bashing your work, just disagreeing with the original post that 10% is enough to draw conclusions, especially over a 162 game season.
Tom Shearn... who knew?
Reds reccord when I attend in 2007: 6-1
You can go further with this and assume that there is a basic "ability" level which determines how well a batter will do (say...hit .300).
All other variations, such as weather, opposing pitchers, situation, small injuries are just randomness of the sampling process.
You can then calculate the probability of said hitter attaning a certain performance (use BA...its easier) based on sample size.
Some examples:
Prob. of a .300 hitter underperforming:
.200 or less after 50 AB: 7.9%
.140 or less after 50 AB: 0.7%
.200 or less after 100 AB: 1.7%
.140 or less after 100 AB: 0.02%
So, if you believe your guy is .300 "talent" and he slumps badly over a couple of weeks...it can be randomness. If the slump is really bad...it could be a "career" slump.
If the "career" slump continues for another two weeks...most likely your guy isn't .300 talent anymore.
"A person is smart. People are dumb, panicky, dangerous animals and you know it."
http://dalmady.blogspot.com
Board Moderators may, at their discretion and judgment, delete and/or edit any messages that violate any of the following guidelines: 1. Explicit references to alleged illegal or unlawful acts. 2. Graphic sexual descriptions. 3. Racial or ethnic slurs. 4. Use of edgy language (including masked profanity). 5. Direct personal attacks, flames, fights, trolling, baiting, name-calling, general nuisance, excessive player criticism or anything along those lines. 6. Posting spam. 7. Each person may have only one user account. It is fine to be critical here - that's what this board is for. But let's not beat a subject or a player to death, please. |