Saber Shenanigans

I would just like to make a couple of comments on an article posted over on ORG the other day. Here is a link to that article: http://www.redszone.com/forums/showthread.php?t=80750

In short the article deals with modeling systems and baseball in general. It focuses on this particular modeling system - The Marcels - and how it performed with regards to Home Runs hit by the top home run hitters in 2009.

The Marcels predicted that the top 13 predicted home run hitters in 2009 would combine for a forecasted 401 HR. The actual total for these hitters in 2009 was 400. In the words of the author of the blog 'They nailed it!'.

In conclusion the author of the blog states: "So, the forecasting systems work… if you know how to properly interpret what it is they are trying to tell you. "

Soon a number of skeptics replied pretty much along the lines of "It was a fluke". The supporters soon shot back with comments such as "the model is capturing the cohort really well...i.e. it works and the population is being accurately defined.....".

This is the statement that got me thinking about it and I consider it a very reasonable statement: "But what do you consider accurate? What if it's within 3 HR 90% of the time? There are plenty of studies out there that show the accuracy of the various projections systems. You might be surprised how accurate a simple model based on regression the mean can be."

So I decided to test how accurate this model is. A quick aside, I found it odd that the author of the blog would make such a statement about his model using just one year's data. That does not strike me as a very rigorous test. Surely the author could have checked out the Marcel forecasts versus what actually happened in rather short order. It would have made for a far more convincing article if he showed that he 'nailed it' 3 or 4 years running.

I decided to first take a look at 2009. He made the cutoff as 28 home runs which resulted in 13 players. I didn't see in the article any explanation as to why he picked 28 homers or if 13 players (points of data) is enough to test a model. So I moved the cutoff one home run and made it 29 homers. This cut the list down to 9 players. How did that work out?

Code:
```NAME	pHR29	aHR29
Howard	40	45
ARod	32	30
Braun	32	32
Fielder	32	46
Dunn	32	38
Pujols	31	47
Pena	31	39
Thome	30	23
Dye	29	27

289	327```
Marcels predicted 289 Home Runs and the actual amount was 327. This is a difference of 38. Less points of data probably mean less accuracy, but I would say Marcels did not nail it. That definitely is not within 3 or even 3%. It missed by about 13%.

So maybe 9 players is too small of sample set (although it isn't that much smaller than 13). I decided to start testing backwards using 13 players and ties. How did that work out? Here are the list for the years 2008, 2007 and 2006. Remember I took the top 13 players and ties so there is not the same amount of players for each year. What does that show us?

Code:
```Last	pHR08	aHR08	Last	pHR07	aHR07	Last	pHR06	aHR06
Howard	39	48	Howard	40	47	Howard	40	58
ARod	36	35	Ortiz	40	35	Ortiz	40	54
Ortiz	34	23	Pujols	38	32	Pujols	38	49
Pujols	33	37	Dunn	36	40	Dunn	36	40
Fielder	33	34	Jones	35	26	Jones	35	41
Dunn	32	40	Glaus	33	20	Glaus	33	38
Pena	31	31	Konerko	33	31	Konerko	33	35
Berkman	30	29	Ramirez	33	20	Ramirez	33	35
Jones	30	3	ARod    33	54	ARod	33	35
Soriano	29	29	Soriano	33	33	Soriano	33	46
Hafner	28	5	Texeira	33	30	Texeira	33	33
Thome	28	34	Ramirez	32	26	Ramirez	32	38
Dye	28	34	Hafner	31	24	Hafner	31	42
439	404	Sexson	31	21	Sexson	31	34

512	463		512	616```
In 2008 Marcels was off 35 HR or 7.9%, in 2007 it was off 49 Hr or 9.6% and in 2006 it was off 104 HR or 20.3%.

I readily admit I don't know much about modeling and perhaps these are acceptable margins of error. But at a glance I would say Marcels, did not nail any of those 3 years.

One defender of Marcels had this to say, "Opponents of the sabermetric approach often like to cherry pick specific examples to question the credibility of a model." It would appear that the author of the blog did a little cherry picking of his own.

In fairness, the author of this blog is probably a post-grad student of sabermetrics while I would consider myself about a second grader. Sabermetrically speaking, I am riding a very short bus. And, while I don't understand a lot about modeling and the uses and the limitations, I think the author was more than a little premature in patting himself on the back.

3. ## Re: Saber Shenanigans

Believe it or not, this is what the "Are you smarter than a monkey?" thread is about.

320

4. ## Re: Saber Shenanigans

Nice work. Stats are best at measuring the players that are best at that stat. Marcels did a good job of predicting total HRs of the best HR hitters. I think as you go down the scale of HR hitters, there is going to be even more variance. This would not be reliable to predict production for a team.

5. ## Re: Saber Shenanigans

It can all be thrown out the window if baseball decides to do something such as "tighten the baseballs" and not tell anyone about it. Or, something else, such as have the umpires make the "strike zone" 1 inch higher at the top level and 1 inch higher at the bottom level, and not tell anyone about it.

Don't put it past Bud Selig to go out with a bang this year and try to attract extra attention to the sport by raising the number of homeruns again. He can say that they've done the steroids testing and that it's just that there are several players having better than normal seasons while others are at their peak years (26-30).

