PDA

View Full Version : Sun Deck's "The Correlates of Run-scoring" Thread



Far East
06-27-2007, 11:38 PM
On pahster's Sun Deck thread about the correlates of run-scoring: 1998 - 2006, he noted how GPA (Gross Production Average) correlates more strongly with runs scored than does any other of his stat columns.

According to pahster, the formula for GPA "is (1.8*OBP)+SLG/4. It is a more 'precise' form of OPS in that it weights OBP by an additional 80%. This is because previous studies have indicated that it is that much more valuable than is slugging."

Here is pahster's pertinent row showing that AVG correlates least of all the stats and GPA correlates the most with runs scored.

R AVG OBP SLG OPS ModOPS GPA
R 1 .812 .898 .890 .946 .917 .954

My question for pahster -- I'm not permitted to post on Sun Deck -- would be the following: Why doesn't GPA (.954) correlate with runs at a disproportionately higher rate compared to plain old OPS (.946)?

Why are GPA (where OBP has been weighted almost 2X SLG, coupled with SLG's being reduced again by being divided by 4) and OPS so similar to each other in their correlation to runs?

Isn't OPS virtually equal to GPA as a correlate to runs scored?

dfs
06-28-2007, 10:56 AM
I'm not Pahster, but I think I can help here.

the claim is that GPA is a better match for runs scored than OPS because it weighs the two components slightly differently (It's far more important to avoid making an out than it is to add power. Interestingly enough this number changes for different era's of baseball.)


My question for pahster -- I'm not permitted to post on Sun Deck -- would be the following: Why doesn't GPA (.954) correlate with runs at a disproportionately higher rate compared to plain old OPS (.946)?

Why are GPA (where OBP has been weighted almost 2X SLG, coupled with SLG's being reduced again by being divided by 4) and OPS so similar to each other in their correlation to runs?

Isn't OPS virtually equal to GPA as a correlate to runs scored?

Question 1. Why doesn't GPA correlate better than OPS? ...The numbers you give show that it does correlate better. The max is 1.00. You simply CAN'T do better than that. You can think of it as a percentage. 100% telling you that the two variables measure exactly the same thing. Simple OPS alone accounts for 94.6% of your runs scored. That's good, but the question is can you do a little bit better? Can we find some simple function that can account for the remaining 5.4% Well, it turns out that by moving from OPS to GPA you bump your numbers to 95.4%. Leaving less than 5% of your runs scored to non-ops related components (stolen bases, baserunning, clutch)

Question 2 Why are the two numbers so similar to each other in their correlation to runs. That's a math question and it has to do with the two numbers (OPS and GPA) sharing component bases. They are going to be so close to each other, so they their correlations to a third variable (that is largely a function of the same component bases) will be similar.

Hope that makes sense. let me know if you need more. I too miss the ability to respond to posts that interest me.

pahster
07-04-2007, 01:36 AM
First off, thanks for the callup, everyone!

Sorry my follow up took so long. I'm trying to trim my 50 page undergraduate thesis down to 5 pages summarizing the results of my research. It's gotta be done by the 15th to make the deadline for submissions to a particular academic journal. Parsing it down is turning out to be quite a bit more difficult than I had anticipated.

Here are my initial tables.

http://i12.tinypic.com/4r09ft3.jpg

http://i9.tinypic.com/4r9uteu.jpg

The next stage of my research focuses on predictive measurements. This, of course, means regression. Unfortunately, the variables I have been using would create an inappropriate model were I to place them together because of some pretty high levels of covariation. While there exists no concrete rule in the social sciences (at least not in political science), models which include variables that correlate to one another at or above |.7| are generally viewed as suffering from inappropriate methods. While I can (and will) create regression models using these variables, they will have to stand alone.

http://i14.tinypic.com/66l6ykk.jpg

GPA appears to explain the most (nearly 91%) variance of all of the models I created. It is followed closely by OPS, which explains 89.5% of the variance. Batting average (65.7%) is the least effective predictor, though certainly not an entirely useless one. That said, OBP and SLG provide higher levels of prediction. From a purely mathematical standpoint, GPA and OPS are the best predictors of team run scoring, at least from 1998-2006. As I have said before, though, I have methodological issues which each as measures.

Here are scatter plots of runs scored by GPA and OPS. The results are unsurprising, but it's always nice to survey the data visually.

http://i17.tinypic.com/6gwzhxe.jpg

http://i13.tinypic.com/4l6q535.jpg

The two are quite similar. The data points are tightly clustered together, thus the high adjusted R-squared scores. Now lets look at batting average.

http://i15.tinypic.com/5250lde.jpg

The general trend is still easily discernible, but you can definitely tell that there is a lot more variation in the data. Interesting, but hardly groundbreaking.

At some point, I'm going to start looking for better datasets. I'd like to create a model incorporating nearly everything - outs of each type, type of hit, walks, fielder's choices, etc. I'm fairly limited in what I can do for now by what I have. I'll get around to it at some point, I suppose.