Join Date: Mar 2002
Location: Mt. Vernon, Ohio
I feel dirty (I tried offensive correlation)
Yes, as promised on another thread, I promised I would try entering offensive statistics in a spreadsheet and I ran a correlation test.
(I did it overnight and got it done in a couple of hours)
Anyways, first I'll explain what I did and then I'll tell you my personal findings.
I considered entering all information in modern day baseball, but I decided the game's offensive explosion really started within the last 15 years or so, therefore offensive dynamics have changed with new ball parks and expansion.
So I made my cutoff 1991.
I also considered doing both AL and NL, but with pitchers batting, more sacrifices, double switches, bunting, steals, etc., I decided again I would just use the national league for more consistency.
So what I did is entered every team offensive statistic information on it's own line from 1991 to 2005 in the national league. It was complete with the routine information you'd expect (R/G, AB, R, H, 2B, 3B, HR, BB, SO, OBP, SLG).
It was sorted by year and then by runs per game. I added in columns and made sure that additionally, OPS, Batter's run average, total average and runs created were also calculated for each team and each season. All together there were 208 teams entered into this database.
First, I did a correlation test from Runs per game by each team to every major offensive category, including the ones I added (OPS, TA, BRA and RC).
I admit, I was somewhat overwhelmed with the result.
Of the major indicators you would figure correlate to a team's total runs scored for the season, in these 15 years of National League data, Batter's Run Average (OBP*SLG) was the category that most correlated to runs per game. It had a .9402 correlation.
Second, OPS correlated in this data at .9358 and runs created at .9279. For clarification purposes, I decided to use the modified version of runs created involving steals (H+BB-CS)*(TB+(.55*SB))/AB+BB. I did this because I feel a stolen base threat should be included, as swiping bases obviously influence the possibility of runs.
There was a brief drop off from these three categories, then came SLG (.8903), OBP (.8860), BA (.8292) and TA (.7820).
Just based on this test and looking at the numbers, I felt runs created overall was the best measurement. Just a quick scan of the runs created numbers compared to runs scored, and the results were alarming. It was quite shocking to me how close the two numbers were.
Then I tried a rank/percentile test.
Based on R/G, teams were ranked 1-15 obviously. Each spot in that ranking is worth about a 6.7 percentile. What I did is put all 208 teams in an aggregate ranking for their individual season's data in all 15 years.
Then, all the OPS, RC, BRA and BA information were ranked and given their percentile out of 208 possible ranks and stacked up against the 1-208 rank/percentile of the R/G figure for each season.
For BRA (Batter's Run Average), 180 of the 208 teams finished within a 13.4 percentile of their R/G ranking in the given season. That means that the BRA percentile was within two (2) ranking spots, theoretically, in 180 of the 208 cases of where that team was ranked in the R/G category.
For OPS, this held true for 175 of the 208 teams being within 13.4 percentile or the theoretical 2 ranking spots per season. For RC, 172 of 208 entries passed this test.
So to make it clear... last year the Cincinnati Reds were No. 1 in runs per game at 5.03. Being within the 13 percentile would mean that in the OPS category, they finish no worse than 3rd. So for OPS specifically speaking, this test was passed in 175 of the 208 cases.
Batting average, by the way, passed this test only 133 of the 208 times.
After these tests, I played around with some numbers a little bit. Basically I took the Runs Created numbers and averaged them out per game. In most cases, the RC average was within 15 hundreths of a run of the actual R/G number. In many cases it was within a few hundreths. There were a few exceptions where the RC average was as much as 4 tenths off the R/G figure, but this was definitely rare.
Just as a quick sample, I tried the correlation test for pitching in the same time frame. I was curious what the biggest indicators were.
To no one's surprise, WHIP was the highest correlation (.9545). Second, kind of surprisingly to me was HR/game (.9452). Hits/game came in at (.8543) and fourth was K/BB ratio (.8153). Of the major pitching categories, BB/game came 5th.
So my thesis is that BRA, OPS and RC do appear to have much higher relevancy when predicting a team's run figure for a season. And in that line of thinking, I do conclude they are the best measurements for a baseball player in examining his hitting productivity.
As you know based on the last few days, I have not necessarily questioned the accuracy of many on this site for believing that, but probably the confidence level in which they can claim these statistics to be accurate. I still feel there is enough correlation with the lesser measurements that in some few cases, there is a gray area, but I'm definitely a believer moreso in these statistics today than I had been.
An interesting test I'd love to see if I could find the numbers is the correlations for run productivity testing it against BA w/RISP numbers versus that of OPS, RC and BRA numbers. I'm interested to see if the "single handed metric" in these situations score much higher to that of batting average as a whole. After all, BA in run situations directly impact scoring, so I'm interested to see that kind of comparison.
I remain skeptical about the significance against that particular measurement, but I will conclude on the whole that BRA and RC along with OPS are tremendous representations of what a team will do provided they meet their projections. And in that line of thinking, I would much rather have players that as a whole, score well in these measurements.
"If it doesn't matter who wins or loses, why do they even keep score?"