View Full Version : I feel dirty (I tried offensive correlation)

04-21-2006, 08:18 AM
Yes, as promised on another thread, I promised I would try entering offensive statistics in a spreadsheet and I ran a correlation test.

(I did it overnight and got it done in a couple of hours)

Anyways, first I'll explain what I did and then I'll tell you my personal findings.

I considered entering all information in modern day baseball, but I decided the game's offensive explosion really started within the last 15 years or so, therefore offensive dynamics have changed with new ball parks and expansion.

So I made my cutoff 1991.

I also considered doing both AL and NL, but with pitchers batting, more sacrifices, double switches, bunting, steals, etc., I decided again I would just use the national league for more consistency.

So what I did is entered every team offensive statistic information on it's own line from 1991 to 2005 in the national league. It was complete with the routine information you'd expect (R/G, AB, R, H, 2B, 3B, HR, BB, SO, OBP, SLG).

It was sorted by year and then by runs per game. I added in columns and made sure that additionally, OPS, Batter's run average, total average and runs created were also calculated for each team and each season. All together there were 208 teams entered into this database.

First, I did a correlation test from Runs per game by each team to every major offensive category, including the ones I added (OPS, TA, BRA and RC).

I admit, I was somewhat overwhelmed with the result.

Of the major indicators you would figure correlate to a team's total runs scored for the season, in these 15 years of National League data, Batter's Run Average (OBP*SLG) was the category that most correlated to runs per game. It had a .9402 correlation.

Second, OPS correlated in this data at .9358 and runs created at .9279. For clarification purposes, I decided to use the modified version of runs created involving steals (H+BB-CS)*(TB+(.55*SB))/AB+BB. I did this because I feel a stolen base threat should be included, as swiping bases obviously influence the possibility of runs.

There was a brief drop off from these three categories, then came SLG (.8903), OBP (.8860), BA (.8292) and TA (.7820).

Just based on this test and looking at the numbers, I felt runs created overall was the best measurement. Just a quick scan of the runs created numbers compared to runs scored, and the results were alarming. It was quite shocking to me how close the two numbers were.

Then I tried a rank/percentile test.

Based on R/G, teams were ranked 1-15 obviously. Each spot in that ranking is worth about a 6.7 percentile. What I did is put all 208 teams in an aggregate ranking for their individual season's data in all 15 years.

Then, all the OPS, RC, BRA and BA information were ranked and given their percentile out of 208 possible ranks and stacked up against the 1-208 rank/percentile of the R/G figure for each season.

For BRA (Batter's Run Average), 180 of the 208 teams finished within a 13.4 percentile of their R/G ranking in the given season. That means that the BRA percentile was within two (2) ranking spots, theoretically, in 180 of the 208 cases of where that team was ranked in the R/G category.

For OPS, this held true for 175 of the 208 teams being within 13.4 percentile or the theoretical 2 ranking spots per season. For RC, 172 of 208 entries passed this test.

So to make it clear... last year the Cincinnati Reds were No. 1 in runs per game at 5.03. Being within the 13 percentile would mean that in the OPS category, they finish no worse than 3rd. So for OPS specifically speaking, this test was passed in 175 of the 208 cases.

Batting average, by the way, passed this test only 133 of the 208 times.

After these tests, I played around with some numbers a little bit. Basically I took the Runs Created numbers and averaged them out per game. In most cases, the RC average was within 15 hundreths of a run of the actual R/G number. In many cases it was within a few hundreths. There were a few exceptions where the RC average was as much as 4 tenths off the R/G figure, but this was definitely rare.

Just as a quick sample, I tried the correlation test for pitching in the same time frame. I was curious what the biggest indicators were.

To no one's surprise, WHIP was the highest correlation (.9545). Second, kind of surprisingly to me was HR/game (.9452). Hits/game came in at (.8543) and fourth was K/BB ratio (.8153). Of the major pitching categories, BB/game came 5th.

So my thesis is that BRA, OPS and RC do appear to have much higher relevancy when predicting a team's run figure for a season. And in that line of thinking, I do conclude they are the best measurements for a baseball player in examining his hitting productivity.

As you know based on the last few days, I have not necessarily questioned the accuracy of many on this site for believing that, but probably the confidence level in which they can claim these statistics to be accurate. I still feel there is enough correlation with the lesser measurements that in some few cases, there is a gray area, but I'm definitely a believer moreso in these statistics today than I had been.

An interesting test I'd love to see if I could find the numbers is the correlations for run productivity testing it against BA w/RISP numbers versus that of OPS, RC and BRA numbers. I'm interested to see if the "single handed metric" in these situations score much higher to that of batting average as a whole. After all, BA in run situations directly impact scoring, so I'm interested to see that kind of comparison.

I remain skeptical about the significance against that particular measurement, but I will conclude on the whole that BRA and RC along with OPS are tremendous representations of what a team will do provided they meet their projections. And in that line of thinking, I would much rather have players that as a whole, score well in these measurements.

04-21-2006, 08:25 AM
Nice info. RedsZoners will eat this up like a fat boy to fudge.

04-21-2006, 09:17 AM
There ya go! Glad you ran the numbers yourself.

Just think about it like this ...

OPS ---> Merely the sum of two factors, OBP and SLG.

OBP ---> The rate at which a batter avoids making an out.
SLG ---> The rate at which a batter acquires bases.

That's the roots of offensive run production, avoiding outs and acquiring bases. Since OPS combines those two factors, it becomes the best short-hand metric to use. The longer methods such as Runs Created become even more accurate and precise, but the foundation of Runs Created is the same: avoiding outs and acquiring bases.

So now when you see this chart, it all makes sense:


IIRC, the correlation of BA w/RISP to actual run scoring is somewhere in the neighborhood of 0.600, which is far lower than the correlation of the other metrics.

04-21-2006, 09:34 AM
Yea, Buckaholic, it is pretty interesting and more personal when you run the numbers yourself and helps in your understanding of what the things are.




Johnny Footstool
04-21-2006, 10:14 AM
I applaud you for not simply believing what you read or what people told you.

We may still disagree about some things, but you've proven that your motivated to find the truth for yourself and you're willing to reformulate your ideas and concepts based on what you learn. Not many people do that.

Chip R
04-21-2006, 10:33 AM
I was told there would be no math here. ;)

04-21-2006, 10:39 AM

Nerd alert!


04-21-2006, 10:41 AM
This is awesome. I am glad someone finally "proved" that all these stats are as valuable as some claim.

04-21-2006, 02:25 PM
I applaud you for not simply believing what you read or what people told you.

We may still disagree about some things, but you've proven that your motivated to find the truth for yourself and you're willing to reformulate your ideas and concepts based on what you learn. Not many people do that.

Well I wasn't lying when I said I would not be narrow-minded about the possibilities. I stressed from the beginning that it's not that I was against the concept, I merely felt there can't be certainties in measuring for these results.

I guess what really sold me was looking at the runs created number and seeing how close they were to the runs scored number in almost every case. That was where I determined that these stats had the most meaning even beyond the successful correlation percentages.

Like MUgrad, we're still going to disagree slightly, perhaps, on the interpretation of some of the stats. I'm FULLY convinced they're the most telling statistics, so you won't hear an argument from me to the contrary.

But I do feel baseball has a lot of human element to the game and room for the little things. And as I said in the other debates, I still feel there are a lot of things done "when making an out" that can constantly greatly enhance a team's chances of winning. Perhaps these measurements supercede those things, but I really feel it's still something that shouldn't be ignored.

I also still believe that in the specific case of Adam Dunn, the batting average correlation, although lower, is still high enough it can be determined with RISP that he's leaving enough runners stranded - but, I also have budged enough to realize his overall value is higher because of the other measurements.

There are probably small philisophical idiosyncracies that I'll find myself disagreeing with and agreeing with some of the other people on this argument, but I did owe it to myself and to everyone here that I was debating to check these numbers. I did, and I'll be the first to admit I have a much different perception of them than I did before yesterday.

04-21-2006, 02:33 PM
I also still believe that in the specific case of Adam Dunn, the batting average correlation, although lower, is still high enough it can be determined with RISP that he's leaving enough runners stranded - but, I also have budged enough to realize his overall value is higher because of the other measurements.


Check it out. You can go to ESPN.com and find Dunn's Runs Created w/RISP.

They have a great sorting function.

04-21-2006, 02:38 PM
Thanks, I'll check that out Raisor.

04-21-2006, 02:45 PM
Great thread. I think one of the major frustrations of "stathead" wanna-be GMs and managers is that the proper weight is not given to the various skill sets by many people

Additionally furstrating is that often the stats guys are pigeonholed in to this "stolen bases and defense don't matter" corner when it fact it's just a function of how much relative value they have. This manifests itself in a player like Dunn whose incredible power and plate discipline more than make up for his strikeouts and poor defense in terms of overall value. The anit-stats group will often spend a lot of time focusing on aspects on negative aspects of the player's skill set, while virtually overlooking the less heralded, but more valuable, postive skills.

The inverse is also true. A guy like Tony Womack may be lauded for his speed and versitility when at the end of the day, his near complete inability to avoid outs completely dominates those positives. He could steal 100 bases and if he had a .290 OBP, Kevin Youkilis would still be more valuable. Yet many would consider Womack an all-star.

An interesting question to pose for various media members... If Adam Dunn reduced the number of strikeouts AND walks by 50%, would he be more or less valuable? I can guess who would answer what...

04-21-2006, 02:46 PM
I was told there would be no math here. ;)
Break out the pocket calculators.

04-21-2006, 02:49 PM
Thanks, I'll check that out Raisor.

Here's a link:


It's based on a min of 100 PA's w/RISP last season.

Dunn ranks 7th in the NL in RC/27 (which essentially equalizes PA's)

He was 9th in the NL in gross RC's w/RISP, while finishing 15th in total PA's w/RISP.

04-21-2006, 02:50 PM
Nice work, Buck.

One of the things "stat guys" like me tend to do is focus on the big picture and try to evaluate the total ballplayer. Little things matter, but there's a reason they're called little things and not big things. Nonetheless, there are little things, and one can believe Dunn is the most productive player on the team while not necessarily making him the first choice to be at the plate in every situation.

One of the more interesting article series I read on Baseball Prospectus last year had to do with "one-run value yield" -- essentially, an analysis of when to switch from macro to micro view, when the typical strategies of maximizing run-scoring potential should yield to the need to get across a single run. And I'll agree, in a pure littleball situation where the runner on third is the ballgame, Dunn is not the ideal guy to have at the plate. His contact rate is low and the walks he draws may not be of real benefit in that situation, depending on what bases are empty. Oddly enough, that's a situation where he's probably going to be pitched around anyway. But the situations where his walks and power add no extra value are rare enough that they're not really worth worrying about in the big picture.

04-21-2006, 04:49 PM
Break out the pocket calculators.

Mine's a SHARP elsi mate EL-233S

Just sayin.

04-21-2006, 04:58 PM
Mine's a SHARP elsi mate EL-233S

Just sayin.

I like my TI-85, but having MATLAB on my computer kicks butt.
Just saying.;)

04-21-2006, 05:38 PM
Mine's a SHARP elsi mate EL-233S

Just sayin.
Calculator of champions

04-21-2006, 05:43 PM

04-21-2006, 06:03 PM
If a key to creating runs is avoiding outs, would not creating outs for your opponent (defense) be equally important in winning a baseball game?

04-21-2006, 06:07 PM
Of course it is. No one has ever argued otherwise. I think I know what you're getting at, but the difference between good hitters and bad hitters as it relates to outs avoided is significantly larger than the difference between good fielders and bad fielders and the number of extra outs given to the other team.