Brutus

06-20-2011, 05:55 PM

A few weeks ago, a discussion centered around the Reds and their run distribution. At the time, some people (myself included) hypothesized that the Reds' inconsistency offensively was costing them some (theoretical) wins.

The foundation for the belief, at least for me, was due in part to a study (http://www.sloansportsconference.com/wp-content/uploads/2010/09/beyondpyth-paper.pdf) that was done last year by Kerry Whisnant, head of Department of Physics and Astronomy that found a tighter distribution generally led to more wins when examining teams with similar runs per game.

This philosophy makes sense, as runs per game are a product of all runs scored over the season. However, scoring 10 runs in each of two games but being held to one run in the next three games means you scored 4.6 runs, but very likely only went 2-3 in the 5-game stretch. The idea is that by spreading those runs out more evenly, like say a distribution of 4, 5, 4, 5 and 5, you give yourself a chance to win probably three, four or even all five games.

But the question was asked: were the Reds actually deficient in distribution? I wanted to do the study myself, but quite honestly was too lazy to do it. I knew a simple copy/paste of results data from Baseball-Reference into a spreadsheet would mean the project wouldn't take more than an hour or two, being able to quickly create formulas to calculate what I needed, but nonetheless I didn't have the energy. Fortunately, my curiosity won me over.

I grabbed the data for all 30 teams for this season. If I were going for a more scientific sample, I'd go a few years back. But I really just wanted to grab a snapshot of this particular Reds' team, so this year will have to do. But let's be honest... it was a sample of 2,190 games league-wide (73 for the Reds), so it's not completely lacking in data.

I made this simple:

* First, I calculated each team's standard deviation. This is simply a "mean of the mean" so to speak. It's basically the amount of runs (at least in this case) in either direction of the mean that fall within about two-thirds or 66% of the team's total sample. So if you erased 33% of the outliers in either direction, what would be the average number of runs above and below the mean?

* Second, I simply applied the league-wide average of the standard deviation for each team and also the total standard deviation for all games to each team to find the team's standard score. Standard scores (also known as Z-scores) are essentially a measurement of how many deviations above or below the team a team is.

I found the Reds' standard deviation is 2.96 runs, which means 66% of their games fall +/- 2.96 of their average (which is currently 4.78).

The league-wide mean is a deviation of 2.89 with the total standard deviation of all games being 2.92. Remember, in this case, according to the studied theory, a the lower the number the better (meaning a smaller distribution). The Reds rank 12 of 30 teams in distribution, meaning 18 teams have smaller (and theoretically better) distributions.

Basically, though, the Reds are close to average as far as distribution. So my conclusion is that they're not far removed from being where they need to be, though all five teams in the NL Central do have better distributions than they do.

But as an anecdote, I tried one more thing in my study out of morbid curiosity: I tested the Pythagorean differential of each team (wins above or below the expected win total) for each team against their standard score.

To see if the theory held weight for this season, I ran correlation of the differential (i.e. 1 game better than the Pythagorean win expectation would be a +1) against the standard score. I came up with a result of a very solid .42, which is another way of saying that about 18% of a team's luck can be explained simply by how tight a run distribution they have.

That doesn't seem like a ton, but empirically, the results were even more intriguing.

Of the 15 teams with the highest run distribution, only 3 of those 15 had POSITIVE Pythagorean differentials. That's to say that only three teams were out-performing their Pythagorean.

Of the 15 teams with the lowest run distribution, 10 of the 15 teams had positive Pythagorean differentials, or rather 10 of those 15 were out-performing their Pythagorean.

The four lowest standard scores are San Francisco, Baltimore, Minnesota and Florida. While two of those teams are in last place, and Minnesota just recently climbing out of the cellar, those teams have plus differentials above and beyond their Pythag of 4, 2, 2 and 1. In fact, five of the six teams with 2-4 games above their Pythag, also are in the top-10 in lowest run distribution.

So in summation, the Reds seem to be close to the middle of the pack in distribution. But while the theory doesn't seem to apply to them, the theory itself seems to have a little bit of merit.

The foundation for the belief, at least for me, was due in part to a study (http://www.sloansportsconference.com/wp-content/uploads/2010/09/beyondpyth-paper.pdf) that was done last year by Kerry Whisnant, head of Department of Physics and Astronomy that found a tighter distribution generally led to more wins when examining teams with similar runs per game.

This philosophy makes sense, as runs per game are a product of all runs scored over the season. However, scoring 10 runs in each of two games but being held to one run in the next three games means you scored 4.6 runs, but very likely only went 2-3 in the 5-game stretch. The idea is that by spreading those runs out more evenly, like say a distribution of 4, 5, 4, 5 and 5, you give yourself a chance to win probably three, four or even all five games.

But the question was asked: were the Reds actually deficient in distribution? I wanted to do the study myself, but quite honestly was too lazy to do it. I knew a simple copy/paste of results data from Baseball-Reference into a spreadsheet would mean the project wouldn't take more than an hour or two, being able to quickly create formulas to calculate what I needed, but nonetheless I didn't have the energy. Fortunately, my curiosity won me over.

I grabbed the data for all 30 teams for this season. If I were going for a more scientific sample, I'd go a few years back. But I really just wanted to grab a snapshot of this particular Reds' team, so this year will have to do. But let's be honest... it was a sample of 2,190 games league-wide (73 for the Reds), so it's not completely lacking in data.

I made this simple:

* First, I calculated each team's standard deviation. This is simply a "mean of the mean" so to speak. It's basically the amount of runs (at least in this case) in either direction of the mean that fall within about two-thirds or 66% of the team's total sample. So if you erased 33% of the outliers in either direction, what would be the average number of runs above and below the mean?

* Second, I simply applied the league-wide average of the standard deviation for each team and also the total standard deviation for all games to each team to find the team's standard score. Standard scores (also known as Z-scores) are essentially a measurement of how many deviations above or below the team a team is.

I found the Reds' standard deviation is 2.96 runs, which means 66% of their games fall +/- 2.96 of their average (which is currently 4.78).

The league-wide mean is a deviation of 2.89 with the total standard deviation of all games being 2.92. Remember, in this case, according to the studied theory, a the lower the number the better (meaning a smaller distribution). The Reds rank 12 of 30 teams in distribution, meaning 18 teams have smaller (and theoretically better) distributions.

Basically, though, the Reds are close to average as far as distribution. So my conclusion is that they're not far removed from being where they need to be, though all five teams in the NL Central do have better distributions than they do.

But as an anecdote, I tried one more thing in my study out of morbid curiosity: I tested the Pythagorean differential of each team (wins above or below the expected win total) for each team against their standard score.

To see if the theory held weight for this season, I ran correlation of the differential (i.e. 1 game better than the Pythagorean win expectation would be a +1) against the standard score. I came up with a result of a very solid .42, which is another way of saying that about 18% of a team's luck can be explained simply by how tight a run distribution they have.

That doesn't seem like a ton, but empirically, the results were even more intriguing.

Of the 15 teams with the highest run distribution, only 3 of those 15 had POSITIVE Pythagorean differentials. That's to say that only three teams were out-performing their Pythagorean.

Of the 15 teams with the lowest run distribution, 10 of the 15 teams had positive Pythagorean differentials, or rather 10 of those 15 were out-performing their Pythagorean.

The four lowest standard scores are San Francisco, Baltimore, Minnesota and Florida. While two of those teams are in last place, and Minnesota just recently climbing out of the cellar, those teams have plus differentials above and beyond their Pythag of 4, 2, 2 and 1. In fact, five of the six teams with 2-4 games above their Pythag, also are in the top-10 in lowest run distribution.

So in summation, the Reds seem to be close to the middle of the pack in distribution. But while the theory doesn't seem to apply to them, the theory itself seems to have a little bit of merit.