PDA

View Full Version : The Reds; Run scoring & run distribution



Brutus
06-20-2011, 05:55 PM
A few weeks ago, a discussion centered around the Reds and their run distribution. At the time, some people (myself included) hypothesized that the Reds' inconsistency offensively was costing them some (theoretical) wins.

The foundation for the belief, at least for me, was due in part to a study (http://www.sloansportsconference.com/wp-content/uploads/2010/09/beyondpyth-paper.pdf) that was done last year by Kerry Whisnant, head of Department of Physics and Astronomy that found a tighter distribution generally led to more wins when examining teams with similar runs per game.

This philosophy makes sense, as runs per game are a product of all runs scored over the season. However, scoring 10 runs in each of two games but being held to one run in the next three games means you scored 4.6 runs, but very likely only went 2-3 in the 5-game stretch. The idea is that by spreading those runs out more evenly, like say a distribution of 4, 5, 4, 5 and 5, you give yourself a chance to win probably three, four or even all five games.

But the question was asked: were the Reds actually deficient in distribution? I wanted to do the study myself, but quite honestly was too lazy to do it. I knew a simple copy/paste of results data from Baseball-Reference into a spreadsheet would mean the project wouldn't take more than an hour or two, being able to quickly create formulas to calculate what I needed, but nonetheless I didn't have the energy. Fortunately, my curiosity won me over.

I grabbed the data for all 30 teams for this season. If I were going for a more scientific sample, I'd go a few years back. But I really just wanted to grab a snapshot of this particular Reds' team, so this year will have to do. But let's be honest... it was a sample of 2,190 games league-wide (73 for the Reds), so it's not completely lacking in data.

I made this simple:

* First, I calculated each team's standard deviation. This is simply a "mean of the mean" so to speak. It's basically the amount of runs (at least in this case) in either direction of the mean that fall within about two-thirds or 66% of the team's total sample. So if you erased 33% of the outliers in either direction, what would be the average number of runs above and below the mean?

* Second, I simply applied the league-wide average of the standard deviation for each team and also the total standard deviation for all games to each team to find the team's standard score. Standard scores (also known as Z-scores) are essentially a measurement of how many deviations above or below the team a team is.

I found the Reds' standard deviation is 2.96 runs, which means 66% of their games fall +/- 2.96 of their average (which is currently 4.78).

The league-wide mean is a deviation of 2.89 with the total standard deviation of all games being 2.92. Remember, in this case, according to the studied theory, a the lower the number the better (meaning a smaller distribution). The Reds rank 12 of 30 teams in distribution, meaning 18 teams have smaller (and theoretically better) distributions.

Basically, though, the Reds are close to average as far as distribution. So my conclusion is that they're not far removed from being where they need to be, though all five teams in the NL Central do have better distributions than they do.

But as an anecdote, I tried one more thing in my study out of morbid curiosity: I tested the Pythagorean differential of each team (wins above or below the expected win total) for each team against their standard score.

To see if the theory held weight for this season, I ran correlation of the differential (i.e. 1 game better than the Pythagorean win expectation would be a +1) against the standard score. I came up with a result of a very solid .42, which is another way of saying that about 18% of a team's luck can be explained simply by how tight a run distribution they have.

That doesn't seem like a ton, but empirically, the results were even more intriguing.

Of the 15 teams with the highest run distribution, only 3 of those 15 had POSITIVE Pythagorean differentials. That's to say that only three teams were out-performing their Pythagorean.

Of the 15 teams with the lowest run distribution, 10 of the 15 teams had positive Pythagorean differentials, or rather 10 of those 15 were out-performing their Pythagorean.

The four lowest standard scores are San Francisco, Baltimore, Minnesota and Florida. While two of those teams are in last place, and Minnesota just recently climbing out of the cellar, those teams have plus differentials above and beyond their Pythag of 4, 2, 2 and 1. In fact, five of the six teams with 2-4 games above their Pythag, also are in the top-10 in lowest run distribution.

So in summation, the Reds seem to be close to the middle of the pack in distribution. But while the theory doesn't seem to apply to them, the theory itself seems to have a little bit of merit.

Kc61
06-20-2011, 06:06 PM
If 18 teams have better run distributions than the Reds, including all five other NL Central teams, why is that acceptable?

As I read it, your analysis seems to show that the Reds are deficient in offensive consistency.

kaldaniels
06-20-2011, 06:11 PM
The first question that needs to be answered is "What is ideal run distribution?"

Brutus
06-20-2011, 06:16 PM
If 18 teams have better run distributions than the Reds, including all five other NL Central teams, why is that acceptable?

As I read it, your analysis seems to show that the Reds are deficient in offensive consistency.

I prefer they be near the top, so I don't want to suggest that's a good thing. I just mean they're hovering near league average.

I suppose Kal's question is appropriate... what is an ideal distribution? If the theory is true that a tighter distribution leads to more wins, in theory, I suppose a distribution of practically zero is ideal.

RedsManRick
06-20-2011, 06:23 PM
The real takeaway from a more comprehensive study would likely be this: Scoring more runs, period, is a much more important factor in winning baseball games than is the manner in which said runs are distributed.

It's sort of like debating where the guy with the .300 OBP should hit. If the goal is to score more runs, stop worrying about batting order and find somebody who can get on base.

But this points to another issue: Is the distribution of runs something a team can control? If so, how? Like the example above, is trying to control it missing the forest for the trees? And if not, why do we care, other than perhaps as a tool to help explain differences between actual W-L records and the pythag expectation?

If I had to guess, teams with high variation in runs scored have two basic factors:
1) They score more runs (given a fixed lower bound of zero, this gives them more opportunity for variance)
2) They have a greater range of talent within their lineup. (a lineup full of studs or scrubs will have less variance than one with a mix)

pahster
06-20-2011, 06:24 PM
Sounds like none of the teams' RPG are significantly (p ≤ .05) different from one another. Given that this is most likely true (at least for most of the sample), it seems as if we don't really have enough data from this season yet to draw any conclusions about what effect, if any, the distribution of runs scored per game has on the probability of winning.

Here's a suspicion I have: I bet teams that score fewer runs experience less variation in their RPG.

Brutus
06-20-2011, 06:28 PM
Here's a suspicion I have: I bet teams that score fewer runs experience less variation in their RPG.

Well, that has to be true somewhat, because there's a floor as to how many runs a team can score at the low end -- they can't score less than zero. But on the high end, there's no actual ceiling a team can score. A team with 4.0 runs per game has to only score 9 runs to automatically have a higher distribution than a team scoring zero with the same RPG.

So yes, it's true that teams with fewer RPG is going to naturally have a built-in bias in their deviation because there's a floor in the distribution in one of the two directions.

Brutus
06-20-2011, 06:34 PM
The real takeaway from a more comprehensive study would likely be this: Scoring more runs, period, is a much more important factor in winning baseball games than is the manner in which said runs are distributed.

It's sort of like debating where the guy with the .300 OBP should hit. If the goal is to score more runs, stop worrying about batting order and find somebody who can get on base.

But this points to another issue: Is the distribution of runs something a team can control? If so, how? Like the example above, is trying to control it missing the forest for the trees? And if not, why do we care, other than perhaps as a tool to help explain differences between actual W-L records and the pythag expectation?

If I had to guess, teams with high variation in runs scored have two basic factors:
1) They score more runs (given a fixed lower bound of zero, this gives them more opportunity for variance)
2) They have a greater range of talent within their lineup. (a lineup full of studs or scrubs will have less variance than one with a mix)

I definitely agree it's more important, but can't both be important in theory? It doesn't have to be one or the other. There are a lot of factors involved at winning baseball games, and ideally, you'd like to address all of them if you could.

That is not to say we know with any real certainty you can address distribution or find players that are more consistent. To my knowledge, no one has tackled the issue of distribution or consistency with players themselves (though I actually think that would be a neat study). But if we know that mere distribution can add a few more wins a year, that can sometimes make the difference between winning a World Series and sitting at home in October.

I'm not trying to insinuate that run distribution is a monumental factor, by the way. I don't want to overplay the significance of it. But I think we have some evidence to suggest it's important, just a matter of how much so and can it be knowingly addressed by a team wishing to improve its consistency?

I would like to see a study grouping teams with like runs scored & runs allowed over the course of several years and then compare their winning percentages with their run distribution. That would obviously be a much more conclusive study about the effect (though not necessarily the causes).

kaldaniels
06-20-2011, 06:37 PM
I have a feeling run distribution is one of those things that is helpful looking back in asking "why team X wound up with Y won-loss record". But to project run distribution going forward I would hypothesize is a complex task with fruitless results.

RedsManRick
06-20-2011, 07:52 PM
I definitely agree it's more important, but can't both be important in theory? It doesn't have to be one or the other. There are a lot of factors involved at winning baseball games, and ideally, you'd like to address all of them if you could.

That is not to say we know with any real certainty you can address distribution or find players that are more consistent. To my knowledge, no one has tackled the issue of distribution or consistency with players themselves (though I actually think that would be a neat study). But if we know that mere distribution can add a few more wins a year, that can sometimes make the difference between winning a World Series and sitting at home in October.

I'm not trying to insinuate that run distribution is a monumental factor, by the way. I don't want to overplay the significance of it. But I think we have some evidence to suggest it's important, just a matter of how much so and can it be knowingly addressed by a team wishing to improve its consistency?

I would like to see a study grouping teams with like runs scored & runs allowed over the course of several years and then compare their winning percentages with their run distribution. That would obviously be a much more conclusive study about the effect (though not necessarily the causes).

I agree with you in theory. But as you start to describe, this isn't just one study. It's a series of studies.

- Is there an effect? (preliminary analysis suggest yes)
- What is the size of the affect?
- What causes teams to have different distributions?
- Can that thing be controlled?
- If so, can it be controlled without negatively impacting other factors?

I'll take a look tonight and see what I can contribute. But if I had to guess I'd say our conclusion will be this: Yes, run distribution has a measurable effect. However, there is nothing a team can do to control it's run distribution that doesn't have a greater effect on other factors. And further, I would guess that the biggest portion of the effect is tied up in simply not getting shut out. I'd also guess that runs scored distributions are pretty strongly skewed, meaning your standard significance tests don't quite work.

Brutus
06-20-2011, 08:00 PM
I agree with you in theory. But as you start to describe, this isn't just one study. It's a series of studies.

- Is there an effect? (preliminary analysis suggest yes)
- What is the size of the affect?
- What causes teams to have different distributions?
- Can that thing be controlled?
- If so, can it be controlled without negatively impacting other factors?

I'll take a look tonight and see what I can contribute.

I agree with this synopsis.

It would require first establishing players having an ability to be more consistent. To establish that, we'd have to differentiate the ones that are consistent with the ones that are not, and then we'd have to show a predictive ability to know which ones are likely to sustain the consistency. Then we'd have to regress that within yearly projections along with the other factors (whether it's WAR or other variations of the projection systems).

mth123
06-20-2011, 08:46 PM
Sounds like none of the teams' RPG are significantly (p ≤ .05) different from one another. Given that this is most likely true (at least for most of the sample), it seems as if we don't really have enough data from this season yet to draw any conclusions about what effect, if any, the distribution of runs scored per game has on the probability of winning.

Here's a suspicion I have: I bet teams that score fewer runs experience less variation in their RPG.

Logical suspicion. There isn't an upper limit, so things would naturally tend to be more spread out as they get bigger. There is a clear lower limit. As teams approach ut, there just isn't room for separation.

RedsManRick
06-20-2011, 09:05 PM
This was actually tackled here on RZ (http://www.redszone.com/forums/showthread.php?t=68279) a few years back.


When you consider that each additional run beyond the mean has a decreasing marginal win value and that there's a lower limit, it naturally follows that given a fixed total number of runs scored (fixed mean), less variation is better. That is, if you can trade runs above the mean for runs below the mean (e.g. trade a 2 & 6 for two 4s), you do it. In that example, you'd go from an average .447 winning percentage over those two games (given the data below) to a .471. Over a full season, that's 4 games worth of difference. Now, that's the difference between a team with 0 variance and one with quite a bit of variance.




Runs Win% Marginal Gain
0 .000 .000
1 .077 .077
2 .208 .131
3 .339 .131
4 .471 .132
5 .593 .122
6 .686 .092
7 .776 .090
8 .840 .064
9 .874 .034
10 .921 .047
11 .939 .018
12 .963 .025
13 .987 .024
14 .978 -.009
15 .976 -.001
16 .983 .007
17 1.000 .017

In terms of winning ballgames, the second through the fifth runs have the most impact, followed by the sixth and seventh runs, and then the first run."

The implication is clear; the lower the variance of the average runs scored per game, the better record your team will likely have. To illustrate this another way, if the average runs per game were 5 for the NL in ’08, then the average total runs scored per team would be (5 x 162) 810. If the Reds managed to average 10 runs a game and score 1620 runs in ’08 you would expect them to be world beaters and have an extremely high winning percentage. If the Reds scored 0 runs in 81 games and 20 runs in 81 games, however, they will have averaged 10 runs a game but would at most only have a .500 winning percentage. This is why the distribution is so important.

Now, obviously, it would be best if a team could manage to just score one run more than the opposition for every game, and sometimes 10 or more runs would be needed to accomplish this, but the fact remains that if a team were able to always score their average the benefits of not having those lower scoring games outweighs the marginal gains of those runs scored above 7 runs.