BernieCarbo

09-19-2014, 10:31 AM

I found one particular thread in the main forum pretty interesting (Would you move Lorenzen to the bullpen), and since I have a wealth of information in a database and a method of extracting it quite easily, I thought I’d build on it and clear up some misconceptions with real data. I won’t mention specific posters and I don’t care about who said what and I don’t care who won an argument. I am in the data analysis business, and I just like to take raw data and let it tell the story.

Although my data encompasses every major league game (long story, but it includes all the retrosheet data as well as from other sources, but the structure was massaged quite a bit to make it easier to extract data), for this exercise I used only games from 1920-2013 ( for a total of 161,000 games), and excluded tied, rain-shortened, forfeited, and extra-inning games. Of course these games have some meaning and I also made an analysis that included them separately, but it barely changed the percentages, and it didn’t make sense to compare the number of runs scored in the 1st with the runs in the 9th if the 9th didn’t even exist. Extra inning games are special cases as well, and there were some oddities I didn’t expect when I ran some numbers against them (a discussion for another time).

Anyway, here are some numbers from a much larger sample set. If someone wants me to run the numbers differently or use specific circumstances, I could do that.

What I can give you is a very rough estimate of what the odds are of a team winning that scores 1-4 runs in the first vs. the odds of a team winning that scores 1-4 runs in the 9th.

Teams that score 1-4 runs in the Top of 1 win 75.1% of their games.

Teams that score 1-4 runs in the Top of 9 win 87.6% of their games.

These are very rough, based on just one week of data, which really isn't enough to get the right frequencies of each situation. But it's the best that I can do.

Actually, this shows the danger of a small sample size. I don’t know if other qualifiers were used in this example, but taken literally as it is stated, the records turn out nearly identical:

Teams that score 1-4 runs in the Top of 1 win 59.1% of their games (23823-16485).

Teams that score 1-4 runs in the Top of 9 win 59.6% of their games (21472-14529).

I’d kind of expect the number of games where a team scores in the first exceeds the games scored in the 9th due to the fact that the best hitters always come up in the 1st (unless there is a manager predisposed to putting a scrappy little fireplug banjo hitter in the two hole year after year), but still the percentages are quite similar.

Later in the thread, there were these stats presented:

Here is a quick summation of each run scored in those situations up to 4 runs. The numbers are rough, because I only used a few examples to make the averages, but you'll see, that there's not much room for adjustments with more data.

Teams that score one run in the Top of 1 win 63 % of their games.

Teams that score 2 runs in the Top of 1 win 77% of their games.

Teams that score 3 runs in the Top of 1 win 88% of their games.

Teams that score 4 runs in the Top of 1 win 92% of their games.

Teams that score one run in the Top of 9 win 81% of their games.

Teams that score 2 runs in the Top of 9 win 89% of their games.

Teams that score 3 runs in the Top of 9 win 94% of their games.

Teams that score 4 runs in the Top of 9 win 97% of their games.

As teams score more runs, the closer the numbers get, but with just one and two runs scoring, there is big difference. I can't do an average since I don't know how many times each one occurred in the history of MLB, but we can safely assume that as the numbers of runs get larger, the frequency diminishes, which means the difference between the 1 run games will have a stronger force on the average than the bigger number games. Meaning, even as we add more an more runs scored examples, the average isn't going to change much for the difference between the 1 run games.

Here are the actual numbers:

Teams that score one run in the Top of 1 win 51.8 % of their games.

Teams that score 2 runs in the Top of 1 win 62.8% of their games.

Teams that score 3 runs in the Top of 1 win 71.5% of their games.

Teams that score 4 runs in the Top of 1 win 82.1% of their games.

Teams that score one run in the Top of 9 win 53.0% of their games.

Teams that score 2 runs in the Top of 9 win 63.2% of their games.

Teams that score 3 runs in the Top of 9 win 73.6% of their games.

Teams that score 4 runs in the Top of 9 win 82.5% of their games.

There is a very marginal difference when scoring in the 9th, but that has other factors as well.

Someone else asked a very specific question that I was curious about myself:

Teams that score in the Top of 1 win ___% of their games.

Teams that score in the Top of 9 win ___% of their games.

The results:

Teams that score in the Top of 1 win _59__% of their games.

Teams that score in the Top of 9 win _59__% of their games.

There really isn’t a difference at all, since scoring is always a good thing no matter when you do it.

But this prompted another question. What about teams that score in the top of the 1st and never again, and teams that score for the first time in the top of the 9th? Here we go:

Teams that score in the Top of 1 and never again win 16% of their games.

Teams that score in the Top of 9 for the first time win 18% of their games.

Again, a marginal difference, but this situation happens pretty rarely (2.2% of the time) and the sample size is small, but it’s still an indicator that a team really just needs to score as often as possible. Who knew? :)

Anyway, I just wanted to throw this stuff out there since I can’t post on the main board. But if anyone wants me to run the numbers a different way, I can certainly do that. Also, one caveat: I may have made a mistake somewhere because I regularly recreate the raw database when new or updated data is available and I could have flubbed up something, but I'm pretty sure this is accurate. If something jumps out at you, let me know and I'll check.

Although my data encompasses every major league game (long story, but it includes all the retrosheet data as well as from other sources, but the structure was massaged quite a bit to make it easier to extract data), for this exercise I used only games from 1920-2013 ( for a total of 161,000 games), and excluded tied, rain-shortened, forfeited, and extra-inning games. Of course these games have some meaning and I also made an analysis that included them separately, but it barely changed the percentages, and it didn’t make sense to compare the number of runs scored in the 1st with the runs in the 9th if the 9th didn’t even exist. Extra inning games are special cases as well, and there were some oddities I didn’t expect when I ran some numbers against them (a discussion for another time).

Anyway, here are some numbers from a much larger sample set. If someone wants me to run the numbers differently or use specific circumstances, I could do that.

What I can give you is a very rough estimate of what the odds are of a team winning that scores 1-4 runs in the first vs. the odds of a team winning that scores 1-4 runs in the 9th.

Teams that score 1-4 runs in the Top of 1 win 75.1% of their games.

Teams that score 1-4 runs in the Top of 9 win 87.6% of their games.

These are very rough, based on just one week of data, which really isn't enough to get the right frequencies of each situation. But it's the best that I can do.

Actually, this shows the danger of a small sample size. I don’t know if other qualifiers were used in this example, but taken literally as it is stated, the records turn out nearly identical:

Teams that score 1-4 runs in the Top of 1 win 59.1% of their games (23823-16485).

Teams that score 1-4 runs in the Top of 9 win 59.6% of their games (21472-14529).

I’d kind of expect the number of games where a team scores in the first exceeds the games scored in the 9th due to the fact that the best hitters always come up in the 1st (unless there is a manager predisposed to putting a scrappy little fireplug banjo hitter in the two hole year after year), but still the percentages are quite similar.

Later in the thread, there were these stats presented:

Here is a quick summation of each run scored in those situations up to 4 runs. The numbers are rough, because I only used a few examples to make the averages, but you'll see, that there's not much room for adjustments with more data.

Teams that score one run in the Top of 1 win 63 % of their games.

Teams that score 2 runs in the Top of 1 win 77% of their games.

Teams that score 3 runs in the Top of 1 win 88% of their games.

Teams that score 4 runs in the Top of 1 win 92% of their games.

Teams that score one run in the Top of 9 win 81% of their games.

Teams that score 2 runs in the Top of 9 win 89% of their games.

Teams that score 3 runs in the Top of 9 win 94% of their games.

Teams that score 4 runs in the Top of 9 win 97% of their games.

As teams score more runs, the closer the numbers get, but with just one and two runs scoring, there is big difference. I can't do an average since I don't know how many times each one occurred in the history of MLB, but we can safely assume that as the numbers of runs get larger, the frequency diminishes, which means the difference between the 1 run games will have a stronger force on the average than the bigger number games. Meaning, even as we add more an more runs scored examples, the average isn't going to change much for the difference between the 1 run games.

Here are the actual numbers:

Teams that score one run in the Top of 1 win 51.8 % of their games.

Teams that score 2 runs in the Top of 1 win 62.8% of their games.

Teams that score 3 runs in the Top of 1 win 71.5% of their games.

Teams that score 4 runs in the Top of 1 win 82.1% of their games.

Teams that score one run in the Top of 9 win 53.0% of their games.

Teams that score 2 runs in the Top of 9 win 63.2% of their games.

Teams that score 3 runs in the Top of 9 win 73.6% of their games.

Teams that score 4 runs in the Top of 9 win 82.5% of their games.

There is a very marginal difference when scoring in the 9th, but that has other factors as well.

Someone else asked a very specific question that I was curious about myself:

Teams that score in the Top of 1 win ___% of their games.

Teams that score in the Top of 9 win ___% of their games.

The results:

Teams that score in the Top of 1 win _59__% of their games.

Teams that score in the Top of 9 win _59__% of their games.

There really isn’t a difference at all, since scoring is always a good thing no matter when you do it.

But this prompted another question. What about teams that score in the top of the 1st and never again, and teams that score for the first time in the top of the 9th? Here we go:

Teams that score in the Top of 1 and never again win 16% of their games.

Teams that score in the Top of 9 for the first time win 18% of their games.

Again, a marginal difference, but this situation happens pretty rarely (2.2% of the time) and the sample size is small, but it’s still an indicator that a team really just needs to score as often as possible. Who knew? :)

Anyway, I just wanted to throw this stuff out there since I can’t post on the main board. But if anyone wants me to run the numbers a different way, I can certainly do that. Also, one caveat: I may have made a mistake somewhere because I regularly recreate the raw database when new or updated data is available and I could have flubbed up something, but I'm pretty sure this is accurate. If something jumps out at you, let me know and I'll check.