Turn Off Ads?
Page 1 of 3 123 LastLast
Results 1 to 15 of 31

Thread: Run Scoring Efficiency Redux

  1. #1
    Member BernieCarbo's Avatar
    Join Date
    May 2014
    Location
    Heaven On Earth
    Posts
    1,650

    Run Scoring Efficiency Redux

    I found one particular thread in the main forum pretty interesting (Would you move Lorenzen to the bullpen), and since I have a wealth of information in a database and a method of extracting it quite easily, I thought I’d build on it and clear up some misconceptions with real data. I won’t mention specific posters and I don’t care about who said what and I don’t care who won an argument. I am in the data analysis business, and I just like to take raw data and let it tell the story.

    Although my data encompasses every major league game (long story, but it includes all the retrosheet data as well as from other sources, but the structure was massaged quite a bit to make it easier to extract data), for this exercise I used only games from 1920-2013 ( for a total of 161,000 games), and excluded tied, rain-shortened, forfeited, and extra-inning games. Of course these games have some meaning and I also made an analysis that included them separately, but it barely changed the percentages, and it didn’t make sense to compare the number of runs scored in the 1st with the runs in the 9th if the 9th didn’t even exist. Extra inning games are special cases as well, and there were some oddities I didn’t expect when I ran some numbers against them (a discussion for another time).

    Anyway, here are some numbers from a much larger sample set. If someone wants me to run the numbers differently or use specific circumstances, I could do that.

    What I can give you is a very rough estimate of what the odds are of a team winning that scores 1-4 runs in the first vs. the odds of a team winning that scores 1-4 runs in the 9th.

    Teams that score 1-4 runs in the Top of 1 win 75.1% of their games.
    Teams that score 1-4 runs in the Top of 9 win 87.6% of their games.

    These are very rough, based on just one week of data, which really isn't enough to get the right frequencies of each situation. But it's the best that I can do.
    Actually, this shows the danger of a small sample size. I don’t know if other qualifiers were used in this example, but taken literally as it is stated, the records turn out nearly identical:

    Teams that score 1-4 runs in the Top of 1 win 59.1% of their games (23823-16485).
    Teams that score 1-4 runs in the Top of 9 win 59.6% of their games (21472-14529).

    I’d kind of expect the number of games where a team scores in the first exceeds the games scored in the 9th due to the fact that the best hitters always come up in the 1st (unless there is a manager predisposed to putting a scrappy little fireplug banjo hitter in the two hole year after year), but still the percentages are quite similar.

    Later in the thread, there were these stats presented:

    Here is a quick summation of each run scored in those situations up to 4 runs. The numbers are rough, because I only used a few examples to make the averages, but you'll see, that there's not much room for adjustments with more data.

    Teams that score one run in the Top of 1 win 63 % of their games.
    Teams that score 2 runs in the Top of 1 win 77% of their games.
    Teams that score 3 runs in the Top of 1 win 88% of their games.
    Teams that score 4 runs in the Top of 1 win 92% of their games.

    Teams that score one run in the Top of 9 win 81% of their games.
    Teams that score 2 runs in the Top of 9 win 89% of their games.
    Teams that score 3 runs in the Top of 9 win 94% of their games.
    Teams that score 4 runs in the Top of 9 win 97% of their games.

    As teams score more runs, the closer the numbers get, but with just one and two runs scoring, there is big difference. I can't do an average since I don't know how many times each one occurred in the history of MLB, but we can safely assume that as the numbers of runs get larger, the frequency diminishes, which means the difference between the 1 run games will have a stronger force on the average than the bigger number games. Meaning, even as we add more an more runs scored examples, the average isn't going to change much for the difference between the 1 run games.

    Here are the actual numbers:

    Teams that score one run in the Top of 1 win 51.8 % of their games.
    Teams that score 2 runs in the Top of 1 win 62.8% of their games.
    Teams that score 3 runs in the Top of 1 win 71.5% of their games.
    Teams that score 4 runs in the Top of 1 win 82.1% of their games.

    Teams that score one run in the Top of 9 win 53.0% of their games.
    Teams that score 2 runs in the Top of 9 win 63.2% of their games.
    Teams that score 3 runs in the Top of 9 win 73.6% of their games.
    Teams that score 4 runs in the Top of 9 win 82.5% of their games.

    There is a very marginal difference when scoring in the 9th, but that has other factors as well.

    Someone else asked a very specific question that I was curious about myself:

    Teams that score in the Top of 1 win ___% of their games.

    Teams that score in the Top of 9 win ___% of their games.
    The results:
    Teams that score in the Top of 1 win _59__% of their games.
    Teams that score in the Top of 9 win _59__% of their games.

    There really isn’t a difference at all, since scoring is always a good thing no matter when you do it.

    But this prompted another question. What about teams that score in the top of the 1st and never again, and teams that score for the first time in the top of the 9th? Here we go:

    Teams that score in the Top of 1 and never again win 16% of their games.
    Teams that score in the Top of 9 for the first time win 18% of their games.

    Again, a marginal difference, but this situation happens pretty rarely (2.2% of the time) and the sample size is small, but it’s still an indicator that a team really just needs to score as often as possible. Who knew?


    Anyway, I just wanted to throw this stuff out there since I can’t post on the main board. But if anyone wants me to run the numbers a different way, I can certainly do that. Also, one caveat: I may have made a mistake somewhere because I regularly recreate the raw database when new or updated data is available and I could have flubbed up something, but I'm pretty sure this is accurate. If something jumps out at you, let me know and I'll check.

  2. Likes:

    Herzeleid (09-20-2014),RedlegJake (09-20-2014),SteelSD (09-19-2014),texasdave (09-19-2014)


  3. Turn Off Ads?
  4. #2
    Member 757690's Avatar
    Join Date
    Mar 2007
    Location
    Venice
    Posts
    33,292

    Re: Run Scoring Efficiency Redux

    I know you trust your database, but this runs counter to what the Wins Expectancy charts says. That's what I based my numbers on, which Fangraphs uses on a daily basis. So either you or Fangraphs are wrong on this. The sample size shouldn't matter that much.
    Hoping to change my username to 75769024

  5. #3
    Member 757690's Avatar
    Join Date
    Mar 2007
    Location
    Venice
    Posts
    33,292

    Re: Run Scoring Efficiency Redux

    Here is a chart by Tango that shows that as the game progresses, the leverage of each situation increases:

    http://www.insidethebook.com/li.shtml
    Hoping to change my username to 75769024

  6. #4
    Member 757690's Avatar
    Join Date
    Mar 2007
    Location
    Venice
    Posts
    33,292

    Re: Run Scoring Efficiency Redux

    I want to point to a specific stat of yours that contradicts what Fangraphs says about the win expectancy of a situation.

    You say that your numbers say that a team who scores one run in the top of the first wins 51.8% of the time.

    Fangraphs says that a team that scores a run in the top of the first wins 55.1% of the time. Here is a link to a game in which a team scores a run in the first. It shows that at the end of that first inning, where they scored one run, they won 55.1% of the time.

    http://www.fangraphs.com/plays.aspx?...=0&season=2014

    Another example:

    You say that your numbers say that a team who scores two runs in the top of the first wins 62.8% of the time.

    Fangraphs says that a team that scores two runs in the top of the first wins 70.6% of the time. Here is a link to a game in which a team scores a run in the first. It shows that at the end of that first inning, where they scored two runs, they won 70.6% of the time.

    http://www.fangraphs.com/plays.aspx?...=0&season=2014

    Clearly, your numbers are wrong, or Fangraphs numbers are wrong.
    Last edited by 757690; 09-19-2014 at 09:41 PM.
    Hoping to change my username to 75769024

  7. #5
    Moderator RedlegJake's Avatar
    Join Date
    Dec 2004
    Location
    Saint Joseph, Mo
    Posts
    9,731

    Re: Run Scoring Efficiency Redux

    LOL...even when someone shows you in black and white you just can't accept it. Just agree that you disagree and move on.
    99% of all numbers only tell 33% of the story so when looking at the numbers remember that numbers is plural...

  8. #6
    Member
    Join Date
    Sep 2014
    Posts
    75

    Re: Run Scoring Efficiency Redux

    Quote Originally Posted by BernieCarbo View Post
    I found one particular thread in the main forum pretty interesting (Would you move Lorenzen to the bullpen), and since I have a wealth of information in a database and a method of extracting it quite easily, I thought I’d build on it and clear up some misconceptions with real data. I won’t mention specific posters and I don’t care about who said what and I don’t care who won an argument. I am in the data analysis business, and I just like to take raw data and let it tell the story.

    Although my data encompasses every major league game (long story, but it includes all the retrosheet data as well as from other sources, but the structure was massaged quite a bit to make it easier to extract data), for this exercise I used only games from 1920-2013 ( for a total of 161,000 games), and excluded tied, rain-shortened, forfeited, and extra-inning games. Of course these games have some meaning and I also made an analysis that included them separately, but it barely changed the percentages, and it didn’t make sense to compare the number of runs scored in the 1st with the runs in the 9th if the 9th didn’t even exist. Extra inning games are special cases as well, and there were some oddities I didn’t expect when I ran some numbers against them (a discussion for another time).

    Anyway, here are some numbers from a much larger sample set. If someone wants me to run the numbers differently or use specific circumstances, I could do that.



    Actually, this shows the danger of a small sample size. I don’t know if other qualifiers were used in this example, but taken literally as it is stated, the records turn out nearly identical:

    Teams that score 1-4 runs in the Top of 1 win 59.1% of their games (23823-16485).
    Teams that score 1-4 runs in the Top of 9 win 59.6% of their games (21472-14529).

    I’d kind of expect the number of games where a team scores in the first exceeds the games scored in the 9th due to the fact that the best hitters always come up in the 1st (unless there is a manager predisposed to putting a scrappy little fireplug banjo hitter in the two hole year after year), but still the percentages are quite similar.

    Later in the thread, there were these stats presented:




    Here are the actual numbers:

    Teams that score one run in the Top of 1 win 51.8 % of their games.
    Teams that score 2 runs in the Top of 1 win 62.8% of their games.
    Teams that score 3 runs in the Top of 1 win 71.5% of their games.
    Teams that score 4 runs in the Top of 1 win 82.1% of their games.

    Teams that score one run in the Top of 9 win 53.0% of their games.
    Teams that score 2 runs in the Top of 9 win 63.2% of their games.
    Teams that score 3 runs in the Top of 9 win 73.6% of their games.
    Teams that score 4 runs in the Top of 9 win 82.5% of their games.

    There is a very marginal difference when scoring in the 9th, but that has other factors as well.

    Someone else asked a very specific question that I was curious about myself:



    The results:
    Teams that score in the Top of 1 win _59__% of their games.
    Teams that score in the Top of 9 win _59__% of their games.

    There really isn’t a difference at all, since scoring is always a good thing no matter when you do it.

    But this prompted another question. What about teams that score in the top of the 1st and never again, and teams that score for the first time in the top of the 9th? Here we go:

    Teams that score in the Top of 1 and never again win 16% of their games.
    Teams that score in the Top of 9 for the first time win 18% of their games.

    Again, a marginal difference, but this situation happens pretty rarely (2.2% of the time) and the sample size is small, but it’s still an indicator that a team really just needs to score as often as possible. Who knew?


    Anyway, I just wanted to throw this stuff out there since I can’t post on the main board. But if anyone wants me to run the numbers a different way, I can certainly do that. Also, one caveat: I may have made a mistake somewhere because I regularly recreate the raw database when new or updated data is available and I could have flubbed up something, but I'm pretty sure this is accurate. If something jumps out at you, let me know and I'll check.
    Nice work!

  9. #7
    Member 757690's Avatar
    Join Date
    Mar 2007
    Location
    Venice
    Posts
    33,292

    Re: Run Scoring Efficiency Redux

    Quote Originally Posted by RedlegJake View Post
    LOL...even when someone shows you in black and white you just can't accept it. Just agree that you disagree and move on.
    I clearly showed in black and white that his work is wrong. That his numbers contradict the numbers that are in black and white on Fangraphs and Tom Tango, the two leading experts on sabermetrics, He is contradicting the stats provided by the two leading experts on this subject. And yet, you believe him over me. Wow.

    This guy shows up our if nowhere. and says that he personally had kept track of every MLB game and has the ability to sort out how each team did after every scoring event in the history baseball since the 1930's. Who has that? And why does his personal numbers contradict the numbers that Fangrpahs and Tom Tango put up?

    It really is amazing and befuddling, that anyone would believe this guy, with almost no history here on Redszone, who is presenting numbers with no link, nothing to back them up, that he claims he has personally kept for decades, and not believe me, when I provide links to the stats that I used, links to Fangraphs and Tom Tango that 100% back up my claim.

    I am going to ask you and everyone else, why do you believe this guy, who has provided zero proof of his numbers, and not believe me, who has provided proof of my numbers?
    Last edited by 757690; 09-20-2014 at 11:36 AM.
    Hoping to change my username to 75769024

  10. #8
    Member 757690's Avatar
    Join Date
    Mar 2007
    Location
    Venice
    Posts
    33,292

    Re: Run Scoring Efficiency Redux

    Quote Originally Posted by Herzeleid View Post
    Nice work!
    Too bad it's wrong, and likely made up.
    Hoping to change my username to 75769024

  11. #9
    Member BernieCarbo's Avatar
    Join Date
    May 2014
    Location
    Heaven On Earth
    Posts
    1,650

    Re: Run Scoring Efficiency Redux

    Quote Originally Posted by 757690 View Post
    Too bad it's wrong, and likely made up.
    It's not made up, and I'm not sure why you would say that. I'm running some verifications of the data to make sure I didn't flub up something in my parser to the database. It looks good so far, although I found a couple of very minor discrepancies in earlier years between Retrosheets and Baseball Reference. This is expected as each one admits that some game data is built from various newspaper accounts, and even today some things like RBI totals are modified (see Hack Wilson). I'll run some more cross checks before I post further data.

  12. #10
    Member 757690's Avatar
    Join Date
    Mar 2007
    Location
    Venice
    Posts
    33,292

    Re: Run Scoring Efficiency Redux

    Quote Originally Posted by BernieCarbo View Post
    It's not made up, and I'm not sure why you would say that. I'm running some verifications of the data to make sure I didn't flub up something in my parser to the database. It looks good so far, although I found a couple of very minor discrepancies in earlier years between Retrosheets and Baseball Reference. This is expected as each one admits that some game data is built from various newspaper accounts, and even today some things like RBI totals are modified (see Hack Wilson). I'll run some more cross checks before I post further data.
    I said your numbers are likely made up because they contradict the numbers that Fangraphs and Tom Tango have presented.

    Please explain why your numbers contradict the numbers on Fangraphs.

    Lol, I can say that I have a magic database, and present any kind of numbers I want from them. Anyone can.

    I have presented numbers taken directly from Fangraphs, with links to Fangraphs, and also presented a link to a chart from Tom Tango that also backs up my stats.

    Before anyone can believe you, you have to explain why your numbers contradict Fangraphs and Tom Tango.
    Hoping to change my username to 75769024

  13. #11
    Member BernieCarbo's Avatar
    Join Date
    May 2014
    Location
    Heaven On Earth
    Posts
    1,650

    Re: Run Scoring Efficiency Redux

    Quote Originally Posted by 757690 View Post
    This guy shows up our if nowhere. and says that he personally had kept track of every MLB game and has the ability to sort out how each team did after every scoring event in the history baseball since the 1930's. Who has that? And why does his personal numbers contradict the numbers that Fangrpahs and Tom Tango put up?

    It really is amazing and befuddling, that anyone would believe this guy, with almost no history here on Redszone, who is presenting numbers with no link, nothing to back them up, that he claims he has personally kept for decades, and not believe me, when I provide links to the stats that I used, links to Fangraphs and Tom Tango that 100% back up my claim.

    I am going to ask you and everyone else, why do you believe this guy, who has provided zero proof of his numbers, and not believe me, who has provided proof of my numbers?
    I don't have a link because I created the numbers from actual data. The raw data is readily available on the net (every line score and most play by play data), and anyone with a basic understanding of programming can write a program that grabs these downloads and sticks them in a database. It isn't difficult. You could download SQLServer for free and even use excel to construct an import script. Personally, I'm amazed that you think having this data is a big deal.

    I came out of nowhere because we all start from zero, right? I just found that particular topic interesting and wanted to offer some input. No worries.

  14. #12
    Member 757690's Avatar
    Join Date
    Mar 2007
    Location
    Venice
    Posts
    33,292

    Re: Run Scoring Efficiency Redux

    Quote Originally Posted by BernieCarbo View Post
    I don't have a link because I created the numbers from actual data. The raw data is readily available on the net (every line score and most play by play data), and anyone with a basic understanding of programming can write a program that grabs these downloads and sticks them in a database. It isn't difficult. You could download SQLServer for free and even use excel to construct an import script. Personally, I'm amazed that you think having this data is a big deal.

    I came out of nowhere because we all start from zero, right? I just found that particular topic interesting and wanted to offer some input. No worries.
    You haven't answered my question. Why do your numbers contradict the numbers on Fangraohs?
    Hoping to change my username to 75769024

  15. #13
    Member BernieCarbo's Avatar
    Join Date
    May 2014
    Location
    Heaven On Earth
    Posts
    1,650

    Re: Run Scoring Efficiency Redux

    They are using different constraints. I only use the constraints I defined above (1920-present, complete nine inning games), and answered the questions as presented in the thread. If you know their constraints and qualifiers, I'll run the same algorithm and see how the numbers compare. But, they will be the same because I am sure we all use the same data. I don't make stuff up.

  16. #14
    Member 757690's Avatar
    Join Date
    Mar 2007
    Location
    Venice
    Posts
    33,292

    Re: Run Scoring Efficiency Redux

    Quote Originally Posted by BernieCarbo View Post
    They are using different constraints. I only use the constraints I defined above (1920-present, complete nine inning games), and answered the questions as presented in the thread. If you know their constraints and qualifiers, I'll run the same algorithm and see how the numbers compare. But, they will be the same because I am sure we all use the same data. I don't make stuff up.
    Nope, sorry. Same constraints and qualifiers.

    I compared Fangraphs' numbers when a team scores a single run in the top of the first inning, and when a team scores exactly two runs in the top of the first inning. The numbers they have for this exact situation are different from the numbers you produced for the exact same situations. And by a significant margin.

    You seem like a nice guy, so I stop saying you likely made them up. But if you didn't make them up, then you made a mistake in your calculations. Or Fangraphs did. This is not a debatable point. They are the numbers for the exact same situation, based on historical data. They have to be identical. If they are not, someone made a mistake.
    Hoping to change my username to 75769024

  17. #15
    Member BernieCarbo's Avatar
    Join Date
    May 2014
    Location
    Heaven On Earth
    Posts
    1,650

    Re: Run Scoring Efficiency Redux

    Can you tell me what Fangraphs says for say, 1954? What is the record for teams that score 2 runs in the top of the first? I'll compare it to mine and see where the mistake is.


Turn Off Ads?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

Board Moderators may, at their discretion and judgment, delete and/or edit any messages that violate any of the following guidelines: 1. Explicit references to alleged illegal or unlawful acts. 2. Graphic sexual descriptions. 3. Racial or ethnic slurs. 4. Use of edgy language (including masked profanity). 5. Direct personal attacks, flames, fights, trolling, baiting, name-calling, general nuisance, excessive player criticism or anything along those lines. 6. Posting spam. 7. Each person may have only one user account. It is fine to be critical here - that's what this board is for. But let's not beat a subject or a player to death, please.

Thank you, and most importantly, enjoy yourselves!


RedsZone.com is a privately owned website and is not affiliated with the Cincinnati Reds or Major League Baseball


Contact us: Boss | Gallen5862 | Plus Plus | Powel Crosley | RedlegJake | The Operator