JinAZ
03-06-2006, 10:45 PM
I just ran across this article over (http://www.hardballtimes.com/main/article/constructing-lineups/) at the Hardball Times, which linked to an earlier article (http://www.beyondtheboxscore.com/story/2006/2/12/133645/296) by Cyril Morong at Beyond the Boxscore. Cyril did a regression analysis in which he looked at the effect of OBA and SLG on team runs for each position in the batting order. Based on his analysis, other bloggers have created various scripts and tools to take a set of nine hitters and spit out the best possible lineup. A nice tool, written by David Pinto at Baseball Musings, can be found here (http://www.baseballmusings.com/cgi-bin/LineupAnalysis.py?). It allows one to quickly input your favorite team's starting 9 and see what Morong's regression analysis would recommend for the lineup order.
My understanding is that it all works like this: for each position in the lineup, Morong's regression analysis created a coefficient for OBA and a coefficient for SLG. Pinto's tool on his blog has you input nine batters, their OBA, and their SLG. First, the tool calculates the expected runs per game for the lineup you inputted by multiplying each player's OBA's and SLG's by the corresponding coefficients from Morong's analysis. Next, it tries all other possible lineup combinations of the nine players you added and sorts those by most runs produced.
To try it out, I inputted the lineup that I would recommend for the Reds at the start of the season (Harang is used as a pitcher to help understand how a horrible hitting pitcher should be placed--his OPS last year was 0.054!):
1. Freel
2. Lopez
3. Griffey
4. Dunn
5. Kearns
6. LaRue
7. Pena
8. Encarnacion
9. Harang
Pinto's tool indicated that this lineup would produce, on average, 4.541 runs/game. Not bad. But what were the best lineups? Pinto's tool spits out the top 30. Below, for each position, I list the players selected, followed by the frequency of their occurrences in those 30 lineups. The estimated runs per game of these 30 lineups hardly differed at all, and range from 4.942 to 4.924. Over a 162 game season, Morong's analysis predicts that if these optimized lineups were used instead of mine, we could see an increase in run production from 736 runs to 798-801 runs--a difference of 65 runs per season (roughly 6-7 wins)! It's also interesting to note that these same players, put into the worst possible lineup order, could produce only 3.943 runs per game (639/season, 162 runs different from the maximum…or roughly 16 wins). So lineup order does matter!
Pinto's Tool's Picks:
1. Dunn (18), LaRue (8), Lopez (4)
2. Dunn (9), Lopez (8), Griffey (6), LaRue (4), Kearns (3)
3. Kearns (12), LaRue (10), Lopez (4), Encarnacion (4)
4. Griffey (24), Lopez (3), Pena (2), Kearns (1)
5. Lopez (11), LaRue (8), Kearns (8), Dunn (3)
6. Pena (18), Encarnacion (11), Kearns (1)
7. Encarnacion (15), Pena (10), Kearns (5)
8. Harang (30)
9. Freel (30)
Whoa! Dunn leading off? Harang batting 8th? My boy Freel hitting 9th?? What is going on here?!?
To decipher this, I turned to this article (http://www.beyondtheboxscore.com/story/2006/2/25/21329/9401) by Dan Scotto, which offered some nice guidance into what Morong's analysis found to help interpret why it's doing what it is doing. After reading Scotto's descriptions and staring at the coefficients for a while, here is my take on what the analysis recommends for each position and why it's choosing the players it is choosing (numbers in parentheses are the ranks from an ascending sort of the OBA and SLG coefficients; better ranking = larger coefficient = higher rewards (runs) for this attribute at this lineup position):
1. (#1 ranked OBA, #7 ranked SLG) This one is all about on-base percentage. The more likely this player is to get on base, the more likely it is that the players behind him will drive him in. This is why Dunn does so well here; he gets on base more reliably than anyone else on the team (it should be noted that the simulator knows nothing about speed or strikeouts).
2. (#3 OBA, #2 SLG) The #2 hitter is the best, most balanced hitter. Last year, Dunn, Lopez, and Griffey all had relatively high OBA (0.352+) and SLG (0.480+), and thus they all fit in fairly well in this spot (though Griffey's high SLG makes him a better candidate for a position that really emphasizes SLG...see below).
3. (#5 OBA, #6 SLG) The conventional wisdom is that your best pure hitter should go here -- Sean Casey in his prime, for example. Or perhaps your best power hitter who you want to make sure hits in the first inning, like McGwire in his prime. But, in fact, the analysis indicates that a relatively average player should go here, perhaps in order to spread around your less productive outs. I had a hard time believing this, as it seems like you'd want to at least have a high OBA here. Nevertheless, the relatively low coefficients indicate relatively little differences in total runs scored resulted when OBA or SLG varied at this lineup position relative to other positions. Very surprising. Based on last years' statistics, Kearns and LaRue fit this bill the best. A stronger season from Kearns and a weaker season by LaRue would probably place LaRue here and Kearns in more valuable lineup positions (see ZiPS work below).
4. (#7 OBA, #1 SLG) The complete opposite of the leadoff hitter. This position rewards SLG above all else, so you want your biggest bopper here. Griffey was the clear choice; if another very high OBP guy was in the lineup, however, I wouldn't be surprised to see Adam Dunn be placed here as well due to his high SLG.
5. (#4 OBA, #5 SLG) This is another spot that demands balance, like the #2 hole. Rewards aren't quite as good from this position as #2, but the regression's recommendation contrasts with the more traditional high power, low OBP guys that I've always heard belonged in the 5 hole. Still, it's very surprising that this player should be a better hitter in terms of both OBA and SLG than the #3 guy. Anyway, Lopez, LaRue, and Kearns all fit in well here with relatively good balance.
6. (#8 OBA, #3 SLG) Following the balanced player in #5 is the guy I usually think of for a #5 hitter. Poor on base average, but a guy who can knock the heck out of the ball. Essentially, this is the same type of player you'd put in the 4 hole, it's just that he's not as good. Wily Mo Pena is the obvious choice.
7. (#7 OBA, #4 SLG) This guy is fairly similar to the #6 hitter, but doesn't quite have the power and has a bit better ability to get on base. EdE fits in well here, as he can get on base better than Pena, yet still does have good power (9 HR in 211 AB's last year).
8. (#9 OBA, #9 SLG) Surprise! This should be your absolute worst hitter, which for just about any NL team will be the pitcher. Increasing OBA or SLG results in fewer runs gained here than any other position in the lineup. Now all NL teams (except for a brief experiment by LaRussa back in McGwire's prime) bat the pitcher 9th, because they want to minimize the number of at bats this player receives and postpone, for as long as possible, the need to pinch hit for him later in the game. But as we'll see, the 9th hitter can be a very productive player:
9. (#2 OBA, #8 SLG). The OBA coefficient for the #9 hitter (2.55) was more than twice as large as that for the #8 hitter (1.188). This means that an increase in OBA in the #8 hole would result in less than half as many additional runs as would the same increase in the #9 hole. Why this dramatic discrepancy? Because the #9 hitter will be on base for your best hitters - the guys in the #1, #2, and #4 holes. Their SLG matters very little, however, because few people are likely to be on base when they come up to bat (especially with the pitcher hitting in front of them!). Ryan Freel, a high OBA, low SLG player, is the prototypical guy for this spot. Another example might be someone like Frank Menechino.
Now I think a lot of people would predict some differences from last year's performance among our players. I'm expecting Edwin Encarnacion and Austin Kearns to be quite a bit more productive than they were last year, and I would not be surprised (unfortunately) to see Jason LaRue drop off in his production a bit. Therefore, I did a second run based on Baseball Think Factory's 2006 ZiPS Projections (http://www.baseballthinkfactory.org/files/oracle/discussion/2006_zips_projections_cincinnati_reds/). Here are the results:
1. Dunn (22), Kearns (6), Lopez (1), LaRue (1)
2. Griffey (10), Kearns (10), Dunn (7), Lopez (1), Encarnacion (1)
3. LaRue (18), Lopez (9), Encarnacion (2), Kearns (1)
4. Griffey (16), Pena (13), Kearns (1)
5. Kearns (9), Lopez (9), Griffey (5), LaRue (4), Encarnacion (3)
6. Pena (15), Encarnacion (11), Lopez (2), LaRue (1), Kearns (1)
7. Encarnacion (13), Lopez (8), LaRue (5), Kearns (2), Pena (2)
8. Harang (30)
9. Freel (30)
A few differences in who wins out at the fiercely contended #2, and #5 spots (balanced players), as well as the #3 spot (the leftover player), but #'s 1, 4, 6, 7, 8, and 9 are all the same. In fact, the only major difference is LaRue's "dominance" in the #3 hole with these projections, caused no doubt by his predicted return to mediocrity and Kearns' predicted improvement. Nevertheless, the lineup recommendations are remarkably static, indicating that each spot in the lineup really does have an optimal role that matches up with particular players' strengths.
Of course, these predictions are inferences based on looking at variances in performance at each lineup position from '98 to '02, not actual experimental data. The best evidence for these claims would come from actually having a team try these ideas out, which unfortunately is unlikely to ever happen. I may try to do some additional toying with Pinto's tool, or maybe even some simulations, at a later date. For now, however, it's an interesting thought exercise.
A few other quick notes:
* Using ZiPS Projections, replacing Encarnacion with Aurilia at 3B drops the maximum optimized run production from 4.912 to 4.834 runs per game (12.6 runs total in a season, or ~1 win). Not a huge difference, but given Encarnacion's upside at this point in his career, it seems the obvious move.
* Again using ZiPS Projections, replacing Freel with Womack at 2B drops the maximum optimized run production from 4.912 to a dreadful 4.670 runs per game (39.2 runs total difference over the season! – perhaps 4 wins!). Both Freel and Womack are always placed in the #9 hole in the top 30 lineups; what we're seeing is the benefit of high OBA from that lineup position.
-JinAZ
My understanding is that it all works like this: for each position in the lineup, Morong's regression analysis created a coefficient for OBA and a coefficient for SLG. Pinto's tool on his blog has you input nine batters, their OBA, and their SLG. First, the tool calculates the expected runs per game for the lineup you inputted by multiplying each player's OBA's and SLG's by the corresponding coefficients from Morong's analysis. Next, it tries all other possible lineup combinations of the nine players you added and sorts those by most runs produced.
To try it out, I inputted the lineup that I would recommend for the Reds at the start of the season (Harang is used as a pitcher to help understand how a horrible hitting pitcher should be placed--his OPS last year was 0.054!):
1. Freel
2. Lopez
3. Griffey
4. Dunn
5. Kearns
6. LaRue
7. Pena
8. Encarnacion
9. Harang
Pinto's tool indicated that this lineup would produce, on average, 4.541 runs/game. Not bad. But what were the best lineups? Pinto's tool spits out the top 30. Below, for each position, I list the players selected, followed by the frequency of their occurrences in those 30 lineups. The estimated runs per game of these 30 lineups hardly differed at all, and range from 4.942 to 4.924. Over a 162 game season, Morong's analysis predicts that if these optimized lineups were used instead of mine, we could see an increase in run production from 736 runs to 798-801 runs--a difference of 65 runs per season (roughly 6-7 wins)! It's also interesting to note that these same players, put into the worst possible lineup order, could produce only 3.943 runs per game (639/season, 162 runs different from the maximum…or roughly 16 wins). So lineup order does matter!
Pinto's Tool's Picks:
1. Dunn (18), LaRue (8), Lopez (4)
2. Dunn (9), Lopez (8), Griffey (6), LaRue (4), Kearns (3)
3. Kearns (12), LaRue (10), Lopez (4), Encarnacion (4)
4. Griffey (24), Lopez (3), Pena (2), Kearns (1)
5. Lopez (11), LaRue (8), Kearns (8), Dunn (3)
6. Pena (18), Encarnacion (11), Kearns (1)
7. Encarnacion (15), Pena (10), Kearns (5)
8. Harang (30)
9. Freel (30)
Whoa! Dunn leading off? Harang batting 8th? My boy Freel hitting 9th?? What is going on here?!?
To decipher this, I turned to this article (http://www.beyondtheboxscore.com/story/2006/2/25/21329/9401) by Dan Scotto, which offered some nice guidance into what Morong's analysis found to help interpret why it's doing what it is doing. After reading Scotto's descriptions and staring at the coefficients for a while, here is my take on what the analysis recommends for each position and why it's choosing the players it is choosing (numbers in parentheses are the ranks from an ascending sort of the OBA and SLG coefficients; better ranking = larger coefficient = higher rewards (runs) for this attribute at this lineup position):
1. (#1 ranked OBA, #7 ranked SLG) This one is all about on-base percentage. The more likely this player is to get on base, the more likely it is that the players behind him will drive him in. This is why Dunn does so well here; he gets on base more reliably than anyone else on the team (it should be noted that the simulator knows nothing about speed or strikeouts).
2. (#3 OBA, #2 SLG) The #2 hitter is the best, most balanced hitter. Last year, Dunn, Lopez, and Griffey all had relatively high OBA (0.352+) and SLG (0.480+), and thus they all fit in fairly well in this spot (though Griffey's high SLG makes him a better candidate for a position that really emphasizes SLG...see below).
3. (#5 OBA, #6 SLG) The conventional wisdom is that your best pure hitter should go here -- Sean Casey in his prime, for example. Or perhaps your best power hitter who you want to make sure hits in the first inning, like McGwire in his prime. But, in fact, the analysis indicates that a relatively average player should go here, perhaps in order to spread around your less productive outs. I had a hard time believing this, as it seems like you'd want to at least have a high OBA here. Nevertheless, the relatively low coefficients indicate relatively little differences in total runs scored resulted when OBA or SLG varied at this lineup position relative to other positions. Very surprising. Based on last years' statistics, Kearns and LaRue fit this bill the best. A stronger season from Kearns and a weaker season by LaRue would probably place LaRue here and Kearns in more valuable lineup positions (see ZiPS work below).
4. (#7 OBA, #1 SLG) The complete opposite of the leadoff hitter. This position rewards SLG above all else, so you want your biggest bopper here. Griffey was the clear choice; if another very high OBP guy was in the lineup, however, I wouldn't be surprised to see Adam Dunn be placed here as well due to his high SLG.
5. (#4 OBA, #5 SLG) This is another spot that demands balance, like the #2 hole. Rewards aren't quite as good from this position as #2, but the regression's recommendation contrasts with the more traditional high power, low OBP guys that I've always heard belonged in the 5 hole. Still, it's very surprising that this player should be a better hitter in terms of both OBA and SLG than the #3 guy. Anyway, Lopez, LaRue, and Kearns all fit in well here with relatively good balance.
6. (#8 OBA, #3 SLG) Following the balanced player in #5 is the guy I usually think of for a #5 hitter. Poor on base average, but a guy who can knock the heck out of the ball. Essentially, this is the same type of player you'd put in the 4 hole, it's just that he's not as good. Wily Mo Pena is the obvious choice.
7. (#7 OBA, #4 SLG) This guy is fairly similar to the #6 hitter, but doesn't quite have the power and has a bit better ability to get on base. EdE fits in well here, as he can get on base better than Pena, yet still does have good power (9 HR in 211 AB's last year).
8. (#9 OBA, #9 SLG) Surprise! This should be your absolute worst hitter, which for just about any NL team will be the pitcher. Increasing OBA or SLG results in fewer runs gained here than any other position in the lineup. Now all NL teams (except for a brief experiment by LaRussa back in McGwire's prime) bat the pitcher 9th, because they want to minimize the number of at bats this player receives and postpone, for as long as possible, the need to pinch hit for him later in the game. But as we'll see, the 9th hitter can be a very productive player:
9. (#2 OBA, #8 SLG). The OBA coefficient for the #9 hitter (2.55) was more than twice as large as that for the #8 hitter (1.188). This means that an increase in OBA in the #8 hole would result in less than half as many additional runs as would the same increase in the #9 hole. Why this dramatic discrepancy? Because the #9 hitter will be on base for your best hitters - the guys in the #1, #2, and #4 holes. Their SLG matters very little, however, because few people are likely to be on base when they come up to bat (especially with the pitcher hitting in front of them!). Ryan Freel, a high OBA, low SLG player, is the prototypical guy for this spot. Another example might be someone like Frank Menechino.
Now I think a lot of people would predict some differences from last year's performance among our players. I'm expecting Edwin Encarnacion and Austin Kearns to be quite a bit more productive than they were last year, and I would not be surprised (unfortunately) to see Jason LaRue drop off in his production a bit. Therefore, I did a second run based on Baseball Think Factory's 2006 ZiPS Projections (http://www.baseballthinkfactory.org/files/oracle/discussion/2006_zips_projections_cincinnati_reds/). Here are the results:
1. Dunn (22), Kearns (6), Lopez (1), LaRue (1)
2. Griffey (10), Kearns (10), Dunn (7), Lopez (1), Encarnacion (1)
3. LaRue (18), Lopez (9), Encarnacion (2), Kearns (1)
4. Griffey (16), Pena (13), Kearns (1)
5. Kearns (9), Lopez (9), Griffey (5), LaRue (4), Encarnacion (3)
6. Pena (15), Encarnacion (11), Lopez (2), LaRue (1), Kearns (1)
7. Encarnacion (13), Lopez (8), LaRue (5), Kearns (2), Pena (2)
8. Harang (30)
9. Freel (30)
A few differences in who wins out at the fiercely contended #2, and #5 spots (balanced players), as well as the #3 spot (the leftover player), but #'s 1, 4, 6, 7, 8, and 9 are all the same. In fact, the only major difference is LaRue's "dominance" in the #3 hole with these projections, caused no doubt by his predicted return to mediocrity and Kearns' predicted improvement. Nevertheless, the lineup recommendations are remarkably static, indicating that each spot in the lineup really does have an optimal role that matches up with particular players' strengths.
Of course, these predictions are inferences based on looking at variances in performance at each lineup position from '98 to '02, not actual experimental data. The best evidence for these claims would come from actually having a team try these ideas out, which unfortunately is unlikely to ever happen. I may try to do some additional toying with Pinto's tool, or maybe even some simulations, at a later date. For now, however, it's an interesting thought exercise.
A few other quick notes:
* Using ZiPS Projections, replacing Encarnacion with Aurilia at 3B drops the maximum optimized run production from 4.912 to 4.834 runs per game (12.6 runs total in a season, or ~1 win). Not a huge difference, but given Encarnacion's upside at this point in his career, it seems the obvious move.
* Again using ZiPS Projections, replacing Freel with Womack at 2B drops the maximum optimized run production from 4.912 to a dreadful 4.670 runs per game (39.2 runs total difference over the season! – perhaps 4 wins!). Both Freel and Womack are always placed in the #9 hole in the top 30 lineups; what we're seeing is the benefit of high OBA from that lineup position.
-JinAZ