View Full Version : New Stat Perspectives on Reds Lineup Construction

03-06-2006, 11:45 PM
I just ran across this article over (http://www.hardballtimes.com/main/article/constructing-lineups/) at the Hardball Times, which linked to an earlier article (http://www.beyondtheboxscore.com/story/2006/2/12/133645/296) by Cyril Morong at Beyond the Boxscore. Cyril did a regression analysis in which he looked at the effect of OBA and SLG on team runs for each position in the batting order. Based on his analysis, other bloggers have created various scripts and tools to take a set of nine hitters and spit out the best possible lineup. A nice tool, written by David Pinto at Baseball Musings, can be found here (http://www.baseballmusings.com/cgi-bin/LineupAnalysis.py?). It allows one to quickly input your favorite team's starting 9 and see what Morong's regression analysis would recommend for the lineup order.

My understanding is that it all works like this: for each position in the lineup, Morong's regression analysis created a coefficient for OBA and a coefficient for SLG. Pinto's tool on his blog has you input nine batters, their OBA, and their SLG. First, the tool calculates the expected runs per game for the lineup you inputted by multiplying each player's OBA's and SLG's by the corresponding coefficients from Morong's analysis. Next, it tries all other possible lineup combinations of the nine players you added and sorts those by most runs produced.

To try it out, I inputted the lineup that I would recommend for the Reds at the start of the season (Harang is used as a pitcher to help understand how a horrible hitting pitcher should be placed--his OPS last year was 0.054!):

1. Freel
2. Lopez
3. Griffey
4. Dunn
5. Kearns
6. LaRue
7. Pena
8. Encarnacion
9. Harang

Pinto's tool indicated that this lineup would produce, on average, 4.541 runs/game. Not bad. But what were the best lineups? Pinto's tool spits out the top 30. Below, for each position, I list the players selected, followed by the frequency of their occurrences in those 30 lineups. The estimated runs per game of these 30 lineups hardly differed at all, and range from 4.942 to 4.924. Over a 162 game season, Morong's analysis predicts that if these optimized lineups were used instead of mine, we could see an increase in run production from 736 runs to 798-801 runs--a difference of 65 runs per season (roughly 6-7 wins)! It's also interesting to note that these same players, put into the worst possible lineup order, could produce only 3.943 runs per game (639/season, 162 runs different from the maximum…or roughly 16 wins). So lineup order does matter!

Pinto's Tool's Picks:
1. Dunn (18), LaRue (8), Lopez (4)
2. Dunn (9), Lopez (8), Griffey (6), LaRue (4), Kearns (3)
3. Kearns (12), LaRue (10), Lopez (4), Encarnacion (4)
4. Griffey (24), Lopez (3), Pena (2), Kearns (1)
5. Lopez (11), LaRue (8), Kearns (8), Dunn (3)
6. Pena (18), Encarnacion (11), Kearns (1)
7. Encarnacion (15), Pena (10), Kearns (5)
8. Harang (30)
9. Freel (30)

Whoa! Dunn leading off? Harang batting 8th? My boy Freel hitting 9th?? What is going on here?!?

To decipher this, I turned to this article (http://www.beyondtheboxscore.com/story/2006/2/25/21329/9401) by Dan Scotto, which offered some nice guidance into what Morong's analysis found to help interpret why it's doing what it is doing. After reading Scotto's descriptions and staring at the coefficients for a while, here is my take on what the analysis recommends for each position and why it's choosing the players it is choosing (numbers in parentheses are the ranks from an ascending sort of the OBA and SLG coefficients; better ranking = larger coefficient = higher rewards (runs) for this attribute at this lineup position):

1. (#1 ranked OBA, #7 ranked SLG) This one is all about on-base percentage. The more likely this player is to get on base, the more likely it is that the players behind him will drive him in. This is why Dunn does so well here; he gets on base more reliably than anyone else on the team (it should be noted that the simulator knows nothing about speed or strikeouts).

2. (#3 OBA, #2 SLG) The #2 hitter is the best, most balanced hitter. Last year, Dunn, Lopez, and Griffey all had relatively high OBA (0.352+) and SLG (0.480+), and thus they all fit in fairly well in this spot (though Griffey's high SLG makes him a better candidate for a position that really emphasizes SLG...see below).

3. (#5 OBA, #6 SLG) The conventional wisdom is that your best pure hitter should go here -- Sean Casey in his prime, for example. Or perhaps your best power hitter who you want to make sure hits in the first inning, like McGwire in his prime. But, in fact, the analysis indicates that a relatively average player should go here, perhaps in order to spread around your less productive outs. I had a hard time believing this, as it seems like you'd want to at least have a high OBA here. Nevertheless, the relatively low coefficients indicate relatively little differences in total runs scored resulted when OBA or SLG varied at this lineup position relative to other positions. Very surprising. Based on last years' statistics, Kearns and LaRue fit this bill the best. A stronger season from Kearns and a weaker season by LaRue would probably place LaRue here and Kearns in more valuable lineup positions (see ZiPS work below).

4. (#7 OBA, #1 SLG) The complete opposite of the leadoff hitter. This position rewards SLG above all else, so you want your biggest bopper here. Griffey was the clear choice; if another very high OBP guy was in the lineup, however, I wouldn't be surprised to see Adam Dunn be placed here as well due to his high SLG.

5. (#4 OBA, #5 SLG) This is another spot that demands balance, like the #2 hole. Rewards aren't quite as good from this position as #2, but the regression's recommendation contrasts with the more traditional high power, low OBP guys that I've always heard belonged in the 5 hole. Still, it's very surprising that this player should be a better hitter in terms of both OBA and SLG than the #3 guy. Anyway, Lopez, LaRue, and Kearns all fit in well here with relatively good balance.

6. (#8 OBA, #3 SLG) Following the balanced player in #5 is the guy I usually think of for a #5 hitter. Poor on base average, but a guy who can knock the heck out of the ball. Essentially, this is the same type of player you'd put in the 4 hole, it's just that he's not as good. Wily Mo Pena is the obvious choice.

7. (#7 OBA, #4 SLG) This guy is fairly similar to the #6 hitter, but doesn't quite have the power and has a bit better ability to get on base. EdE fits in well here, as he can get on base better than Pena, yet still does have good power (9 HR in 211 AB's last year).

8. (#9 OBA, #9 SLG) Surprise! This should be your absolute worst hitter, which for just about any NL team will be the pitcher. Increasing OBA or SLG results in fewer runs gained here than any other position in the lineup. Now all NL teams (except for a brief experiment by LaRussa back in McGwire's prime) bat the pitcher 9th, because they want to minimize the number of at bats this player receives and postpone, for as long as possible, the need to pinch hit for him later in the game. But as we'll see, the 9th hitter can be a very productive player:

9. (#2 OBA, #8 SLG). The OBA coefficient for the #9 hitter (2.55) was more than twice as large as that for the #8 hitter (1.188). This means that an increase in OBA in the #8 hole would result in less than half as many additional runs as would the same increase in the #9 hole. Why this dramatic discrepancy? Because the #9 hitter will be on base for your best hitters - the guys in the #1, #2, and #4 holes. Their SLG matters very little, however, because few people are likely to be on base when they come up to bat (especially with the pitcher hitting in front of them!). Ryan Freel, a high OBA, low SLG player, is the prototypical guy for this spot. Another example might be someone like Frank Menechino.

Now I think a lot of people would predict some differences from last year's performance among our players. I'm expecting Edwin Encarnacion and Austin Kearns to be quite a bit more productive than they were last year, and I would not be surprised (unfortunately) to see Jason LaRue drop off in his production a bit. Therefore, I did a second run based on Baseball Think Factory's 2006 ZiPS Projections (http://www.baseballthinkfactory.org/files/oracle/discussion/2006_zips_projections_cincinnati_reds/). Here are the results:

1. Dunn (22), Kearns (6), Lopez (1), LaRue (1)
2. Griffey (10), Kearns (10), Dunn (7), Lopez (1), Encarnacion (1)
3. LaRue (18), Lopez (9), Encarnacion (2), Kearns (1)
4. Griffey (16), Pena (13), Kearns (1)
5. Kearns (9), Lopez (9), Griffey (5), LaRue (4), Encarnacion (3)
6. Pena (15), Encarnacion (11), Lopez (2), LaRue (1), Kearns (1)
7. Encarnacion (13), Lopez (8), LaRue (5), Kearns (2), Pena (2)
8. Harang (30)
9. Freel (30)

A few differences in who wins out at the fiercely contended #2, and #5 spots (balanced players), as well as the #3 spot (the leftover player), but #'s 1, 4, 6, 7, 8, and 9 are all the same. In fact, the only major difference is LaRue's "dominance" in the #3 hole with these projections, caused no doubt by his predicted return to mediocrity and Kearns' predicted improvement. Nevertheless, the lineup recommendations are remarkably static, indicating that each spot in the lineup really does have an optimal role that matches up with particular players' strengths.

Of course, these predictions are inferences based on looking at variances in performance at each lineup position from '98 to '02, not actual experimental data. The best evidence for these claims would come from actually having a team try these ideas out, which unfortunately is unlikely to ever happen. I may try to do some additional toying with Pinto's tool, or maybe even some simulations, at a later date. For now, however, it's an interesting thought exercise.

A few other quick notes:
* Using ZiPS Projections, replacing Encarnacion with Aurilia at 3B drops the maximum optimized run production from 4.912 to 4.834 runs per game (12.6 runs total in a season, or ~1 win). Not a huge difference, but given Encarnacion's upside at this point in his career, it seems the obvious move.
* Again using ZiPS Projections, replacing Freel with Womack at 2B drops the maximum optimized run production from 4.912 to a dreadful 4.670 runs per game (39.2 runs total difference over the season! – perhaps 4 wins!). Both Freel and Womack are always placed in the #9 hole in the top 30 lineups; what we're seeing is the benefit of high OBA from that lineup position.

03-07-2006, 07:45 AM
Tony LaRussa says hi.

03-07-2006, 12:26 PM
Tony LaRussa says hi.

Yeah, I thought he was nuts when he pulled that stunt back in '98. Now I'm wondering if he was actually on to something. -JinAZ

03-07-2006, 01:44 PM
The only problem with this is ABs. The higher up the order you are, the more ABs you get. As a #3 hitter being just average, I'd rather not give him a bunch more ABs than the others. Same for the pitcher at the 8th spot. Someone posted a thread once about how many more ABs a player will get if he is in the top 3 or 4, and over a course of a season they add up. (if I'm remembering right). Good article though.

03-07-2006, 02:18 PM
The only problem with this is ABs. The higher up the order you are, the more ABs you get. As a #3 hitter being just average, I'd rather not give him a bunch more ABs than the others. Same for the pitcher at the 8th spot. Someone posted a thread once about how many more ABs a player will get if he is in the top 3 or 4, and over a course of a season they add up. (if I'm remembering right). Good article though.

Yeah, that's something I'd thought about as well, and may be something to investigate further with simulations and such. The regression does indicate that you're best served in terms of the number of runs in putting your most productive players at 1, 2, 4, and 5, but it also stands to reason that if you give you better hitters more at bats they'll get you more runs. The regression knows nothing about total number of at bats, although it should show a greater effect on run production at the top of the order if at bats really are important. ...

I'm sure more research will have to be done in order to clarify this finding.

03-11-2006, 02:24 PM
So basically this is a nice, statistical way to say Womack should be released and Freel should start @ 2B?

03-11-2006, 04:37 PM
So basically this is a nice, statistical way to say Womack should be released and Freel should start @ 2B?

Among other things. :) Unfortunately, with spring training performances overemphasized like they are, it looks like Womack's going to win that job. :bang:

Some additional points on the surprisingly poor impact the #3 hitter, based on this article (http://www.hardballtimes.com/main/article/constructing-lineups/) from Hardball Times, which cites a different source that also finds that the #3 hitter should not be your best hitter (it recommends fifth-best):

* When all your good hitters are clustered together, you essentially restrict the times you score to those innings in which those players hit in the same inning. If you distribute your better hitters around the lineup a bit more, you should score more runs because you take advantage of those times when the weaker hitters do get on base.
* The 2nd inning tends to be the lowest scoring inning across teams; in fact, even though the 1st inning is the highest scoring, the combined average scoring of the 1st and 2nd innings is lower than any other pair of innings. One possible reason for this is that the #3 hitter is usually a high OBA guy, and he is the least likely individual to lead off the 2nd inning.
* A study found that the #3 hitter is the most likely position to come up with two outs and nobody on base among the 1-5 hitters. Therefore, hits from this position result in fewer runs than those other lineup positions. [I'm not sure about how important this one is--I think this might be the result of managers putting high speed, low OBA guys (Womack) or good contact, low OBA guys (Aurilia) at the top of the lineup].

Anyway, I'm still not completely comfortable with these ideas, but I'm giving them a go in a baseball simulator game I like to play (out of the park baseball (http://www.ootpdevelopments.com/ootp/)). Playing through some games, these lineups have actually worked out fairly well. I really like having a decent hitter in the #9 slot, far more than I'm bothered by having the pitcher hit 8th. And the average hitter in the #3 slot hasn't caused many problems... I do like how this helps you have good hitters available in most innings, which gives a more consistent chance to score runs. Later on, I'm going to do some full-season simulations using traditional vs. new lineups and see if that does result in a difference in runs scored. It's probably not as good a simulator as something like Diamond Mind, but it's reasonably sophisticated and should at least be interesting to try out.

03-11-2006, 05:39 PM
Krono just our OOTP league, Pioneers of the Diamond. You can check us out at www.potd.simleaguecentral.com . Cyclone792 is the Asst. Commish. I'll be running some tests with that lineup tool to see how effective it can be as well with v6.5!

06-01-2006, 11:43 AM
I'm giving this thread a bump because of a new thread arguing that Junior should no longer be batting third in the lineup.

06-01-2006, 11:50 AM
I can't wait to see results on this, especially with the different "possibilities." I'm too busy to play with it right now. Who's going to do the dirty work?

06-01-2006, 02:21 PM
Well the lineup that I put in the system was:

The website says that this lineup will average 4.793 runs per game (776 runs for the season). The new more productive lineup the site suggests is:

Hatteberg (30)
Kearns (9) (Dunn 8, Phillips 5, Encarnacion 8)
Phillips (8) (Encarnacion 7, LaRue 8, Lopez 5 Kearns 2)
Dunn (20) (Kearns 9, Encarnacion 1)
Encarnacion (11) (Kearns 7, Phillips 10, Dunn 2)
Griffey (23) (LaRue 6, Phillips 1)
LaRue (11) (Phillips 6, Griffey 7, Encarnacion 3, Kearns 3)
Arroyo (30)
Lopez (25) (LaRue 5)

It says that this lineup will produce 5.15 runs per game (834 runs total) so 58 more runs than the original lineup I used.
There are two models on the website and the second model has a much different lineup than the first:


The site says that this lineup will average 5.07 runs per game (821 total).

I don't know how much stock I put into this but it is interesting. It would be fun to see a manager use this model to make his lineup for the year just to see if it works....I'm not sure if I'd want it to be Narron but it would definetly be cause for talk.