PDA

View Full Version : Markov on the Reds' Lineups



JinAZ
04-13-2008, 03:22 AM
Hi folks,

First, just wanted to thank BaseClogger for nominating me for ORG, and everyone else who helped vote me in. I'm pleased to have a chance to interact with you folks without having to have somebody post for me by proxy! :)

I had a thread over at SunDeck about a lineup project that I've been undertaking using John Beamer's Markov Chains model. That thread can be found here (http://www.redszone.com/forums/showthread.php?t=66651).

I've now released the results of my initial foray into this stuff on my blog (http://jinaz-reds.blogspot.com/2008/04/markov-dusty-bakers-lineups-arent-half.html) (I'd cross-post here, but I don't think the tables would translate well). The results were pretty surprising to me, given my prior understanding of lineup construction.

Anyway, I can't post to my old thread anymore given my status change here, so I thought I'd open this thread in case anyone had comments. In some ways, I'm not sure I completely believe the results myself, but even so they're intriguing.
-j

cincyinco
04-13-2008, 03:53 AM
Crazy good stuff and welcome to org! Ill be putting some of this to the test in my sim league. :)

Redhook
04-13-2008, 07:15 AM
Great work! That was fun to read.

RFS62
04-13-2008, 08:17 AM
Very interesting read.

Welcome aboard.

:beerme:

Highlifeman21
04-13-2008, 09:27 AM
IIRC, didn't PECOTA give some unrealistically high projections for Bruce, which would have therefore skewed the Markov results for the "best" lineups?

I think down the road Bruce should definitely be included on any "best" lineup, but for the time being, IMO he needs more time in AAA.

2009, however, is a completely different story.

JinAZ
04-13-2008, 09:59 AM
IIRC, didn't PECOTA give some unrealistically high projections for Bruce, which would have therefore skewed the Markov results for the "best" lineups?

I think down the road Bruce should definitely be included on any "best" lineup, but for the time being, IMO he needs more time in AAA.

2009, however, is a completely different story.

PECOTA does like Bruce quite a bit, with an 0.850ish OPS. Whether folks think that's too high is open to debate. If it is too high, then yeah, it's going to change the performance of the lineups that include him. Not all of the top organizational lineups include him--that was up to those who submitted lineups.

From the perspective of trying to understand lineups (and not just the Reds' lineup), though, it doesn't really matter. Substituting a PECOTA Bruce for a PECOTA Patterson just represents a case where a better player is substituted for a poorer player, so we can see the effects of that on lineup performance.
-j

RedsManRick
04-13-2008, 10:19 AM
Wow. I sucked. Time to bite my tongue about lineups I guess...

RedlegJake
04-13-2008, 10:31 AM
Welcome aboard Jinaz! I like your blog, and your post on SunDeck, it'll be nice having you in the mix here.

Now my comments on the thread topic. So basically this validates Baker's use of the players he has available to him, which I think is fair. Also, the Bruce for CP stuff has died down a bit and if PECOTA is accurate, Bruce for CP would be a small uptick in team performance. What I wish I saw was Bruce in place of Griffey although the PECOTA numbers for Junior may be high, too, given his age and his start.

Also how about plugging in some variants with players available from the RH side when facing lefties -a look at how Dusty might construct alineup aginst lefties. Can the Markov model isolate performance vs. L-R? I really don't know, Jinaz. Can you answer that?

As much as it is getting old to hear the fact is it takes a good offense to leave so many guys on base, and key hits will start falling in.

GAC
04-13-2008, 10:36 AM
Welcome aboard and thanks for the read. Good stuff.

This immediately caught my eye....


Think about that. Baker's lineup violates one of the biggest "rules" for lineup construction that us stat people harp on--his leadoff hitter is projected to have a miserable 0.307 OBP this season. And yet, the interactions between players in his lineup are such that his lineup results in more wins per season than most other variants...at least, according to Markov. My own lineups, which I designed based largely on the lineup chapter in The Book, rated as a fairly middle-of-the-pack lineup, and came out a good 17 runs (~1.5 wins) behind Baker's model. And some of the user-submitted lineups, which look very reasonable to my eye, came out more than 30 runs per season behind Baker's. Again, "wow."

This team is getting on base. The Reds are 4th in MLB in OB% (.347). Where they are really getting hurt is SLG% (.399 - 17th). With RISP, they carry a B/A of .248, and SLG% of .386.

My biggest concern is where Dunn is batting. I just don't think you're maximizing the guy's ability/performance batting him 5th. He is at the top of MLB (or was) in drawing walks. He is struggling right now; but could that be because of where he is hitting and seeing who is behind him?

This team got 14 hits last night, yet over the last two games have stranded 23.

I think they are just suffering through an early season slump, some tough luck, and where some guys are starting to press.

pahster
04-13-2008, 10:45 AM
IIRC, didn't PECOTA give some unrealistically high projections for Bruce, which would have therefore skewed the Markov results for the "best" lineups?

You're thinking of Bill James' projection for Bruce. It was something like a .990 OPS.

westofyou
04-13-2008, 11:17 AM
You're thinking of Bill James' projection for Bruce. It was something like a .990 OPS.



PECOTA - .269/.336/.512 - 584 ab's

Bill James - .308/.363/.602 - 143 ab's

It's nice to have a player whose age comps are Kaline and Ott, but dialing it back a bit would be my approach.

Mario-Rijo
04-13-2008, 06:12 PM
Welcome aboard and thanks for the read. Good stuff.

This immediately caught my eye....



This team is getting on base. The Reds are 4th in MLB in OB% (.347). Where they are really getting hurt is SLG% (.399 - 17th). With RISP, they carry a B/A of .248, and SLG% of .386.

My biggest concern is where Dunn is batting. I just don't think you're maximizing the guy's ability/performance batting him 5th. He is at the top of MLB (or was) in drawing walks. He is struggling right now; but could that be because of where he is hitting and seeing who is behind him?

This team got 14 hits last night, yet over the last two games have stranded 23.

I think they are just suffering through an early season slump, some tough luck, and where some guys are starting to press.


I don't think there is any doubt to this being an issue. IMO the lineup should have Dunn batting ahead of guys who are nearly as dangerous when they do swing the bat but who are more likely to swing the bat. The problem is Jr won't be moved down in the lineup due to politics. So really the only other option would be to flip flop Phillips and Dunn which could also cause a bit of a stir. I also think this could take some of the pressure off of EE because he and Dunn are struggling and back to back which puts more pressure on EE because he feels like the last line of defense. If you put BP between them it protects Dunn which should help him and if BP doesn't take it as a demotion should help take the pressure of EE. Hope that made some sense.

I think the lineup(s) should look more like this (Justin please feel free to use these in the model).

(Using the OD starters this would have been my OD lineup, surprisingly no one used this lineup)

Kepp
Hatteberg
Dunn
Phillips
Jr
Encarnacion
Patterson
Valentin

(Using all MLB ava. players)

Hopper
Kepp
Dunn
Phillips
Jr
Encarnacion
Votto
Valentin

(If Bruce were ava. too me)

Kepp
Votto
Dunn
Phillips
Jr
Encarnacion
Bruce
Valentin

Oh Yeah and welcome aboard JinAz!

JinAZ
04-13-2008, 06:27 PM
Now my comments on the thread topic. So basically this validates Baker's use of the players he has available to him, which I think is fair. Also, the Bruce for CP stuff has died down a bit and if PECOTA is accurate, Bruce for CP would be a small uptick in team performance. What I wish I saw was Bruce in place of Griffey although the PECOTA numbers for Junior may be high, too, given his age and his start.


One thing that isn't identified in a lot of evaluations of Griffey is his impact via baserunning and throwing. His speed is such that his "arm" ratings (i.e. ability to prevent baserunner advancement) is terrible, as is his ability to advance around the bases on hits. FWIW, MGL was forecasting Griffey as a below-replacement level player this past offseason based on those numbers and his miserable range. I'm not sure I agree (I had him at 15-25 runs above replacement if I remember right, depending on what fielding and arm stats you use), but it probably is something worth tracking more carefully in his case.



Also how about plugging in some variants with players available from the RH side when facing lefties -a look at how Dusty might construct alineup aginst lefties. Can the Markov model isolate performance vs. L-R? I really don't know, Jinaz. Can you answer that?


In terms of the model itself, it should be fairly "easy" to do left vs. right lineups with this model, as you'd just use different input data--projected left/right splits for each counting stat for each player.

Coming up with good input data could be tricky, though, as none of the major projection engines provide left/right splits. We could probably Marcel it and assume that all players of each handedness have the same left/right splits (which is usually pretty safe, especially for right-handers)...but that isn't really something that I can do right now time-wise.
-j

Mario-Rijo
04-13-2008, 06:47 PM
One thing that isn't identified in a lot of evaluations of Griffey is his impact via baserunning and throwing. His speed is such that his "arm" ratings (i.e. ability to prevent baserunner advancement) is terrible, as is his ability to advance around the bases on hits. FWIW, MGL was forecasting Griffey as a below-replacement level player this past offseason based on those numbers and his miserable range. I'm not sure I agree (I had him at 15-25 runs above replacement if I remember right, depending on what fielding and arm stats you use), but it probably is something worth tracking more carefully in his case.

I also forecasted that, only just by the naked eye. I just could tell he was slipping in so many facets. It hasn't necc. come to fruition yet he has done ok to this point (and way better than at this point last year) but I suspect we will be screaming vehemently for Bruce to RF before the end of the season.

JinAZ
04-13-2008, 06:48 PM
Kepp
Hatteberg
Dunn
Phillips
Jr
Encarnacion
Patterson
Valentin


Markov says: 4.91 R/G, 796 Runs/season, +6.9 runs above Baker.
Innings each spot led off:


1 2 3 4 5 6 7 8 9
1.80 0.77 0.78 1.07 1.01 0.83 1.13 0.83 0.72

We have a new champion.. :)


(Using all MLB ava. players)

Hopper
Kepp
Dunn
Phillips
Jr
Encarnacion
Votto
Valentin

Markov says: 4.85 r/g, 785.7 r/season, -3.4 above Baker


1 2 3 4 5 6 7 8 9
1.77 0.76 0.78 0.95 1.04 0.85 0.88 0.85 1.06



(If Bruce were ava. too me)

Kepp
Votto
Dunn
Phillips
Jr
Encarnacion
Bruce
Valentin

Markov says: 5.07 R/G, 820.6 R/season, +31.5 runs above Baker.
Innings led off:


1 2 3 4 5 6 7 8 9
1.80 0.77 0.76 1.07 1.02 0.84 1.14 0.84 0.72

That's about 9 runs better than the next-best lineup submitted thus far. :)

One thing I might be seeing in your lineups is that when Dunn hits third, it might tend to result in the #4 hitter leading off fewer innings (thanks to his OBP?)...the latter is a major feature of your lineups, and might be a big part of the reason that they've been so successful in the model.

Of course, that's not always the case--Degenerate's OD lineup also had Dunn hitting third, and yet his lineup tied for the most innings the #4 hitter led off an inning of all the lineups I've run (and furthermore did poorly overall).


Oh Yeah and welcome aboard JinAz!

Thanks, both to you and to all the other welcome messages!
-Justin

*BaseClogger*
04-13-2008, 08:37 PM
Alright, I'll give this a try if you've got the time:

OD:
Keppinger
Dunn
Griffey
Encarnacion
Phillips
Hatteberg
Patterson
Valentin

Now:
Votto
Keppinger
Dunn
Griffey
Encarnacion
Phillips
Patterson
Bako

Soon:
Votto
Keppinger
Dunn
Griffey
Encarnacion
Phillips
Bruce
Ross

Glad to have ya JinAZ :thumbup:

Highlifeman21
04-13-2008, 08:40 PM
You're thinking of Bill James' projection for Bruce. It was something like a .990 OPS.

I knew James' projection was pure fantasy, but as WOY posted, PECOTA's projection was also rather optimistic.

Maybe James and PECOTA mixed up the years. 2009 Bruce might sniff those numbers, but he sure won't in 2008. Unless they both meant for Louisville.

mth123
04-13-2008, 09:58 PM
I'm generally for moving Dunn ahead of Phillips, but one thought.

Dunn behind Phillips allows Phillips to see better pitches which somewhat limits the negative of Phillips free swinging ways. If Phillips was being thrown junk all day, I wonder if he'd swing at a lot of it and get himself out even more often. With Dunn behind Phillips, its possible that the junk he sees is lessened and his free swinging makes more solid contact than it otherwise would. Dunn, on the other hand, is selective enough to avoid getting himself out by swinging at junk. I think Dunn hitting in front of Phillips would improve Dunn's production a bit, but I'm guessing it would be more than offset by the drop in Phillips production. I wonder if Phillips ahead of Dunn doesn't provide the best combined production even if Dunn is impacted negatively.

Mario-Rijo
04-13-2008, 11:16 PM
I'm generally for moving Dunn ahead of Phillips, but one thought.

Dunn behind Phillips allows Phillips to see better pitches which somewhat limits the negative of Phillips free swinging ways. If Phillips was being thrown junk all day, I wonder if he'd swing at a lot of it and get himself out even more often. With Dunn behind Phillips, its possible that the junk he sees is lessened and his free swinging makes more solid contact than it otherwise would. Dunn, on the other hand, is selective enough to avoid getting himself out by swinging at junk. I think Dunn hitting in front of Phillips would improve Dunn's production a bit, but I'm guessing it would be more than offset by the drop in Phillips production. I wonder if Phillips ahead of Dunn doesn't provide the best combined production even if Dunn is impacted negatively.

I disagree or at least see it differently. Phillips is more free swinging true but his ability to make more consistent contact than Dunn helps him IMO. Sure he still needs to improve his selectivity but IMO his bat speed allows him more time than Dunn to "catch up" and thus harder to strike out if he get's more selective and lays off that stuff off the plate.

Jpup
04-13-2008, 11:28 PM
vs. R

Keppinger
Votto
Dunn
Jr.
Encarnacion
Phillips
Valentin
Patterson

vs. L

Hopper
Keppinger
Phillips
Dunn
Encarnacion
Jr.
Votto
Bako

*BaseClogger*
04-13-2008, 11:30 PM
vs. R

Keppinger
Votto
Dunn
Jr.
Encarnacion
Phillips
Valentin
Patterson

Curious--why unecessarily bat three lefties in a row when you could flip Votto and Keppinger?

Jpup
04-13-2008, 11:32 PM
Curious--why unecessarily bat three lefties in a row when you could flip Votto and Keppinger?

Because it I believe Keppinger is the better hitter and get's on base more right now and I would rather get him more at bats. I also think Votto is much more prone to hitting doubles than Kepp, which would give him the Reds a chance for more runs. I also think the notion of splitting up the lefties is silly.

*BaseClogger*
04-13-2008, 11:33 PM
Because it I believe Keppinger is the better hitter and get's on base more right now and I would rather get him more at bats. I also think Votto is much more prone to hitting doubles than Kepp, which would give him the Reds a chance for more runs. I also think the notion of splitting up the lefties is silly.

:thumbup:

JinAZ
04-14-2008, 12:31 AM
OD:
Keppinger
Dunn
Griffey
Encarnacion
Phillips
Hatteberg
Patterson
Valentin


4.84 R/G, 783.5 R/season, -5.6 runs below Baker.



Now:
Votto
Keppinger
Dunn
Griffey
Encarnacion
Phillips
Patterson
Bako


4.49 R/g, 727.9 R/season, -61.2 runs below Baker.

Keep in mind that Bako has an absolutely horrendous PECOTA projection: 0.219/0.303/0.285. Hard for the model to project many runs scored with a hitter like that in the lineup.



Soon:
Votto
Keppinger
Dunn
Griffey
Encarnacion
Phillips
Bruce
Ross


4.78 R/g, 774.5 r/season, -14.6 runs below Baker.

I have no idea why this lineup didn't perform better.



vs. R

Keppinger
Votto
Dunn
Jr.
Encarnacion
Phillips
Valentin
Patterson


4.85 R/g, 786 R/season, -3.1 runs above Baker.



vs. L

Hopper
Keppinger
Phillips
Dunn
Encarnacion
Jr.
Votto
Bako


4.61 R/g, 746.7 R/season, -42.4 runs above Baker.

Again we see the Bako effect. Hopper's PECOTA projection is pretty low too. Also, this model has no clue about lefty/righty matchups, so there's a way in which this kind of lineup isn't tested very well by the model.
-j

reds44
04-14-2008, 12:40 AM
Try this one:
Keppinger
Encarnacion
Dunn
Griffey
Phillips
Votto
Patterson
Ross

JinAZ
04-14-2008, 12:41 AM
MGL is skeptical about the usefulness of the Markov model I'm using.

He offered to run (and I gladly accepted) some of the lineups in his own personally-designed simulator, which includes some additional information about baserunning and handedness. He also is using his own projections, rather than PECOTA's. Here's his post...lots of interesting points in there, so I'm reposting it here:


Using my sim, I ran each lineup 100,000 times at home (neutral stats adjusted for home field advantage) in a neutral park against a neutral, league-average (neither RH nor LH) pitcher, using my projections for each player. My sim includes baserunning, GIDP, etc., so it is pretty much all encompassing. The standard deviation of runs per game for one team in 100,000 games is .009. So these numbers are plus or minus .018 runs at 2 sigma (with a 95% confidence interval).

Baker: 4.560
Baker with Votto rather than Hatteberg: 4.646 rpg +13.9
Your #1: 4.604 +7.1
Your #2: 4.618 +9.4
Your #3: 4.615 +8.9
Your worst: 4.626 +10.7
Your 3rd worst: 4.620 +9.7
Your 4th worst: 4.549 +1.8
Jinaz-OD: 4.588 +4.5
Jinaz-OD-exploit: 4.579 +3.1

I re-ran each lineup at home at GABP, rather than a neutral park:

Baker: 4.640
Baker with Votto rather than Hatteberg: 4.763 +19.9
Your #1: 4.706 +10.7
Your #2: 4.736 +15.6
Your #3: 4.758 +19.1
Your worst: 4.729 +14.4
Your 3rd worst: 4.731 +14.7
Your 4th worst: 4.672 +5.18
Jinaz-OD: 4.679 +6.3
Jinaz-OD-exploit: 4.690 +8.1

Let me say a couple of things: One, Dusty’s lineup is one of the worst you can put out there, as you can see from the above, based on my projections and my sim. You really have to make an effort to do as badly as Dusty.

I have much more confidence in a comprehensive sim than a “dry Markov chain.” In fact, I think that using a Markov chain that does not include handedness, baserunning, etc., is a waste of time for evaluating lineups.

Two, the Reds have a roughly average first-string lineup, despite what you often hear about them having a very good one or even a great one. And of course, the defense is awful, as long as Griff, Dunn, and Phillips are out there.

Three, can we stop saying that Griffey is a “great hitter.” He is not anymore. Not even close. He is a below-average hitting corner outfielder. With his terrible defense and baserunning, he is near replacement level. One of the worst overall players in baseball. Possibly the worst full-time player. Has been for a few years.

Four, Baker’s (or whoever makes those decisions) biggest mistake is playing Hatteberg over Votto. I don’t know about their defense, but Votto is almost a win and a half better with the bat than Hatty. If Hatty is a better defender, it probably is not more than a win, unless Votto is a DH-like entity, awful with the glove. And of course Hatty cannot run the bases a lick. I don’t know about Votto.

Five, the Reds lineup is quite balanced, as compared to many or even most, so that it does not make that much difference who you put where, as you can see from the above. As long as Keppy, Griffey, Dunn and Encarnacion are near the top or middle of the lineup, you are fine. And no one is that bad that they can’t pretty much bat anywhere, although Valentine being the worst and the slowest should probably bat last in any lineup.

Six, just eyeballing the above Zips projections, my projections are quite a bit different. I have, in a neutral setting, something like, in wOBA, Dunn, .386, Encarnacion, .368, Keppinger, .353, Griffey, .348, Hatty, .338, Phillips, .338, Patterson, .338, Valentine, .323.

Here's my response:


Hi MGL,

Thanks for the quick work!

It is a bit unnerving to see how different your results are. You absolutely could be right about the importance of the handedness and baserunning details that your simulator takes into account. I do wonder how much your final point about the differences between your projections and the PECOTAs I used might also be coming into play though. Looks like the big differences were on Keppinger, Hatteberg, and especially Patterson. As you say, the Reds' lineup is fairly balanced, so differences like that could result in big differences in the rank order of lineups. Of course, the Patterson difference should just help Dusty's case, and he clearly got creamed in your sim.

As for your other points, I generally agree (though I still think that UZR must be missing low with Phillips given how he does with the Fans, PMR, and RZR...but we've had that conversation already!). A point I've made a few times is that if the Reds are going to contend, they're going to need surprises from both their offense and defense. And the only way they'll get surprises from their offense is if they play high-upside players like Jay Bruce and Joey Votto over known quantities like Patterson and Hatteberg.

FWIW, Votto does have pretty good speed for a first-baseman (perhaps average overall?), though reviews of his glove have been a bit mixed. I'm just assuming that he's an average defender for now. Hatteberg's been all over the place from year to year defensively, but I think he's at least not terrible.

So there you have it. :)
-Justin

JinAZ
04-14-2008, 12:45 AM
Try this one:
Keppinger
Encarnacion
Dunn
Griffey
Phillips
Votto
Patterson
Ross

Ok, last one for the night! :)

4.89 r/g, 791.6 r/season, +2.5 runs above Baker

Ok, I gotta go work on my lecture for wednesday night now. :)
-j

reds44
04-14-2008, 12:51 AM
Ok, last one for the night! :)

4.89 r/g, 791.6 r/season, +2.5 runs above Baker

Ok, I gotta go work on my lecture for wednesday night now. :)
-j
Winner!

Far East
04-14-2008, 09:12 AM
Baseballmusings.com figures that Dusty's opening day lineup would average 4.353 runs per game.

Using those same players, the website found its best lineup to be the following, which should score 4.789 runs per game:

Keppinger, Dunn, Valentin, Griffey, Edwin, Patterson, Phillips, Harang, Hatteberg.

Have you tried any lineups that use the pitcher at #8?

JinAZ
04-14-2008, 10:54 AM
Baseballmusings.com figures that Dusty's opening day lineup would average 4.353 runs per game.

The problem with the baseballmusings system is that it is based on regression coefficients for OBP and SLG at each lineup position, and therefore cannot account for interactions among lineup spots like the Markov will. I just don't put much stock into that lineup tool, unfortunately. I can try to run that lineup tonight, though.

There are some lineups in the sample that have the pitcher hitting 8th. None did particularly well, though that's probably for other reasons. The Book indicated that this strategy is worth ~2 runs per season on average IIRC, so it's not that big of a deal. It works best when the new #9 hitter gets most of his offensive value from his OBP, like in the case of Jason Kendall with the Brewers. Our #8/#9 hitters tend to be guys with more power than on-base ability (e.g. Valentin, Patterson, Gonzalez, etc), and the variations I've tried with them indicate that we're often best served having them in the 8-hole to leverage that power with men on base.
-j

RedsManRick
04-14-2008, 11:08 AM
The problem with the baseballmusings system is that it is based on regression coefficients for OBP and SLG at each lineup position, and therefore cannot account for interactions among lineup spots like the Markov will. I just don't put much stock into that lineup tool, unfortunately. I can try to run that lineup tonight, though.

There are some lineups in the sample that have the pitcher hitting 8th. None did particularly well, though that's probably for other reasons. The Book indicated that this strategy is worth ~2 runs per season on average IIRC, so it's not that big of a deal. It works best when the new #9 hitter gets most of his offensive value from his OBP, like in the case of Jason Kendall with the Brewers. Our #8/#9 hitters tend to be guys with more power than on-base ability (e.g. Valentin, Patterson, Gonzalez, etc), and the variations I've tried with them indicate that we're often best served having them in the 8-hole to leverage that power with men on base.
-j

I was one of the one who suggested batting the pitcher 8. This was with Hopper or Freel batting 9th and with the assumption that Bruce wasn't on the roster.

nate
04-14-2008, 11:34 AM
Interesting. Although I can't claim I fully understand all of the numbers behind the system.

Can the simulation generate the highest scoring lineup or is it simply a matter of feeding different lineups into it until one is crowned "king"?

CaiGuy
04-14-2008, 01:14 PM
I have a new one:

Dunn
Encarnation
Griff
Keppinger
Votto
Phillips
Patterson
Ross (or catcher of the day)
P

While it is mostly for fun, I think that it could perform well.

And it has a L/R/L/R... split! :)

JinAZ
04-14-2008, 02:26 PM
Interesting. Although I can't claim I fully understand all of the numbers behind the system.

You mean how the numbers in the results, or just how it works?

The results are just a) expected number of runs scored per game (or season), and b) how many times (on average) a batter in each lineup spot leads off per inning per game. So 1.0 means that, on average, a batter leads off exactly once an inning each game.

How it works? It's also fairly simple...basically, you start with the first batter and figure out all the different things that could happen with that batter--out, single, double, homer, walk, etc. Each of those becomes a different path, with a probability attached to them based on the data you input for the batter. Then, going down each path, what are all possible events that might occur with the next batter (given where the first batter ended up): GDP, out without advancement, out with advancement, single, double, etc. Each of those becomes a new path as well. Eventually, each path (or chain) will end in a 27th out (or they become so improbable that they no longer matter to the results--e.g. 10 consecutive home runs, or 30 consecutive singles), and most of them will result in some number of runs scored.

The Markov then basically just figures out how many runs were scored down each path. It then computes the average numbers of runs scored down each path, weighted by the probability of each path occurring.

Anyway, I hope that helps.


Can the simulation generate the highest scoring lineup or is it simply a matter of feeding different lineups into it until one is crowned "king"?

It is probably possible to create something that will walk through all 300,000+ lineups per 9 players and spit out The Best lineup. But as it is, you just feed it lineups one at a time.
-j

REDREAD
04-14-2008, 04:35 PM
MGL is skeptical about the usefulness of the Markov model I'm using.


The beauty of doing a Markov chain is that it is excellent for relative comparisons under the same assumptions. For example, Patterson has an average OBP of X across all different pitching. With a Markov chain, you can calculate the results of what the expected value would be over an infinite number of attempts. I don't know how the tool you used works specifically, but I remember cranking out Markov chains in grad schools.

Simulation based results put you at the mercy of the computer's random number generator. IMO, it's just not as precise.

Regardless, it looks like in his sim when Votto is substituted for Hat, it looks like Dusty does quite well. Dusty looks bad largely because he uses Hat.
Now, this is probably because the sim predicts Votto to be much better than Hat.. which of course is an unknown as the prediction for both players is a guess/estimate. If you truly want to criticize Baker's lineup skills, it's much more fair to restrict yourself to the same players (or sub Votto for Hat). Because it's easy to make Dusty look like an idiot if Votto's projection is much greater than Hat's projection. In fairness, the guy that did the sims did display this data, which I appreciate. I'm not accusing him of doing anything deceptive, just making a point.

If you are a real geek about this math, do a Google search about the stablity of the early Aloha computer network. Markov chains proved that it was an unstable network protocol, and this information was key when they designed the Ethernet Collision Avoidance/Detection and rebroadcast alogrithms for network systems..

JinAZ
04-14-2008, 04:51 PM
The beauty of doing a Markov chain is that it is excellent for relative comparisons under the same assumptions. For example, Patterson has an average OBP of X across all different pitching. With a Markov chain, you can calculate the results of what the expected value would be over an infinite number of attempts. I don't know how the tool you used works specifically, but I remember cranking out Markov chains in grad schools.

Simulation based results put you at the mercy of the computer's random number generator. IMO, it's just not as precise.

I agree that methodologically, the Markov seems like the better approach for the reasons you describe. However, it may be that MGL's sim is better in this case because it uses additional information. MGL's system includes hitter-adjusted differences in baserunning, as well as differences in handedness. The Markov model I'm using includes detailed baserunning information, but aside from stolen bases the rates of advancement, etc, are generalized across all players (I think). It also has no idea about handedness, which is an important thing.

Also, FWIW, the fact that MGL repeated each of his sims 100,000 times and reported the average result gets around the problems of the random number sampling (assuming no bias in his generator).


Regardless, it looks like in his sim when Votto is substituted for Hat, it looks like Dusty does quite well. Dusty looks bad largely because he uses Hat.
Now, this is probably because the sim predicts Votto to be much better than Hat.. which of course is an unknown as the prediction for both players is a guess/estimate. If you truly want to criticize Baker's lineup skills, it's much more fair to restrict yourself to the same players (or sub Votto for Hat).


I think that generally, that's what he did. It's also why I dedicated the first part of my write-up to only investigating variation in the opening day lineups, relegating the differences in personnel to the second section. The only time MGL included Votto is in the "Baker with Votto rather than Hatteberg" sims. All the other lineups were restricted to using Baker's opening day players.



If you are a real geek about this math, do a Google search about the stablity of the early Aloha computer network. Markov chains proved that it was an unstable network protocol, and this information was key when they designed the Ethernet Collision Avoidance/Detection and rebroadcast alogrithms for network systems..

Cool, I'll have to look that up at some point when I have a chance. :)
-j

REDREAD
04-14-2008, 04:58 PM
I agree that methodologically, the Markov seems like the better approach for the reasons you describe. However, it may be that MGL's sim is better in this case because it uses additional information. MGL's system includes hitter-adjusted differences in baserunning, as well as differences in handedness. The Markov model I'm using includes detailed baserunning information, but aside from stolen bases the rates of advancement, etc, are generalized across all players (I think). It also has no idea about handedness, which is an important thing.


Ok, I see. However, with more work, you could incoporate additional states to take this into account. Naturally, this would be quite a bit of more work.. But you could add a state for sac bunt, stolen base, picked off, etc..






Also, FWIW, the fact that MGL repeated each of his sims 100,000 times and reported the average result gets around the problems of the random number sampling (assuming no bias in his generator).


Ok, I missed that as I read too fast. That certainly lessens the effect of the simulation method.




The only time MGL included Votto is in the "Baker with Votto rather than Hatteberg" sims. All the other lineups were restricted to using Baker's opening day players.


Didn't realize that.. I assumed that at least some of the non-Baker lineups had Votto in them.

Obviously, the big weakness in both of these methods is that's it's impossible to know exactly how these players will perform. Guys like Votto and EdE are hard to project, IMO. Guys like Dunn and Castro are much easier to project.

JinAZ
04-14-2008, 07:38 PM
Ok, I see. However, with more work, you could incoporate additional states to take this into account. Naturally, this would be quite a bit of more work.. But you could add a state for sac bunt, stolen base, picked off, etc..


It actually does have these states. The big thing that MGL's sim has that the Markov doesn't is consideration of lefty vs. righty batters, and how those come into play vs. pitcher usage (especially the lefty relievers). It also doesn't allow the rate at which runners advance around the bases to vary among different baserunners (except via the stolen base). I'm not sure how big of a deal those things are, to be honest...

Eventually, John Beamer might modify his model to include those kinds of things. But I'm certainly not going to try, at least not now! :)


Obviously, the big weakness in both of these methods is that's it's impossible to know exactly how these players will perform. Guys like Votto and EdE are hard to project, IMO. Guys like Dunn and Castro are much easier to project.

Well, all I can say to that is that the PECOTA projections I'm using have routinely been among the best performing (i.e. most accurate, by a variety of measures) projections in the projection round-ups over the last several years. ZiPS and CHONE also do well, but PECOTA's more consistently near the top. But yeah, the projections on EDE and (especially) Votto have more uncertainty attached to them than folks with longer track records at the MLB level. So from the standpoint of saying what is the best lineup for the 2008 Cincinnati Reds, you're right that any estimate has error bars attached to it.

Still, there's a sense in which that doesn't matter from the standpoint of trying to figure out something about lineup construction in general. These just represent a certain set of players with which we can futz. I could just as easily have used 2007 Cincinnati Reds data, or 2007 averages at each lineup position for the NL, or 1997-2007 averages at each lineup position for the NL. Different datasets might require different lineup styles to be optimal, but we're just trying to lay some foundation work here. ;)
-j

REDREAD
04-14-2008, 11:03 PM
I agree with you JinAz that it's a good tool to use.. Especially when rearranging players.
If you are off by a little bit on EdE for example, that's not too much of an impact if EdE is in every lineup.

I certainly wasn't implying that there's a better way to project the numbers. I'm just saying it was hard. If someone could accurately project guys like Votto and EdE 100% of the time, obviously they'd be very rich.