There has been a conversation going on in the Jamie Garcia MRI thread regarding the validity of pitcher WAR. Most of this was posted over there, but I felt it probably deserved its own thread so that more people would see it and follow-up comments could be made without taking that one further off-topic.
Many of us are inclined toward using FIP when discussing pitcher performance. But poster cincinnati chili correctly pointed out linking WAR to FIP can be tough to swallow when we have a guy's runs allowed so readily available. Run prevention has a lot to do with the pitcher's strikeouts, walks and homers, but it's hardly the whole picture. Pitchers can attack hitters in ways that encourage double plays. They can pick off runners. They can ramp up the velocity a bit or increase their focus. Defense matters too. And all of the things players control aside, there's some degree of plain old timing "luck" -- if that triple had come before fly out to deep RF instead of after, it would have produced a run.
So how do we measure pitcher performance at the level of runs (WAR)? There are two popular methods. Fangraphs has traditionally uses a FIP-based pitching WAR. In part this was because defense was already being accounted for in position player's WAR, so they didn't want to double count. Baseball-Reference, on the other hand, uses a WAR based on actual runs allowed, RA9-WAR.
FIP-WAR plays it on the extreme "safe" side, only crediting the pitcher for what we know he's responsible for and assuming average influences from everything else. RA9 goes to the other extreme of giving the pitcher credit for everything that results in actual runs being allowed, including his team's defense and luck.
The reality of pitcher responsibility lies somewhere in the middle; it's a trade-off either way. Sabermetrics-guru TangoTiger's rule of thumb is to take the average of the FIP-WAR and RA-WAR. But recently, Fangraphs rolled out some a few stats to help us better navigate this middle ground. Instead of just having their FIP-WAR, they added RA9-WAR and they added stats that use BABIP and LOB% to help explain the difference between the FIP and RA9 WAR totals.
The conceptual framework: RA9-WAR = FIP-WAR + BIP-Wins + LOB-Wins.
FIP-WAR: Pitcher WAR using FIP-projected runs allowed
RA9-WAR: Pitcher WAR using actual runs allowed
BIP-Wins: The difference between RA9-WAR & FIP-WAR explained by BABIP.
LOB-Wins: The difference between RA9-WAR & FIP-WAR explained by LOB%.
They also show FDP-Wins, which is just the two new pieces added together (the difference between RA9-WAR and FIP-WAR, if you prefer).
Let's take Johnny Cueto as an example, who has a been poster boy for this issue around here:
For his career, he's put up 12.8 WAR using FIP. But using RA9, he's put up 18.4. That's an entire win per season! The new stats help us understand that the difference between his FIP & RA9 can be explained by equal parts BABIP "luck" and LOB% "luck". Interestingly though, some years it's been more one than the other.Code:Season Team RA9-WAR BIP-Wins LOB-Wins FDP-Wins WAR 2008 Reds 1.1 -0.1 0.0 -0.1 1.2 2009 Reds 1.7 0.2 0.3 0.4 1.3 2010 Reds 3.6 0.2 0.8 1.0 2.6 2011 Reds 4.3 1.8 0.0 1.7 2.6 2012 Reds 6.2 -0.1 1.7 1.6 4.5 2013 Reds 1.6 0.8 0.1 0.9 0.6 Total - - - 18.4 2.8 2.8 5.6 12.8
This is interesting. But it doesn't do anything to answer the question, "which is more right"?
I thought that maybe a closer look at the data would help. I looked at all qualified SPs from 2011 to 2013, 139 pitchers. They all had at least 300 IP, but ranged from 300 to over 700. So that I could compare them fairly, I normalized the stats to "per 200 IP".
(A quick but important aside, we have to keep in mind this is not a representative sample of all pitchers in professional baseball. This is a sample of the best of the best professional baseball has had to offer. So if I say that pitchers can't control their LOB%, I don't mean that your or I would have a 75% LOB% if we stepped on a major league mound. I mean that pitchers are good enough to keep a job as a major league SP don't differ much in their ability to sustain a LOB% above or below league average. This is important because it's where our intuitions are prone to get confused.
On to the data! I first looked at the distribution of BIP-Wins, LOB-Wins, and those two combined as FDP-Wins to get a lay of the land.
Code:Min Max Avg SD BIP-Wins -1.9 2.2 0.1 0.8 LOB-Wins -2.0 1.6 -0.2 0.6 FDP-Wins -2.7 2.6 -0.1 1.0
That's pretty straight forward. My groupings are somewhat arbitrary, so don't read too much in to a particular value; it's the basic shape that matters. As we could infer from the table above, about 80% of the guys are within 1 win on the two components and all but a few outliers are within 2 wins. BIP has a bit more variation than LOB, but they're pretty similar.
BIP-Wins and LOB-Wins are virtually uncorrelated (R-sq of .01). This actually surprised me a bit because I figured that LOB% was largely a function of getting more or less outs than expected, which is was a low BABIP measures. But I guess the timing "luck" tends to trump a few more or less outs overall. I guess that makes sense. What this means in practice is that a guy at the one extreme on BIP or LOB is just as likely to be in the middle of the other distribution as anybody else (The 2.2 BIP-Wins guy probably a close-to-average LOB-Wins guy).. This is why the distribution for the combined FDP-Wins is only a little bit wider than the other two instead of twice as wide.
[B]So, does this tell us conclusively whether not this variation is due to the player or teammates/luck? No. However, a random process will tend produce a normal distribution like we see here. Roll two die and you get a lot more 6s, 7s, and 8s (44%) than 2s and 14s (6%). By contrast, measures of elite talent tends to produce distributions that skew in one direction (among MLB regulars, the average is 2 WAR, but the elite go up to 10). That's a big hint for us. If BABIP & LOB% were mostly skill and had a reliable influence on pitcher effectiveness, we'd expect to see some outliers on the "really" good side and then a bunching up toward the bottom.
But let's assume for a second that these things are skills. What would these data tell us? Even if these variances are all due to skill, no regular MLB SP prevents or allows more than about 3 wins worth of runs per season due to non-FIP skills and no more than about 2 wins worth on either BABIP or LOB alone -- and most guys are pretty average.
In the other thread, this came up in the context of Cardinals SP Joe Kelly. Over the last two seasons as a part-time SP, Kelly has a 3.08 ERA (2.69 last year). But his FIP is a pedestrian 4.00. In his brief major league career, he has put up:
FIP-WAR/200 IP: 1.0
BIP-Wins/200 IP: -0.2
LOB-Wins/200 IP: 2.3
RA9-WAR/200 IP: 3.0
If Shelby Miller is healthy, Kelly is going to be back in the bullpen. The question is, were the Cards to get 3 WAR of run prevention from a 1 WAR starter or are they sending a 3 WAR starter to the bullpen for no good reason?
Let me first say, as I explained with Cueto, I don't think FIP is entirely right. Pitchers can and do sustainably prevent/allow runs in ways not captured by FIP. However, I know that both both defense and "luck" play non-trivial roles in run prevention. So if I'm crediting a pitcher for his contribution to the team or trying to project his future performance
Kelly wasn't in my sample because his IP were too low, but his 2.3 LOB-Wins would have outpaced by the field by far. Only 4 SP in the sample were north of 1.0: Tyler Chatwood (1.6), Kris Medlen (1.4), Ivan Nova (1.1) and Vance Worley (1.1). Not coincidentally, 3 of those 4 guys had among the lowest IP totals in the sample, where there is less time for the "lucky" aspect of things to average out.
On one hand, it's possible that Joe Kelly is the best SP in baseball at stranding base-runners, twice as good as everybody not named Chatwood or Medlen. But it's much more likely that Kelly (and Chatwood, Medlen and copmany) have been quite lucky. That's not an either/or proposition; guys with extreme performances are those that are good AND lucky.
But given what I've seen in the data and what we know about the Cardinals defense, if his FIP stays in the 4.00 range, I'd put good money that his RA9-WAR moving forward is going to much closer to his FIP-WAR than it has been so far.
For the record, I'm somewhat concerned about Cueto's RA9 too, because he's probably been good and lucky. But I'm more optimistic with Cueto than Kelly because he pitches in front of a stellar defense and controls the running game as well as any pitcher in baseball, which leads to more outs than his FIP suggests. Unlike Kelly, Cueto's true production probably is a good bit higher than his FIP-WAR suggests (though still shy of his RA9).
And because I was curious and others might be as well, I looked at a few other correlations.
This tells us a few pretty simple things:Code:R R-Sq FIP-WAR & RA9-WAR 0.80 0.64 FIP-WAR & BIP-Wins 0.12 0.01 FIP-WAR & LOB-Wins -0.04 0.00 RA9-WAR & BIP-Wins 0.55 0.31 RA9-WAR & LOB-Wins 0.33 0.11
1. FIP explains about 2/3 of RA-9. BABIP & LOB can explain a lot of the rest, with BIP having a bigger role.
2. FIP is basically not correlated to BIP or LOB. If those things are skills, they are unrelated to do with whatever ability a pitcher has to strike guys out, not walk them, and keep the ball in the yard.
If you want to come up with your own adjusted "ERA" that sits somewhere between FIP and RA9, I'd suggest taking TangoTiger's advice: split the difference. And if you're particularly suspicious of the guy's BABIP and/or LOB, split the difference with FIP again.