FIP vs. RA: Which for pitcher WAR? Why not both?

**RedsManRick** · 03-01-2014, 06:16 PM

There has been a conversation going on in the Jamie Garcia MRI thread regarding the validity of pitcher WAR. Most of this was posted over there, but I felt it probably deserved its own thread so that more people would see it and follow-up comments could be made without taking that one further off-topic.

Many of us are inclined toward using FIP when discussing pitcher performance. But poster cincinnati chili correctly pointed out linking WAR to FIP can be tough to swallow when we have a guy's runs allowed so readily available. Run prevention has a lot to do with the pitcher's strikeouts, walks and homers, but it's hardly the whole picture. Pitchers can attack hitters in ways that encourage double plays. They can pick off runners. They can ramp up the velocity a bit or increase their focus. Defense matters too. And all of the things players control aside, there's some degree of plain old timing "luck" -- if that triple had come before fly out to deep RF instead of after, it would have produced a run.

So how do we measure pitcher performance at the level of runs (WAR)? There are two popular methods. Fangraphs has traditionally uses a FIP-based pitching WAR. In part this was because defense was already being accounted for in position player's WAR, so they didn't want to double count. Baseball-Reference, on the other hand, uses a WAR based on actual runs allowed, RA9-WAR.

FIP-WAR plays it on the extreme "safe" side, only crediting the pitcher for what we know he's responsible for and assuming average influences from everything else. RA9 goes to the other extreme of giving the pitcher credit for everything that results in actual runs being allowed, including his team's defense and luck.

The reality of pitcher responsibility lies somewhere in the middle; it's a trade-off either way. Sabermetrics-guru TangoTiger's rule of thumb is to take the average of the FIP-WAR and RA-WAR. But recently, Fangraphs rolled out some a few stats to help us better navigate this middle ground. Instead of just having their FIP-WAR, they added RA9-WAR and they added stats that use BABIP and LOB% to help explain the difference between the FIP and RA9 WAR totals.

The conceptual framework: RA9-WAR = FIP-WAR + BIP-Wins + LOB-Wins.

FIP-WAR: Pitcher WAR using FIP-projected runs allowed
RA9-WAR: Pitcher WAR using actual runs allowed
BIP-Wins: The difference between RA9-WAR & FIP-WAR explained by BABIP.
LOB-Wins: The difference between RA9-WAR & FIP-WAR explained by LOB%.

They also show FDP-Wins, which is just the two new pieces added together (the difference between RA9-WAR and FIP-WAR, if you prefer).

Let's take Johnny Cueto as an example, who has a been poster boy for this issue around here:

Code:

Season	Team	RA9-WAR	BIP-Wins   LOB-Wins  FDP-Wins	 WAR	
2008	Reds	 1.1	 -0.1	     0.0       -0.1	 1.2	
2009	Reds	 1.7	  0.2	     0.3	0.4	 1.3	
2010	Reds	 3.6	  0.2	     0.8	1.0	 2.6	
2011	Reds	 4.3	  1.8	     0.0	1.7	 2.6 
2012	Reds	 6.2	 -0.1	     1.7	1.6	 4.5	 
2013	Reds	 1.6	  0.8	     0.1	0.9 	 0.6
Total	- - -	18.4	  2.8	     2.8	5.6	12.8

For his career, he's put up 12.8 WAR using FIP. But using RA9, he's put up 18.4. That's an entire win per season! The new stats help us understand that the difference between his FIP & RA9 can be explained by equal parts BABIP "luck" and LOB% "luck". Interestingly though, some years it's been more one than the other.

This is interesting. But it doesn't do anything to answer the question, "which is more right"?

I thought that maybe a closer look at the data would help. I looked at all qualified SPs from 2011 to 2013, 139 pitchers. They all had at least 300 IP, but ranged from 300 to over 700. So that I could compare them fairly, I normalized the stats to "per 200 IP".

(A quick but important aside, we have to keep in mind this is not a representative sample of all pitchers in professional baseball. This is a sample of the best of the best professional baseball has had to offer. So if I say that pitchers can't control their LOB%, I don't mean that your or I would have a 75% LOB% if we stepped on a major league mound. I mean that pitchers are good enough to keep a job as a major league SP don't differ much in their ability to sustain a LOB% above or below league average. This is important because it's where our intuitions are prone to get confused.

On to the data! I first looked at the distribution of BIP-Wins, LOB-Wins, and those two combined as FDP-Wins to get a lay of the land.

Code:

		Min	Max	 Avg	SD
BIP-Wins	-1.9	2.2	 0.1	0.8
LOB-Wins	-2.0	1.6	-0.2	0.6
FDP-Wins	-2.7	2.6	-0.1	1.0

That's pretty straight forward. My groupings are somewhat arbitrary, so don't read too much in to a particular value; it's the basic shape that matters. As we could infer from the table above, about 80% of the guys are within 1 win on the two components and all but a few outliers are within 2 wins. BIP has a bit more variation than LOB, but they're pretty similar.

BIP-Wins and LOB-Wins are virtually uncorrelated (R-sq of .01). This actually surprised me a bit because I figured that LOB% was largely a function of getting more or less outs than expected, which is was a low BABIP measures. But I guess the timing "luck" tends to trump a few more or less outs overall. I guess that makes sense. What this means in practice is that a guy at the one extreme on BIP or LOB is just as likely to be in the middle of the other distribution as anybody else (The 2.2 BIP-Wins guy probably a close-to-average LOB-Wins guy).. This is why the distribution for the combined FDP-Wins is only a little bit wider than the other two instead of twice as wide.

[B]So, does this tell us conclusively whether not this variation is due to the player or teammates/luck? No. However, a random process will tend produce a normal distribution like we see here. Roll two die and you get a lot more 6s, 7s, and 8s (44%) than 2s and 14s (6%). By contrast, measures of elite talent tends to produce distributions that skew in one direction (among MLB regulars, the average is 2 WAR, but the elite go up to 10). That's a big hint for us. If BABIP & LOB% were mostly skill and had a reliable influence on pitcher effectiveness, we'd expect to see some outliers on the "really" good side and then a bunching up toward the bottom.

But let's assume for a second that these things are skills. What would these data tell us? Even if these variances are all due to skill, no regular MLB SP prevents or allows more than about 3 wins worth of runs per season due to non-FIP skills and no more than about 2 wins worth on either BABIP or LOB alone -- and most guys are pretty average.

In the other thread, this came up in the context of Cardinals SP Joe Kelly. Over the last two seasons as a part-time SP, Kelly has a 3.08 ERA (2.69 last year). But his FIP is a pedestrian 4.00. In his brief major league career, he has put up:
FIP-WAR/200 IP: 1.0
BIP-Wins/200 IP: -0.2
LOB-Wins/200 IP: 2.3
RA9-WAR/200 IP: 3.0

If Shelby Miller is healthy, Kelly is going to be back in the bullpen. The question is, were the Cards to get 3 WAR of run prevention from a 1 WAR starter or are they sending a 3 WAR starter to the bullpen for no good reason?

Let me first say, as I explained with Cueto, I don't think FIP is entirely right. Pitchers can and do sustainably prevent/allow runs in ways not captured by FIP. However, I know that both both defense and "luck" play non-trivial roles in run prevention. So if I'm crediting a pitcher for his contribution to the team or trying to project his future performance

Kelly wasn't in my sample because his IP were too low, but his 2.3 LOB-Wins would have outpaced by the field by far. Only 4 SP in the sample were north of 1.0: Tyler Chatwood (1.6), Kris Medlen (1.4), Ivan Nova (1.1) and Vance Worley (1.1). Not coincidentally, 3 of those 4 guys had among the lowest IP totals in the sample, where there is less time for the "lucky" aspect of things to average out.

On one hand, it's possible that Joe Kelly is the best SP in baseball at stranding base-runners, twice as good as everybody not named Chatwood or Medlen. But it's much more likely that Kelly (and Chatwood, Medlen and copmany) have been quite lucky. That's not an either/or proposition; guys with extreme performances are those that are good AND lucky.

But given what I've seen in the data and what we know about the Cardinals defense, if his FIP stays in the 4.00 range, I'd put good money that his RA9-WAR moving forward is going to much closer to his FIP-WAR than it has been so far.

For the record, I'm somewhat concerned about Cueto's RA9 too, because he's probably been good and lucky. But I'm more optimistic with Cueto than Kelly because he pitches in front of a stellar defense and controls the running game as well as any pitcher in baseball, which leads to more outs than his FIP suggests. Unlike Kelly, Cueto's true production probably is a good bit higher than his FIP-WAR suggests (though still shy of his RA9).

And because I was curious and others might be as well, I looked at a few other correlations.

Code:

			R	R-Sq
FIP-WAR & RA9-WAR	0.80	0.64
FIP-WAR & BIP-Wins	0.12	0.01
FIP-WAR & LOB-Wins     -0.04	0.00
RA9-WAR & BIP-Wins	0.55	0.31
RA9-WAR & LOB-Wins	0.33	0.11

This tells us a few pretty simple things:
1. FIP explains about 2/3 of RA-9. BABIP & LOB can explain a lot of the rest, with BIP having a bigger role.
2. FIP is basically not correlated to BIP or LOB. If those things are skills, they are unrelated to do with whatever ability a pitcher has to strike guys out, not walk them, and keep the ball in the yard.

If you want to come up with your own adjusted "ERA" that sits somewhere between FIP and RA9, I'd suggest taking TangoTiger's advice: split the difference. And if you're particularly suspicious of the guy's BABIP and/or LOB, split the difference with FIP again.

**tomnuetten** · 03-01-2014, 07:15 PM

Thanks! Your comments/posts are a must read, you must spent lots of time writing and researching those posts and you have a great writing style that even the German can understand most of it

great work

**Roush's socks** · 03-01-2014, 10:04 PM

Count me in the runs allowed camp. Once you get into a large enough sample size, there is no such thing as luck. If a pitcher is allowing fewer runs than their peripherals suggest, and they sustain that over a period of time, then I call that skill.

**Roush's socks** · 03-01-2014, 10:29 PM

It's kind of the same question as evaluating hitters on RC vs RBI/RS. Runs created is based on potential, but RBI's and runs scored are what actually happened.

**dougdirt** · 03-01-2014, 11:11 PM

I don't like pitcher WAR in general, either version. They both have their issues. I will stick with my gut based on Innings, walks, strikeouts and ERA+.

**Roush's socks** · 03-01-2014, 11:49 PM

I think you have to distinguish whether you are giving credit for past performance, or trying to predict future outcomes. You might watch a pitcher all year and get the feeling they are flirting with disaster, getting away with it. You have to give them credit for their success, while recognizing that they might not be able to sustain it in the future.

Just like I think Philips deserves credit for his RBI production last year, but I don't think he will necessarily replicate it this year.

**jojo** · 03-02-2014, 10:26 AM

I don't really get the argument over FIP or RA for WAR. I tend to ignore WAR alot when evaluating pitchers because it's more informative to get very granular with pitchers. Why use WAR at all for pitchers? Here's some old posts from the archives that largely explains why I marginalize pitcher WAR:

This kind of sets the context for evaluating pitchers based upon the "pitcher trifecta":

http://www.redszone.com/forums/showp...&postcount=137

This one pretty much explains how to dissect a pitcher and see him naked without wearing his defense or randomness:

If you look at a pitcher's peripherals (K%, BB%, K/BB, GB%, HR/FB%, LOB%, and BABIP) in conjunction with a superior metric like FIP you really can get a very complete picture of why a pitcher performed the way he did and a really good feel for what you should expect he'll do in the future.

http://www.redszone.com/forums/showp...8&postcount=21

**cincinnati chili** · 03-03-2014, 12:45 AM

Originally Posted by Roush's socks

It's kind of the same question as evaluating hitters on RC vs RBI/RS. Runs created is based on potential, but RBI's and runs scored are what actually happened.

I like your prior post a lot but disagree with this one. A pitcher has a substantial effect on whether runs are allowed by his team. 50%? 75%? 90% of the equation? Because there are eight other men in the lineup, assigning value to a batter based solely on runs scored/rbis usually ends up giving credit/blame to the wrong players. Does anybody seriously believe that Brandon Phillips deserves more credit than Joey Votto for the Reds' above average run total last year?

Thread: FIP vs. RA: Which for pitcher WAR? Why not both?

Thread Tools

Display

FIP vs. RA: Which for pitcher WAR? Why not both?

Likes:

Re: FIP vs. RA: Which for pitcher WAR? Why not both?

Likes:

Re: FIP vs. RA: Which for pitcher WAR? Why not both?

Likes:

Re: FIP vs. RA: Which for pitcher WAR? Why not both?

Re: FIP vs. RA: Which for pitcher WAR? Why not both?

Likes:

Re: FIP vs. RA: Which for pitcher WAR? Why not both?

Re: FIP vs. RA: Which for pitcher WAR? Why not both?

Re: FIP vs. RA: Which for pitcher WAR? Why not both?

Posting Permissions