An alternative method for determining defensive WAR
According to FanGraphs, Darwin Barney was an excellent defensive player in 2012. Barney, the everyday second baseman for the Chicago Cubs, was worth about 13 runs, or 1.3 wins above replacement, due to his defensive contributions at the pivot. This is a very good number.
According to Baseball-Reference, Darwin Barney was the most excellent defensive player in baseball in 2012, along with Brendan Ryan. In terms of wins above replacement, B-R had Barney at about three-and-a-half defensive WAR. This is a tremendous amount of value ascribed to a single player -- especially one who does not play shortstop. It can mean the difference between a good starter and an All-Star level of performance.
These two comparisons of value are just a bit different.
This is a symptom of a greater problem in sabermetric circles these days: can we trust single-season WAR values, when the defensive component of WAR is, let's just say "debatable"? For a while now, there's been a lot of discussion about how useful the defensive metrics that go into WAR are, especially on a year-by-year basis.
The most common refrain I hear is that the sample sizes used to judge a fielder's performance in a given year are simply too small to ascribe much real weight to these numbers. Nevertheless, a good (or bad) defensive season can mean a gigantic swing in wins above replacement. While the WAR metrics aren't used as catch-alls by everybody, when a player has a dramatic swing in perceived defensive value, it can show up in plenty of articles and analyses about how much value a player is worth, and how useful their performance has been / can be.
Perhaps what we need to do is *deep breath* rely less on the UZR / DRS / TZ / FRAA numbers, and more on a scouting perspective.
Re: An alternative method for determining defensive WAR
The one thing I've never understood is why the variance in WAR from year to year is almost always treated as evidence that it is unreliable as a measure of performance for that year.
Sure, the small sample issue means it is not a good measure of talent. But talent and performance are not synonymous. If we looked at offensive stats after 200 PA, we'd see many more odd peaks and valleys. But we wouldn't say they're an unreliable measure of offensive production over those 200 PA.
Now, I realize a big part of the issue with Barney is the difference between UZR and DRS -- I'm very curious about that too. But how prevalent are those types of gaps. I decided to check it out:
A total of 123 players had the number of innings to "qualify" per Fangraphs. The distribution of the variance between UZR and DRS looks like this:
(UZR higher than DRS)
+10 and up: 6
+5 to 10: 12
+2 to +5: 24
-2 to +2: 34
-2 to -5: 22
-5 to -10: 18
-10 and down: 7
I find that interesting -- it's a pretty normal distribution. That suggests that whatever the difference is, it cuts equally both ways. I also looked by position and it doesn't seem to be biased towards or against a certain position. The R-squared between the two for that population is 0.69, a fairly strong positive, but nothing like you'd hope for given that they're supposed to be measuring the same thing.
Looks like there's definitely a reason to be skeptical about the numbers -- but I'd rather get an actual explanation looking at how they coded/valued things differently and choose the approach that seems more correct than simply average them.