Anybody read the article in the back by Gary Huckabay regarding the limits of statistical analysis?
Excellent essay and interview. I've been waiting to read another piece like that ever since since Bill James swiftly reprimanded the statistical analysis wannabes a few years ago while guest-writing for Neyer. It was so welcome, in fact, that I didn't even begrudge Huckabay for getting those thoughts into print before I could.
Travis Hafner with the flu on April 8th is a different player than Travis Hafner in perfect health on May 14th. Using seasonal data, you get an aggregation of all the Hafners that played in 2005; that's not necessarily a bad thing, but any analysis or decision-making based on that information should include contextual information, if possible. That can't be done with statistical analysis.
As a group, the statheads were no help. You didn't understand that parks change every year, and not just in little ways. Instead of understanding park effects better because we're taking all these little adjustments into account, now we know less than we did before, and we had to pay for the privilege.
The somewhat problematic issue is that of embedded assumptions -- we don't generally have enough data to know the specific impact of a player's physical environment on that player's performance, so we end up using data aggregations and averages, which leaves us without a good tool to explain why a particular park has certain effects.
Kevin Goldstein, Baseball America:
The amazing thing to me is that people don't question that a college football athlete can have great statistics, win the Heisman Trophy, and still have no future in the NFL because professional football is a different beast. Then, those same people can't accept that a player can be a great collegiate baseball player, but have no chance of ever making a contribution in pro baseball. The college game of baseball is a different game than professional baseball, just as the NFL game is a different game than Major College Football.
It's not about using information or not using information. It's about identifying and gathering the right information on which to base decisions.
Gary Huckabay: So the stats guys have simply failed to make their case?
Baseball Executive: I think so. And it's because most of the stat guys that have been hired are the wrong guys. They're amateur mathematicians, really. They don't have training or experience in persuading people. What I've seen and heard, both personally and second hand, is that if their mathematical case isn't the one that ends up determining a roster spot or contract, they repeat the same information, only louder, and decry the lack of understanding of the other people in the process.
Gary Huckabay: Are you worried about losing ground to other organizations that are investing more money and time in their analytical capabilities?
Baseball Executive: Not really, no.
Gary Huckabay: There are clubs doing some pretty cool stuff.
Baseball Executive: But they're not using it particularly well. One thing I've learned is that this isn't an area where clubs can actually generate an advantage. It's more of a place where you can lose ground if you do nothing, rather than one where you can gain something through action.
Gary Huckabay: But don't you lose ground to an org that has a dedicated person, or a top-flight consultant like an Eddie Epstein?
Baseball Executive: If so, it's not a huge advantage they're getting. It's not like every team has a squad of Keith Woolners on staff with a team of analysts and programmers at his beck and call. And, the dirty little secret of your industry is its lack of opinion divergence [bold and italics are mine]....
Gary Huckabay: What do you mean by lack of opinion divergence?
Baseball Executive: You guys generally don't have a dime's worth of difference between you when it comes to players. You like durable guys with high on base percentages who hit for power and play great defense. On the mound, you like guys that strike people out as often as possible, don't walk people, and keep the ball on the ground. Gee, not sh!t. Us dumb-arse scouts would never have thought of that. Do you think it's possible that maybe you'll be against wife beating and passing out meth to kids, too?
Baseball Executive: The biggest problem with statistical analysis is that it's always retrospective. That's the biggest difference between stats and traditional scouting, and it's an unconquerable strength of scouting. I've seen perhaps 15-20 prediction methods using statistical data; you have PECOTA and Vlad, Rotowire, perhaps a dozen systems done by individuals, and all of them basically work off the same information. None of them are particularly interesting, really, and their primary benefit is that someone else has done the tedious work of writing them all down. I'd like to see a prediction system that worked well, and had some actual knobs that can be tuned, in terms of underlying assumptions.
Gary Huckabay: What's next for statistical analysis in front offices?
Baseball Executive: Like most movements, its best ideas will be co-opted by the brightest people on the "other side." It doesn't take very long to teach the core pieces of serious baseball analysis to scouts and old-line baseball men. The reality is that there's never been that much difference between the guys that the scouts like, and the guys that the statheads like. It's really just a question of degree.
Baseball Executive: Seriously, though, it's much easier for an ex-player to learn what he needs in terms of analysis than it is for a true data star to learn what he needs in terms of observation, people, and management skills. Just the way it is.
Gary Huckabay: The baseball analysis "community" lacks standards; people self-publish their work and feel confident that they're qualified to offer advice on multi-million dollar transactions. Many of these people don't have formal training in statistical methods or research design, nor exposure to all of the constraints facing decision-makers in front offices. They occupy a nexus between academia and fandom....
There is excessive attention paid to the "academic" race, refining a model to another 1% of precision, without regard to its utility for making decisions that will actually help a ballclub, or the enormous error bars inherent in the entire exercise. All of these things work against the widespread adoption, much less embrace, of data-driven management.