RedsZone.com - Cincinnati Reds Fans' Home for Baseball Discussion Better descriptive stats?

 07-14-2013, 02:46 PM #1 RedsManRick Stat Wanker Hodiernus     Join Date: Dec 2004 Location: Chicago, IL Posts: 15,450 Better descriptive stats? Ever since I was kid, I had an intuition that bugged me about baseball stats. What are we trying to measure? Ultimately, I've come to realize that we ended up crafting statistics without full recognition that they were of varying use for the two basic kinds of questions we ask: What happened? What's likely to happen next? We realize that these are highly correlated, but also that there are nuances, exceptions, etc. In short, we have to grapple with the difference between "descriptive" and "inferential" statistics. And this constantly frustrates us. We want the elegance and simplicity of descriptive statistics with the real world value of of the inferential. Case in point: "Batting average". It's a simple measure of fact, descriptive, right? It's just "hits" per "at bat". Two numbers. Of course, as well all know, it's hardly that simple. We realize that sometimes when a player bats the ball and reaches first base, it's not because he hit the ball well but because the fielder screwed up. So created a special stat called a "hit", in which we subjectively decide whether the action of "batter hits ball and reaches base" is really earned or not. If the fielder "should have" made the play, we don't give the hitter credit for getting a "hit". We also realize that sometimes the batter doesn't get a fair chance to hit the ball. So we take all the times he walks up to hit and subtract out the times when he gets walked (be it earned via 4 balls, HBP or catcher's interference). It wouldn't be fair to count those plate appearances against him as if he failed to hit, right? And of course, sometimes the batter still did something good by advancing a runner even though he didn't reach base himself, so we should subtract out those sacrifices too, right? You get the idea. We took something that has the appearance of the record of a simple frequency of an event and we layered in a bunch of conditions to it so that it would be (supposedly) more meaningful -- it would tell us more about the player and about his contribution to the event, if maybe a little less about the event itself. If we just want to show what happened in the past, why are we making it so convoluted? And if want to measure "how good" the player was, shouldn't we be accounting for a lot more than that? It always seemed like we had something that didn't actually do a good job at telling us much of anything -- other than create an artificial sense of what it meant to be "good" at hitting. So, as I walked my dog this morning, I got to thinking. I know how to take the inferential stuff to a more useful place (e.g. wOBA), but would there be more value in just getting a clearer picture of "what happened". Instead of the still complicated slash line, can we make it simpler? So I pulled together these little tables. Imagine if this is what showed up on the TV screen instead of AVG/HR/RBI. Firstly, note I use percentages instead of counts -- there's a reason we currently show AVG instead of hits -- the same logic should apply to any outcome showed in such a context. The second thing to note is that I only went to two digits. What's the purpose of the 3rd digit on than the appearance of meaning. Does knowing a guy gets a walk 12.3% of the time give us more information than knowing he walked 11.9%? Or if you prefer do you knowing any more about a .275 hitter than a .281 one? I agree it just creates the appearance of knowledge. Code: ```Name PA Hit% Walk% Out% Shin-Soo Choo 430 23% 20% 57% Joey Votto 424 26% 20% 54% Jay Bruce 407 25% 8% 67% Brandon Phillips 383 24% 9% 67% Zack Cozart 375 21% 4% 74% Todd Frazier 348 21% 13% 67% Devin Mesoraco 191 20% 12% 68% Xavier Paul 184 22% 13% 65% Ryan Hanigan 168 17% 16% 67% Derrick Robinson 147 23% 11% 66% Chris Heisey 113 19% 5% 75% Jack Hannahan 101 20% 11% 69% Cesar Izturis 90 19% 10% 71% Donald Lutz 59 24% 2% 75%``` But that might not be quite enough info, so what if we broke it down in two meaningful pieces of each of those (sorted in descending order of value): XBH: Extra-Base Hit Sng: Single eBB: Earned Walk uBB: Unearned Walk PO: Productive Out (Sacrifice) upO: Unproductive Out Code: ```Name PA Hit% Walk% Out% XBH% Sng% eBB% uBB% PO% uPO% Shin-Soo Choo 430 23% 20% 57% 8% 15% 14% 6% 1% 57% Joey Votto 424 26% 20% 54% 8% 18% 16% 4% 1% 53% Jay Bruce 407 25% 8% 67% 11% 14% 7% 1% 1% 66% Brandon Phillips 383 24% 9% 67% 7% 17% 7% 3% 2% 65% Zack Cozart 375 21% 4% 74% 8% 14% 4% 1% 5% 70% Todd Frazier 348 21% 13% 67% 8% 13% 10% 3% 1% 66% Devin Mesoraco 191 20% 12% 68% 6% 15% 10% 2% 2% 65% Xavier Paul 184 22% 13% 65% 8% 14% 11% 2% 0% 65% Ryan Hanigan 168 17% 16% 67% 5% 12% 11% 5% 1% 66% Derrick Robinson 147 23% 11% 66% 5% 18% 10% 1% 1% 65% Chris Heisey 113 19% 5% 75% 11% 9% 4% 2% 4% 71% Jack Hannahan 101 20% 11% 69% 5% 15% 9% 2% 1% 68% Cesar Izturis 90 19% 10% 71% 4% 14% 8% 2% 1% 70% Donald Lutz 59 24% 2% 75% 3% 20% 2% 0% 0% 75%``` So I'm not proposing anything. I'm not doing any analysis. I'm just posing the question: would looking at the data this way add value -- particularly in a broadcast context in which high quality analysis and interpretation of nuance is unreasonable to expect. __________________ Games are won on run differential -- scoring more than your opponent. Runs are runs, scored or prevented they all count the same. Worry about scoring more and allowing fewer, not which positions contribute to which side of the equation or how "consistent" you are at your current level of performance. Last edited by RedsManRick; 07-14-2013 at 03:13 PM.
 Likes: Always Red (07-14-2013), BillDoran (07-14-2013), Billy Hamilton's Legs (07-14-2013), BluegrassRedleg (07-15-2013), NebraskaRed (07-14-2013), reds1869 (07-14-2013), RichRed (07-15-2013), thatcoolguy_22 (07-15-2013), Tom Servo (07-14-2013), vaticanplum (07-14-2013)

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home RedsZone     The Old Red Guard     Game Threads Baseball     The Sun Deck         Game Threads     Minor League Talk Miscellaneous     Introductions and Site Feedback         RedsZone Pictures     Non-Sports Chatter         Politics and Religion Talk         Arts & Entertainment     The Tavern         Fantasy Island Archives     The Archives     Predictions Archives

All times are GMT -4. The time now is 10:39 AM.

 -- RedsZone -- Black RedsZone -- Reds Away Contact Us - RedsZone.com - Archive - Top