View Full Version : The Greeks, Bill James and the Beauty of Baseball Stats

01-18-2007, 07:21 PM
From Baseball Analysts. An article on Willy James and our 'ol buddy Pythagoras.


The Greeks, Bill James and the Beauty of Baseball Stats
By Dave Studeman

You've heard of Pythagoras, right? If you're a fan of baseball stats, you might associate Pythagoras with Bill James's Pythagorean Formula, RS^2/(RS^2+RA^2), which calculates a team's expected winning percentage. It's a sublime formula, really. It captures critical information in a simple way and expresses the relationship between runs scored, runs allowed and winning just so.
If you're not a baseball analyst, you probably associate Pythagoras with right triangles, as in A^2+B^2=C^2, where C is the length of the hypotenuse. It's another beautiful formula. From what I've read, Pythagoras didn't exactly invent it, but he did popularize it. Still, it wasn't Pythagoras's greatest contribution to mankind.
Pythagoras actually invented the musical scale we use today. If you place your finger exactly halfway up a guitar string, the note of the string is an octave higher. Put your finger on a spot two-fifths the length of the string, and you get a perfect fifth note. It's said that Pythagoras discovered this, and he found that the simplest ratios of string length created the most harmonious notes.
Reportedly, this was a huge revelation to the Greek. He felt that he had discovered a fundamental Truth, something that uncovered the deepest meanings of the universe. In a way, he had.
Pythagoras had discovered the power and beauty of ratios. He became convinced that mathematical ratios were the foundation of all beauty in the universe. He conceived of the music of the spheres, in which all planets orbit the earth in a circle, set in a specific ratio from the earth, which emits its own tone throughout the universe.
Pythagoras took his findings seriously. He developed a following - a cult, really - that believed that universal truths could be found in numbers. His disciples considered him a kind of god and followed him loyally.
I don't know anyone who thinks of Bill James as a kind of god, but there are many of us who feel that our eyes were opened by his Abstracts. He didn't just discuss baseball and its numbers, he uncovered the beauty in its numbers. Take that Pythagorean Formula...
James found that you can reasonably predict a team's performance by its runs scored and allowed. He also found that the relationship is geometric; Runs aren't just doubled in the formula, they're squared.
The power of two is everywhere in life. E=MC squared, after all. When you move closer to a light, cutting the distance in half, the light doesn't become twice as bright. The brightness is squared. When you double the sides of a square, its size doesn't just double, it's squared.
So when Bill James discovered that the nature of runs to winning is squared, it seemed as though something essential and fundamental had been discovered. And he didn't stop there.
Take any league in modern baseball history and multiply its On-Base Percentage by its total bases. Know what you'll usually get? A number that is very, very close to the total number of runs scored in that league. I mean, how amazing is that?
League Year OBP TB OBP*TB Runs Diff %Diff
NL 1968 .300 18,737 5621 5577 44 1%
NL 1954 .335 17,106 5731 5624 107 2%
NL 1925 .348 17,751 6177 6195 -18 0%
AL 1997 .340 33,495 11388 11164 224 2%
AL 1977 .330 31,307 10331 10247 84 1%
AL 1959 .323 16,118 5206 5391 -185 -3%
I don't know if Bill James is the person who discovered this relationship but, like Pythagoras and his theorem, he will forever be associated with it because it was the basis of the very first Runs Created formula: A+B/C, where A is times on base, B is total bases and C is plate appearances.
Once again, James had found a simple formula and ratio, multiplicative in nature, that expressed the fundamental nature of baseball.
Of course, James created other metrics, too. He created Game Scores, Defensive Efficiency Record, Secondary Average and Isolated Power. He developed points systems for Hall of Fame and award eligibility. He created his own ways to project player careers (the Brock system), major league performance from minor league performance (MLE's) and the Favorite Toy.
James's findings were simple and beautiful. They were something new in the baseball firmament and they created a new kind of baseball fan, a bit like Pythagoras's cult. But, as with Pythagoras, questions began to undermine the beauty of the numbers.
One of Pythagoras's followers, an unfortunate man named Hippasus, discovered that some numbers are irrational. That is, the digits of some numbers continue infinitely like Pi (3.14159...) or the square root of two (1.41421...). Hippasus developed a proof showing that irrational numbers exist. Pythagoras considered this sacrilege, and reportedly had him drowned.
But the truth couldn't be held back, and the logic of Hippasus's finding was eventually recognized. Thousands of years later, a guy named Copernicus came along and established, once and for all, that the planets don't revolve around earth. They revolve around the sun. Pythagoras's music of the spheres doesn't really exist at all.
Pythagoras's math wasn't wrong, really. The trouble was that, for all of its beauty, it wasn't fundamentally sound enough to take future mathematicians where they needed to go. Newton and Einstein could never have conceived of calculus and relativity (relatively) if they had stuck to Pythagoras's mathematical ideals. Sometimes, progress requires a revision of the fundamentals.
Early in his career, Bill James really wasn't interested in creating the most precise statistics. He was interested in the framework, in the insights that would lead to revolutionary thinking about baseball and its players. So he didn't include counting stats like stolen bases and sacrifice hits in Runs Created. Like Pythagoras, he was most interested in the beauty and insight.
As time moved on, however, he became more interested in accuracy, and his formulas became more complex. He eventually added stolen bases, situational hitting and lots of other things to Runs Created. In fact, the current Runs Created formula is virtually unrecognizable compared to its original version, even though it still follows the A+B/C format.
The Pythagorean Formula has changed too. James recognized that squaring runs scored and allowed wasn't quite accurate enough, and changed the formula's factor to 1.83. I remember my disappointment when he did that, thinking that Pythagoras wouldn't approve.
Subsequent researchers have gone further, and found that the correct factor is dependent on the overall run environment. In other words, the impact of runs scored and allowed changes according to the average number of runs scored in each league each year.
Just think how Pythagoras would have responded to that.
Many years ago, Pete Palmer built his own runs estimator formula called Linear Weights, in which each offensive event (singles, home runs, walks, outs, etc.) is weighted by a specific amount. James didn't like Linear Weights. He once criticized Palmer's system because the weights of each event were computed after the end of the year (and he also doesn't like stats that use averages as a baseline).
However, Tangotiger showed, in a persuasive article called "How Runs are Really Created (http://www.tangotiger.net/runscreated.html)" a few years ago, that context really does matter. You can't really know the impact of each type of batting event unless you know how many times every event occurred.
In fact, Tango went one step further and showed that the format of James's original Runs Created formula wasn't quite right. He advocates the use of a formula developed by David Smyth called Base Runs (http://mb9.scout.com/fbaseballfrm8.showMessageRange?start=1&stop=20&topicID=1045.topic). And if you take some time to think about it, you have to agree with him.
When you look at things in more detail, sometimes the fundamental structures that have gotten you so far have to change. That's what Hippasus meant to Pythagoras, and that's what has happened to James's original formulas, too.
Baseball writers like Rich and me aren't really researchers. We're communicators. We want to reach out to fans who are curious about the game of baseball and describe to them how the "inner game" of baseball statistics works. We are truly following in the footsteps of James, who is a fantastic writer, and we want to express the same joy at the beauty of baseball stats.
On the other hand, hardcore researchers are finding new ways of describing the game's statistics, and we want to share that with general baseball fans too. So we're in a curious bind. We want to continue to talk about the music of the spheres, but we also want to acknowledge the Copernican solar system.
At Baseball Graphs (http://baseballgraphs.com/main/index.php/site/) and the Hardball Times (http://www.hardballtimes.com/), I've helped keep Bill James's Win Shares in the public's eye. At the same time, however, I've conducted my own research and tried to improve his system. Some researchers have told me that trying to correct Win Shares isn't possible, that the framework is too flawed. But there is much I like about Win Shares, so I soldier on.
In the end, my quest may be quixotic, but as long as I help a few fans see a bit more in the numbers, and help a few researchers get a little more visibility for their efforts, I'll be happy. At least, hopefully, no one will try to drown me.
Dave Studeman is a writer at the Hardball Times (http://www.hardballtimes.com/), and also the manager of the Baseball Graphs (http://www.baseballgraphs.com/) website.

01-18-2007, 08:17 PM
Initially, I thought there was a spelling error in that title, but after reading it, I get it.;)

01-19-2007, 04:15 AM
So, let me get this straight. Raisor is greek?

01-19-2007, 08:33 AM
So, let me get this straight. Raisor is greek?

Phil Raisoropolis?

01-19-2007, 10:24 AM
Phileous Raisoropolis?

name corrected.