I follow everything except the multivariat regression. How is that performed?Covariance. The players who get on base a lot tend to be the guys who SLG a lot. That is to say that they overlap. When you just look at one when running a correlation between OBP and Runs, the effect of the other one is still there, but you aren't giving it any credit.
But if those two things correlate with each other (teams who get OBP tend to SLG more - or less - than teams who don't), and OBP and SLG do correlate, then you are still capturing a lot of the other one's effect even though you aren't measuring it directly. So if you want to look at them at the same time, you have to get rid of the part that overlaps or you'd count it twice.
- A vs C have a correlation of .7
- B vs C have a correlation of .5
If you look at A & B together, your correlation isn't going to be 1.2 (that's impossible!). Instead, you are going to have to divide up that overlap.
Let's say that you re-run your analysis look at A and B at the same time to try and predict C.
- A and B together vs C has a correlation of .8
However, when you run a multivariate regression, you also get regression coefficients. Essentially, these tell you the component pieces of that overall model correlation. The process of running the regression tells you were the overlap is and gets rid of it so that the pieces fit together squarely.
So your new overall correlation of .8 is explained by taking 3 parts of A to every 1 part of B. Tthat's not .6 correlation and .2 correlation, mind you. That is to say that a change in OBP has 3 times the effect on run scoring as a similarly scaled change in SLG.
I don't know if that's actually the right ratio for runs and slugging (I thought it was more like 1.5:1), but that's covariance in a nutshell.