The WAR Crutch, or 'Why it's OK to like sabermetrics, yet dislike WAR'

Bob Levey

Note: All citations, unless otherwise noted, come from Dave Cameron's (excellent) Win Values Explained pieces from the fall of 2008, to my knowledge, the methodology for Fangraphs WAR hasn't changed much in the timespan. Link

In this fanpost, I would like to share my two main issues with the calculation of WAR. As BYB has become more sabremetric focused over time, I've noticed a number of people vilified for refusing to accept WAR as an accurate measure of player skill or contribution to a team. They are often cast as out of touch, and encouraged to educate themselves on the subject before dismissing it.

Well, as it happens, I don't like WAR as a measurement much at all. The linear weight system suffers from bias as the game changes subtly every year. Defensive metrics are unstable at best, and downright dangerous at worst. I'd like to take a minute to expand upon those thoughts.

These aren't necessarily my only issues with WAR, but they're the largest two related directly to the calculation of a value.

Part 1- Linear Weights

The offensive (Batting) component of WAR is very straightforward, it's derived from wOBA, which is a strong statistic that attempts to corral all contributions of an offensive player and present them on the same scale as OBP. It does this by adjusting potential outcomes of at bats using linear weights calculated over an incredible amount of baseball to determine their exact contributions to run scoring.

For the curious, these weights are as follows: HR 1.70, 3B 1.37, 2B 1.08, 1B 0.77, NIBB 0.62.

Linear weights have also been used to assign positional adjustments:

Catcher: +12.5 runs (all are per 162 defensive games)
First Base: -12.5 runs
Second Base: +2.5 runs
Third Base: +2.5 runs
Shortstop: +7.5 runs
Left Field: -7.5 runs
Center Field: +2.5 runs
Right Field: -7.5 runs
Designated Hitter: -17.5 runs

And park factors: Link

Ironically, it's the sheer volume of research that weakens this stat. I've often heard the statement, "If you don't like the way the stat is calculated, use your own weights." I believe that is a necessity, baseball is not played in a continuous vacuum, and is in fact constantly evolving. Every year the game changes slightly, and calculating off previous year results will introduce bias into the calculation. Fangraphs has begun to correct for this, calculating park factors off five year averages. Unfortunately, the offensive weights and player values have not to my knowledge taken that step (Link), and anyone who thinks the relative value of a homerun hasn't changed between the height of the steroid era and 2013 hasn't been paying enough attention.

I understand that these differentials are most likely small (I can't invest anywhere near the man hours it would take to confirm that, but it seems a safe enough assumption), nonetheless bias will exist in the data set, reducing the accuracy of WAR's offensive component, positional adjustments, and park factors to a lesser extent.

That being said, the batting section is still the most reliable of the major components of WAR. If baseball's best are constantly at the top of the wOBA ranks, and the dregs feed at the bottom, the stat does a good job of objectifying player performance, even if the values may be slightly biased.

There are some legitimate questions on park factors: are stats the result of the parks, or are the parks merely the victims of the players in them? What influence does employing Miguel Cabrera and Prince Fielder have on Comerica's power ratings? That being said, the trends hold fairly well, and are probably not worth the time to fuss over. Unlike defensive statistics.

Part 2- Fielding

I could post an incredibly long rant here about the inaccuracies of Defensive Metrics, but I think Dave does so more eloquently than I could. Discussing the UZR component of WAR:

Essentially, it’s the best fielding metric publicly available, and while it’s not perfect (I generally give it an error range of five runs in either direction, meaning that a +10 could be anything between a +5 and +15), it’s a big step forward in defensive evaluations.

UZR is not a perfect statistic. It's not even really particularly close. It stabilizes decently with a large enough sample, and given a few seasons of data, statistical noise flattens out and provides a pretty decent view of the defensive contributions of a player. However, in even 30 game samples it's still ridiculously flawed. Raise your hand if you really think Dexter Fowler has been worth nearly a full 1.0 WAR above Miguel Cabrera so far this season (2.1-1.3 as of this writing). At times, even full seasons can be complete outliers (Jhonny Peralta was most likely not the 3rd best defensive shortstop in either 2011 or 2012, but UZR loves him).

Often, these discrepancies are too easily shaken off as "not WAR's fault". Flaws in defensive statistics are inherently flaws in WAR, as the latter is dependent on the former. As an analogy, let's say you're about to cross a bridge; The engineer informs you that 4/5ths of the bridge construction is perfectly sound, but that they have no good way of predicting what it might do when the wind starts blowing. It happens to be a very windy day, would you want to cross the bridge?

Final Thoughts

This is the issue with relying on flawed defensive metrics to come to a conclusion. Most sabremetricians you run into freely admit the flaws of defensive metrics, but still insist that the WAR statistic is not unduly tarnished as a result. The Mike Trout vs Miguel Cabrera debate is a prime example of how people who preferred not to consider evaluations based on WAR were painted as out of touch with sabremetrics, and unwilling to accept numbers. However, n the case of defensive metrics, and to some extent linear weighting, there is a very strong case against their validity that does significantly undermine the usefulness of WAR.

In the end, I think it's important to keep in mind Dave's words from nearly 4.5 years ago:

Make no mistake – I think these are the best single value metric for evaluating a player on the internet today. I’d use a player’s Win Value number to describe his total performance before I used anything else. But we’re not saying they’re perfect or that they can’t be improved upon. We’ll keep working on getting better data, figuring things out, and making them even more accurate in the future. Right now, they’re great. Hopefully, by this time next year, they’re even better.

A lot of very intelligent people are working on WAR, and it is a great statistic. But it is flawed. No one should rely on it as a crutch, there are an incredible number of other metrics that can tell us a lot about a player. Each is subjected to its own inherent flaws. WAR isn't perfect, nothing is. I for one am tired of its use as a crutch throughout the blogosphere, and hope that with this we can stop pretending all opponents of WAR are Jerry Green clones.

This is a FanPost and does not necessarily reflect the views of the <em>Bless You Boys</em> writing staff.