One of my favorite lines in the movie Moneyball is when a scout named Grady tries to convince general manager Billy Beane that league veteran David Justice is not worth hiring:
“He's going to really help our season tickets at the beginning of the year…But by June he's not going to be hitting his weight.”
For those less familiar with baseball, the joke here is that batting averages — the percent of at-bats that result in a hit — are quoted in a similar fashion to weights in pounds: if your batting average is .250 (i.e. one out of four at-bats leads to a hit), and you weigh 260lbs, then you don’t quite hit your weight (250 < 260). Since batting averages typically range in the mid .200’s, while weights are in the high 100’s, failing to hit one’s weight typically implies significant underperformance.
But just how much of an underperformance is it to not hit one’s weight? Has it become harder or easier over time? And do we have examples of “good” players who have failed to hit their weight? In this post, I conduct an in-depth analysis of the Moneyball insult with a view toward answering these questions.
Stylized Facts
To begin, it is useful to look at the history of batting averages and player weights in the MLB. I construct both measures using the excellent Lahman baseball database, and take averages across the MLB weighted by plate appearances — that is, giving more leverage to players that have more plate appearances. I focus on the modern era (post 1960), when data on player weights are both more complete and more trustworthy.
Fig. 1 plots the time series averages of both batting average (in decimals × 1000) and player weights (in pounds). There are three striking patterns in the data. The first is that league-average batting average (red) lies well above the average weight (blue). Indeed, this is basic gist of Grady’s insult: in 2001 (the year the Oakland A’s traded for David Justice), the league average BA was 0.264, while the league-average weight was 194. The “average” player could afford to gain 70 pounds or achieve a hit on 7% fewer at-bats, and still manage to “hit his weight”.
The second pattern is that player weights started increasing dramatically in the early 1990’s. From 1960-1980, the average weight across seasons was remarkably steady 183lbs. Since 2010, the average weight has been 211lbs. In other words, players have become heavier (and taller) over time.
Finally, and slightly less easy to see, batting averages have decreased in the recent years. In the early 2000’s, the league-wide batting average was close to 0.280; in the most recent season, it was 0.242 (roughly, the 25th percentile in 2000). This decrease is a well-studied trend that reflects a confluence of factors, including increased emphasis on power hitting, the use of defensive shifts, and improved pitching techniques and specialization.
From Averages to Probabilities
On average, MLB batting averages significantly exceed player weights. But we are less interested in the average batting average vs. the average weight as we are in the relative frequency of players who manage to hit their weight. For example, from Fig. 1 alone, it is difficult to get a sense of precisely how common it is to fail to hit one’s weight. And while the recent trends of increasing weights and decreasing batting averages suggest that this phenomenon might have become more probable over time, this is not a necessary consequence of the data, at least without some additional assumptions.
To see why, consider the case where batting averages and player weights are jointly normally distributed. It is straightforward to show that the fraction of players that fail to hit their weight, P(BA<Weight), is given by:
where Φ is the normal CDF function; μ_B and μ_W are the means of batting average and player weight respectively; σ_BW is the covariance between batting average and weight; and σ_B^2 and σ_W^2 are the variances of batting average and weight.
Since Φ is an increasing function, we see that the fraction of players who underhit their weight is higher when the average weight is high relative to the average batting average (μ_W>μ_B). This should be obvious, insofar as a higher average weight make it more difficult to outhit it.
However, the variances and covariances can matter as well. As in the data, suppose that the average BA exceeds the average weight, so that the numerator μ_W - μ_B is negative. Then higher variance in either component leads to a higher fraction of players who fail to hit their weight (intuitively: when data are noisy, it is more likely to underhit one’s weight just by chance). And when player weights and batting averages are negatively correlated, more players fail to hit their weight (intuitively: if heavy players are bad hitters, failing to hit one’s weight will be much more common).
It turns out that weights and batting averages are not quite normally distributed, but are close enough to normal for the approximation to allow a useful calibration. In particular, let’s take as our unit of observation players in each season with at least 100 at-bats post-1960. Plugging in empirical counterparts to (1), I get
i.e., roughly 6% of players fail to hit their weight in a given season. The true percentage over 1960-2021 is 6.66%, suggesting our approximation is pretty accurate! Note also the unconditional correlation, σ_BW, is negative (-8.89). In other words, heavier players tend to be slightly worse hitters, all else equal, but the correlation is so small as to be meaningless (-0.01).
Heterogeneity over Time
The previous calibration used empirical estimates over the full 1960-2021 period. But as we saw in Fig. 1, these are not stable parameters: weights have been increasing, batting averages decreasing. Under certain assumptions, this suggests that failing to hit one’s weight has become more common.
Figure 2 shows that this phenomenon has indeed become significantly more common — from less than 5% pre-2000 to nearly 20% in the most recent season. (Ironically, the Moneyball quote was from 2001, the last year before the rapid increase).
It is tempting to conclude that this difference is entirely due to changes in the average weight and batting average, but we can test this empirically. To do so, I separately consider the role of each parameter (μ_B, μ_W, σ_B, σ_W, σ_BW) in the time series evolution of the probability of underhitting one’s weight. For each parameter, I ask “if this parameter were set at a baseline level — say, the average level from 1960-1980— and all other parameters were updated each year, how far would the implied probability, using our normal approximation, be from the observed probability?” The idea here is that if the implied probability from fixing a certain parameter deviates significantly from the observed probability, then that parameter matters a lot. Equivalently, letting that parameter “update” is important for matching the observed data.
Figure 3 shows the results of this analysis. The thick black line shows the true proportion of underhitting, replicating the line from figure 2. The remaining (colored) lines show the impact of holding each parameter constant at its baseline level. As we can see, the changing average weight, μ_W, accounts for most of the difference. The changing mean batting average (light blue) has a smaller effect, but was relatively important in 1990s. Changes in the variance and covariances have a trivial effect when considering changes in the other parameters.
Player Analysis
That a full one-in-five players failed to hit their weight in 2021 raises the question of whether any well known players have failed to do so. The short answer is yes, not only because so many players are heavy, but also because players can be good without achieving a high batting average.
To begin our player-level analysis, we consider another way of visualizing the probability of underhitting one’s weight. In Figure 4, I show a scatterplot of batting averages against player weights, where each point is a player-season. As usual, I filter to post 1960, with at least 100 At-Bats for the season.
I also plot the “hit one’s weight” line, BA=weight/1000, in black. All points below / to the right of this line are players that failed to hit their weight in the season.
One interesting takeaway from this visualization is that our metric of underhitting does not seem particularly concentrated at a particular range of batting averages or weights. At the high end, a certain player (Dmitri Young) managed to hit 0.283 in 2002 without hitting his weight (295). At the low end, the lightest player to not hit his weight in a season was current Washington Nationals manager Dave Martinez. In 1986, he hit only 0.138 on 108 at-bats, despite weighing 150 lbs.
What about at the career level? Re-aggregating at the player level and filtering to players with at least 600 career at-bats, I show the results in figure 5, with some notable players highlighted. Aaron Judge, 2022 AL MVP, has not hit his career weight (282 lbs)! Nor have players like Rhys Hoskins, Ji-Man Choi, and Franmill Reyes.
An alternative question to ask — and a great baseball trivia question — is “who is the heaviest player that has hit their weight over the career”. The answer, of course, is Prince Fielder, who achieved a career BA of 0.283 with a weight of 275 lbs. Below, I show the top 10 heaviest who did their weight, as well the best hitters (by batting average) to not hit their weight.
Final Thoughts
When the events of Moneyball took place, failing to “hit ones weight” was indeed a mark of underperformance: a mere 6% of players did so. But in today’s game the accusation holds less bite. For one, nearly 20% of players today fail to hit their weight. And among those that do are literal MVPs like Aaron Judge. These time series trends reflect a combination of increasing player weights, decreasing batting averages, and a tendency to prize offensive output beyond batting average.
Typically I end blog posts with some concluding thoughts or issue that arose in the analysis. While this post was less serious than some of my other ones, I’ll continue the tradition.
With all said and done, did David Justice hit his weight? It turns out he did. His 2002 BA for the Oakland A’s was .266, slightly below his career BA of 0.279, and well above his (listed) weight of either 195lbs (Baseball Prospectus) or 215lbs (MLB.com). In fact, David Justice never had a single season where he failed to hit his weight!
Has there ever been a case where a player has not OBP (on-base percentage) his weight? (Recall: on-base percentages are strictly larger than batting averages, since they include walks.) Shockingly, the answer is yes, at least at the season level. Excluding pitchers, and filtering to minimum 100 plate appearances, there are 79 players since 1960 who have failed to OBP their weight in a season, including Josh Bell in 2010, Jonathan Schoop in 2014, and my childhood favorite, Robinson Cano, in 2022.
While the correlation between batting average and weight is pretty small (-0.06 to 0.06 over the full sample), this does not imply the absence of a relationship between weight and hitting ability more generally. One way to see this is using this helpful decomposition of batting average, courtesy of Jim Albert’s “Baseball with R” blog:
\(BA = \frac{H}{AB} = (1-SO.Rate)(HR.Rate+(1-Hr.Rate)BABIP)\)where:
SO.Rate = SO/AB is the strikeout rate;
HR.Rate = HR/(AB-SO) is the rate of home runs on balls in play
BABIP = (H-HR)/(AB-SO-HR) is the batting average on balls in play.
In words, this decomposition says that players with high batting averages must have some combination of low strikeout rates, and either high home run rates or high batting average on balls in play.
While Corr(Weight,BA) is small, the correlation of weight with constituent components of batting average can be quite significant, as I show in Figure 6 below. Heavier players have more homeruns but also more strikeouts. These effects weigh against each other, leading to low correlation with the aggregate batting average measure. (There doesn’t seem to be much relationship between weight and BABIP).
While the scouts line about hitting one’s weight is one of my favorite from the movie, it is not my absolute favorite. As a Yale alum who studies economics and worked in baseball, my favorite line has to be this one.