Which Advanced Stat Is More Important? RE24 or Non Contextual Stats Like wOBA or Runs Created?
As you probably know by now, I like baseball stats. I also like trying to solve mysteries. Delving into the stats can be intriguing, because we frequently want to understand which particular statistic is more useful in comparing and analyzing players and teams.
We like advanced stats, but they may or may not include context (such as context-dependent win probability stats). Run expectancy stats are contextual, taking into account the sequencing and timing of hits and outs. However, we usually rely on rate stats which are based on the league average run impacts of hits and outs over a season. Examples include wOBA, wRC+, and wRAA. These are linear weight statistics which place a value on singles, doubles, triples, and home runs based on the average throughout the season. The assumption is that “context” adds more noise than explanation. And the player’s average wOBA or wRAA is rolled into the player’s fWAR or bWAR valuation. However, if we choose to utilize a contextual stat, like Run Expectancy (RE 24), the player’s value as measured by WAR, might be considerably higher or lower.
I like RE24, which measures the value of a hit or out in each of the 24 base-out states. RE24 is measured for both pitchers and hitters, but this article is concerned with offense. Unlike Win Probability Added (WPA), RE24 does not measure timing of leverage (i.e., inning and score), but relies on the base-out situation. Because RE24 is situational and considers factors like productive outs and base running, conceptually it is more informative of skill and repeatability than WPA. There are many base-out situations in which the batter may adjust his approach at the plate. At the team level, RE24 may tell us which teams are better at converting hits and outs into runs. The team’s RE24 may implicate characteristics like lineup construction, as well individual player performance.
Productivity of Outs
Fangraphs’ Ben Clemens recently published two articles (here and here) investigating the run expectancy of various kinds of outs in base/out situations. This has some similarity to RE24, but focuses only on a slice of events (namely outs) during each base/out situation. His methodology “calculated the difference in run scoring expectation between the average out and a specific type of out (strikeout, air out, non-GIDP groundout, double play) for each base/out state,” and gives it the fancy name “out advancement runs.” Air outs and non-GIDP groundouts are better than the average out, from the standpoint of productive outs, and strike outs and GIDPs are worse than the average out. GIDPs are much worse, bordering on disastrous.
His methodology views Corbin Carroll and Aaron Judge as polar opposites on this metric, Based on the type of outs they produce in critical situations, Carroll gains almost one win above average and Judge is almost one win worse than average. If their 2024 WAR accounted for this difference, Carroll would increase by more than 1 win above his 4 WAR and Judge’s 11 WAR would decline by more than 1 win. To a significant extent this is based on Judge’s proclivity for GIDPs and Carroll’s ability to avoid the DP. fWAR and bWAR are based on “an out is an out,” but out advancement runs is based on the type of out in various base-out states.
At the team level, the Detroit Tigers are the best on this metric, followed closely by Baltimore and Arizona. Those three teams each gain more than 1 win above their wins that are based on average outs. The Astros were 18th in out advancement runs in 2024, with -3.1 runs below average. (The ranking of teams is here.) It is unclear how much of the difference is randomness vs. repeatable skills and approach.
RE24 vs. wRAA
RE24 is a more comprehensive evaluation of run expectancy because it includes walks, hits, and home runs during each base-out state, as well as outs. I made a comparison of wRAA (runs above average) and RE24. This should be somewhat comparable to the ranking above (except including more than just outs). The difference between wRAA and RE24 should tell us whether teams’ actions in the 24 base out states under- or over-performs the metrics based on average run production. The Astros’ offense as measured by wRAA is ranked 7th, but the ranking declines to 10th based on RE24.
The difference between wRAA and RE24 suggests that the Astros’ “average runs” over stated the runs as measured on a base-out metric. The Astros’ RE24 difference is -10.9 runs or more than 1 win worse than the average linear weights measure for the Astros. The Astros’ differential is ranked 15th. This differential is similar in direction to the -3.9 runs for productive outs, but becomes a larger negative run value when hits and walks for the base-out states are included.
The comparison of RE24 and wRAA is useful in a descriptive sense. But does it tell us whether over- or under-performing the linear weights stat is repeatable? We don’t have a good answer. For instance, the Fangraphs article mentions that the Tigers and Yankees do not have favorable strike out factors overall, but that both teams reduce their strike outs and make more contact in critical situations. Is this randomness or do the teams change their approach in leverage situations? You could make an argument for either side of this debate. (The Astros’ K rate increased in high leverage situations, by the way.) As for RE24 vs. wRAA, my guess is that the causes of the difference fall somewhere in-between those views. Perhaps correlation with other variables can illuminate the issue.
Therefore, I examined the R-square statistic for three other variables which could have some explanatory power regarding the variation in the team differences between wRAA and RE24 in 2024. I compared speed and baserunning with men on base, ISO with men on base, BABIP with men on base, and BB/K with men on base to the differential between wRAA and RE24.
- The R-square for team speed or base running is approximately .05, which implies that team speed has only a minor explanatory effect. This is consistent with the .075 R-squared relationship between base running skill and productive outs found by the Fangraphs article.
- Power, as measured by ISO, has very little effect (R-squared of .02). Perhaps run expectancy with men on base is more about making contact than extra base power.
- BABIP with men on base has an R-squared of 0.15, suggesting that variation in BABIP may explain 15% of the variation. Since BABIP is notoriously volatile, this may point in the direction of random variation.
- I sometime use the ratio of walks to strike outs as a measure of plate discipline. BB/K with men on base has a more notable R-squared of 0.26, which may indicate that plate discipline can explain over 25% of the wRAA under- or over-performance. That makes sense. If a team’s players take a more patient approach with men on base, higher run expectancy could be the result.
This isn’t a definitive analysis—after all, we are examining only one season—but it gives us something to think about.
A Look at Individual Astros’ Players
Double plays are the worst kind of out. Yainer Diaz had the most DPs (22) of Astros’ hitters. However, a ratio based on plate appearances with men on base is a more appropriate comparison, since GIDP depend on the opportunities to hit into a DP.
- A ranking of notable higher than average GIDP percentages: (1) Dezenzo .09; (2) Diaz .08; (3) Meyers .07; (4) Altuve .066. A ranking of notable below average GIDP percentages: (1) Singleton .01; (2) Tucker .03; (3) Dubon .03; (4) Whitcomb .05.
As discussed above, the BB/K ratio with runners on base may explain why some teams have better run expectancy in critical situations.
- A ranking of notable higher than average BB/K percentages with runners on base (per PA): (1) Alvarez 1.54; (2) Tucker 1.36; (3) Singleton 0.54; (4) Bregman 0.45; (5) Altuve 0.44. Notable worse than average BB/K ratios per PA with runners on base: (1) Dezenzo .07; (2) Dubon .16; (3) McCormick .18; (4) Diaz .24.
- I didn’t include Jose Abreu on these lists, but if I had, he would have been the second worst for GIDP percentage and the worst at BB/K ratio with men on base. There may be a reason why the Astros’ runs scoring picked up after Abreu left the team.
- Both Whitcomb and Dezenzo are small sample size players, but I included them on the lists because it is possible that one or both young players will play some LF in 2025. Given the sample size, maybe it doesn’t tell us much about 2025. But it would appear that Dezenzo may need to work on his plate approach in critical situations, since he worst in both GIDP percentage and BB/K percentage with men on base.
- Not surprisingly, the Astros will miss Kyle Tucker’s superior approach with men on base. Hopefully Christian Walker and Isaac Paredes can make up for it.
This is a lot to digest…and all for today.