Random Thoughts from a Nonlinear Mind

The Evolution of the Baseball Player: Discovering Offensive Archetypes, 1954-2025

Baseball has changed enormously since the middle of the twentieth century.

Strikeouts have increased. Home-run rates have risen and fallen. Stolen bases have moved in and out of fashion. Relief pitching has become more specialized. Defensive positions have acquired different offensive expectations.

It is tempting, therefore, to divide baseball history into a sequence of distinct player types. The contact hitter belongs to one era. The base stealer belongs to another. The modern game belongs to the power hitter who walks frequently and strikes out even more frequently.

But is that really what happened?

Did one type of hitter replace another, or did the same basic offensive archetypes persist across changing statistical environments?

To explore that question, I used the Lahman baseball database to examine 9,218 qualified non-pitcher seasons from 1954 through 2025. I standardized each player against other qualified hitters in his own season, reduced the statistical profiles using principal component analysis, and then used clustering to identify recurring offensive archetypes.

The results reveal six recognizable types of offensive players:

Patient-contact hitters
Low-impact contact hitters
Elite power-and-patience hitters
Power-and-strikeout hitters
Aggressive free-swingers
Speed-and-contact hitters

Perhaps most importantly, the results suggest that baseball’s statistical environment has changed much more dramatically than its underlying distribution of player types.

The modern hitter may look different in the raw statistics. Relative to his contemporaries, however, he often occupies a role that has existed for generations.

Building the Historical Sample

I began with five Lahman tables:

Batting
People
Appearances
Fielding
Teams

Players who appeared for multiple teams during one season were combined into a single player-season record. I used the appearances data to assign each player a primary defensive position, defined as the position at which he appeared most frequently during that season.

Pitchers were excluded.

To account for seasons of different lengths, I defined a qualified season using the familiar standard of 3.1 plate appearances per scheduled team game.

Because teams occasionally played slightly different numbers of games, I used the median number of team games during each season:

PA_{i,y} \geq 3.1 \widetilde{G}_{y}

where:

\widetilde{G}_{y} = \operatorname{median}\left(G_y\right)

Here, $(\widetilde{G}_{y})$ represents the median number of team games played in season (y).

This method adjusts the qualification threshold for 154-game seasons, 162-game seasons, strike-shortened seasons, and the 60-game 2020 season.

The study begins in 1954 because the variables needed for the full model are not consistently complete before that date. Strikeouts become sufficiently complete before then, but caught stealing and sacrifice flies create additional limitations. Beginning in 1954 allows the same five-variable model to be used across the entire study period.

Figure 1. The number of qualified player-seasons generally increased as Major League Baseball expanded.

The increase in qualified seasons primarily reflects league expansion. The early portion of the study contains fewer than 100 qualified hitters in many seasons. By the late 1990s and early 2000s, the total frequently exceeded 150.

Measuring an Offensive Profile

I wanted to describe how a hitter produced offense, not merely how much offense he produced.

I therefore selected five variables:

On-base percentage
Isolated power
Walks per plate appearance
Strikeouts per plate appearance
Net stolen bases per plate appearance

Plate appearances were calculated as:

PA = AB + BB + HBP + SF + SH

On-base percentage was calculated as:

\mathrm{OBP} = \frac{ H + BB + HBP }{ AB + BB + HBP + SF }

Isolated power measures extra-base power beyond batting average:

\mathrm{ISO} = \mathrm{SLG} - \mathrm{AVG}

The walk and strikeout rates were:

\mathrm{BB/PA} = \frac{BB}{PA}

\mathrm{SO/PA} = \frac{SO}{PA}

Finally, I defined net stolen-base production as:

\mathrm{NetSB/PA} = \frac{ SB - CS }{ PA }

This final measure rewards successful steals while penalizing caught-stealing events.

I did not include runs or RBI in the clustering model. Both statistics are strongly affected by batting order, teammates, and opportunity. They tell us something about the results of a player’s season, but less about the underlying style with which he produced those results.

Home runs were also not included as a separate rate because ISO already captures power production. Adding both ISO and home runs per plate appearance would have given power disproportionate weight in the clustering.

Baseball’s Changing Offensive Environment

The raw statistics immediately demonstrate how much the offensive environment changed.

Figure 2. Mean offensive rates among qualified hitters, 1954-2025.

In 1954, the average qualified hitter in the sample had approximately:

\mathrm{OBP} = 0.355

\mathrm{ISO} = 0.151

\mathrm{SO/PA} = 0.088

By 2025, the corresponding values were approximately:

\mathrm{OBP} = 0.330

\mathrm{ISO} = 0.178

\mathrm{SO/PA} = 0.204

The average qualified hitter’s strikeout rate more than doubled.

Power generally increased, particularly during the offensive surge of the late 1990s and again during the home-run-heavy seasons of the late 2010s. Walk rates changed much less. On-base percentage remained within a fairly narrow historical range, although it rose noticeably around 2000 before declining.

Stolen-base production followed a different pattern. It increased during the 1970s and 1980s, declined during the power-oriented environment that followed, and has recently begun to rise again.

These changes create a serious comparison problem. A 20 percent strikeout rate would have been extraordinary during much of the twentieth century. In the modern game, it may be close to ordinary.

A player cannot be classified historically based solely on raw statistics.

Adjusting Every Player for His Era

To make the seasons comparable, I standardized each variable within its own season.

For player (i), season (y), and offensive measure (m):

z_{i,y,m} = \frac{ x_{i,y,m} - \overline{x}_{y,m} }{ s_{y,m} }

A value of: z=0 indicates that the player was equal to the seasonal average.

A value of: z=1 indicates that he was one standard deviation above the seasonal average.

This adjustment alters the analysis’s meaning. I am not asking whether a hitter had a high strikeout rate in absolute terms. I am asking whether he struck out frequently compared with the other qualified hitters of his own season.

A player from 1965 and a player from 2025 can therefore belong to the same archetype even though their raw statistics differ substantially.

They occupied the same relative position within their respective baseball environments.

Reducing the Offensive Dimensions

The five standardized variables remain related to one another. High-OBP players often walk frequently. Power hitters may also strike out frequently. Speed-oriented players tend to have different power and contact profiles.

I used principal-component analysis to summarize these relationships.

The first principal component explained: 42.5% of the total variation.

The second explained: 25.0%. Together, the first two components explained: 42.5% +25.0% = 67.5%

The first three components explained approximately: 86.2% of the total variation.

The first component primarily represents overall offensive force. OBP, ISO, and walk rate all load positively on this dimension. Players far to the right of the PCA plot tend to reach base, hit for power, and draw walks at rates well above their seasonal environments.

The second component separates high-power, high-strikeout hitters from lower-strikeout players with stronger contact or speed characteristics.

How Many Archetypes Are There?

Clustering always requires a choice about how much detail to preserve.

A model with too few clusters combines meaningfully different players. A model with too many clusters produces distinctions that may be statistically fragile or difficult to interpret.

The k-means procedure attempts to minimize the total squared distance between each player and the center of his assigned cluster:

\mathrm{WCSS} = \sum_{k=1}^{K} \sum_{i \in C_k} \left\lVert \mathbf{z}_i - \boldsymbol{\mu}_k \right\rVert^2

where:

C_k=\mathrm{cluster}\ k

$\mathbf{z}_i = \mathrm{standardized\ profile\ of\ player\!-\!season}\ i$

\boldsymbol{\mu}_{k} = \mathrm{center\ of\ cluster}\ k

I tested solutions containing four through nine clusters.

Figure 3. The silhouette score is highest for the four-cluster model, although the six-cluster model retains useful baseball distinctions.

The silhouette statistic compares the average distance between a player and his own cluster with the distance between that player and the nearest alternative cluster:

s(i) = \frac{ b(i)-a(i) }{ \max\left\{a(i),b(i)\right\} }

The four-cluster solution produced the strongest formal separation. However, it merged several historically meaningful offensive styles.

In particular, it tended to combine elite power-and-patience hitters with less complete power hitters, and it reduced distinctions between contact-oriented players.

I therefore selected the six-cluster solution.

This is an interpretive decision. The six clusters are not six perfectly isolated biological species. Baseball players exist along continuous statistical dimensions. The purpose of the clusters is to provide a useful map of that continuum.

Figure 4. The 9,218 player-seasons displayed in the space formed by the first two principal components.

The overlap in Figure 4 is important. The clusters are recognizable, but their boundaries are not absolute. A player near a boundary may resemble members of two neighboring archetypes.

The Six Offensive Archetypes

Figure 5 shows the standardized center of each cluster.

Figure 5. Each value represents the number of standard deviations above or below the seasonal mean.

1. Patient-Contact Hitters

Player-seasons: 2,190
Share of sample: 23.8 percent

The patient-contact group combines:

Above-average OBP
Above-average walk rates
Low strikeout rates
Approximately average power
Below-average stolen-base production

Its cluster center has an OBP score of:

z_{\mathrm{OBP}} = +0.55

and a strikeout-rate score of:

z_{\mathrm{SO/PA}} = -0.50

These hitters generally controlled the strike zone and put the ball in play. They were not necessarily powerless, but power was not the defining feature of the group.

The most statistically central examples include:

Justin Turner, 2022
Eric Hosmer, 2015
Edgardo Alfonzo, 1999

These are not necessarily the greatest seasons in the cluster. They are the seasons located closest to its statistical center.

2. Low-Impact Contact Hitters

Player-seasons: 1,925
Share of sample: 20.9 percent

This group also struck out infrequently, but without the OBP, walks, or power of the patient-contact cluster.

Its center was:

z_{\mathrm{OBP}} = -0.74

z_{\mathrm{ISO}} = -0.94

z_{\mathrm{BB/PA}} = -0.81

z_{\mathrm{SO/PA}} = -0.84

These hitters made contact, but much of that contact produced limited offensive value. Their low strikeout rates should not automatically be interpreted as evidence of superior hitting.

This distinction is important. Avoiding strikeouts is valuable only when the resulting balls in play produce enough hits, power, or advancement to compensate for the lost walks and extra-base production.

Central examples include:

Danny Bautista, 2004
Marlon Anderson, 2002
Melky Cabrera, 2007

3. Elite Power-and-Patience Hitters

Player-seasons: 851
Share of sample: 9.2 percent

This is the smallest cluster and the most offensively dominant.

Its center was approximately:

z_{\mathrm{OBP}} = +1.74

z_{\mathrm{ISO}} = +1.25

z_{\mathrm{BB/PA}} = +1.74

The cluster’s strikeout rate was almost exactly average relative to each season:

z_{\mathrm{SO/PA}} = +0.05

These hitters combined elite on-base ability with elite power and patience. Unlike the power-and-strikeout group, they did not require an exceptionally high strikeout rate to produce their power.

Central examples include:

Al Kaline, 1966
Kris Bryant, 2017
Ben Zobrist, 2009

The presence of players from widely separated eras is exactly what the season adjustment was designed to reveal. Their raw strikeout totals and league environments differed, but their relative offensive structures were similar.

4. Power-and-Strikeout Hitters

Player-seasons: 1,363
Share of sample: 14.8 percent

This group most closely resembles the familiar three-true-outcomes hitter.

Its defining characteristics were:

z_{\mathrm{ISO}} = +1.00

z_{\mathrm{BB/PA}} = +0.62

z_{\mathrm{SO/PA}} = +1.25

The group produced power and drew walks, but also struck out much more frequently than its seasonal peers.

Its OBP remained modestly above average:

z_{\mathrm{OBP}} = +0.25

Central examples include:

Jack Clark, 1979
Andruw Jones, 2002
Dale Murphy, 1986

This cluster existed long before the recent explosion in league-wide strikeouts. The modern environment made the raw statistical profile more common, but the relative archetype was already present.

5. Aggressive Free-Swingers

Player-seasons: 1,946
Share of sample: 21.1 percent

The aggressive free-swinging group had:

Below-average OBP
Below-average walk rates
Above-average strikeout rates
Approximately average power
Little baserunning contribution

Its center included:

z_{\mathrm{OBP}} = -0.83

z_{\mathrm{BB/PA}} = -0.66

z_{\mathrm{SO/PA}} = +0.66

Power was only slightly above the seasonal average:

z_{\mathrm{ISO}} = +0.06

This is an important contrast with the power-and-strikeout cluster. Both groups struck out frequently, but the aggressive free-swingers did not receive the same compensating power, walks, or OBP.

Central examples include:

Ollie Brown, 1969
Ryan Ludwick, 2009
Matt Williams, 1998

6. Speed-and-Contact Hitters

Player-seasons: 943
Share of sample: 10.2 percent

This was the most specialized cluster.

Its net stolen-base score was:

z_{\mathrm{NetSB/PA}} = +2.20

No other cluster approached that level.

The group also had:

z_{\mathrm{SO/PA}} = -0.32

and:

z_{\mathrm{ISO}} = -0.69

These players produced value through speed, contact, and mobility rather than power.

Central examples include:

Delino DeShields, 2000
Stan Javier, 1995
Marquis Grissom, 1994

Their average OBP was almost exactly equal to the seasonal mean. The cluster was not defined by superior hitting in the narrow sense. A distinctive combination of speed, contact, and limited power defined it.

Did the Archetypes Change Over Time?

This was the study’s central question.

The raw offensive environment changed dramatically. Figure 2 shows a major increase in strikeout rates, higher power levels, and shifting patterns of stolen bases.

The relative distribution of player archetypes was surprisingly stable.

Figure 6. Five-year moving percentages of qualified player-seasons assigned to each cluster.

Patient-contact hitters generally represented between approximately 20 and 28 percent of qualified seasons.

Low-impact contact hitters reached their highest levels during the 1960s and 1970s, then gradually declined. Their five-year moving share peaked near 26 percent in the mid-1970s and stood near 19 percent by 2025.

Power-and-strikeout hitters became somewhat more prevalent, reaching approximately 18 percent around 2012. Yet they never displaced the other archetypes.

Speed-and-contact hitters remained surprisingly consistent. Their moving share generally remained between approximately 8 and 12 percent, even though league-wide stolen-base environments changed considerably.

The elite power-and-patience group remained rare throughout the entire study. It accounted for approximately 8 to 11 percent of qualified seasons in most periods.

The 2020s did not produce a completely new distribution of player types. Compared with the partial 1950s sample, the modern distribution contains somewhat fewer low-impact contact hitters and a modestly larger share of power-and-strikeout hitters.

The basic architecture, however, remains recognizable.

This suggests that baseball evolution has operated on at least two levels.

At the first level, the statistical baseline changes. Strikeouts become more common. Power becomes more valuable. Stolen-base strategies change.

At the second level, players continue to occupy recurring roles relative to that baseline. Every era still contains:

Patient hitters
Free swingers
Power hitters
Speed specialists
Low-impact contact hitters
Rare players who combine several elite skills

The numbers change. The ecological niches persist.

Offensive Archetypes and Defensive Positions

The clusters were also closely connected to defensive position.

Figure 7. Primary defensive positions represented within each offensive archetype.

The elite power-and-patience cluster was concentrated at traditional offensive positions:

28.2 percent first basemen
17.7 percent right fielders
14.1 percent left fielders
12.5 percent third basemen

Shortstops represented only 2.1 percent of the cluster.

The power-and-strikeout group followed a similar pattern. First base, right field, and left field accounted for a large portion of those seasons.

The speed-and-contact cluster looked completely different:

32.1 percent center fielders
20.0 percent shortstops
19.7 percent second basemen
15.0 percent left fielders

Catchers and first basemen were almost absent.

The low-impact contact group was strongly concentrated in the middle infield:

27.9 percent shortstops
23.9 percent second basemen

This pattern reflects the interaction between offense and defensive value. A shortstop could remain in a lineup with limited power because his defensive position carried different offensive expectations. A first baseman generally needed much greater offensive production.

The archetypes are therefore not purely hitting categories. They also reflect the way teams distribute offensive and defensive responsibilities across the field.

What the PCA Map Really Shows

The PCA figure is not simply a picture of six boxes.

Instead, it shows a continuous offensive landscape.

The elite power-and-patience hitters occupy the high end of the first principal component because they combine OBP, power, and walks.

The power-and-strikeout hitters move upward on the second component because of their combination of ISO and strikeout rate.

The speed-and-contact hitters move in the opposite direction because their offensive identities are dominated by baserunning and lower power.

The patient-contact and low-impact contact groups overlap along the contact dimension, but separate sharply through OBP and walk rate.

This provides a useful reminder about classification. A player’s archetype is not his complete identity. It is a summary of how his season relates to thousands of other seasons in baseball history.

Some players sit close to a cluster center. Others occupy transitional areas between types.

Limitations

This study has several important limitations.

First, the model uses traditional Lahman statistics. It does not include park-adjusted measures such as wRC+, nor does it include Statcast measures such as exit velocity, barrel rate, launch angle, or sprint speed.

Second, the model treats each player-season as an independent observation. A player who qualified in 15 seasons appears 15 times. This is appropriate for studying the distribution of seasonal styles, but it gives durable players more influence than short-career players.

Third, the cluster labels are interpretations. The algorithm identifies groups of seasons that are statistically similar. It does not name those groups.

Fourth, the sample includes only qualified hitters. Part-time players, platoon specialists, defensive replacements, and many late-career seasons are excluded.

Fifth, seasonal standardization intentionally removes changes in the league-wide baseline. This is the correct approach for identifying relative archetypes, but it means the clusters should not be interpreted as absolute comparisons of offensive production.

A power-and-strikeout hitter from 1960 did not necessarily strike out as frequently as a member of the same cluster in 2025. He struck out frequently relative to the hitters around him.

Conclusion

I began this study expecting to find a succession of offensive types.

I expected contact hitters to dominate the early years, speed players to expand during the 1970s and 1980s, and power-and-strikeout hitters to overwhelm the modern period.

Some of those movements are visible, but the broader result is more interesting.

Baseball’s offensive environment changed enormously. Its fundamental player archetypes changed much less.

The low-strikeout hitter did not disappear. His raw strikeout rate simply rose with the league.

The power-and-strikeout hitter did not suddenly appear in the twenty-first century. Earlier versions existed within lower-strikeout environments.

The speed-and-contact player did not vanish during the power era. His share remained more stable than the raw stolen-base totals might suggest.

Perhaps the evolution of the baseball player is not a story of one species replacing another.

Perhaps it is a story of persistent roles adapting to a changing environment.

The game changes its equilibrium. The players reorganize themselves around it. Yet the same broad offensive strategies continue to reappear, season after season and generation after generation.

That continuity may be one of the most striking features of baseball history.

The Geography of Power: Home Runs by Primary Position

Introduction

Home runs are not evenly distributed across the diamond.

That is obvious at one level. First basemen and corner outfielders have historically been expected to hit for power. Middle infielders and catchers have often been evaluated based on a broader mix of defense, contact, arm strength, range, game-calling, and positional scarcity. Designated hitters exist almost entirely because of the bat.

But the obvious pattern is still worth measuring.

This chapter asks a simple question:

How have home runs historically been distributed by primary position?

To answer that, I built a player-season dataset from the Lahman-style batting and appearance files already used in this project. Each player-season was assigned to the position where the player appeared most often. Then I compared home run totals across primary positions.

The main version of the study uses regular player-seasons:

PA \geq 300

That cutoff matters because all player seasons include bench players, call-ups, pitchers, defensive replacements, and partial seasons. Those observations are part of baseball history, but they compress the distribution heavily toward zero. The 300-plate-appearance version gives a clearer view of regular players.

The result is consistent with baseball intuition, but the details are interesting.

Among regular player seasons, the highest median home run totals come from:

The lower median positions are:

Third base sits where we might expect it to sit: not quite a pure slugger position like first base, right field, left field, or designated hitter, but clearly more power-oriented than second base or shortstop.

That makes third base a bridge position. It carries defensive responsibility, but it has also historically demanded more power than the middle infield.

Data and Method

The unit of analysis is the player-season.

For each player-season, I summed the player’s batting record across all stints. Home runs came from the batting file:

HR_{i,y} = \sum_{t} HR_{i,y,t}

Where:

i = \text{player}

y = \text{season}

t = \text{team or stint}

Plate appearances were estimated from the available batting columns as:

PA = AB + BB + HBP + SF + SH

This is the same practical plate-appearance construction used elsewhere in the project.

Each player-season was then assigned a primary position using the appearances file. The primary position was the position at which the player appeared in the most games:

\mathrm{PrimaryPosition}_{i,y} = \operatorname*{arg\,max}_{p} \left( G_{i,y,p} \right)

Where:

G_{i,y,p} = \text{games played by player } i \text{ at position } p \text{ in season } y

The positions included were:

For the main batting-position figures, I excluded pitchers because pitcher seasons have a very different distribution and can compress the plot.

The main regular-player sample uses:

PA \geq 300

This produced the following number of regular player-seasons by batting position:

C: 2,650

1B: 3,273

2B: 3,273

3B: 3,213

SS: 3,149

LF: 3,262

CF: 3,331

RF: 3,259

DH: 646

The designated hitter sample is smaller because the position did not exist across the full historical record.

Why Use Box Plots?

Home run totals are skewed.

Most players do not hit 40 home runs. Many regulars hit fewer than 10. A smaller number of players produce the spectacular seasons that shape memory and record books.

A box plot is useful because it shows several parts of the distribution at once:

median

middle 50 percent

upper and lower spread

outlier seasons

The median is the central line in the box.

The box itself shows the interquartile range:

IQR = Q_3 - Q_1

Where:

Q_1 = \text{25th percentile}

Q_3 = \text{75th percentile}

The outliers show the extreme home run seasons. Those outliers matter because home run history is partly a history of extremes.

The box plot, therefore, helps separate typical power from exceptional power.

Figure 1: Regular Player-Seasons by Primary Position

Figure 1. Home runs by primary position, regular player-seasons.

The first figure is the main result.

It includes batting positions only and uses the regular-player cutoff:

PA \geq 300

The median home run totals are:

C: 7

1B: 12

2B: 5

3B: 9

SS: 4

LF: 10

CF: 7

RF: 11

DH: 18

The pattern is clear.

Designated hitters have the highest median:

\widetilde{HR}_{DH} = 18

First basemen follow:

\widetilde{HR}_{1B} = 12

Right fielders and left fielders are next:

\widetilde{HR}_{RF} = 11

\widetilde{HR}_{LF} = 10

Third basemen sit just behind the corner outfielders:

\widetilde{HR}_{3B} = 9

The middle infield positions are lower:

\widetilde{HR}_{2B} = 5

\widetilde{HR}_{SS} = 4

This is the historical power spectrum in compact form.

The right side of the defensive spectrum contains more home runs. The middle of the diamond contains fewer.

The Third-Base Position in Context

The third-base result is especially relevant to the larger project.

Third base has a median of 9 home runs among regular player-seasons:

\widetilde{HR}_{3B} = 9

Its mean is:

\overline{HR}_{3B} = 11.24

Its 90th percentile is:

P_{90,3B} = 26

Its 95th percentile is:

P_{95,3B} = 32

That means a 26-home-run season by a regular third baseman sits around the 90th percentile historically, while a 32-home-run season sits around the 95th percentile.

This helps frame the third-base offensive studies. A third baseman does not need to hit like a first baseman or designated hitter to be power-relevant at the position. But third base has historically demanded more power than second base or shortstop.

In that sense, third base is not a pure defensive position and not a pure slugger position. It sits between worlds.

Summary Table: Regular Player-Seasons

The main regular-player summary is:

Position Mean HR Median HR 90th Percentile Max
C 9.20 7 20 60
1B 14.68 12 31 70
2B 6.89 5 16 45
3B 11.24 9 26 54
SS 6.66 4 17 57
LF 12.47 10 27 73
CF 10.07 7 24 62
RF 13.19 11 28 66
DH 19.48 18 33 56

The mean is higher than the median at every position. That reflects the right-skewed nature of home run totals:

\overline{HR}_p > \widetilde{HR}_p

Where:

\overline{HR}_p = \text{mean home runs at position } p

\widetilde{HR}_p = \text{median home runs at position } p

This skew is exactly why box plots are useful. The median shows the typical regular. The outliers show the historical power peaks.

Figure 2: All Player-Seasons

Figure 2. Home runs by primary position, all player-seasons.

The second figure includes all player-seasons, regardless of plate appearances.

This plot answers a different question.

Instead of asking what regular players do, it asks what the full population of player-seasons looks like.

The result is much more compressed toward zero.

That is expected.

All player-seasons include players who appeared briefly, bench players, injury-shortened seasons, late-season call-ups, defensive specialists, and players with very few batting opportunities. Many of these seasons have few or no home runs.

This is why the regular-player cutoff is important.

The all-player plot is useful because it shows the structure of the full record. But the regular-player plot is better for comparing positional expectations.

The difference between Figure 1 and Figure 2 is a methodological lesson:

Home run distributions depend heavily on playing-time filters.

That is not a flaw. It is part of the phenomenon.

Figure 3: Including Pitchers

Figure 3. Home runs by primary position, regular player-seasons including pitchers.

The third figure adds pitchers to the regular-player sample.

There are only 49 pitcher seasons with at least 300 estimated plate appearances in this dataset. That is a tiny sample compared with the defensive positions.

Pitchers have a very low home run distribution:

P_n = 49

The pitcher box is compressed near the bottom of the plot.

This is why pitchers are excluded from the main batting-position comparison. Pitchers are not just another defensive position in this context. Historically, they had a very different offensive role.

Including them is still useful as a reminder of how specialized baseball’s offensive expectations have become.

The pitcher comparison also shows why the designated hitter changed the structure of the game. The DH role separated the hitting function from the pitching function.

Figure 4: Before 1994

Figure 4. Home runs by primary position, regular player-seasons before 1994.

The fourth figure examines regular player seasons before 1994.

This view helps separate the long historical baseline from the more recent high-power period.

Before 1994, the positional pattern is still visible:

1B, LF, RF, and DH are higher-power positions.

2B and SS are lower-power positions.

3B sits in the middle-to-upper range.

However, the distributions are generally lower than in the modern period.

That is especially visible in the medians and upper tails. The tops of the boxes and whiskers are lower for most positions.

This matters because the modern home run environment can distort our intuition. If we only look at recent baseball, we may overestimate what a typical power season historically looked like.

The pre-1994 plot keeps the longer history visible.

Figure 5: 1994–2025

Figure 5. Home runs by primary position, regular player-seasons, 1994–2025.

The fifth figure focuses on the modern high-power period from 1994 through 2025.

Here the distributions shift upward.

The median home run totals are visibly higher at most positions than in the pre-1994 plot.

This is particularly clear at positions that historically carried lower power expectations. Second base, shortstop, catcher, and center field all show more power in the modern period than in the older baseline.

The modern plot also shows how the distinction between positions has narrowed in some ways. Shortstops and second basemen now have more home run upside than they did historically, though they still remain below first base and designated hitter in the central distribution.

Third base remains a power-relevant position. In the 1994–2025 period, the third-base distribution shifts upward, reflecting the broader increase in home run production across baseball.

Maximum Home Run Seasons by Primary Position

The maximum regular-player home run seasons by primary position are:

C: Cal Raleigh, 2025, 60 HR

1B: Mark McGwire, 1998, 70 HR

2B: Marcus Semien, 2021, 45 HR

3B: Alex Rodriguez, 2007, 54 HR

SS: Alex Rodriguez, 2002, 57 HR

LF: Barry Bonds, 2001, 73 HR

CF: Aaron Judge, 2022, 62 HR

RF: Sammy Sosa, 1998, 66 HR

DH: Kyle Schwarber, 2025, 56 HR

The maximum values show how different the upper tail can be from the median.

For third base:

\max(HR_{3B}) = 54

The median regular third-base season is:

\widetilde{HR}_{3B} = 9

So the highest third-base season is six times the median:

\frac{ \max(\mathrm{HR}_{\mathrm{3B}}) }{ \widetilde{\mathrm{HR}}_{\mathrm{3B}} } = \frac{54}{9} = 6

This is the shape of home run history. The typical distribution matters, but the record book is made in the tails.

Positional Power Spectrum

The regular-player medians produce a rough power spectrum:

DH > 1B > RF > LF > 3B > C/CF > 2B > SS

This is not a law. It is an empirical summary of the historical data.

It shows that home run expectations follow the defensive spectrum, but imperfectly.

First base and corner outfield positions carry major power expectations. Middle infield positions carry lower typical power expectations. Third base sits between them.

That makes sense. Third base is a reaction position, an arm-strength position, and historically a position where teams have often accepted more offensive responsibility than at shortstop or second base.

Why Third Base Is Interesting

This chapter began as a positional home run study, but it also helps explain why third base is such a useful position for the larger project.

Third base is not first base. It is not a position where the bat alone defines most of the historical expectation.

But it is also not shortstop or second base. The position has carried real offensive expectations for a long time.

This makes third base analytically rich.

A third baseman can be great through power.
A third baseman can be great through defense.
A third baseman can be great through a two-way profile.
A third baseman can be average in one dimension and exceptional in another.

That is why the earlier z-score, WAR, and wRC+ studies were useful. Third base demands a multidimensional approach.

The home run box plots confirm the same idea from a different angle.

Third base lives in the middle of the power spectrum.

Limitations

This study uses primary position by games played. That is practical and transparent, but it simplifies players who split time across multiple positions.

A player-season is assigned to only one position, even if the player spent substantial time elsewhere.

The plate appearance formula is estimated from available batting columns:

PA = AB + BB + HBP + SF + SH

That is a reasonable construction, but it may differ slightly from official plate appearance totals in some historical contexts.

The 300-PA cutoff is also a choice. A different cutoff, such as 250 or 400 PA, would change the sample slightly. The broad pattern should remain, but the exact medians and percentiles would move.

Finally, the designated hitter is not available across the full historical period. It should be interpreted separately from long-running defensive positions.

Conclusion

Home runs have never been distributed evenly across the diamond.

The regular-player box plots show a clear positional structure. Designated hitters, first basemen, right fielders, left fielders, and third basemen occupy the higher-power part of the spectrum. Second basemen and shortstops occupy the lower-power part. Catchers and center fielders sit in between, with major outlier seasons but lower typical medians.

For regular player-seasons:

DH: 18 median HR

1B: 12 median HR

RF: 11 median HR

LF: 10 median HR

3B: 9 median HR

C: 7 median HR

CF: 7 median HR

2B: 5 median HR

SS: 4 median HR

Third base sits in a revealing place.

It is not the highest-power position. It is not the lowest. It is a position where power matters, but not alone.

That is why the position works so well for this larger study.

Third base is a bridge between offensive and defensive expectations.

The home run distributions show that clearly.

Postscript: Where the Home Runs Came From

The box plots in this chapter show the distribution of home run seasons by position. They answer a question about typical and exceptional seasons:

What does a home run season usually look like at each position?

But there is another question worth asking:

Where did the historical home run volume come from?

To answer that, I summed all home runs by primary position.

\mathrm{TotalHR}_{p} = \sum_{i,y} HR_{i,y,p}

Where:

p = \text{primary position}

The first postscript figure uses all player-seasons from 1871 through 2025.

Postscript Figure 1. Total home runs by primary position, all player-seasons, 1871–2025.

The largest total comes from first base:

\mathrm{TotalHR}_{1B} = 56{,}082

Right field and left field follow closely:

\mathrm{TotalHR}_{RF} = 50{,}858

\mathrm{TotalHR}_{LF} = 49{,}898

Third base ranks fourth:

\mathrm{TotalHR}_{3B} = 42{,}140

This reinforces the larger pattern. The historical home run supply has come disproportionately from the corners: first base, corner outfield, and third base.

The second postscript figure uses only regular player-seasons:

PA \geq 300

Postscript Figure 2. Total home runs by primary position, regular player-seasons, PA ≥ 300, 1871–2025.

The regular-player version tells the same basic story. First base remains first:

\mathrm{TotalHR}_{1B} = 48{,}064

Right field remains second:

\mathrm{TotalHR}_{RF} = 42{,}977

Left field remains third:

\mathrm{TotalHR}_{LF} = 40{,}663

Third base remains fourth:

\mathrm{TotalHR}_{3B} = 36{,}127

The designated hitter is the important caution. DH has the highest median home run total among regular player-seasons, but it does not rank near the top in total historical home runs because it is not present across the full 1871–2025 record.

Keep that distinction in mind.

The box plots show rate and distribution.
The bar charts show accumulated historical volume.

They are related, but they are not the same thing.

A position can have a high typical power profile but a smaller historical total if it has fewer seasons in the record. That is exactly what happens with designated hitters.

Third base, by contrast, has both a long historical presence and a meaningful power profile. It does not lead the home run record, but it sits clearly among the power-producing positions.

That is the key postscript result:

First base produced the most historical home run volume.

Corner outfield followed.

Third base remained a major source of home runs.

Designated hitter had the strongest power profile but a shorter historical runway.

Three Measures of Third-Base Greatness: Z-Scores, WAR, and wRC+

Introduction

At this point in the third-base study, we have three different ways of measuring greatness.

The first is our own z-score framework. It asks how far a player separated from other third basemen in the same season and across a career.

The second is WAR. It asks how much total value a player produced, including offense, defense, baserunning, position, replacement level, and playing time.

The third is wRC+. It asks how strong a hitter was after adjusting for league and park context, with 100 set as league average.

Each measure is useful.

Each measure answers a different question.

That is why this comparison matters.

A player can dominate by z-score because he separates from his third-base peers. A player can dominate by WAR because he accumulates value across many seasons. A player can dominate by wRC+ because his offensive rate quality is extraordinary.

The goal of this chapter is not to declare that one metric is correct and the others are wrong.

The goal is to compare the stories they tell.

For Study 1, I focused on regular third basemen with at least five qualified third-base seasons and matched values for career WAR, career wRC+, and our career z-score measures. That produced a working sample of 239 third basemen.

The central question is:

Which third basemen remain elite when judged by z-scores, WAR, and wRC+ together?

The answer begins with Mike Schmidt.

Across the three-metric composite, Schmidt is the clear anchor of the study. He ranks first in combined z-score, first in WAR, and second in wRC+. He is the player who survives every test.

But the rest of the list is more interesting than a simple ranking.

Eddie Mathews, Chipper Jones, Wade Boggs, George Brett, Home Run Baker, Alex Rodriguez, Ron Santo, Scott Rolen, and Jose Ramirez also emerge as strong cross-metric performers.

At the same time, several players reveal the tension between the metrics.

Brooks Robinson ranks extremely high by combined z-score and WAR, but much lower by wRC+. Dick Allen ranks first by wRC+, but much lower by combined z-score and WAR. Adrian Beltre ranks third by WAR but much lower by wRC+.

Those differences are not problems.

They are the point of the chapter.

The Three Metrics

This study compares three broad dimensions: Combined career z-score, career WAR, and career wRC+.

The combined z-score is internal to this project. WAR and wRC+ are external validation measures.

The three metrics are not interchangeable.

They measure different things.

The Combined Z-Score

The combined z-score is based on two career components: Model C offensive career score and Traditional defensive career score.

Each career score is standardized across the third-base regular sample.

The standardized offensive score is:

z_{\mathrm{Offense},i} = \frac{ \mathrm{Offense}_{i} - \overline{\mathrm{Offense}} }{ s_{\mathrm{Offense}} }

The standardized defensive score is:

z_{\mathrm{Defense},i} = \frac{ \mathrm{Defense}_{i} - \overline{\mathrm{Defense}} }{ s_{\mathrm{Defense}} }

The combined z-score is:

\mathrm{Combined\ Z}_{i} = z_{\mathrm{Offense},i} + z_{\mathrm{Defense},i}

This score rewards players who separate from other third basemen in both offensive and traditional defensive dimensions.

It is not the same thing as WAR.

It does not directly assign run values. It does not use replacement level. It does not use park factors in the way WAR or wRC+ does. It is a peer-separation measure.

That is its strength.

It asks:

How far did this third baseman stand from the position?

WAR

WAR is a broader value metric.

In this chapter, WAR is used as a career value measure for the player’s qualified third-base seasons in our merged dataset. WAR includes offense, defense, baserunning, positional value, replacement value, and playing time.

For the purpose of this chapter, we can think of WAR abstractly as:

\begin{aligned} \mathrm{WAR}_i &= \mathrm{Offense}_i + \mathrm{Defense}_i + \mathrm{Baserunning}_i \\ &\quad+ \mathrm{Position}_i + \mathrm{Replacement}_i \end{aligned}

That is not intended as a full WAR formula. It is a conceptual summary.

WAR asks a different question from the z-score model.

It asks:

How much total value did this player produce?

That is why Adrian Beltre, Wade Boggs, Brooks Robinson, Scott Rolen, Graig Nettles, and Buddy Bell can look stronger by WAR than they do by wRC+ alone.

WAR values more than hitting.

wRC+

wRC+ is an offensive rate measure.

It is scaled so that 100 is league average:

wRC^+ = 100

A hitter with a 120 wRC+ is roughly 20 percent better than league average offensively:

wRC^+ = 120

A hitter with an 80 wRC+ is roughly 20 percent below league average offensively:

wRC^+ = 80

For this study, wRC+ answers a narrower question:

How good was the hitter?

It does not measure third-base defense. It does not measure total career value. It does not reward playing third base well. It isolates offensive rate quality.

That is why Dick Allen can rank first in wRC+ without ranking first in the other systems.

Why a Composite Score Is Useful

Because the three measures use different scales, we cannot simply add raw z-score, WAR, and wRC+.

Instead, I converted each metric into a percentile rank.

For each player:

P_{\mathrm{Combined\ Z},i} = \mathrm{PercentileRank}(\mathrm{Combined\ Z}_i)

P_{\mathrm{WAR},i} = \mathrm{PercentileRank}(\mathrm{WAR}_i)

P_{wRC^+,i} = \mathrm{PercentileRank}(wRC^+_i)

Then I calculated a three-metric composite percentile:

\mathrm{Composite}_{i} = \frac{ P_{\mathrm{Combined\ Z},i} + P_{\mathrm{WAR},i} + P_{wRC^+,i} }{3}

Higher values indicate players who rank well across all three systems.

This composite is not meant to replace the individual metrics. It is a summary tool.

It rewards broad agreement.

A player who ranks high in all three metrics will rise. A player who is exceptional in one metric but weaker in the others will still be visible, but not necessarily at the top of the composite.

That is why this study is useful.

It separates all-around consensus from metric-specific greatness.

Figure 1: The Top 25 Composite Performers

Figure 1. Top third basemen across combined z-score, WAR, and wRC+.

The top 25 composite chart gives the broadest view of the results.

The top ten are:1. Mike Schmidt 2. Eddie Mathews 3. Chipper Jones 4. Wade Boggs 5. George Brett 6. Home Run Baker 7. Alex Rodriguez 8. Ron Santo 9. Scott Rolen 10. Jose Ramirez.

It is not simply an offensive list. Brooks Robinson does not reach the top ten because wRC+ pulls him down, but the list still includes two-way and value-based players such as Wade Boggs, Ron Santo, and Scott Rolen.

It is not simply a WAR list either. Adrian Beltre ranks third by WAR, but thirteenth by the composite because his wRC+ rank is lower than his WAR and z-score ranks.

It is not simply a wRC+ list. Dick Allen ranks first by wRC+, but he does not land near the top of the composite because his combined z-score and WAR ranks are lower.

The top composite list rewards players who remain strong across the different definitions of greatness.

That is why Schmidt is first.

He is not merely great by one method. He is great by all three.

The Top Players by Each Metric

The top players change depending on the question.

By combined z-score, the top five are: 1. Mike Schmidt 2. Brooks Robinson 3. Nolan Arenado 4. Scott Rolen 5. Wade Boggs.

This list rewards two-dimensional separation. Robinson and Arenado rise because traditional defense is included.

By WAR, the top five are: 1. Mike Schmidt 2. Eddie Mathews 3. Adrian Beltre 4. Wade Boggs 5. Brooks Robinson.

This list rewards total value and career accumulation.

By wRC+, the top five are: 1. Dick Allen 2. Mike Schmidt 3. Eddie Mathews 4. Harmon Killebrew 5. John McGraw.

This list rewards offensive rate quality.

These are three different lists because they are answering three different questions.

The question is not which list is correct.

The question is what each list reveals.

Figure 2: Rank Movement Across the Three Systems

Figure 2. How the top composite third basemen rank by combined z-score, WAR, and wRC+.

The rank-comparison figure shows how players move across the three measures.

Mike Schmidt barely moves. That is the signature of a consensus number one. His profile is not dependent on one definition of value.

Eddie Mathews is similarly strong. He ranks high in WAR and wRC+, and still remains strong in the combined z-score system.

Chipper Jones is also stable. His defensive score is not strong, but his offensive value is so high that he remains near the top.

The movement becomes more interesting with players like Adrian Beltre, Nolan Arenado, and Scott Rolen.

Beltre ranks extremely high by WAR but much lower by wRC+. That makes sense. His case is not purely about offensive rate. It is about durability, defense, and total value.

Arenado ranks very high by combined z-score but much lower by wRC+. Again, that makes sense. His profile is two-dimensional and defense-forward.

Rolen is a balanced case. He ranks very high by combined z-score and WAR but lower by wRC+. That reflects his two-way value.

This figure shows that the metrics are not redundant.

They overlap, but they do not tell the same story.

Figure 3: Combined Z-Score Versus WAR

Figure 3. Combined career z-score versus career WAR among third-base regulars.

The combined z-score and WAR relationship is strong.

The fitted line is:

\mathrm{WAR} = 15.98 + 6.69(\mathrm{Combined\ Z})

The model fit is:

R^2 = 0.782

That means the combined z-score explains a large share of the variation in career WAR among third-base regulars.

This is important.

It tells us that our z-score framework is not just an internal ranking system. It aligns strongly with a major external value metric.

But the scatterplot also shows meaningful differences.

Mike Schmidt sits at the upper-right extreme. His combined z-score and WAR both identify him as historically exceptional.

Brooks Robinson sits high in combined z-score and WAR, but his shape is different. His combined z-score is powered by traditional defense rather than offensive dominance.

Adrian Beltre sits higher in WAR than his combined z-score alone would predict. That suggests his total value, longevity, and broader WAR components are stronger than the simplified z-score model fully captures.

Nolan Arenado sits high in combined z-score but lower in WAR relative to the line. That may reflect career length, active-career status, or differences between traditional defensive separation and WAR’s defensive valuation.

The relationship is strong, but the residuals still matter.

They show where the systems disagree.

Figure 4: Offensive Z-Score Rate Versus wRC+

Figure 4. Average offensive z-score per qualified third-base season versus career wRC+.

The relationship between average offensive z-score and wRC+ is also strong.

The fitted line is:

wRC^+ = 100.89 + 5.41(\mathrm{Average\ Offensive\ Z})

The fit is:

R^2 = 0.740

This confirms the earlier wRC+ validation result.

Average offensive z-score is a strong predictor of wRC+ because both are measuring offensive quality, though in different ways.

The equation says that each additional point of average offensive z-score corresponds to about 5.41 additional points of career wRC+:

\beta_1 = 5.41

This is why the offensive names rise in this figure.

Dick Allen, Mike Schmidt, Eddie Mathews, Chipper Jones, Alex Rodriguez, George Brett, Home Run Baker, Wade Boggs, Al Rosen, and David Wright all appear as strong offensive profiles.

Brooks Robinson, by contrast, is much closer to the middle of the wRC+ distribution. That is not a criticism. It simply reflects that Robinson’s greatness is not primarily a wRC+ case.

That is exactly why this comparison matters.

Figure 5: Cross-Metric Rank Disagreements

Figure 5. Largest cross-metric rank disagreements among notable third basemen.

The disagreement chart is one of the most useful figures in the study.

It identifies players whose rankings differ sharply across combined z-score, WAR, and wRC+.

The rank spread is:

\begin{aligned} \mathrm{RankSpread}_{i} &= \max\left( r_{\mathrm{CombinedZ},i}, r_{\mathrm{WAR},i}, r_{\mathrm{wRC}^{+},i} \right) \\ &\quad- \min\left( r_{\mathrm{CombinedZ},i}, r_{\mathrm{WAR},i}, r_{\mathrm{wRC}^{+},i} \right) \end{aligned}

A large spread means the player looks very different depending on the metric.

Some of the most interesting disagreement cases are: Brooks Robinson, Dick Allen, Adrian Beltre, Nolan Arenado, Willie Kamm, Gary Gaetti, Harmon Killebrew, Edwin Encarnacion, Jim Ray Hart, and Deacon White.

These players are not mistakes in the data.

They are interpretive opportunities.

Brooks Robinson is a defensive and WAR giant, but not a wRC+ giant.

Dick Allen is an offensive-rate giant, but not a top combined z-score or WAR third-base regular in this framework.

Adrian Beltre is a WAR giant, but wRC+ does not fully capture his case.

Willie Kamm is extremely strong by the combined z-score framework because of traditional defense, but he is not similarly high by wRC+.

Edwin Encarnacion is much stronger by wRC+ than by third-base z-score or WAR within the third-base framework, partly because his career offensive identity extends beyond a long regular third-base profile.

The disagreement chart shows why a single number is not enough.

The Schmidt Result

Mike Schmidt is the central result of Study 1.

He ranks: Combined z-score rank: 1 WAR rank: 1 wRC+ rank: 2 Composite rank: 1

This is almost the perfect cross-metric profile.

Schmidt is not merely the best by our internal model. He is also the best by WAR and nearly the best by wRC+.

That matters because it means his result is robust.

He is not a product of one method.

He is the player who remains elite when the question changes.

If the question is peer separation, Schmidt wins.

If the question is total value, Schmidt wins.

If the question is offensive rate quality, Schmidt is still almost at the top.

That is the strongest possible case.

Eddie Mathews, Chipper Jones, and the Offensive Greatness Group

Eddie Mathews ranks second by the composite.

He ranks: Combined z-score rank: 6 WAR rank: 2 wRC+ rank: 3 Composite rank: 2.

That is a very strong cross-metric profile. Mathews does not have Schmidt’s complete separation, but he remains elite everywhere.

Chipper Jones ranks third by the composite: Combined z-score rank: 8 WAR rank: 6 wRC+ rank: 6 Composite rank: 3.

Chipper’s case is offense-forward. His traditional defensive component is not strong, but his offensive quality is so high that he remains elite across the systems.

George Brett and Home Run Baker also belong in this broad offensive greatness group. They are strong by wRC+, strong by WAR, and strong enough by combined z-score to remain near the top.

This group shows that offensive greatness can carry a third-base profile a long way.

Wade Boggs and the On-Base Profile

Wade Boggs ranks fourth by the composite: Combined z-score rank: 5 WAR rank: 4 wRC+ rank: 14 Composite rank: 4.

Boggs is a fascinating case because he is not a home-run power archetype. His greatness is built around contact, on-base skill, batting average, plate discipline, and sustained offensive quality.

The fact that he ranks so highly in the composite confirms that the model is not simply rewarding slugging power.

Boggs was a different kind of offensive star, and the metrics recognize it.

Rolen, Beltre, Arenado, and Two-Way Value

Scott Rolen, Adrian Beltre, and Nolan Arenado show why WAR and combined z-score are necessary companions to wRC+.

Rolen ranks: Combined z-score rank: 4 WAR rank: 10 wRC+ rank: 34 Composite rank: 9.

Beltre ranks: Combined z-score rank: 9 WAR rank: 3 wRC+ rank: 58 Composite rank: 13.

Arenado ranks: Combined z-score rank: 3 WAR rank: 15 wRC+ rank: 57 Composite rank: 15.

These are not weak wRC+ players. But their all-time third-base cases are not primarily wRC+ cases.

They are two-way cases.

Rolen is balanced. Beltre is a total-value and longevity case. Arenado is a defense-forward combined z-score case.

If this study used only wRC+, these players would be underrated.

If it used only WAR, their offensive shape would be less visible.

If it used only z-score, the relationship to broader value would be less clear.

The three-metric comparison gives the fuller picture.

Brooks Robinson and the Limits of wRC+

Brooks Robinson is the clearest example of a player whose greatness is not offensive-rate greatness.

He ranks: Combined z-score rank: 2 WAR rank: 5 wRC+ rank: 114.

That is a huge split.

It makes perfect sense.

Robinson’s historical case is not based on being one of the greatest offensive third basemen. It is based on defense, durability, and total value.

The combined z-score model sees him because traditional defense is included. WAR sees him because total value includes defense. wRC+ does not see him in the same way because wRC+ is an offensive metric.

That is not a flaw in wRC+. It is a reminder that wRC+ answers a narrower question.

Dick Allen and the Limits of Third-Base Accumulation

Dick Allen is the opposite case.

He ranks: wRC+ rank: 1 Combined z-score rank: 87 WAR rank: 51.

Allen’s offensive rate quality is extraordinary. But within this third-base regular framework, he does not accumulate the same kind of third-base-specific z-score or WAR profile as Schmidt, Mathews, Chipper, Boggs, or Brett.

This shows the difference between a great hitter who played third base and a great third baseman across the entire profile. That is an important distinction.

Allen is not diminished by this result. The study simply clarifies what kind of greatness he represents.

He is a wRC+ giant. He is not the top all-around third-base regular by the three-metric composite.

Why This Study Is Interesting

The value of Study 1 is that it prevents the project from becoming metric-dependent.

If Schmidt ranked first only by our z-score model, the conclusion would be interesting but narrower.

But Schmidt also ranks first by WAR and second by wRC+. That makes the conclusion much stronger.

At the same time, the disagreements prevent the chapter from becoming too simple.

Brooks Robinson, Dick Allen, Adrian Beltre, Nolan Arenado, Rolen, Boggs, and others show that greatness has different forms.

The study therefore supports two conclusions at once: 1. Mike Schmidt is the clearest cross-metric third-base anchor. 2. Different metrics reveal different kinds of third-base greatness.

Both points are important.

Limitations

This chapter uses regular third basemen with at least five qualified third-base seasons and matched values across the three systems. That makes the comparison cleaner, but it also means the study is focused on third-base regulars, not every player who ever appeared at third base.

The combined z-score uses this project’s offensive Model C and traditional defensive model. It does not include modern defensive metrics, park factors, or full run-value modeling.

WAR includes many components that the z-score model does not.

wRC+ is a rate statistic and should not be treated as an accumulated career value measure. That is why the study includes average offensive z-score per qualified season when comparing to wRC+.

The composite percentile score is a summary tool. It is not a new definitive metric. It is best used to identify players who remain strong across multiple systems.

Conclusion

Study 1 compares three ways of measuring third-base greatness: Combined z-score, WAR, and wRC+

The main result is clear.

Mike Schmidt is the strongest cross-metric third baseman in the study.

He ranks first by combined z-score, first by WAR, second by wRC+, and first by the three-metric composite.

Eddie Mathews, Chipper Jones, Wade Boggs, George Brett, Home Run Baker, Alex Rodriguez, Ron Santo, Scott Rolen, and Jose Ramirez also emerge as strong cross-metric performers.

But the disagreements are just as important.

Brooks Robinson shows that wRC+ cannot capture defensive greatness.

Dick Allen shows that offensive rate greatness is not the same as all-around third-base accumulation.

Adrian Beltre, Nolan Arenado, and Scott Rolen show the importance of two-way value.

The larger conclusion is this:

Third-base greatness is not one-dimensional.

Z-scores show peer separation.
WAR shows total value.
wRC+ shows offensive quality.

The best third basemen are the ones who remain visible when the lens changes.

By that standard, Mike Schmidt stands at the center of the argument.

The Shape of a Pitching Staff: A Dendrogram of MLB Team Pitching, 2001- 2025

The earlier chapter asked a ranking question: which MLB organizations built the best pitching staffs from 2001 to 2025?

This post asks a slightly different question.

Which teams pitched alike?

That is not the same thing. Two teams can both be good without being similar. One team might dominate through strikeouts and fielding-independent indicators. Another might prevent runs through contact control, ground balls, park fit, or bullpen management. A third might have a mixed profile, with decent run prevention but weaker underlying indicators. Ranking tells us who was best. Clustering tells us who had the same shape.

That is why a dendrogram is useful. It does not begin with a leaderboard. It begins with resemblance.

The question becomes: if we describe each franchise by its long-term pitching profile, which franchises naturally group together?

The method

Each franchise was described using season-normalized pitching variables from 2001 through 2025. That step is important because pitching environments changed dramatically during this period. A 4.00 ERA in 2001 does not mean exactly the same thing as a 4.00 ERA in 2025.

So each team-season was first compared to its own season.

z_{i,y,m} = \frac{ X_{i,y,m} - \overline{X}_{y,m} }{ s_{y,m} }

Here, $(X_{i,y,m})$ is team (i)’s value for metric (m) in season (y), $(\bar{X}_{y,m})$ is the league average for that metric in that season, and $(s_{ym})$ is the season standard deviation.

For lower-is-better metrics, such as ERA-, FIP-, xFIP-, SIERA, BB%, HR/9, HR/FB, and Hard%, I reversed the sign so that higher values always mean a more favorable pitching profile.

The clustering used these long-term franchise traits:

Category	Variables
Value	WAR
Dominance	K-BB%
Command	BB% prevention
Run prevention	ERA-
Fielding-independent skill	FIP-, xFIP-, SIERA
Home-run control	HR/9, HR/FB
Contact profile	GB%, Hard% prevention
Starter usage	Quality-start rate

After calculating each franchise’s average profile, I standardized the franchise-level variables and used Ward hierarchical clustering. The distance between teams is based on how far apart their standardized pitching profiles are.

d(i,j) = \sqrt{ \sum_{m=1}^{p} \left( z_{i,m} - z_{j,m} \right)^2 }

The dendrogram then links the most similar teams first and gradually joins them into larger groups.

Figure 1. MLB team pitching identity dendrogram, 2001-2025

The first thing to notice is that the dendrogram is not only a quality ranking. It does separate many of the best pitching organizations, but it also captures style.

The Dodgers, Yankees, Astros, Guardians, Phillies, Cubs, and Red Sox form a major cluster. That makes sense. These organizations score well across the strongest modern pitching indicators: WAR, K-BB%, FIP-, xFIP-, and SIERA. This is the elite skill-and-value cluster.

But the Braves, Giants, and Cardinals form a different group. They are not grouped with the Dodgers and Yankees, even though they include strong pitching organizations. Their similarities lie more in run prevention, home-run suppression, ground-ball tendency, and contact management. In other words, their profile is not merely “good pitching.” It is a particular kind of good pitching.

The Rays, Brewers, Padres, Blue Jays, and Diamondbacks form another interesting group. This cluster is more modern and peripheral-driven. These teams tend to show some strength in strikeout-minus-walk skill and xFIP/SIERA-style indicators, but they are not as dominant in long-term value or run prevention as the elite group.

At the other end, the Rockies, Royals, Orioles, Reds, and Rangers group together as long-term struggling pitching profiles. That does not mean each franchise was bad every year. It means that across the full 2001-2025 window, their average profile shares several weaknesses: lower WAR, weaker K-BB%, weaker fielding-independent indicators, and poorer home-run prevention.

The six main clusters

The dendrogram produced six useful interpretive groups:

Cluster	Teams	Interpretation
Elite skill and value staffs	BOS, CHC, CLE, HOU, LAD, NYY, PHI	Strongest overall skill profile, especially WAR, K-BB%, FIP-, xFIP-, and SIERA
Contact-control run preventers	ATL, SFG, STL	Strong run prevention, home-run control, ground-ball tendency, and starter length
Modern peripheral builders	ARI, MIL, SDP, TBR, TOR	Better in K-BB%, xFIP, and SIERA than in long-term WAR or ERA dominance
Mixed middle profiles	ATH, CHW, DET, LAA, MIN, NYM, SEA, WSN	No single shared identity as strong as the other groups, generally middle-range profiles
Low-dominance HR suppressors	MIA, PIT	Weak dominance indicators but relatively better home-run suppression
Long-term struggling profiles	BAL, CIN, COL, KCR, TEX	Broadly weak long-term pitching profile across value, dominance, and fielding-independent metrics

Why the clusters formed

Figure 2 explains the dendrogram. It shows the average profile of each cluster.

The elite skill-and-value group is strong almost everywhere that modern pitching analysis would expect. The group is especially strong in K-BB%, FIP-, xFIP-, SIERA, and WAR. This is the clearest “modern excellence” cluster.

The contact-control group is different. Atlanta, San Francisco, and St. Louis are not grouped primarily because of overwhelming strikeout dominance. They cluster because of run prevention, home-run prevention, ground-ball tendency, and starter length. This is a more traditional-looking run-prevention cluster.

The modern peripheral group is subtle. Arizona, Milwaukee, San Diego, Tampa Bay, and Toronto do not dominate the long-term value category. Still, they show enough similarity in strikeout-minus-walk skill and advanced indicators to cluster together. This feels like a group of organizations that, at different times, leaned into modern pitching design without producing the same full-period dominance as the Dodgers or Yankees.

The struggling group is also clear. Baltimore, Cincinnati, Colorado, Kansas City, and Texas sit below average in most of the key categories. Colorado’s presence is not surprising because pitching in Denver creates a unique environmental problem. But the dendrogram is not only about park effects. The broader cluster reflects weaker long-term skill indicators too.

Why the Dodgers separate

The Dodgers are inside the elite skill-and-value cluster, but they remain visually distinctive. They join the group later than several other teams inside that cluster. That is important.

It suggests that the Dodgers are similar to other elite pitching organizations but are still somewhat their own thing. Their long-term profile is so strong across so many categories that they are not simply interchangeable with the Yankees, Astros, Guardians, Phillies, Cubs, or Red Sox.

That matches the earlier chapter. The Dodgers were the top pitching organization by average normalized score from 2001 to 2025. The dendrogram confirms the same story from a different angle. They are not only highly ranked. They have a recognizable organizational profile.

Houston and Cleveland as development stories

Houston and Cleveland are also revealing.

The Astros cluster with the elite organizations, but their full-period story contains more transformation than the Dodgers’ story. Houston’s early-2010s pitching collapse and late-2010s pitching rise are both part of the same 25-year average. Even with those bad seasons included, Houston still clusters with the strongest pitching organizations. That says something about how powerful the later Houston pitching model became.

Cleveland’s placement also makes sense. The Guardians are not always discussed like the Dodgers or Yankees because they operate with a different market profile, but the data places them in the same broad pitching family. Their identity is built around development, strike-zone control, and repeatable pitching skill.

Tampa Bay, Milwaukee, and the modern middle

The Rays and Brewers are especially interesting because they appear in the modern peripheral group rather than the elite skill-and-value group.

That is not an insult. It may actually be the more interesting result.

Tampa Bay and Milwaukee often represent modern pitching creativity: bullpen flexibility, role adaptation, pitcher development, and tactical staff construction. But over the full 25-year period, the dendrogram places them closer to San Diego, Toronto, and Arizona than to the Dodgers or Yankees.

That suggests a distinction between modernity and dominance. A team can be tactically modern without producing the same long-term value profile as the very top organizations.

What the dendrogram adds

The dendrogram adds something that a leaderboard cannot.

A leaderboard says:

The Dodgers were first.
The Yankees were second.
The Astros, Guardians, Cubs, Red Sox, Phillies, and Braves followed.

The dendrogram says something different:

The Dodgers, Yankees, Astros, Guardians, Cubs, Red Sox, and Phillies belong to a shared skill-and-value family.

The Braves, Giants, and Cardinals form a separate contact-control and run-prevention family.

The Rays, Brewers, Padres, Blue Jays, and Diamondbacks form a modern peripheral family.

The Rockies, Royals, Orioles, Reds, and Rangers share a long-term struggling profile.

That is the value of clustering. It changes the question from “who was better?” to “who was built alike?”

Conclusion

The central lesson is that pitching identity has shape.

Some organizations built strong staffs across nearly every modern indicator. Some built staffs that prevented runs through contact management and home-run control. Some had modern peripheral strengths without the same long-term value profile. Some struggled across nearly every measure.

The dendrogram does not replace the chapter’s rankings. It deepens them.

From 2001 to 2025, the best pitching organizations were not merely collecting arms. They were building systems. The Dodgers built the most consistent system. The Yankees, Astros, Guardians, Phillies, Cubs, and Red Sox clustered near them because they also produced strong long-term skill profiles. The Braves, Giants, and Cardinals showed a different path, one built more around run prevention and contact control.

The larger point is simple: team pitching is not just performance. It is identity.

And the dendrogram shows us how identity takes shape.

Pitching in the Age of the Strikeout: MLB Team Pitching, 2001-2025

Pitching is one of the most difficult parts of baseball to measure because it sits at the intersection of many things. A pitcher controls the ball, but not everything that happens after contact. A defense turns balls in play into outs, or fails to. A park changes the meaning of a fly ball. A league environment changes the meaning of a 4.00 ERA. A bullpen changes the way we understand a starter. A front office changes the way we understand a pitching staff.

That is why a long-term team pitching study has to be deliberate and careful. If we simply rank every team from 2001 to 2025 by ERA, we are mixing together very different run environments. A 3.70 team ERA in one season does not mean exactly the same thing as a 3.70 team ERA in another. The offensive environment changes. The baseball changes. Strikeout rates change. Bullpen usage changes. Even the definition of a normal starting pitcher changes.

So the goal of this chapter is not merely to ask which team had the lowest ERA. The better question is this: which organizations consistently produced strong pitching staffs relative to their own era?

That is a critical distinction. A team does not pitch in the abstract. It pitches in a particular season, against a particular league, with a particular baseball, inside a particular tactical environment. The 2001 Diamondbacks, the 2011 Phillies, the 2017 Guardians, the 2018 Astros, and the 2024 Braves all belong to the same broad story, but they do not belong to the same pitching world.

The data in this study covers MLB team pitching from 2001 through 2025. The core variables include ERA, ERA-, FIP, FIP-, xFIP, xFIP-, SIERA, WAR, K%, BB%, K-BB%, HR/9, HR/FB, complete games, quality starts, and several contact-profile measures. For 2001, xFIP, SIERA, and detailed contact data are incomplete, so the 2001 season is included in the main study but handled carefully where those variables are missing.

The central finding is straightforward: from 2001 to 2025, team pitching moved decisively toward strikeout-based run prevention. The best organizations were not simply the ones that prevented runs in a given season. They were the ones who repeatedly built staffs with strong strikeout-minus-walk rates, strong fielding-independent indicators, and enough depth to remain competitive across changing offensive environments.

The Dodgers stand out most clearly. Over the full 25-year period, they were the strongest pitching organization by average normalized pitching score. The Yankees, Astros, Guardians, Cubs, Red Sox, Phillies, and Braves also appear near the top. But the Dodgers are the outlier, not because of one spectacular season, but because of repeated organizational excellence.

Method: comparing teams within seasons

The first methodological problem is that pitching statistics are unstable over time. A league-average pitching staff in 2001 did not look like a league-average pitching staff in 2025. Strikeouts increased. Complete games declined. Velocity rose. Bullpen usage expanded. Home-run rates surged and retreated. If we compare raw numbers across all years, we risk confusing historical context with team quality.

To solve this, each team-season was compared only to the other teams from the same season. In other words, the 2011 Phillies were compared to the league as a whole in 2011. The 2018 Astros were compared with the league in 2018. The 2024 Braves were compared to the league as a whole. This allows us to ask which staffs were exceptional relative to the environment in which they actually pitched.

The basic within-season z-score is:

z_y(X_{i,y}) = \frac{ X_{i,y} - \overline{X}_y }{ s_y(X) }

Here, ( $X_{i,y}$ ) is a statistic for team (i) in season (y), ( $\overline{X}_y)$ is the league average for that statistic in that season, and ( $s_y(X)$ ) is the standard deviation across teams in that season.

For statistics where higher is better, such as WAR and K-BB%, the z-score is used directly. For statistics where lower is better, such as ERA-, FIP-, xFIP-, and SIERA, the sign is reversed. This keeps the interpretation consistent. A higher score always means better pitching.

q_{i,y,m} = \begin{cases} z_y(m_{i,y}), & \text{if higher values are better} \\ -z_y(m_{i,y}), & \text{if lower values are better} \end{cases}

The overall pitching score is then the average of the available component scores:

\text{Pitching Score}_{i,y} = \frac{1}{|M_{i,y}|} \sum_{m \in M_{i,y}} q_{i,y,m}

The metric set is:

M = \left\{ \text{WAR}, \text{K-BB\%}, \text{ERA-}, \text{FIP-}, \text{xFIP-}, \text{SIERA} \right\}

This score is not meant to be the only possible definition of pitching quality. It is a deliberately balanced measure. It includes actual run prevention through ERA-, fielding-independent performance through FIP- and xFIP-, skill-based dominance through K-BB%, and overall value through WAR.

K-BB% is especially important because it captures the two plate appearance outcomes most directly controlled by the pitcher: strikeouts and walks.

\text{K-BB\%} = \text{K\%} - \text{BB\%}

FIP also deserves special attention because it attempts to isolate the events most directly connected to the pitcher: home runs, walks, hit batters, and strikeouts.

\text{FIP} = \frac{ 13 \cdot \text{HR} + 3 \cdot (\text{BB} + \text{HBP}) - 2 \cdot \text{K} }{ \text{IP} } + c_{\text{FIP}}

The constant ( $c_{\text{FIP}}$ ) places FIP on an ERA-like scale. Because that run environment changes by season, FIP- and other indexed measures help compare teams more fairly.

The league changes: strikeouts become the center of pitching

The first figure shows the most important league-wide transformation: the rise of strikeouts.

Figure 1. K%, BB%, and K-BB% trends, 2001-2025

In 2001, the league strikeout rate was about 17.3%. By 2025, it was about 22.2%. That is not a small tactical adjustment. It is a structural change in how pitching works. The modern pitching staff is built around missing bats in a way that the early 2000s staff was not.

Walk rate did not change nearly as dramatically. In 2001, the league walk rate was about 8.4%. In 2025, it was about 8.4% again. There were fluctuations in between, but the broad pattern is clear: strikeouts rose much more than walks did.

That means K-BB% increased substantially. In 2001, league K-BB% was about 8.9%. In 2025, it was about 13.8%. That is the heart of the modern pitching revolution. The best staffs are not just striking out more hitters. They are increasing the gap between strikeouts and walks.

This is why K-BB% belongs near the center of the composite score. It is simple, but powerful. It strips pitching down to a basic contest: can the staff create strikeouts without giving back too many free baserunners?

The answer, increasingly, is yes. But not all teams answered equally well.

Run prevention, FIP, and the changing meaning of ERA

The second figure compares ERA, FIP, xFIP, and SIERA over time. Note that some of the data overlaps on the same line.

Figure 2. ERA, FIP, xFIP, and SIERA trends, 2001-2025

One of the striking features of the long-term data is that ERA does not move in one simple direction. The league ERA was about 4.42 in 2001. It fell to about 3.74 in 2014, rose to about 4.51 in 2019, and then settled around 4.16 in 2025.

That pattern matters because it reminds us that pitching quality cannot be evaluated solely by raw ERA. A team can have a lower ERA because it is genuinely better, but it can also have a lower ERA because the entire league is scoring less. Likewise, a higher ERA in a high-offense environment may not be as bad as it looks.

The 2014 season is a useful example. League run prevention was strong. ERA, FIP, xFIP, and SIERA all sat at relatively low levels. A good pitching staff in 2014 had to be judged against that lower-scoring context. The opposite problem appears in 2019, when the home-run environment pushed run scoring upward. A team that survived 2019 with strong FIP-based indicators deserves credit, given the more difficult environment.

That is why indexed statistics such as ERA- and FIP- are valuable. They tell us how a staff performed relative to league average, where lower is better.

\text{ERA-} = 100 \cdot \frac{ \text{Team ERA} }{ \text{League ERA} } \quad \text{adjusted for context}

A team with an ERA- of 90 was roughly 10% better than league average by that measure. A team with an ERA- of 110 was roughly 10% worse. The same logic applies to FIP-, except the foundation is fielding-independent pitching rather than actual runs allowed.

This distinction becomes crucial when comparing staffs across 25 seasons. The 2011 Phillies and the 2018 Astros both appear as historically great staffs, but they do not look great in exactly the same way. The Phillies represent a more traditional elite rotation model. The Astros represent the modern strikeout, command, and run-prevention model.

The home-run environment

The third figure shows the home-run environment.

Figure 3. HR/9 and HR/FB trends, 2001-2025

Home runs are one of the most important pressure points in modern pitching analysis because they connect individual pitcher skill, batted-ball profile, park context, and league environment. In 2001, league HR/9 was about 1.14. It dipped below 1.00 in several seasons, including 2010 and 2014, then spiked dramatically in 2019, reaching about 1.41 HR/9.

The 2019 season stands out immediately. It was not merely a season with more scoring. It was a season in which the relationship between contact and damage changed. HR/FB also rose sharply, reaching about 15.3% in 2019. That created a very different environment for pitchers.

This is one reason xFIP can be useful. FIP uses actual home runs allowed. xFIP estimates performance by normalizing home-run rate relative to fly balls. Neither statistic is perfect. FIP gives pitchers the actual cost of the home runs they allowed. xFIP asks whether that home-run rate was likely to persist.

For a team-level study, the difference between FIP and xFIP can be revealing. A team with a much better FIP than xFIP may have suppressed home runs unusually well, perhaps through park effects, pitcher skill, batted-ball management, or some combination of these. A team with a much worse FIP than xFIP may have been punished by an elevated home-run rate.

The home-run environment also helps explain why season-normalization is necessary. A team pitching in 2019 faced a different kind of run-prevention problem than a team pitching in 2014. The raw numbers alone cannot tell us whether a staff was good. They have to be interpreted against the league context.

The disappearance of the complete game

The fourth figure captures one of the clearest tactical changes in baseball.

Figure 4. Complete games and quality starts, 2001-2025

In 2001, MLB teams combined for 199 complete games. In 2002, that number was 214. By 2025, it had fallen to 29.

This is not a gradual stylistic preference. It is a transformation in pitcher usage. The complete game went from a normal, if still special, part of pitching to a rare event. The starting pitcher’s job changed. The bullpen’s job changed. The manager’s job changed. The entire architecture of run prevention changed.

Quality starts also declined. In 2001, teams combined for 2,342 quality starts. In 2025, that number was 1,676. Unlike complete games, quality starts did not nearly vanish; they simply became less central to how team pitching is organized.

This matters because a traditional pitching staff was often understood through the front of the rotation. The ace mattered. The number two starter mattered. The innings-eater mattered. In the modern game, those categories still matter, but they are less complete descriptions of team pitching quality. A staff can be excellent because it has dominant starters, but it can also be excellent because it has a deep bullpen, matchup flexibility, velocity, strikeout depth, and player-development infrastructure.

This is one reason team-level pitching analysis is valuable. It captures the staff as an organization, not merely as a list of starting pitchers.

The organizational scoreboard

The franchise-level results show which organizations repeatedly built strong pitching staffs across the full period.

Figure 5. Franchise average pitching score, 2001-2025

The top organizations by average normalized pitching score were:

Rank	Franchise	Avg Score	Avg Rank	Top-5 Seasons
1	LAD	1.043	5.80	16
2	NYY	0.785	8.12	10
3	HOU	0.470	11.20	8
4	CLE	0.447	12.16	9
5	CHC	0.363	12.80	6
6	BOS	0.361	12.44	6
7	PHI	0.356	11.88	6
8	ATL	0.344	12.60	7

The Dodgers are the clear leader. Their average rank was 5.80 across 25 seasons, and they finished in the top five 16 times. That is a remarkable level of consistency.

The Yankees also stand out. They were not as dominant as the Dodgers by average score, but they were consistently strong. Their average rank was 8.12, and they had 10 top-five seasons.

Houston’s position is interesting because the Astros’ 25-year period includes both very bad years and elite years. Their full-period average ranks third, but that average hides a sharp organizational transformation. The 2013 Astros appear among the worst pitching seasons in the dataset. The 2018 and 2019 Astros appear among the strongest. That makes Houston one of the most dramatic before-and-after stories in the study.

Cleveland also deserves attention. The Guardians were not merely good in one season. They produced nine top-five seasons across the full period, including the remarkable 2017 staff and the shortened-season 2020 staff. Cleveland’s results point toward a consistent ability to develop or acquire pitching skill, especially strikeout and command skill.

The Phillies are different. Their full-period average is strong, but their story is anchored by the 2011 staff, the top single-season result in the study. The Phillies’ score is not just about consistency. It is about peak excellence.

The best team pitching seasons

The best individual team pitching seasons in the study were:

Figure 6. Best team pitching seasons, 2001-2025

Rank	Season	Team	Score	WAR	ERA-	FIP-	K-BB%
1	2011	PHI	2.472	29.45	78.58	82.88	14.75%
2	2017	CLE	2.464	30.35	72.49	75.44	20.59%
3	2018	HOU	2.442	28.63	75.89	78.23	21.17%
4	2013	DET	2.089	26.26	89.31	82.13	15.79%
5	2024	ATL	2.059	23.62	84.13	86.61	18.47%

The top three are especially revealing because they show three different versions of elite pitching.

The 2011 Phillies represent the great traditional staff. Their rotation was the center of the story. Their ERA- was 78.58, meaning they were far better than league average at preventing runs. Their FIP- was also excellent at 82.88. They were not merely outperforming their peripherals. They were genuinely strong across the major indicators.

The 2017 Guardians look like a bridge between traditional excellence and modern dominance. Their ERA- was 72.49, the best among the top five listed here, and their FIP- was 75.44. Their K-BB% was 20.59%, which is extraordinary. This is a staff that combined run prevention, fielding-independent strength, and strikeout-minus-walk dominance.

The 2018 Astros are the modern model. Their K-BB% was 21.17%, the highest among these top five. Their ERA- and FIP- were both outstanding. They did not merely prevent runs. They controlled the plate appearance.

That phrase may be the key to the whole chapter: the modern elite staff controls the plate appearance. It wins by turning fewer balls into uncertain events. More strikeouts. Fewer walks. Better home-run control. Better matchup deployment. Better depth.

The 2013 Tigers are also fascinating. Their FIP- was much stronger than their ERA-, which suggests a staff whose fielding-independent indicators were better than its actual run prevention. That kind of gap is analytically useful because it may point toward defense, sequencing, bullpen leakage, park effects, or simple variation.

The 2024 Braves round out the top five, showing that the modern model remains alive. Strong WAR, strong ERA-, strong FIP-, and excellent K-BB% place them among the best team pitching seasons of the last 25 years.

The heat map view: organizational memory

The heat map shows how pitching strength is distributed over time.

Figure 7. Team pitching score heat map, 2001-2025

A heat map is useful because it shows continuity. A table gives us leaders. A heat map gives us memory.

The Dodgers’ consistency becomes visible immediately. They do not merely spike and disappear. They remain strong across many different league environments. This suggests that their pitching success is not just the product of one rotation or one era. It is organizational.

The Yankees also show long-term strength, although with a different shape. They remain regularly above average, but the Dodgers’ top-end consistency is stronger.

Houston’s pattern is more dramatic. The Astros transitioned from poor pitching during the rebuilding years to elite pitching in the late 2010s and beyond. This makes them one of the best examples of organizational reinvention in the dataset.

Cleveland’s pattern is also compelling. The Guardians do not always (understatement) have the resources of the largest-market teams, but the pitching results are consistently strong enough to suggest a real developmental identity. Cleveland’s peak seasons are not accidents.

Tampa Bay deserves separate attention as well. The Rays do not rank at the very top over the full period, but their modern pitching identity is clear. They are one of the organizations most associated with bullpen creativity, opener usage, and flexible staff construction. A team-level study captures some of that, although a starter-reliever split would make the story even sharper.

While the heat map shows the full league, a smaller set of franchise trajectories makes the organizational story easier to see. Figure 8 follows several teams that help define the period: the Dodgers, Yankees, Astros, Guardians, Rays, Phillies, and Braves. The Dodgers show sustained excellence. Houston shows dramatic organizational reinvention. Cleveland and Tampa Bay show the value of pitching development and tactical adaptation. Philadelphia shows the difference between peak rotation dominance and long-term consistency.

Figure 8. Selected Franchises, 2001 – 2025

Three eras of team pitching

Breaking the study into periods helps clarify the historical movement.

From 2001 to 2009, the leading organizations were:

Period	Rank	Franchise	Avg Score	Avg Rank	Top-5 Seasons
2001-2009	1	CHC	0.980	6.89	4
2001-2009	2	LAD	0.840	7.33	5
2001-2009	3	ARI	0.805	8.78	5
2001-2009	4	BOS	0.791	8.22	4
2001-2009	5	NYY	0.733	9.00	4

The early period is more rotation-centered. The Cubs, Dodgers, Diamondbacks, Red Sox, and Yankees all had strong stretches. This era still belongs partly to the older model of staff construction. Starting pitching carries more of the symbolic weight. Complete games are declining, but they have not yet collapsed to modern levels.

From 2010 to 2019, the leaders were:

Period	Rank	Franchise	Avg Score	Avg Rank	Top-5 Seasons
2010-2019	1	LAD	1.202	4.40	7
2010-2019	2	NYY	0.925	6.70	4
2010-2019	3	WSN	0.733	9.20	3
2010-2019	4	CLE	0.709	10.20	6
2010-2019	5	TBR	0.687	8.60	2

This is the period when the modern pitching environment becomes much clearer. Strikeouts rise. Velocity rises. Bullpen roles become more specialized. Cleveland, Tampa Bay, Washington, and Los Angeles all become central parts of the story.

From 2020 to 2025, the leaders were:

Period	Rank	Franchise	Avg Score	Avg Rank	Top-5 Seasons
2020-2025	1	LAD	1.082	5.83	4
2020-2025	2	PHI	0.941	5.67	3
2020-2025	3	MIL	0.808	8.00	2
2020-2025	4	TBR	0.808	6.83	2
2020-2025	5	ATL	0.658	11.33	2

The modern period is especially interesting because it includes the shortened 2020 season, the post-2020 workload reset, and the continuing dominance of strikeout-based staff construction. The Dodgers remain first. The Phillies rise. The Brewers and Rays become central examples of modern pitching development and staff management. The Braves also emerge strongly, especially with the 2024 season.

The worst seasons and the cost of weak pitching infrastructure

The worst team pitching seasons are just as revealing as the best ones.

At the bottom of the dataset are seasons such as the 2025 Rockies, 2006 Royals, 2023 Athletics, 2024 Rockies, and 2013 Astros. These seasons combine weak WAR, poor run prevention, poor FIP-based indicators, and low K-BB%.

The 2025 Rockies had a pitching score of -2.757, with a 125.19 ERA-, 119.75 FIP-, and only an 8.47% K-BB%. The 2006 Royals were similarly poor, with a 124.43 ERA-, 118.20 FIP-, and a 4.15% K-BB%. The 2023 Athletics had a 132.94 ERA-, 122.15 FIP-, and 9.49% K-BB%.

These are not merely bad ERAs. They are broad staff failures. When a team is poor in both run prevention and fielding-independent indicators, the problem is deeper than sequencing or defense. It suggests that the staff is not controlling the strike zone, not limiting damaging contact enough, and not producing enough value.

The 2013 Astros are especially important because they later became one of the strongest pitching organizations in the study. That contrast gives us a natural case study in organizational transformation. Bad pitching staffs do not have to remain bad forever. But the transformation requires more than one good pitcher. It requires a system.

What the study suggests

This first pass suggests several conclusions.

First, team pitching from 2001 to 2025 became increasingly strikeout-centered. The rise in K% and K-BB% is the central statistical movement of the period. It changed what good pitching looks like.

Second, raw ERA is not enough for a long-term study. ERA remains important because runs allowed are real. But ERA must be placed next to FIP, xFIP, SIERA, K-BB%, and indexed measures such as ERA- and FIP-. Otherwise, we risk mistaking league environment for team quality.

Third, complete games and traditional starter workload declined dramatically. This changes how we should think about team pitching. A great staff is no longer just a great rotation. It is a complete run-prevention system.

Fourth, the Dodgers are the strongest pitching organization of the 2001-2025 period. Their dominance is not just peak dominance. It is consistency. They averaged a top-six pitching rank across 25 seasons and finished in the top five 16 times.

Fifth, several organizations deserve deeper case studies. The Astros show organizational reinvention. The Guardians show player-development strength. The Rays show tactical creativity. The Phillies show the power of peak rotation excellence. The Braves show modern staff strength. The Yankees show long-term high-level stability.

Conclusion: pitching as organizational identity

The most important lesson from this study is that pitching is no longer best understood as a collection of individual arms. At the team level, pitching has become an organizational identity.

The best teams do not merely find pitchers. They shape pitching environments. They develop velocity. They manage workloads. They build bullpens. They optimize matchups. They control the strike zone. They use data to turn raw stuff into repeatable advantage.

That is why the Dodgers’ long-term record matters. It is not just that they had good pitchers. Many teams have good pitchers for a year or two. The Dodgers repeatedly built strong pitching staffs across different run environments, tactical eras, and roster cycles.

The same broader lesson applies to Houston, Cleveland, Tampa Bay, Milwaukee, Atlanta, Philadelphia, and New York. The details differ, but the underlying pattern is the same. Modern pitching excellence is systemic.

From 2001 to 2025, baseball shifted toward a game in which the best staffs increasingly controlled plate appearances. Strikeouts rose. Walks became more costly. Home runs reshaped risk. Complete games disappeared. Bullpens expanded. The old image of pitching as one starter carrying a game into the ninth inning gave way to something more distributed, more specialized, and more organizational.

The great pitching staffs of this period are therefore not just statistical outliers. They are historical markers. They show how the game changed, and how the smartest organizations changed with it.

The Shape of Defense: What MLB Fielding Metrics Tell Us So Far This Season

The Shape of Defense: What MLB Fielding Metrics Tell Us So Far This Season

Defense is the hardest part of baseball to measure cleanly. I thought it might be interesting to study all MLB teams as of the end of June, 2026.

Hitting leaves a visible trail. A batter walks, strikes out, singles, doubles, homers, or makes an out. Pitching is more complicated, but it still has a fairly direct statistical language. Strikeouts, walks, home runs, velocity, chase rate, and contact quality all point in recognizable directions. Defense is different. Good defense often appears as absence. The ball that does not fall. The extra base that is not taken. The throw that does not need to be dramatic because the fielder got to the ball early enough.

That makes defensive analysis both frustrating and interesting. One number rarely tells the whole story. Fielding percentage tells us whether a team usually completes the plays it reaches, but it says little about how many plays it reaches in the first place. Errors measure visible mistakes, but not invisible range. Modern metrics try to correct for that. Defensive Runs Saved, Outs Above Average, Fielding Run Value, FanGraphs Def, framing value, arm value, and range value each capture a different part of the defensive picture.

For this study, I used three FanGraphs team defensive leaderboards. The goal was not simply to rank teams. The better question is how the different defensive systems agree, where they disagree, and what kind of defense each team is actually playing.

The main conclusion is clear: so far this season, the Cubs are the strongest defensive team in baseball by a wide margin. But the deeper conclusion is more interesting. OAA, FRV, and FanGraphs Def are telling very similar stories. DRS is related to those measures, but it is not identical. Traditional fielding percentage has some relationship to defensive value, but not nearly enough to stand on its own.

Defense, in other words, is not one thing.

Building a Composite Defensive Score

To compare teams across multiple defensive systems, I created a composite z-score using four broad measures:

Defensive Runs Saved, or DRS
Outs Above Average, or OAA
Fielding Run Value, or FRV
FanGraphs Def

Each metric was standardized across the 30 MLB teams. The z-score for team (i) on metric (m) is:

z_{i,m} = \frac{x_{i,m} - \mu_m}{\sigma_m}

where $(x_{i,m})$ is team (i)’s value on metric (m),

$(\mu_m)$ is the league average for that metric,

and $(\sigma_m)$ is the standard deviation across teams.

The composite defensive score is then:

D_i = \frac{ z_{i,\mathrm{DRS}} + z_{i,\mathrm{OAA}} + z_{i,\mathrm{FRV}} + z_{i,\mathrm{Def}} }{4}

This score does not claim that all defensive metrics are perfect or equally philosophical. It is simply a way to ask a practical question: which teams look good across several major defensive metrics simultaneously?

The top of the list is not subtle.

Rank	Team	Composite Z	DRS	OAA	FRV	Def
1	CHC	2.41	57	38	34	34.06
2	LAD	1.73	61	23	20	21.77
3	BOS	1.32	42	18	17	19.20
4	ARI	1.30	25	23	23	19.08
5	SDP	1.08	17	16	22	19.61
6	STL	0.89	18	16	20	10.75

The Cubs are not merely first. They are first by a lot. Their composite z-score is 2.41, meaning they are far above league average across the combined defensive measures. The Dodgers are also excellent, but they are closer to the next group than they are to Chicago.

At the other end of the chart, the weakest defensive teams are also fairly clear.

Team	Composite Z	DRS	OAA	FRV	Def
MIN	-1.50	-32	-19	-19	-17.35
SEA	-1.36	-2	-27	-22	-18.83
LAA	-1.24	-3	-18	-23	-19.03
PHI	-1.08	-29	-18	-9	-7.46
ATH	-1.08	-4	-19	-16	-15.51

Minnesota rates last by the composite score. Seattle and the Angels are also deep in negative territory, though they arrive there in slightly different ways. Minnesota is hurt badly by DRS and modern range-based measures. Seattle is particularly poor by OAA and FRV standards. The Angels are near the bottom in FRV and FanGraphs Def.

The first lesson is that the defensive standings have a clear shape. Chicago is alone at the top. Los Angeles leads the next tier. Boston, Arizona, San Diego, and St. Louis form a strong second group. At the bottom, Minnesota, Seattle, Los Angeles Angels, Philadelphia, and the Athletics are the weakest group.

But rankings are only the beginning.

OAA and FRV Mostly Agree

The tightest relationship in the study is between Outs Above Average and Fielding Run Value.

The regression equation is:

\mathrm{FRV}_i = 0.83 \cdot \mathrm{OAA}_i + 0.77

with:

R^2 = 0.819

That is a strong relationship. This means that about 82% of the variation in team FRV is explained by team OAA in this dataset.

That makes intuitive sense. OAA and FRV are closely related modern defensive concepts. Both are trying to move beyond errors and fielding percentage. Both are interested in actual plays made relative to expected plays. Both reward teams that turn difficult batted balls into outs.

The Cubs sit in the upper-right corner of the chart. They are not just good by one metric. They are extreme by both. Chicago has 38 OAA and 34 FRV. Arizona, Los Angeles, San Diego, Boston, and St. Louis also occupy the positive area of the chart. At the bottom, Seattle, the Angels, Minnesota, and the Athletics cluster in negative territory.

This is useful because it gives confidence. If two modern metrics with related but not identical constructions point in the same direction, the result is more persuasive. The Cubs are not a leaderboard accident. Their defensive advantage appears in multiple systems.

DRS Tells a Related but Different Story

DRS also matters, but it does not align with FRV as tightly as OAA does.

The regression equation is:

\mathrm{FRV}_i = 0.43 \cdot \mathrm{DRS}_i - 3.98

with:

R^2 = 0.418

That is still a meaningful relationship, but it is much weaker than the OAA-FRV relationship. DRS and FRV are clearly not measuring the same thing in the same way.

This is where the study becomes more interesting. The Dodgers have the highest DRS total in the dataset, with 61, but they trail the Cubs in FRV, OAA, and Def. The Cubs have slightly lower DRS than Los Angeles, but they dominate in OAA and FRV. Tampa Bay is another example of disagreement. The Rays have a positive DRS total, but their FRV is negative. Philadelphia is negative in both, but much worse in DRS than FRV.

The correlation table reinforces the point. The correlations among the major modern measures are:

Pair	Correlation
OAA and FRV	0.90
FRV and Def	0.97
OAA and Def	0.94
DRS and OAA	0.65
DRS and FRV	0.65
DRS and Def	0.66

The formula for correlation is:

r_{XY} = \frac{ \sum_i (x_i - \bar{x})(y_i - \bar{y}) }{ \sqrt{\sum_i (x_i - \bar{x})^2} \sqrt{\sum_i (y_i - \bar{y})^2} }

This tells us that DRS belongs in the conversation, but it should not be treated as interchangeable with OAA or FRV. When DRS disagrees with the Statcast-style measures, that disagreement is not a nuisance. It is evidence that defensive measurement still depends on the assumptions built into each system.

The Cubs Are Winning with Range

The component data explains why Chicago is separated from the rest of the league.

The Cubs have:

Component	Value
Range	31
Arm	7
Framing	-3
Blocking	0
Throwing	-2
FRV	34

That is the key to the whole study. Chicago is not leading because of framing. It is not leading because of blocking. It is leading because of range.

Range is the largest component in the dataset. The Cubs have a Range value of 31. The next highest teams are Arizona at 19, the Dodgers at 18, Boston at 16, and San Diego at 13. That gap is enormous. It suggests that Chicago is turning a large number of balls in play into outs that an average defense might not convert.

A simple component model is:

\mathrm{FRV}_i = \mathrm{Range}_i + \mathrm{Arm}_i + \mathrm{Framing}_i + \mathrm{Blocking}_i + \mathrm{Throwing}_i + \epsilon_i

The additional term at the end of the equation: $(\epsilon_i)$ is included because published component totals may not always sum perfectly to the displayed total due to rounding, classification, or metric construction. But the practical interpretation is still clear. In this dataset, Range is the dominant separating variable.

Arizona is similar to Chicago in shape, though not in magnitude. The Diamondbacks have 19 Range and 23 FRV. The Dodgers combine 18 Range with 6 Arm. Boston combines 16 Range, 3 Arm, and 2 Blocking. San Diego has a more balanced profile, with 13 Range, 4 Arm, and positive throwing value.

Toronto is the most interesting contrast. The Blue Jays have a strong FRV total of 12, but they do not get it from range. Their Range value is -2. Their Framing value is 13. Toronto is a reminder that not all good defenses are built the same way. Some teams prevent runs by reaching more balls. Others gain value through catcher receiving. A single defensive ranking can hide those differences.

Traditional Fielding Stats Still Miss the Heart of Defense

Fielding percentage remains familiar, but it is not enough.

The regression between fielding percentage and FanGraphs Def is:

\mathrm{Def}_i = 1958.5 \cdot \mathrm{FP}_i - 1930.6

with:

R^2 = 0.227

That means that fielding percentage explains only about 23% of the variation in FanGraphs Def in this dataset.

This should not be surprising. Fielding percentage is calculated as:

\mathrm{FP} = \frac{\mathrm{PO} + \mathrm{A}} {\mathrm{PO} + \mathrm{A} + \mathrm{E}}

Putouts and assists are important, as are errors. But this formula has a blind spot. It only evaluates plays that become official chances. It does not ask how many difficult balls were reached. It does not ask whether a fielder’s first step turned a hit into an out. It does not measure the value of positioning. It does not capture the difference between a clean single and a ball that a better defense would have converted into an out.

This is why a team can have a decent fielding percentage and still grade poorly by modern defensive metrics. Avoiding errors is not the same thing as preventing hits. Completing routine plays is not the same thing as creating outs.

The traditional fielding formula is useful, but incomplete. It measures reliability on contacted chances. Modern defensive metrics try to measure territory, difficulty, and run value.

What the Rankings Mean

The strongest defensive teams so far fall into three groups.

The first group is Chicago alone. The Cubs are the best team in the study because they combine elite OAA, elite FRV, elite FanGraphs Def, and excellent DRS. Their defense is driven by range, and the size of that range advantage is the most important finding in the data.

The second group includes the Dodgers, Red Sox, Diamondbacks, Padres, and Cardinals. These teams are all meaningfully above average. The Dodgers have the strongest DRS figure and excellent marks across the board. Arizona and San Diego are especially strong in FRV. Boston is balanced and rates highly in all major measures. St. Louis is not quite as strong by FanGraphs Def, but its OAA and FRV remain impressive.

The third group is more complicated. Atlanta, Cleveland, Toronto, Texas, Kansas City, and the Yankees all have positive composite scores, but each has a different profile. Toronto is especially noteworthy because its positive value comes primarily from framing rather than range. Texas has positive OAA but neutral FRV, which keeps its composite score closer to the middle.

At the bottom, Minnesota is the weakest team overall. Seattle and the Angels are also poor by modern defensive value. Philadelphia is unusual because DRS dislikes it more than FRV does, while the Athletics are consistently weak across several measures.

Why This is Important

Defense affects the interpretation of everything else.

A pitcher on a great defensive team is not working in the same environment as a pitcher on a poor defensive team. A staff backed by Chicago’s range advantage may see more balls converted into outs. A staff backed by a weak range defense may see more balls fall in, even when contact quality is similar. That matters when evaluating ERA, run prevention, pitcher luck, and even team overperformance.

The same is true for team analysis. A club with strong pitching numbers may be getting help from its defense. A club with disappointing run prevention may have a fielding problem hidden underneath the pitching line. Defense is not just a separate category. It is part of the context in which pitching statistics are produced.

This is especially important because modern baseball produces so many batted-ball measurements. We can now ask not just whether a ball was caught, but whether it should have been caught. We can compare actual outs to expected outs. We can separate the routine from the exceptional. That changes the language of defense.

Errors used to dominate the conversation because they were visible. Range is harder to see, but it is often more important.

Conclusion

So far this season, the defensive story is clear at the top. The Cubs have been the best defensive team in baseball by the combined evidence of DRS, OAA, FRV, and FanGraphs Def. Their advantage is not cosmetic. It is large, broad, and especially driven by range.

The Dodgers are also excellent, and they lead in DRS. Boston, Arizona, San Diego, and St. Louis form a strong second tier. At the bottom, Minnesota, Seattle, the Angels, Philadelphia, and the Athletics have been the weakest defensive teams by the composite measure.

But the broader lesson is methodological. Defense should not be reduced to one statistic. OAA, FRV, and FanGraphs Def are closely aligned. DRS is related but more independent. Fielding percentage captures a small part of the picture, but it misses much of what makes modern defense valuable.

The old defensive question was: did the fielder make an error?

The better question is: how many outs did the defense create that an average defense would not have created?

That is where the Cubs separate themselves. Not merely by being clean. By getting to the baseball.

Zeno’s Paradox: The Infinite Hidden Inside a Single Step

At first glance, Zeno’s paradox seems ridiculous.

Of course, Achilles catches the tortoise. Of course, an arrow moves through the air. Of course, I can walk across a room. Well, duh!

We know these things before anyone begins arguing. Motion is one of the most ordinary facts of experience. Every thrown ball, every running child, every falling leaf, every car moving down a road seems to refute Zeno before he even begins.

And yet the paradox remains.

That is what makes Zeno interesting. His argument does not stand because it leads us to believe that motion is impossible. It survives because it reveals something strange about the way we explain motion. Zeno takes an everyday event and slows it down until the ordinary becomes puzzling. He asks us to look not at the fact that something moves, but at what must be true for motion to be intelligible.

Before I can cross a room, I must first cross half the room. Before I can cross the remaining distance, I must cross half of that. Then half again. Then half again. The distances become smaller and smaller, but the number of required divisions seems to grow without end.

The paradox begins with a simple observation: A finite distance can be divided into infinitely many parts.

That is the unsettling idea at the heart of Zeno’s paradox. The problem is not that the room is too large. The problem is that even a small room appears to contain an infinite structure.

The question becomes: how can a person complete an infinite number of tasks in a finite amount of time?

The Dichotomy Paradox

One of Zeno’s most famous arguments is often called The Dichotomy Paradox. The word “dichotomy” means a division into two parts. In this paradox, every journey must be divided in half.

Suppose I want to walk from one side of a room to the other. To reach the far wall, I first need to reach the halfway point. Once I reach the halfway point, I still need to reach the halfway point of the remaining distance. Then I need to reach the next halfway point. And so on.

The sequence looks like this:

\frac{1}{2},\ \frac{1}{4},\ \frac{1}{8},\ \frac{1}{16},\ \frac{1}{32},\ldots

Each distance is smaller than the one before it. But there is no final term. No matter how many halfway points I cross, another halfway point remains.

That is the apparent trap. If every motion requires completing infinitely many sub-motions, then motion seems impossible. Before I can finish the journey, I must finish an infinite sequence of smaller journeys.

Yet I do finish the journey.

That tension is the paradox.

Figure 1. Divided Finite Distance.

Mathematically, the total distance can be written as an infinite series:

\frac{1}{2}+\frac{1}{4}+\frac{1}{8}+\frac{1}{16}+\cdots

At first, this looks like an endless accumulation. But modern mathematics gives us a clear answer:

\frac{1}{2}+\frac{1}{4}+\frac{1}{8}+\frac{1}{16}+\cdots = 1

More formally:

\sum_{n=1}^{\infty}\left(\frac{1}{2}\right)^n = 1

The infinite series has a finite sum.

This is the key mathematical insight. An infinite number of terms does not necessarily mean an infinite total. The terms can shrink quickly enough that their sum approaches a finite limit.

That is why the walker reaches the wall. The distances get smaller, and the times required to cross them also get smaller. The infinite sequence does not require infinite time.

Still, this answer should not make us dismiss Zeno too quickly. The modern solution is powerful, but it also shows why the paradox mattered in the first place. Zeno forced later thinkers to clarify the relationship between infinity, space, time, and motion.

He did not merely ask a trick question. He discovered a pressure point.

Achilles and the Tortoise

The most famous version of Zeno’s argument is Achilles and the tortoise.

Imagine Achilles, the great runner, racing against a tortoise. Since Achilles is much faster, the tortoise receives a head start. Once the race begins, Achilles quickly reaches the place where the tortoise started. But by that time, the tortoise has moved a little farther ahead.

Achilles then reaches that new position. But again, the tortoise has moved forward.

Achilles reaches the next position. The tortoise has moved again.

This continues indefinitely.

The distances shrink. The tortoise’s lead becomes smaller and smaller. But in Zeno’s framing, Achilles must first reach every previous position occupied by the tortoise. Since there are infinitely many such positions, it seems Achilles can never catch up.

Again, common sense rebels.

Of course Achilles catches the tortoise.

But Zeno is not really betting on the tortoise. He is asking whether motion can be explained if every interval contains infinitely many smaller intervals.

Figure 2. Race Diagram.

Let the tortoise begin with a head start of distance (d). Let Achilles run at velocity (v_A), and let the tortoise move at velocity (v_T). If Achilles is faster, then:

v_A > v_T

The time it takes Achilles to catch the tortoise is:

t_{\text{catch}} = \frac{d}{v_A - v_T}

This equation gives a finite answer. Achilles catches the tortoise when the initial head start has been eliminated by the difference between their speeds.

For example, suppose the tortoise starts 10 meters ahead. Achilles runs at 10 meters per second. The tortoise moves at 1 meter per second. Then:

t_{\text{catch}} = \frac{10}{10 - 1}

t_{\text{catch}} = \frac{10}{9}

So Achilles catches the tortoise in about 1.11 seconds.

t_{\text{catch}} \approx 1.11\ \text{seconds}

The paradox dissolves mathematically. But it does not disappear philosophically. Zeno’s description of the race is not false in the ordinary sense. Achilles really does pass through the tortoise’s earlier positions. There really are infinitely many possible subdivisions of the race. What Zeno gets wrong is the assumption that infinitely many subdivisions require infinitely much time.

The modern answer depends on the idea of convergence.

The partial sums of a shrinking series approach a limit. For example:

S_n = \sum_{k=1}^{n}\left(\frac{1}{2}\right)^k

As (n) increases, (S_n) gets closer and closer to 1.

\lim_{n\to\infty} S_n = 1

This is the heart of the mathematical solution. The sequence has infinitely many steps, but the total distance is finite. The total time is finite too, assuming the motion is continuous, and the speed remains well-behaved.

Figure 3. Infinite Steps

The Arrow Paradox

Zeno’s Arrow paradox attacks motion from another direction.

Imagine an arrow flying through the air. At any single instant, the arrow occupies a particular position. At that instant, it is exactly where it is. It is not yet at the next position, nor is it at the previous one.

So, Zeno asks, where is the motion?

If time is made of instants, and if the arrow is motionless at each instant, then how can motion arise from a collection of motionless moments?

This paradox is different from the Dichotomy and Achilles arguments. It is not mainly about an infinite sequence of distances. It is about time itself. If time is composed of indivisible instants, then motion becomes difficult to locate. At a single frozen instant, nothing appears to move.

A photograph captures this problem nicely. A photograph of a moving car does not show motion itself. It shows a car at a position. Motion appears only when we understand the position as part of a sequence.

Modern physics and calculus answer this by treating velocity not as a visible change inside a single instant, but as an instantaneous rate of change.

Average velocity is easy to understand:

v_{\text{avg}} = \frac{\Delta x}{\Delta t}

This says that average velocity equals change in position divided by change in time.

Instantaneous velocity is more subtle. It is defined as the limit of average velocity as the time interval becomes arbitrarily small:

v(t) = \lim_{\Delta t\to 0}\frac{x(t+\Delta t)-x(t)}{\Delta t}

The arrow does not need to move “inside” a frozen instant. Its motion is represented by the way its position changes over time. Velocity belongs to the structure of the function, not to a single isolated snapshot.

That is a powerful mathematical response. But again, Zeno has forced us to become more precise. He makes us distinguish between position and motion, between an instant and an interval, between a snapshot and a process.

The arrow paradox is not silly. It is a warning about confusing the parts of a description with the whole of reality.

Infinity as the Real Subject

The reason Zeno’s paradoxes endure is that they are not really about turtles, arrows, or people crossing rooms. They are about infinity.

There are at least two kinds of infinity at work here.

First, there is the infinity of division. A line segment can be divided in half, then half again, and so on. There is no obvious stopping point. This suggests that space may be infinitely divisible.

Second, there is the infinity of sequence. Once we begin listing the required steps, the list seems endless. First half the distance. Then half the remainder. Then half again.

Zeno’s genius was to combine these two ideas and turn them against motion.

If every finite act contains infinitely many parts, then how can any finite act be completed?

The modern answer is that infinitely many parts can form a finite whole. That answer now seems familiar because infinite series are part of standard mathematics. But the idea is far from obvious. It is one of the great achievements of mathematical thought.

A simple geometric series shows the point:

a + ar + ar^2 + ar^3 + \cdots = \frac{a}{1-r}

provided that:

|r| < 1

In the Dichotomy paradox, the first term is:

a = \frac{1}{2}

and the common ratio is:

r = \frac{1}{2}

So:

\frac{a}{1-r} = \frac{\frac{1}{2}}{1-\frac{1}{2}}

\frac{\frac{1}{2}}{\frac{1}{2}} = 1

The infinite sum equals the finite distance.

This is why Zeno’s argument fails mathematically. But it fails in a revealing way. It shows that common sense alone is not enough. We needed a theory of limits to explain what everyday experience already knew.

The Difference Between Solving and Dismissing

It is tempting to say that calculus solved Zeno’s paradox and leave it there.

In one sense, that is true. The mathematics of limits gives a clean answer to the problem of infinite subdivision. Achilles catches the tortoise. The walker crosses the room. The arrow moves.

But there is a difference between solving a paradox and dismissing it.

A bad paradox depends on a cheap trick. Once the trick is exposed, nothing remains.

Zeno’s paradox is different. Even after the mathematical answer is given, the original problem remains intellectually productive. It continues to ask useful questions.

What is continuity?

What is an instant?

Is space made of points, or are points abstractions we impose on space?

Is time a flowing reality, or a coordinate in a mathematical model?

Does mathematics describe the world directly, or does it provide a structure that predicts the world?

These are not dead questions. They return in different forms in philosophy, physics, and mathematics. Zeno’s paradox survives because it sits near the boundary between lived experience and formal explanation.

We live in motion. But to explain motion, we must translate it into distance, time, velocity, sequence, and limit. Each translation clarifies something. Each translation also changes the problem.

The Paradox as a Lesson in Explanation

There is a deeper lesson here.

Zeno shows that an explanation can fail even when the reality being explained is obvious.

Motion happens. No serious person doubts that. But saying “motion happens” is not the same as explaining how motion is possible within a particular theory of space and time.

That distinction matters far beyond ancient philosophy.

In science, statistics, and history, we often begin with facts that seem obvious. A species changes. A river cuts a valley. A baseball player declines with age. A market rises or falls. A civilization expands. A population migrates.

But explanation requires structure. We need a model. We need assumptions. We need a way to connect observations to causes.

Zeno’s paradox reminds us that the structure of explanation can become unstable. Sometimes the model makes the obvious seem impossible. When that happens, the answer is not to reject experience immediately. It is to examine the assumptions inside the model.

That may be the real value of the paradox.

Zeno slows us down. He makes us ask what we mean by motion, distance, time, and completion. He takes a simple act and reveals the hidden machinery of thought inside it.

A single step across a room becomes a philosophical event.

Why the Paradox is Still Discussed

Zeno was wrong if his goal was to prove that motion is impossible.

But he was right that motion is stranger than it appears.

The paradox matters because it teaches humility. We should be careful when we assume that ordinary experience is simple. The simplest events often contain the deepest assumptions.

Walking across a room feels immediate. But when analyzed mathematically, it opens into infinity.

A runner passing a tortoise feels obvious. But when divided into successive positions, it becomes a puzzle about convergence.

An arrow flying through the air feels undeniable. But when frozen into instants, it becomes a question about time.

In each case, Zeno forces us to notice that reality and explanation are not identical. Reality happens. Explanation tries to account for how it happens. The gap between the two is where paradox lives.

The modern mathematical answer is beautiful:

\sum_{n=1}^{\infty}\left(\frac{1}{2}\right)^n = 1

An infinite process can have a finite limit.

But the philosophical lesson is just as important:

The world may move easily, but our concepts do not always move with it.

Conclusion: The Infinite in the Ordinary

Zeno’s paradox begins with common sense and ends with infinity.

That is why it remains powerful. It does not take us away from ordinary life. It takes ordinary life more seriously than we usually do.

A walk across the room becomes a question about infinite division. A race becomes a question about convergence. An arrow becomes a question about time, instants, and change.

The paradox is not really asking whether motion exists. It is asking whether our account of motion is coherent.

That is a much better question.

Achilles catches the tortoise. The arrow reaches the target. I cross the room.

But after Zeno, none of these things seems quite as simple as they did before.

The world still moves.

The mystery is that we can explain it at all.

Season-Level Validation: Do Third-Base Offensive Z-Scores Predict wRC+?

Introduction

The first wRC+ validation study used a career-level FanGraphs export.

That study was useful. It showed that, among regular third basemen, average Model C offensive score per qualified season strongly predicted career wRC+. It also showed that traditional defense did not predict wRC+, which was exactly what we wanted from a negative-control test.

But the career-level study had one limitation.

wRC+ is fundamentally a season-level offensive rate statistic. Our offensive z-score system is also built season by season. So the cleanest validation test is not career score against career wRC+.

The cleanest test is:

Does a third baseman’s season-level offensive z-score predict his season-level wRC+?

This chapter answers that question.

The answer is yes.

Using the season-level FanGraphs export, the Model C offensive season score explains about 69 percent of the variation in season wRC+ among qualified third-base seasons.

R^2 = 0.692

The fitted model is:

wRC^+ = 101.47 + 5.86(\text{Model C Offensive Season Score})

That is a strong result.

Just as important, the traditional defensive score does not predict wRC+:

R^2 = 0.002

This is exactly the pattern the project needed.

Offensive z-scores predict offense.

Traditional defensive z-scores do not.

That means the Model C offensive score is not merely identifying generally good players. It is measuring offensive quality.

Data Used in the Season-Level Study

The FanGraphs season-level export included:

9,152 player-season rows through 2025
Season
Name
Team
PA
wRC+
PlayerId
MLBAMID

The broader third-base season dataset included:

3,188 qualified third-base seasons
Season range: 1880–2025

The merge was very strong:

Matched seasons: 3,163
Unmatched seasons: 25
Match rate: 99.2%

The remaining unmatched seasons were mostly older Negro Leagues or historical ID cases. The modern and post-integration major-league seasons matched very well.

This makes the season-level validation much cleaner than the first career-level wRC+ test.

Why Season-Level Validation Matters

The career-level wRC+ test asked whether accumulated third-base offensive separation was related to career offensive quality.

The season-level test is more direct.

It asks:

In a given season, does the offensive z-score model identify the same kind of offensive performance that wRC+ identifies?

This is a better test because both measures are season-specific.

The z-score model compares a third baseman to other third basemen in the same season. wRC+ compares a hitter’s offensive production to the league and park context of that season.

They are not the same statistic.

But they should be related.

If Model C is measuring offensive quality, high Model C scores should correspond to high wRC+ values.

That is what the data show.

The Model C Offensive Score

The Model C offensive score uses seven components:

OBP
ISO
BB/PA
SO/PA, inverted
Net SB/PA
R/PA
RBI/PA

Each component is converted into a same-position, same-season z-score.

The basic z-score formula is:

z = \frac{x - \mu}{\sigma}

Where:

x = \text{the player's value}

\mu = \text{the same-position, same-season peer-group mean}

\sigma = \text{the same-position, same-season peer-group standard deviation}

This is the central idea of the study.

Raw numbers ask how large a number is. Z-scores ask how far a player separated from his peer group.

Offensive Component Equations

On-base percentage is:

OBP = \frac{H + BB + HBP}{AB + BB + HBP + SF}

Slugging percentage is:

SLG = \frac{TB}{AB}

Isolated power is:

ISO = SLG - AVG

Walk rate is:

BB/PA = \frac{BB}{PA}

Strikeout rate is:

SO/PA = \frac{SO}{PA}

Net stolen bases are:

NetSB = SB - CS

Net stolen-base rate is:

NetSB/PA = \frac{SB - CS}{PA}

Run rate is:

R/PA = \frac{R}{PA}

RBI rate is:

RBI/PA = \frac{RBI}{PA}

The strikeout component is inverted because lower strikeout rates are better:

z_{\text{Low SO/PA}} = -\left( \frac{ (SO/PA)_i - \overline{(SO/PA)}_{\text{peer}} }{ s_{SO/PA,\text{peer}} } \right)

The full Model C offensive season score is:

\begin{aligned} \text{Season Score} &= z_{\text{OBP}} + z_{\text{ISO}} + z_{\text{BB/PA}} + z_{\text{Low SO/PA}} \\ &\quad + z_{\text{NetSB/PA}} + z_{\text{R/PA}} + z_{\text{RBI/PA}} \end{aligned}

This score measures offensive separation from same-season third-base peers.

Regression Framework

The main validation model is:

wRC^+_s = \alpha + \beta_1(\text{Model C Offensive Season Score}_s) + \varepsilon_s

Where:

wRC^+_s = \text{FanGraphs wRC+ for season } s

\alpha = \text{intercept}

\beta_1 = \text{slope for the offensive z-score}

\varepsilon_s = \text{residual error}

The coefficient of determination is:

R^2 = 1 - \frac{ \sum_s \left( wRC^+_s - \widehat{wRC^+}_s \right)^2 }{ \sum_s \left( wRC^+_s - \overline{wRC^+} \right)^2 }

A higher value of $R^2$ means the model explains more of the variation in wRC+.

Main Season-Level Result

The fitted offense-only model is:

wRC^+ = 101.47 + 5.86(\text{Model C Offensive Season Score})

The result is:

R^2 = 0.692

This means the Model C offensive season score explains about 69.2 percent of the variation in season-level wRC+ among matched qualified third-base seasons.

That is a strong validation result.

The slope is also meaningful:

\beta_1 = 5.86

Each additional point of Model C offensive season score corresponds to about 5.86 additional points of wRC+.

For example, a player with an offensive score of 0 projects as:

wRC^+ = 101.47 + 5.86(0)

wRC^+ = 101.47

A player with an offensive score of 5 projects as:

wRC^+ = 101.47 + 5.86(5)

wRC^+ = 130.77

A player with an offensive score of 10 projects as:

wRC^+ = 101.47 + 5.86(10)

wRC^+ = 160.11

This is exactly the pattern expected if Model C is capturing offensive dominance.

Figure 1: Model Comparison

Figure 1. How well season-level third-base metrics predict wRC+.

The first figure compares several models.

The offensive z-score model performs well:

R^2_{\text{Offensive z-score}} = 0.692

The traditional defensive score performs almost not at all:

R^2_{\text{Traditional Defense}} = 0.002

Adding traditional defense to offense does not meaningfully improve the result:

R^2_{\text{Offense + Defense}} = 0.692

Adding plate appearances produces only a tiny improvement:

R^2_{\text{Offense + PA}} = 0.695

The WAR_off benchmark is higher:

R^2_{\mathrm{WAR}_{\mathrm{off}}} = 0.846

That is expected. WAR_off is already a sophisticated offensive value measure. It is included only as a benchmark, not as a competing z-score model.

The important comparison is offense versus defense.

The offensive z-score score predicts wRC+ strongly. The defensive score does not.

Figure 2: Offensive Z-Score Versus wRC+

Figure 2. Season wRC+ versus Model C offensive season score among third basemen.

This figure shows the main relationship directly.

The x-axis is:

\text{Model C Offensive Season Score}

The y-axis is:

wRC^+

The fitted line is:

wRC^+ = 101.47 + 5.86x

R^2 = 0.692

The pattern is clear.

High offensive z-score seasons generally produce high wRC+ seasons. Miguel Cabrera’s 2013 season, Chipper Jones’s 1999 season, Mike Schmidt’s 1980 and 1981 seasons, George Brett’s 1985 season, and Alex Rodriguez’s 2007 season all sit in the upper-right region.

That is exactly where they should be.

The plot also shows interesting residual cases. Some seasons have high wRC+ relative to their Model C score. Others have lower wRC+ than the z-score model predicts.

Those differences are not necessarily errors. They show that Model C and wRC+ measure offense from different angles.

Figure 3: Actual Versus Predicted wRC+

Figure 3. Actual versus predicted season wRC+ using the offensive z-score model.

The prediction equation is:

\widehat{wRC^+}_s = 101.47 + 5.86(\text{Model C Offensive Season Score}_s)

The residual is:

\text{Residual}_s = wRC^+_s - \widehat{wRC^+}_s

Players near the diagonal are well predicted. Players above the diagonal have higher wRC+ than the z-score model predicts. Players below the diagonal have lower wRC+ than the z-score model predicts.

The figure shows that most seasons fall around the diagonal, which is why the model produces a strong $R^2$ .

It also shows the value of residual analysis. The most interesting seasons are often the ones that do not land exactly where the model expects.

Figure 4: The Defensive Negative Control

Figure 4. Traditional defensive score does not predict season wRC+.

The negative-control model is:

wRC^+ = \alpha + \beta_1(\text{Traditional Defensive Season Score}) + \varepsilon

The fitted result is:

wRC^+ = 102.46 + 0.51(\text{Traditional Defensive Season Score})

R^2 = 0.002

This is one of the most important results in the chapter.

The traditional defensive score explains almost none of the variation in wRC+.

That is exactly what should happen.

wRC+ is an offensive metric. A traditional defensive score should not meaningfully predict it. The fact that it does not strengthens the validation.

It shows that the Model C offensive score is measuring offense specifically, not simply general player quality.

Figure 5: Residuals

Figure 5. Largest season-level wRC+ residuals from the offensive z-score model.

The residual equation is:

\text{Residual}_s = wRC^+_s - \widehat{wRC^+}_s

Positive residuals mean the season had a higher wRC+ than predicted by the z-score model.

Negative residuals mean the season had a lower wRC+ than predicted.

The largest positive residuals include:

Matt Williams 1995
Jim Finigan 1954
Jack Gleason 1884
Sean Berry 1995
Ron Cey 1981
George Scott 1970
Mike Schmidt 1981
Bill Joyce 1894

The largest negative residuals include:

Art Devlin 1905
Chone Figgins 2011
Jerry Royster 1977
Pie Traynor 1922
Chuck Harmon 1954
Bubba Phillips 1960
Charlie Hayes 1999
Maikel Garcia 2024

These residuals are worth studying because they show where the z-score model and wRC+ disagree most.

Interpreting Positive Residuals

A positive residual means wRC+ sees more offensive value than the z-score model predicts.

There are several possible reasons.

First, wRC+ is built from run values and is park- and league-adjusted. Model C is built from peer separation in selected categories. The two systems overlap strongly, but they are not identical.

Second, Model C includes runs and RBI rates. Those are useful for describing offensive dominance, but they can also be influenced by lineup context. wRC+ is more directly centered on offensive production independent of team context.

Third, partial seasons can create interesting differences. Matt Williams 1995, for example, had a very high wRC+ in fewer plate appearances than a full season. The z-score model includes playing-time weighting, so a shorter season can be pulled downward relative to a rate statistic.

That does not mean either measure is wrong.

It means they are answering slightly different questions.

Model C asks:

How much offensive separation did this third baseman produce in this season?

wRC+ asks:

How strong was this hitter's offensive production after league and park adjustment?

Those are related questions, not identical questions.

Interpreting Negative Residuals

A negative residual means the z-score model predicted a higher wRC+ than the player actually had.

This can happen when a player scores well in the Model C components but not as well in wRC+.

For example, a player may separate from third-base peers in runs, RBI, baserunning, or contact profile without producing the same level of park- and league-adjusted offensive value.

Art Devlin 1905 is the largest negative residual in this run. Pie Traynor 1922, Ossie Vitt 1915, and several other early-era or context-sensitive seasons also appear in the negative tail.

This is not surprising.

The farther back the data go, the more differences we expect between a transparent peer-z-score model and a modern run-value metric such as wRC+.

The residuals are not a failure of the model. They are a useful diagnostic tool.

Why This Season-Level Result Matters

This season-level validation is probably the cleanest offensive test in the project.

The WAR validation showed that the combined offense-defense model predicts total value.

The career wRC+ validation showed that average offensive z-score predicts career offensive quality.

But this season-level wRC+ validation is even more direct.

It compares:

\text{Season Offensive Z-Score}

to:

\text{Season } wRC^+

The result is strong:

R^2 = 0.692

That means Model C captures a substantial share of the same offensive signal captured by wRC+.

The defensive negative control confirms the interpretation:

R^2_{\text{Defense Only}} = 0.002

That is almost zero.

Offensive z-scores predict offense. Traditional defensive z-scores do not.

That is exactly the validation pattern we wanted.

How This Fits With the Earlier Validation Studies

The validation sequence now has three layers.

First, the WAR study showed that offense and traditional defense together predict total value:

R^2_{\text{Career WAR, Offense + Defense}} = 0.814

Second, the career-level wRC+ study showed that average offensive z-score predicts career offensive quality:

R^2_{\text{Career wRC+}} = 0.740

Third, this chapter shows that season-level offensive z-score predicts season-level wRC+:

R^2_{\text{Season wRC+}} = 0.692

Together, these results give the project a strong methodological foundation.

The z-score model is not WAR.

It is not wRC+.

It is a simpler and more transparent peer-separation model.

But it clearly captures real value-related information.

Limitations

This chapter should still be read carefully.

The FanGraphs season-level file matched almost all qualified third-base seasons, but not every season. The unmatched cases were mostly older Negro Leagues or historical ID records.

The Model C offensive score is not park-adjusted in the same way as wRC+. It is same-position and same-season adjusted through z-scores, but that is not identical to league and park adjustment.

Model C also includes runs and RBI rates, which are not purely individual batter skill measures. They can reflect lineup and team context.

Finally, wRC+ is itself a model. It is extremely useful, but it is not a perfect measure of all offensive contribution. It does not treat baserunning the same way Model C does, and it does not ask the same positional-peer question.

So the correct conclusion is not:

Model C is the same as wRC+.

The correct conclusion is:

Model C strongly predicts wRC+, while preserving a different interpretive question.

That is exactly what we want from a validation study.

Conclusion

The season-level wRC+ validation gives the clearest offensive support for the third-base z-score project.

The main model is:

wRC^+ = 101.47 + 5.86(\text{Model C Offensive Season Score})

The result is:

R^2 = 0.692

That means the offensive z-score model explains about 69 percent of the variation in FanGraphs season-level wRC+ among matched qualified third-base seasons.

The traditional defensive score explains almost none:

R^2 = 0.002

That negative-control result is crucial.

The offensive model predicts offense.

The defensive model does not.

The broader implication is clear.

The z-score system is not just an internal ranking device. It aligns strongly with established external value metrics.

WAR validates the two-dimensional model.

wRC+ validates the offensive model.

And the season-level wRC+ study confirms that Model C captures a real offensive signal year by year.

Do Third-Base Offensive Z-Scores Predict wRC+?

Introduction

The WAR validation chapter tested the full two-dimensional model.

It asked whether our third-base z-score framework could predict total player value. The answer was yes. Offensive z-scores predicted WAR. Traditional defensive z-scores added substantial explanatory power. The combined model performed especially well at the career level.

But WAR is broad.

WAR includes offense, defense, baserunning, positional adjustment, replacement level, and playing time. That makes it useful, but it also makes it complex. If the question is whether our offensive z-score model really measures offensive quality, WAR is not the cleanest validation target.

For that, we need an offense-only benchmark.

That is where wRC+ becomes useful.

FanGraphs wRC+ is designed to measure offensive production relative to league and park context, with 100 as league average. A 120 wRC+ means a hitter was about 20 percent better than league average. An 80 wRC+ means about 20 percent below league average.

So the validation question becomes simple:

Does our Model C offensive z-score predict FanGraphs wRC+?

The answer is yes.

Among third-base regulars with at least five qualified third-base seasons, the average Model C offensive score per qualified season explains a large share of career wRC+ variation:

wRC^+ = 100.89 + 5.41(\text{Model C Offensive Score per Qualified Season})

R^2 = 0.740

That is a strong relationship.

Just as important, the traditional defensive score does not meaningfully predict wRC+:

R^2 = 0.022

That negative-control result matters. It tells us that the offensive z-score model is not simply measuring general player quality. It is measuring offense.

Why wRC+ Is the Right Validation Target

The earlier WAR validation was a broad test.

It asked:

Do our offense-defense scores predict total value?

This chapter asks something narrower:

Does our offensive z-score predict an established offensive metric?

That is a cleaner test of Model C.

The offensive z-score model was built from same-position, same-season peer comparisons. It was not designed to reproduce wRC+. It does not directly use the same run-value formula. It does not include park adjustments in the same way. It includes runs and RBI, which wRC+ does not treat as independent batter skills in the same way. It includes baserunning through net stolen bases, while wRC+ is focused on hitting.

Even so, the relationship is strong.

That is useful validation.

It means Model C is not just producing interesting internal rankings. It is also aligned with an external offensive measure.

Data Used in the Study

The FanGraphs file used for this chapter was a career batting leaderboard export. Because the file was career-level rather than season-level, this first wRC+ validation is a career-level study.

The merge was very successful.

The third-base career dataset included: 897 third-base players

The FanGraphs wRC+ merge matched: 786 of 897 players

Among regular third basemen, defined as players with at least five qualified third-base seasons, the merge matched: 239 of 240 players

That gives us a strong sample for the validation test.

The main analysis focuses on the regulars because wRC+ is a rate statistic, and very short careers can create noisy results. A five-qualified-season cutoff helps identify players with enough third-base playing time to make the comparison meaningful.

The Model C Offensive Score

The offensive score used in this validation is the same Model C score used throughout the third-base study.

Model C uses seven offensive components:

OBP ISO BB/PA SO/PA, inverted Net SB/PA R/PA RBI/PA

The basic z-score formula is:

z = \frac{x - \mu}{\sigma}

Where:

x = \text{the player's value}

\mu = \text{the same-position, same-season peer-group mean}

\sigma = \text{the same-position, same-season peer-group standard deviation}

This equation asks a simple question:

How far above or below the third-base peer group was this player?

That is the core of the whole study.

Offensive Component Equations

On-base percentage is:

OBP = \frac{H + BB + HBP}{AB + BB + HBP + SF}

Slugging percentage is:

SLG = \frac{TB}{AB}

Isolated power is:

ISO = SLG - AVG

Walk rate is:

BB/PA = \frac{BB}{PA}

Strikeout rate is:

SO/PA = \frac{SO}{PA}

Net stolen bases are:

NetSB = SB - CS

Net stolen-base rate is:

NetSB/PA = \frac{SB - CS}{PA}

Run rate is:

R/PA = \frac{R}{PA}

RBI rate is:

RBI/PA = \frac{RBI}{PA}

The strikeout component is inverted because fewer strikeouts are better:

z_{\text{Low SO/PA}} = -\left( \frac{ (SO/PA)_i - \overline{(SO/PA)}_{\text{peer}} }{ s_{SO/PA,\text{peer}} } \right)

The full Model C offensive season score is:

\begin{aligned} \text{Season Score} &= z_{\text{OBP}} + z_{\text{ISO}} + z_{\text{BB/PA}} + z_{\text{Low SO/PA}} \\ &\quad + z_{\text{NetSB/PA}} + z_{\text{R/PA}} + z_{\text{RBI/PA}} \end{aligned}

This produces one offensive score for each qualified third-base season.

Playing-Time Weighting

The broader study uses a playing-time weight so that a partial season does not count the same as a full season.

The weight is:

w = \min\left(1, \frac{PA}{600}\right)

The weighted season score is:

\text{Weighted Offensive Season Score} = \text{Model C Offensive Season Score} \times w

The career offensive score is:

\text{Career Offensive Score} = \sum_{s=1}^{n} \text{Weighted Offensive Season Score}_s

This career score is cumulative. It rewards repeated separation from third-base peers.

But wRC+ is not cumulative. It is a rate-style offensive measure. That creates an important methodological issue.

Why We Use Average Offensive Score per Qualified Season

Because wRC+ is rate-based, the best predictor is not simply total career offensive score.

A player with many seasons can accumulate a large career score even if his average season was not historically great. Another player with fewer seasons can have a higher offensive level but a lower accumulated score.

So for this validation, the primary predictor is:

\text{Average Offensive Score} = \frac{ \text{Career Offensive Score} }{ \text{Qualified Third-Base Seasons} }

Or:

\text{Average Offensive Score} = \frac{ \sum_{s=1}^{n} \text{Weighted Offensive Season Score}_s }{ n }

Where:

n = \text{number of qualified third-base seasons}

This gives us an offensive quality measure rather than a pure accumulation measure.

That distinction matters.

The cumulative career offensive score still predicts wRC+, but not as well as the average score.

For third-base regulars:

Average offensive score per qualified season:
R² = 0.740

Cumulative career offensive score:
R² = 0.661

The average score is a better validation measure because it matches the rate-like nature of wRC+.

Regression Framework

The basic validation model is:

wRC^+_i = \alpha + \beta_1(\text{Average Offensive Score}_i) + \varepsilon_i

Where:

wRC^+_i = \text{FanGraphs career wRC+ for player } i

\alpha = \text{intercept}

\beta_1 = \text{effect of one additional average offensive z-score point}

\varepsilon_i = \text{residual error}

The fitted model for third-base regulars is:

wRC^+ = 100.89 + 5.41(\text{Average Offensive Score})

R^2 = 0.740

This means that each additional point of average Model C offensive score is associated with about 5.41 points of career wRC+.

A player with an average offensive score of 0 projects near league average:

wRC^+ = 100.89 + 5.41(0)

wRC^+ = 100.89

A player with an average offensive score of 3 projects as:

wRC^+ = 100.89 + 5.41(3)

wRC^+ = 117.12

A player with an average offensive score of 6 projects as:

wRC^+ = 100.89 + 5.41(6)

wRC^+ = 133.35

This is exactly the kind of relationship we hoped to see.

Figure 1: Model Comparison

Figure 1. How well third-base z-scores predict FanGraphs wRC+.

The first figure compares the validation models.

The most important result is:

R^2 = 0.740

for the average offensive score model among regular third basemen.

The cumulative offensive score also performs well:

R^2 = 0.661

But the average score is better because wRC+ is a rate metric.

The traditional defensive score performs very poorly as a wRC+ predictor:

R^2 = 0.022

That is not a problem. It is exactly what we want.

Defense should not predict wRC+ very well. If it did, that would suggest either a hidden confounding problem or a model that was mixing offensive and defensive signals.

The offense-plus-defense model is nearly identical to the offense-only model:

R^2 = 0.741

That small difference tells us that traditional defense adds almost nothing to the prediction of wRC+. Again, this strengthens the interpretation.

The offensive model predicts offense. The defensive model does not.

Figure 2: Average Offensive Z-Score Versus wRC+

Figure 2. Career wRC+ versus average offensive z-score among third-base regulars.

The second figure shows the main relationship directly.

The x-axis is:

\text{Model C Offensive Score per Qualified Third-Base Season}

The y-axis is:

wRC^+

The fitted line is:

wRC^+ = 100.89 + 5.41x

R^2 = 0.740

The upward trend is clear.

Players with high average offensive z-scores tend to have high career wRC+ values. Mike Schmidt, Chipper Jones, Eddie Mathews, George Brett, Wade Boggs, Dick Allen, and Al Rosen all sit in the upper-right region. Players with lower offensive z-score averages tend to have lower wRC+ values.

This is a strong validation of Model C.

The z-score model is not simply rewarding raw counting totals. It is recovering a meaningful offensive signal that corresponds closely to an established offensive metric.

Figure 3: Actual Versus Predicted wRC+

Figure 3. Actual versus predicted career wRC+ using the offense-only model.

The actual-versus-predicted plot shows how well the model estimates wRC+.

The prediction equation is:

\widehat{wRC^+} = 100.89 + 5.41(\text{Average Offensive Score})

The residual is:

\text{Residual}_i = wRC^+_i - \widehat{wRC^+}_i

Players near the diagonal are well predicted. Players above the diagonal have higher wRC+ than the model predicts. Players below the diagonal have lower wRC+ than the model predicts.

This figure shows that the model captures the broad structure very well, but it also shows useful outliers.

That is important.

The purpose of validation is not only to confirm that the model works. It is also to identify where it differs from an established metric.

Figure 4: The Defensive Negative Control

Figure 4. The traditional defensive score does not meaningfully predict wRC+.

The negative-control model is:

wRC^+ = \alpha + \beta_1(\text{Traditional Defensive Score per Qualified Season}) + \varepsilon

The fitted equation is:

wRC^+ = 105.34 - 1.49(\text{Traditional Defensive Score per Qualified Season})

R^2 = 0.022

This means traditional defense explains only about 2.2 percent of the variation in career wRC+ among regular third basemen.

That is a very small relationship.

This is one of the most important findings in the chapter. It shows that the validation is specific. Offensive z-scores predict offensive value. Traditional defensive z-scores do not.

The negative-control test strengthens the model.

It tells us that Model C is not simply identifying famous players or good players in general. It is identifying an offensive quality.

Figure 5: Residuals

Figure 5. Largest wRC+ residuals from the offensive z-score model.

The residual equation is:

\text{Residual}_i = wRC^+_i - \widehat{wRC^+}_i

Positive residuals mean the player’s FanGraphs wRC+ is higher than the z-score model predicts.

Negative residuals mean the player’s FanGraphs wRC+ is lower than the z-score model predicts.

The largest positive residuals include:

Edwin Encarnacion
David Freese
Dick Allen
Cal Ripken Jr.
Deacon White
Joe Torre
Larry Parrish
Wade Boggs

These players had higher wRC+ values than the offensive z-score model predicted.

The largest negative residuals include:

Ossie Vitt
Art Devlin
Jim Gilliam
Jose Ramirez
Billy Werber
Bob Jones
Chone Figgins
Hans Lobert

These players had lower wRC+ values than the model predicted.

The residuals are not merely mistakes. They show where the two systems differ.

Interpreting the Positive Residuals

Positive residuals are especially interesting because they identify players whose wRC+ is better than our average offensive z-score model expects.

For example, Edwin Encarnacion has a large positive residual. His career wRC+ is much stronger than his average third-base z-score profile suggests. This may reflect the fact that much of his offensive identity was formed outside a long traditional third-base career. Since our model focuses on qualified third-base seasons, while FanGraphs career wRC+ reflects his broader batting career, the comparison can produce differences.

David Freese also appears as a positive residual. His wRC+ is higher than expected from the third-base z-score model.

Dick Allen is another important case. He had enormous offensive quality, and his wRC+ remains higher than the model predicts, even though the model already rates him strongly.

Wade Boggs is also above prediction. That may reflect the way wRC+ values his on-base skill and batting quality more directly than a model that also includes runs, RBI, power, and baserunning components.

Interpreting the Negative Residuals

Negative residuals tell the opposite story.

Ossie Vitt is much lower in wRC+ than the offensive z-score model predicts. Art Devlin, Jim Gilliam, Jose Ramirez, Billy Werber, Bob Jones, Chone Figgins, and Hans Lobert also fall below prediction.

These cases require careful interpretation.

Some players may be rewarded in our Model C framework because they separated from their third-base peers in components that do not translate as strongly into wRC+. Runs, RBI, stolen-base value, and contact profile can influence the z-score model differently than wRC+.

Jose Ramirez is especially interesting. The model predicts a higher wRC+ than his current FanGraphs career mark. That may reflect his strong same-position separation across multiple components, including power, walks, baserunning, runs, and RBI. It may also reflect the fact that his career is still active.

A negative residual does not mean the z-score model is wrong. It means the z-score model and wRC+ are measuring offense from different angles.

That difference is analytically useful.

What the wRC+ Validation Shows

The wRC+ validation supports the offensive model in three ways.

First, the relationship is strong:

R^2 = 0.740

Second, the slope is meaningful:

\beta_1 = 5.41

That means each additional average offensive z-score point corresponds to about 5.41 points of wRC+.

Third, the negative-control test works:

R^2_{\text{Defense Only}} = 0.022

Traditional defense does not predict wRC+.

That is exactly what should happen if the model is behaving properly.

Why This Complements the WAR Validation

The WAR validation and wRC+ validation answer different questions.

The WAR validation asked:

Do offense and traditional defense together predict total value?

The answer was yes.

The career-level offense-plus-defense model for regular third basemen had:

R^2 = 0.814

The wRC+ validation asks:

Does the offensive z-score model predict offensive quality?

The answer is also yes.

The average offensive score model has:

R^2 = 0.740

Together, these two validation studies are stronger than either one alone.

WAR validates the broader two-dimensional structure.

wRC+ validates the offensive dimension specifically.

The negative control confirms that the defensive dimension is not pretending to be offense.

This gives the project a stronger methodological foundation.

What the Study Does Not Prove

This chapter should not be overread.

It does not prove that Model C is better than wRC+. It does not prove that wRC+ is perfect. It does not prove that every residual is meaningful. It does not prove that the z-score model captures park effects, full run values, league quality, or all contextual differences.

The FanGraphs file used here is at the career level. That means this chapter does not yet test season-by-season wRC+ against season-by-season z-scores.

A season-level wRC+ study would be even cleaner because it would compare:

\text{Season Offensive Z-Score}

directly against:

\text{Season } wRC^+

That should be the next step if we obtain a season-level FanGraphs export.

For now, this chapter provides strong career-level validation.

Conclusion

The wRC+ validation study answers a direct question:

Do third-base offensive z-scores predict an established offensive metric?

Yes.

Among third-base regulars, the average Model C offensive score per qualified season strongly predicts FanGraphs career wRC+:

wRC^+ = 100.89 + 5.41(\text{Average Offensive Score})

R^2 = 0.740

The cumulative offensive score also predicts wRC+, but less strongly:

R^2 = 0.661

Traditional defense does not meaningfully predict wRC+:

R^2 = 0.022

That is exactly the pattern we wanted.

The offensive model predicts offense.

The defensive model does not.

The combined validation framework now has both breadth and specificity.

The WAR study showed that offense plus defense predicts total value.

The wRC+ study shows that the offensive z-score model predicts offensive quality.

That is a major validation result for the third-base project.

Squam Lake (Flash Fiction)

Kellen was dead, and that was a good thing. She felt safe, as safe as a young woman prancing around the middle of Reverse Vampire territory could. She thought she knew what was what (after all, she was a woman of the world, right?). Lucky for her, I’ve got her back.

Behold all who hear me; I am a modern-day Van Helsing. And, yes, I am talking about THAT Van Helsing.

Author’s Note: Not that I need to brag, but I am a direct descendant of the great Van Helsing. Yeah, howdy, little old me, the man nearly everyone calls Hillbilly Jedediah, carries the DNA of the greatest monster hunter that ever lived. What does your DNA look like once it is untangled and exposed?

My tale won’t take long to tell. I am working on a memoir, but I need to live several hundred more years before any publisher worth their salt will give me a sit-down. So, here it is (such as it is).

It was a day like any other at Squam Lake, androids were dreaming of electric sheep, and the U.S. dollar was in a deadly tug of war with the Japanese Yen. All seemed to be right with the world. Of course, I didn’t sleep; how could I when all h-e-double-hockey-sticks was breaking loose everywhere I looked? I can’t save everyone; that’s impossible; I have to pick and choose. On this day, for reasons beyond my capacity to understand, I decided to give her my attention. Usually, I would say that if someone is foolish enough to go to Reverse Vampire Central (during an RV convention, no less), they deserve whatever they get.

How did I find him out? It’s just one of those things, some real inexplicable nonsense. It was the kind of lapse that can be made 1000 times and never get you into trouble. Maybe it is just lousy RV karma. Maybe he “just ain’t living right,” as every evangelical will tell you is the reason for everything bad that happens to any poor son of a biscuit that happens to zig when they should have zagged. Yeah, it finally happened; I was able to expose him, to show him for what he truly is. I exposed him, I directed a bright light on his deepest colors.

It was a simple e-mail…short, nothing more than a few words. I intercepted it the way I usually do; a simple keylogger sent the message directly to me. “They are tricksy rabbits.” That is all he had to write. What happened next will make your toes curl.

After I received the message, I called her in two seconds. “Get the heck out of there, dagnabbit; he is the one I have been looking for. Evan is the Reverse Vampire! I am sure of it; run as fast as you can.”

She made it two steps before her left hamstring was ripped from her leg. I didn’t want to think about what I knew he would do with the fresh, human meat. One thing is sure: he didn’t like it at room temperature.

I could immediately sense it; I felt her pain. What else could I do? I gathered up my resolve, opened a portal, and headed east. You know, I didn’t have to save her; it wasn’t my job. Looking back, I guess I kind of felt sorry for her. Who knows, maybe I even liked her. I have since given it lots of thought, and I still don’t know why I risked my life that day.

The incantation complete, the portal opened up only a few feet from Evan.

“Put her down, Now!”

Evan looked back at me; he was half-crazed, licking the blood off the detached muscle. I could tell he was silently cursing in his feeble little mind, a half-sized brain with only enough room inside for murder and carnage.

So, I did it; I used The Device. It does take a heck of a toll on me, but, like I said, I guess maybe I like her. As it stands, she is fine (I sent her back to a time just before the trip to Squam Lake), Evan is a fetus (best I could do), and I really need a beer. On second thought, my cousin, Naomi Crump, makes the vilest moonshine I have ever experienced, and I could use a week-long bender.