BASEBALL – Page 2 – Random Thoughts from a Nonlinear Mind

More on AL First Basemen 4 23 26

Here we are, a day later and (hopefully) a little wiser. I have more to say about yesterday’s post on those pesky first basemen. I can now tell you exactly how lucky or unlucky all of them have been so far this season.

BABIP, batting average on balls in play, is the metric used to determine the role of luck in a player’s offensive output. Here is the simple equation:

Where:

H = Hits
HR = Home Runs
AB = At-bats
K = Strikeouts
SF = Sacrifice flies

This equation shows what happens after contact is made and a ball is put in play. League-wide, and this has been true for a long time, players hover around .300. A BABIP of .300 is the de facto gravitational center for all players.

Take a look at this:

This is how we want to read the data. Guerrero (.378), Rice (.378), and Kurtz (.364) have been extremely lucky so far this season. I do not think they can maintain BABIPs that strong much longer. They will certainly regress to the league mean of ~.300.

On the other hand, Naylor (.213) and Pasquantino (.169) have been very unlucky. Those line drives are being hit right at fielders, and the hard-hit ground balls are not finding any holes. Both men should see their offensive production increase as their BABIPs work their way toward .300.

What about our man, Kyle Manzardo? At .317, he has not been unlucky at all. Not only has he not been unlucky, but he has likely benefited from slightly favorable outcomes on balls he has put in play. This means that balls are finding gaps at a slightly elevated rate, defensive positioning or variance is working in his favor, and there is no immediate signal of suppressed results due to bad luck. Once again, I was a bit surprised by this.

Manzardo’s BABIP implies that there should be downward pressure on his offensive production. Remember the graph from the last post? This does not bode well for a player whose output has been last in the league at his position. I am curious to see how this plays out.

AL First Basemen Thru 4 22 26

I got up early today and decided to take a look at what has been happening in the American League with all the first basemen. I was inspired after I heard that Kyle Manzardo was having a very unlucky season at the plate. It happens, and there are very good metrics out there to measure things like luck.

I want to take a glance at what is going on about 25 games into the 2026 season. The first figure shows which players have the most similar production. Notice that Manzardo’s offensive output is most closely related to that of his old teammate, Josh Naylor. Both are off to very slow starts.

The players on the right-hand side of the figure should come as no surprise. What might give you pause is the next figure. Instead of clustering players based on similar offensive numbers, I decided to analyze them only by the categories that help their teams win. In other words, I eliminated things like strikeouts and double plays that the player might have grounded into.

Our man Manzardo is all alone at the bottom of the list. I believe he is too good a player to remain there. The same is true for Naylor, and all the players with low rankings will probably increase their production as the weather warms.

If you study the chart, you will see that Ben Rice has been, by a wide margin, the most productive offensive first baseman in the American League so far this year. I was a bit surprised by this. I will keep an eye on things and report back throughout the season. Depending on the will of the Muses in control of The Boys of Summer, I might expand my horizons to every position in both leagues.

A Few Thoughts on MLB Batting Averages and Scoring

The folks with a serious interest in baseball have been meticulously recording the numbers the game generates since the 19th century, giving us one of the longest continuous statistical datasets in professional sports. Using MLB league totals from 1871 through 2025, I have traced the story of offense through a single, elegant metric: runs per game per team (R/G).

The chart below (based on raw data graciously provided by baseball-reference.com) visualizes the average runs scored per game per team by decade, beginning in the 1920s—an era often considered the dawn of modern baseball. I view 1920 as the beginning of the modern era, mainly due to the standardization of the balls used in the games. Before this date, the balls were haphazardly procured; there were no standards imposed, and none were implied. One game might finish with a score of 43 – 36, and the next might be 2 -1. This was a result of the baseball ( and yes, I mean singular ball) used in the game.

The figure tells an interesting story:

1930s: Offensive explosion. The live-ball era fully matured, and league scoring topped 5 runs per game.
1960s: The “Pitcher’s Decade.” Offense collapsed, bottoming out at 3.7 R/G in 1968—the “Year of the Pitcher.”
1990s: The power surge. League scoring rebounded to nearly 5 runs per game, driven by expansion, smaller parks, and the home-run boom. Surely, there are no other explanations, right? Cough, cough, hack, hack…
2020s: The analytics paradox (but not really). Despite smarter lineups and stronger hitters, offense has fallen again, down to 4.4 R/G in recent seasons. More on this later…

BATTING AVERAGES

While run scoring has fluctuated wildly, the league batting average has remained remarkably stable. From 1920 onward, the overall mean is .262, almost identical to the all-time mark of .260 since 1871.

The highest batting averages came during the explosive decades of the 1920s and 1930s, while today’s hitters hover around .245, the lowest sustained level since the Dead Ball Era (1900-1920).

ANALYTICS

The offensive (and defensive) landscape of MLB can’t be understood without the analytics revolution, which ushered in a seismic shift in how teams interpret performance. It is, without doubt, the most transformative movement in the history of the game.

Baseball’s analytics revolution unfolded in three waves. The first began in the late 1970s, when writer Bill James published his Baseball Abstracts and coined the term “sabermetrics,” introducing a generation of fans and front offices to the idea that baseball could be studied scientifically. The second wave arrived around 2000, when the Oakland Athletics—immortalized in Moneyball—used data-driven roster construction to compete on a small budget. Their success sparked a league-wide shift toward on-base percentage, run efficiency, and market inefficiency analysis. The third and most mind-bending stage came in 2015 with the introduction of Statcast, a tracking technology that measures exit velocity, launch angle, spin rate, and player movement in real time. Together, these eras changed baseball from a sport of intuition to one of precision, where every swing, pitch, and sprint is quantified and optimized.

The following chart overlays those analytical milestones onto league scoring trends. Note how the average runs per game increased steadily until mathematics started to play a central role in baseball strategy.

🟠 2000 – Moneyball / Analytics Era: Teams begin valuing on-base skills and cost efficiency.
🔴 2015 – Statcast Era: Tracking technology transforms player evaluation and biomechanics.

Interestingly, runs per game spiked during the early pre-Moneyball years (late 1990s) but declined sharply once every team adopted similar analytical models. The advantage disappeared as the playing field leveled and pitchers harnessed data to exploit hitters’ weaknesses. League-wide defense also vastly improved; the players had a much better idea of where to position themselves batter by batter and pitch by pitch.

THE APPARENT DATA PARADOX

Baseball-flavored analytics were initially designed to optimize offense, yet their full integration has arguably optimized defense and pitching instead. By 2025, batting averages and runs per game are both at their lowest sustained levels in decades—even as individual player performance is measured with unprecedented precision.

The result is a kind of equilibrium: fewer balls in play, more strikeouts and home runs, and an ongoing debate about whether efficiency has made the game better or simply duller.

And yes, there is a strong correlation between what has happened in baseball and what the 3-point shot has brought to the NBA. Just as basketball front offices realized that a 3-point shot is worth 50% more than a regular 2-point shot, baseball players were strongly advised that a home run is worth a lot more than a single or walk.

Take a moment to look over the following table. I am struck by the downward trend in batting average. It sure seems like the table is calling out for a similar study using on-base and slugging percentages. I will address this issue in a future post.

Metric	1920–2000	2010s	2020s
Avg. Batting Avg. (BA)	.264	.254	.245
Avg. Runs per Game (R/G)	~4.5	4.38	4.45

The 2010s and 2020s mark the first back-to-back decades of declining batting average since the 1960s. Despite this, run scoring remains relatively stable. Interesting, isn’t it? Even though there is only one batter and nine defenders, the offense-minded have concluded that home runs, even with the resultant declines in batting average and on-base percentage, are much more desirable than any other alternatives. This is a big reason why batting averages have gone down, defense and pitching have improved, and average runs per game have stayed consistent.

CONCLUSION

The numbers reveal something profound: baseball’s statistical evolution mirrors its cultural one, suggesting a fundamental constancy in its design. Each new wave of data, whether Bill James’ notebooks or Statcast’s terabytes of data, has changed how players are valued and how teams win. Yet through all of it, the sport’s core equilibrium remains intact. The league batting average, while steadily going down, still results in scoring of about 4½ runs per game—just as it did a hundred years ago. In the end, baseball adapts, but it rarely strays too far from its mathematical mean. I find that very intriguing.

The next post builds on the themes touched on in this short essay. I want to know where all the .300 hitters have gone, and I have decided to write about it. The next post will build on the work of Stephen Jay Gould, one of the most influential and essential evolutionary biologists of the last century. Perhaps most importantly, he was a big baseball fan who used his considerable talents to write about the sport he loved.

Analyzing Max Exit Velocity (2020)

In baseball analytics, exit velocity—specifically, the maximum exit velocity—is a critical metric. It measures the speed at which a ball leaves the bat, providing insights into a player’s power and potential impact. I am looking at max exit velocity data from the 2020 season. This visualization offers a clear and detailed view of how max exit velocities are distributed among players and a smoothed density estimate to reveal underlying trends. My first observation is amazement at how hard these balls are being hit. It is truly astonishing.

Forget batting average; this metric is more diagnostic than many others that are typically (especially historically) referenced. If you are putting a team together, you want players who hit the ball hard. And yes, the harder the better. This line of reasoning is all about a player’s ceiling; it has nothing to do with the dribbling groundballs that find a spot between defenders. Such “seeing eye” base hits are of little predictive value.

In 2020, exit velocity data’s importance escalated as teams began using it for more refined scouting and player development decisions. This season saw an exceptionally high interest in advanced metrics, partly because of the pandemic-shortened season. This led teams and analysts to seek more data-driven insights into player performance.

I used a histogram with an overlayed density curve to visualize max exit velocity data. Here’s what each part of this plot conveys:

Histogram: The histogram separates the exit velocity data into intervals (bins) and shows how many players achieved max exit velocities within each range. Each bar represents a specific range of velocities and provides a quick overview of where most data points (player exit velocities) lie.
Density Curve: The smoothed density curve overlaid on the histogram estimates the data’s distribution, offering insights into how the data might spread beyond discrete bins. This curve helps us visualize peaks and concentration points without the rigidity of bin divisions.

Key Insights from the 2020 Max Exit Velocity Data

Concentration Around the Mean: The density curve reveals a central concentration of exit velocities in the range of approximately 105-111 mph. This concentration suggests that most players in the 2020 season achieved max exit velocities within this range, indicating a consistent performance level among players regarding hitting power.
Distribution Shape: The distribution is symmetric, slightly skewed towards higher velocities. This symmetry is typical in sports metrics, where most players fall near the average performance level while a few outliers achieve exceptional numbers.
High-End Outliers: The density curve and histogram both suggest that a few players in 2020 achieved exceptionally high max exit velocities, reaching up to 118 mph. These outliers represent some of the league’s top power hitters, whose performances exceed the average exit velocities and pose a significant offensive threat to opposing teams. And in case you were wondering, Pete Alonzo of the New York Mets hit a ball at 118.4 mph to lead the league. If facing such a batter, I would point to first base and take my chances with the next guy. If first were occupied, I certainly wouldn’t put anything over the plate. I wouldn’t even see the line drive coming back at me.

Why This Visualization Matters

A histogram with a density curve provides a quantitative view of max exit velocity data. This visualization helps scouts, coaches, and analysts quickly assess the distribution of max exit velocities across players. The density curve also offers a smooth, continuous view of the data, making it easier to observe trends and concentrations without the constraints of bin width.

Closing Thoughts

This histogram with a density overlay captures a snapshot of the league’s hitting power, revealing the typical max exit velocities and highlighting exceptional outliers.

This exemplifies how data analytics can deepen our understanding of baseball. By looking beyond averages and focusing on distribution, we gain a richer perspective on the league’s players. Whether you’re a data enthusiast or a baseball fan, this analysis offers a powerful glimpse into the metrics driving modern baseball.

Exploring Arm Strength in MLB (2020-2024): A Positional Comparison

Introduction

When I think about baseball, arm strength is one of the first things that comes to mind—especially when comparing players across different positions. Whether it’s a third baseman making a quick throw across the diamond (Brooks Robinson, anyone?) or an outfielder firing a rocket from the warning track (Roberto Clemente was awesome), a strong and accurate arm can make all the difference. Recently, I dove into some data from Major League Baseball covering the years 2020 to 2024 to better understand how arm strength varies by position, and I’d like to share what I found.

Comparing Average Arm Strength Across Positions

I started by looking at the average arm strength for each position. Unsurprisingly, outfielders—particularly those in right field—have the strongest arms, while positions like first base require less power behind the throw.

This bar chart shows the average arm strength for each position (excluding catcher) in miles per hour. Outfielders (RF, CF, LF) clearly lead the way, with center fielders and right fielders consistently throwing the hardest. It makes sense: outfielders must make long throws back into the infield, often in critical situations where arm strength is key.

Are you surprised? I might have thought that shortstops would have edged out left fielders and maybe even center fielders. That said, it is close.

As always, box plots allow us to get a more granular view of the raw data. Here is what I found.

Notice the outliers among first basemen. Lots of them get very little on their throws. That is unsurprising; many players are positioned there for their offense, with defense being an afterthought.

As readers of this blog know, I have a special relationship with violin plots. Here is the same data in that form.

Once again, the poor arms of a select group of first basemen are highlighted. I consider that fact to be a big takeaway from this plot.

Infield vs. Outfield: A Clear Difference

Next, I wanted to break things down further and compare infielders’ arm strength versus outfielders. Unsurprisingly, outfielders, who cover more ground and make longer throws, generally have stronger arms.

The box plot below shows the distribution of arm strength between infield and outfield players. Outfielders not only have higher average arm strength, but the range of arm strength is more comprehensive, too. Some outfielders, particularly those in right field, can really get after it when a runner is rounding second.

I would like to tell you something interesting about this plot. Over 35 years ago, I was taught a trick (more properly, a heuristic) at Harvard University. If there is a space between the bodies of the box plots, then the data set is worthy of further exploration. If you look closely, you can see a thin space between the boxes, so I decided to investigate further to see if the differences in arm strength are statistically significant. We will get to that in a bit.

Looking for Patterns: Correlations Between Positions

Before we get to the hard-core statistics, I wanted to explore whether there is a relationship between arm strength at different positions. For instance, do shortstops tend to have arm strength similar to that of second basemen or third basemen? To find out, I ran a correlation analysis.

This heatmap shows how arm strength at one position correlates with another. There are some interesting patterns here—positions like second base (2B) and shortstop (SS) show a strong correlation, likely because they both require quick, strong throws in the infield. The outfield positions also show high correlations with each other, which makes sense given the similar demands placed on their arms.

Here are the Statistics

The results of the one-way ANOVA test (a comparison of variance amongst means) indicate the following:

F-statistic: 261.67

Since the p-value is extremely small (well below the typical significance and totally arbitrary threshold of 0.05), we can reject the null hypothesis. This suggests statistically significant differences in arm strength across the different positions. In other words, the differences in arm strength are authentic and valid.

I have never done this before in my blog, but I decided to take an even deeper dive into this data set. I view this blog as more or less an introduction to what I find interesting. I don’t want to get into the weeds; many blogs and websites do that. Today, though, is different. Early this morning, I ran my 4 miles despite not wanting to get out of bed. My hip, which needs to be replaced, barked the entire time. I guess I am in a mood… Here is what I did next.

group1	group2	meandiff	p-adj	lower	upper	reject
arm_1b	arm_2b	4.0267	0	2.74	5.30	TRUE
arm_1b	arm_3b	8.4252	0	7.1	9.75	TRUE
arm_1b	arm_cf	12.6281	0	11.3	13.92	TRUE
arm_1b	arm_lf	11.1761	0	9.93	12.42	TRUE
arm_1b	arm_rf	13.3679	0	12.1	14.63	TRUE
arm_1b	arm_ss	8.977	0	7.6	10.32	TRUE
arm_2b	arm_3b	4.3985	0	3.2	5.57	TRUE
arm_2b	arm_cf	8.6014	0	7.46	9.738	TRUE
arm_2b	arm_lf	7.1494	0	6.067	8.23	TRUE
arm_2b	arm_rf	9.3412	0	8.2	10.45	TRUE
arm_2b	arm_ss	4.9503	0	3.75	6.14	TRUE
arm_3b	arm_cf	4.2029	0	3.02	5.38	TRUE
arm_3b	arm_lf	2.7509	0	1.61	3.88	TRUE
arm_3b	arm_rf	4.9427	0	3.78	6.10	TRUE
arm_3b	arm_ss	0.5518	0.84	-0.69	1.79	FALSE
arm_cf	arm_lf	-1.4519	0.02	-2.54	-0.35	TRUE
arm_cf	arm_rf	0.7398	0.45	-0.38	1.86	FALSE
arm_cf	arm_ss	-3.6511	0	-4.86	-2.43	TRUE
arm_lf	arm_rf	2.1918	0	1.12	3.26	TRUE
arm_lf	arm_ss	-2.1991	0	-3.36	-1.037	TRUE
arm_rf	arm_ss	-4.3909	0	-5.57	-3.202	TRUE

These are the results from Tukey’s HSD (Honestly Significant Difference) test results that provide pairwise comparisons between arm strengths for different positions. Yeah, I know your eyes are glazing over, but bear with me. Here’s how to interpret the key columns:

Group1 and Group2: These columns represent the two positions being compared. For example, “arm_1b” vs. “arm_2b” compares the arm strength of first basemen with second basemen.
Meandiff: This column shows the difference in the average arm strength between the two groups. A positive number means the arm strength of the first group (Group1) is higher than the second group (Group2).
- For example, the mean difference between first basemen (arm_1b) and second basemen (arm_2b) is 4.03 mph, meaning first basemen tend to have lower arm strength compared to second basemen.
p-adj: This is the adjusted p-value, which tests the statistical significance of the difference. If this value is below 0.05, it indicates that the difference is statistically significant.
- For most comparisons, the p-values are extremely low (0.0), indicating strong evidence that arm strength significantly differs between these positions.
Lower and Upper: These are the confidence intervals for the mean difference. It provides a range within which the actual mean difference will likely fall, with a 95% confidence level.
- For example, the confidence interval for the difference between arm_1b and arm_2b is between 2.75 and 5.31 mph, suggesting that the actual difference lies within this range.
Reject: This column tells whether the difference between the two groups is statistically significant. If it says “True,” the test rejects the null hypothesis, meaning the difference between the two positions is significant.
- In this case, “True” appears in many rows, indicating that the arm strengths differ significantly between most pairs of positions.

Key Insights

Significant differences: Almost all pairwise comparisons show statistically significant differences. For example:
- Outfielders (CF, RF, LF) generally have higher arm strength compared to infielders (1B, 2B, 3B, SS).
- Third basemen (arm_3b) also tend to have higher arm strength than first basemen (arm_1b), as shown by an 8.43 mph difference.
Largest differences: The biggest differences are between infield positions like first base and outfield positions like right field (arm_rf), where the arm strength difference can be over 13 mph.

Even though my hip is killing me, I feel very good about the results of this study.

Wrapping Up

So, what did I learn from all this? First, outfielders—especially those in right and center field—are in a league of their own regarding arm strength. Conversely, infielders don’t need the same power, but positions like third base and shortstop still require strong arms for those quick, long throws.

Running the ANOVA and Tukey’s test confirmed that these differences in arm strength are not random results due to the vagaries of sampling. Understanding these variations can be crucial for teams looking to optimize their defensive lineups or scout new talent.

Examining the data and seeing how arm strength varies across MLB positions was fascinating. I hope you enjoyed it. I am going to grab a beer and contemplate the disappointment of my team, the Cleveland Guardians, disastrously ending another year. Meh, what else is new?

Even More Catcher Info: 2023 Blocking Data

Catcher defense, especially the ability to block pitches, can often go unnoticed but significantly impact the game. Preventing wild pitches and passed balls can save crucial runs and give pitchers confidence to throw in the dirt when necessary. In 2023, several catchers distinguished themselves as exceptional blockers. Let’s take a look at some of the data.

This analysis uses metrics like “blocks above average,” passed balls/wild pitches (PBWP), and more to examine the best catchers at blocking pitches during the season. Below, I break down the data to highlight the elite performers.

1. Top 10 Catchers by Blocks Above Average

“Blocks above average” is a critical statistic that tells us how much better (or worse) a catcher is compared to the league average at blocking pitches. Here’s a look at the top 10 catchers based on this metric:

As shown, Sean Murphy from the Atlanta Braves leads the way with 16 blocks above average, followed closely by Alejandro Kirk and Nick Fortes. These catchers were above average in keeping pitches in front of them, saving runs for their teams.

2. Actual vs. Expected PBWP

Next, take a look at the actual vs. expected number of passed balls and wild pitches (PBWP). The scatter plot below visualizes this comparison:

Catchers whose actual PBWP is lower than expected (below the red line) performed better than average. Catchers like Sean Murphy and J.T. Realmuto are among those outperforming expectations, while others are closer to the expected values. Note that the majority of catchers were about average.

3. Blocks Above Average Per Game

Another critical metric is the rate catchers accumulate blocks above average per game. This accounts for differences in playing time and offers a normalized view of performance. Here’s a look at the top 10 catchers:

The usual suspects are once again prominent. Notice that Yainer Diaz ranked number one in the league in this critical category.

4. Comprehensive Heatmap

To better understand each catcher’s performance, I’ve compiled several blocking metrics into a heatmap. This chart includes statistics such as catcher blocking runs, blocks above average, actual vs. expected PBWP, and blocks above average per game:

The heatmap above gives a comprehensive view of the top 10 catchers. The varying shades show how these catchers compare across multiple metrics, with Sean Murphy, Alejandro Kirk, and Nick Fortes again emerging as the top performers. This heatmap allows us to see the nuances in their blocking ability, with some excelling at reducing passed balls. In contrast, others are better at blocking above average on a per-game basis.

Conclusion

Nuance and subtlety are the operative words here. Asking who was the best defensive catcher in 2023 has as complex and interesting answer. What should we value in a catcher’s defense? Which metric is more important to winning than the others? Can you settle for a below-average pop time if your catcher is brilliant at framing pitches? Lots of great questions that require thoughtful answers. Stay tuned; I will continue posting my analyses. And yes, I do intend to publish some (hopefully) thoughtful conclusions.

Pop Time: A Critical Metric for Catchers

In baseball, a catcher’s Pop Time can be the difference between catching a base-stealer and letting them slide in safely. Pop Time measures how quickly a catcher transfers the ball from their mitt to second base, factoring in the catcher’s footwork, exchange, and arm strength. This metric provides a more comprehensive assessment of a catcher’s defensive capabilities than arm strength alone, making it crucial in evaluating how effectively a catcher can control the running game.

This post explores the distribution of pop times among various MLB catchers, with visualizations such as a histogram, Kernel Density Estimate (KDE) plot, violin plot, and box plot. We’ll also examine some key summary statistics and update the analysis with the best pop times recorded during the 2023 season.

What is Pop Time?

Pop Time is the time it takes for a catcher to throw the ball to second base during a steal or pickoff attempt. It measures the time elapsed from when the pitch hits the catcher’s mitt to when the throw reaches the center of the base. MLB’s average pop time for a throw to second base is 2.01 seconds, but elite catchers are significantly faster.

Pop Time considers three main factors:

Footwork: The catcher’s ability to quickly get into a throwing position.
Exchange: How fast the catcher transfers the ball from the glove to the throwing hand.
Arm Strength: The velocity and speed of the throw.

Catchers with exceptional Pop Times obviously offer a much higher probability of recording an out.

Best Pop Times from 2023

Below are the best average Pop Times to second base on stolen-base attempts (minimum 15 SB attempts) from the 2023 MLB season:

J.T. Realmuto: 1.90 seconds
Yan Gomes: 1.93 seconds
Jorge Alfaro: 1.94 seconds
Austin Hedges: 1.94 seconds
Manny Piña: 1.94 seconds
Gary Sánchez: 1.94 seconds

These elite catchers consistently post Pop Times well below the league average, making them highly effective at throwing out would-be base stealers. J.T. Realmuto, whose reputation proceeds him, leads the pack with an impressive 1.90 seconds.

Pop Time Distribution: A Closer Look

To better understand how Pop Times vary among catchers, I visualized the distribution using a histogram:

The histogram shows that most catchers’ Pop Times cluster around 1.95–2.0 seconds, with very few recording times below 1.90 seconds. The majority of catchers are near the league average of 2.01 seconds, but the elite catchers separate themselves by consistently being faster than this threshold.

Kernel Density Estimate (KDE) Plot

A Kernel Density Estimate (KDE) plot smooths out the distribution to provide a clearer picture of the underlying trends:

The KDE plot highlights the peak of Pop Times around 1.95 seconds, confirming that most catchers perform near this time. The data skews slightly to the right, indicating that a few catchers have slower pop times exceeding 2.0 seconds, but most fall below this threshold.

Violin Plot: Visualizing Distribution and Density

I also created a violin plot, which combines the features of a KDE and a box plot to visualize both the distribution and the density of pop times:

The violin plot shows that most catchers fall within a narrow range of 1.90 to 2.00 seconds. The distribution is dense around 1.95 seconds, with fewer catchers having significantly faster or slower times. This plot also highlights that catchers like J.T. Realmuto are outliers, excelling well beyond the typical range.

Box Plot: Highlighting Key Statistics

The box plot below offers a simple yet informative view of the data, focusing on the central tendency and spread of Pop Times:

Key points from the box plot:

Median Pop Time: 1.97 seconds
Interquartile Range (IQR): Most pop times fall between 1.93 and 1.99 seconds.
Outliers: A few catchers have slower times above 2.0 seconds, but these are rare.

Summary Statistics

The summary statistics for Pop Times further illustrate how closely clustered most catchers are around the league average:

Mean Pop Time: 1.96 seconds
Standard Deviation: 0.051 seconds (indicating low variability)
Minimum Pop Time: 1.83 seconds
Maximum Pop Time: 2.09 seconds
25th Percentile: 1.93 seconds
50th Percentile (Median): 1.97 seconds
75th Percentile: 1.99 seconds

These statistics show that most catchers perform within a narrow band, with the elite catchers falling below 1.90 seconds.

Conclusion

Pop Time is a critical metric for evaluating a catcher’s ability to control the running game. While arm strength is important, Pop Time provides a fuller picture by incorporating footwork and exchange speed. This type of analysis also lets us ignore the pitcher and focus exclusively on the catcher’s skills.

Our analysis of Pop Times using visual tools like histograms, KDE plots, violin plots, and box plots shows that most catchers fall within a narrow range of 1.95 to 2.0 seconds, with a few standout performers excelling beyond this. The data from the 2023 season illustrates how slight differences in Pop Time can significantly impact a catcher’s effectiveness at throwing out base stealers.

For catchers, a fast Pop Time can be the difference between a successful defensive play and allowing the opposing team to gain momentum on the bases. I hope you are enjoying this deep dive into the nuances of catching; I certainly am. It is fascinating, isn’t it?

Whiff Percentages in Baseball: A Little EDA Goes A Long Way

In baseball analytics, understanding a player’s whiff percentage—the rate at which they miss the ball when swinging—can offer key insights into their performance. A higher whiff percentage suggests a tendency to miss pitches, while a lower percentage indicates better contact with the ball.

In this post, I explore whiff percentages from both leagues across several years using three different visualization techniques: box plots, violin plots, and a line plot of medians. Each method offers a unique perspective on the data, and together, they help paint a comprehensive picture of trends in whiff percentages from 2015 to 2023. All players with approximately 200 plate appearances in that given year are included in the study.

1. Box Plot: Visualizing the Distribution by Year

A box plot is a simple yet powerful tool to summarize the distribution of whiff percentages each year. It shows the median (the line within each box), the interquartile range (the box itself), and any outliers (the dots outside the whiskers).

This box plot gives us several insights:

Consistency: In certain years, the boxes are tightly grouped, indicating less variation in whiff percentages (e.g., 2015).
Outliers: Some years have extreme values, shown as dots, which highlight players who either significantly outperformed or underperformed compared to the rest.
Year-to-Year Comparison: The height of the boxes gives a sense of how spread out the whiff percentages were for each year, helping to identify years with more variability in player performance.

Why use a box plot? Box plots are ideal when you want to compare distributions without being distracted by individual data points. It provides a clean, uncluttered view of how the overall performance fluctuated from year to year, and highlights outliers effectively.

2. Violin Plot: Adding Depth to Distribution Analysis

A violin plot enhances the box plot by providing additional information about the shape of the distribution. It combines aspects of a box plot with a kernel density estimate, which helps visualize the probability distribution of the data. I will mention once again that I invented these plots, much to the chagrin of my peers, many decades ago. See my “A Crush, A Data Viz, and a Book Long Postponed” post for that tragic tale.

This violin plot offers some extra depth:

Distribution Shape: You can see how the whiff percentages are spread out within each year. Some years have narrow violins, suggesting that most players had similar whiff percentages, while others are more spread out, indicating more variability.
Density: The wider sections of the violin show where most data points are concentrated, allowing us to see not just the range but also the density of players’ performances in each year.

Why use a violin plot? Violin plots are particularly useful when you want a more nuanced understanding of the data distribution. While box plots are excellent for a high-level summary, violin plots allow us to see the underlying density, which can reveal patterns not visible in box plots alone.

3. Line Plot of Medians: Tracking Trends Over Time

Finally, to understand the overall trend in whiff percentages, I created a line plot of the median whiff percentage for each year. The median is a robust measure of central tendency, making it ideal for highlighting general shifts without being overly influenced by outliers.

This plot shows us:

Overall Trend: The line plot helps reveal whether the median whiff percentage is increasing, decreasing, or remaining stable over time. If the line rises, it suggests that players are missing more swings as the years progress, while a falling line indicates better contact rates.
Key Years: Significant upward or downward trends in specific years are easily spotted. These could prompt further investigation into why such changes occurred, whether due to rule changes, player performance shifts, or other factors.

Why use a line plot? A line plot of medians is the best way to capture the long-term trend. It smooths out individual variations and provides a clear picture of how the “middle” of the data is changing over time.

Conclusion: Insights from Multiple Perspectives

By using these three visualizations—box plots, violin plots, and line plots—we gain a multi-dimensional understanding of whiff percentages in baseball:

The box plot provides a clean, high-level comparison of distributions across years, highlighting outliers and general performance variability.
The violin plot offers a deeper look at how player performances are distributed within each year, revealing the shape and density of the data.
The line plot of medians shows the overall trend, capturing how the middle of the distribution shifts over time.

Each plot tells a part of the story, and when combined, they provide a comprehensive view of player performance over the years. Whether you’re a data enthusiast, baseball analyst, or interested bystander, these tools can help unlock valuable insights into the game. And yes, I find the trend reversal after the 2020 season curious. The great thing about Exploratory Data Analysis (EDA) is that it can strongly suggest what questions must be asked in subsequent stages of analysis. That is certainly what happened here.

Frame This: MLB Catchers (2023)

I took a deeper dive into MLB Catchers for the year 2023. I found lots of interesting stuff. Let’s get to it.

In this post, I decided to focus on catcher framing. Some catchers are better than others in fooling umpires that a ball is a strike. That is what catcher framing is all about. This may surprise some of you, but all this data is now readily available. Every pitch is tracked with impressive accuracy, with terabytes of data generated for each game played.

I created this figure to illustrate the standardized zones used for pitches thrown to home plate. The following is taken from the perspective of the catcher and home plate umpire.

Take Zone 11, for example. The reams of data tell us the percentage of pitches in that area that are taken and called strikes. In 2023, 19.2% of all pitches thrown into that zone were called strikes. Austin Hedges, then a catcher for the world-champion Texas Rangers, managed to get 27.6% of those pitches called strikes by the sweaty man crouching behind him. Get the idea? Hedges’ strike rate for that zone led all of MLB.

Hedges’ work in Zone 13 was even more impressive. The league average for pitches thrown up and away to right-handed batters was 23.6%. Hedges managed to get strike calls on 42.2% of those pitches. Extraordinary.

I ran a Cluster Analysis of all the framing data across all the zones to recognize the top ten catchers in MLB in 2023. Hedges and Patrick Bailey of the San Francisco Giants stand apart based on their superior performance.

And, yes, what is a top ten list without a bottom ten list? There might be a name or two on there that will surprise you.

In a previous post, I had identified J.T. Realmuto as having an outstanding defensive season in 2023. Regarding pitch framing, he ranked a ridiculous 63^rd. I admit, I found that unexpected.

Now, we can move on to something very cool. I have known what heatmaps are for a long time, but I have never needed to create one. It simply never came up. Guess what is next; go ahead.

I want to point out one aspect of this map: Hedges was well below the league average regarding framing pitches in Zone 14. I must admit, that is curious. I do not know why he would be so bad in that area and excel in all the other zones. I have no explanation for that anomalous chunk of data.

And, yes, I also generated a heatmap for the bottom ten catchers in 2023.

Another strange fact is that Martin Maldonado was very good at getting strike calls in Zone 11 but well below average in all the others. Does that have something to do with the pitchers on the Houston Astros in 2023? That line of reasoning might lead to a possible explanation.

I thought that was the end of this post, but I decided to test the new AI release that ChatGPT just dropped. I asked it for recommendations on how it would display this data. It offered up something very cool. Here are Radar Plots of the top 5 and bottom 5 catchers for pitch framing for the 2023 season.

Note that Hedges in Zone 13 and Miguel Amaya in Zone 17 stand out.

These plots are beautiful, but I haven’t decided on their utility. Are they diagnostic enough to merit their use? We will look more into that question in future posts.

At least for now, the takeaway is that determining the best defensive catcher in 2023 is much more subtle and nuanced than one might have imagined. Stay tuned; there is more to come.

Baseball Has a Strange Math Issue

My last post was about the defensive capabilities of MLB catchers in 2023. I mentioned that there was more to come. As I was researching the follow-up post, I came across something bizarre. As soon as I stop violently shaking my head back and forth, I will show you what I found.

This post was supposed to be about framing pitches. Some catchers are very good at fooling umpires into calling strikes on pitches that are actually balls. There is lots of excellent data to quantify the ability of any catcher to do this. As you might guess, this is a precious skill that any team would want to have in their catcher.

As I reviewed the data and put together a strategy to analyze and visualize it for the post, I realized that I needed to draw pictures of home base, more commonly called home plate. Why home base, then? That is what it is called in the official baseball rule book. How did I end up on a web page showing those rules? That is an excellent question.

I searched for the dimensions of home plate; it wasn’t something I had committed to memory. Trust me, I know the numbers now, and I doubt I will ever forget. Here’s why…

The following paragraph is taken from Official Baseball Rules, 2024 edition, published by the Office of the Commissioner of Baseball.

2.02 Home Base. Home base shall be marked by a five-sided slab of whitened rubber. It shall be a 17-inch square with two of the corners removed so that one edge is 17 inches long, two adjacent sides are 8½ inches and the remaining two sides are 12 inches and set at an angle to make a point.

So, what’s the big deal? The rule book describes an impossible figure. The shape described does not, and cannot, exist. Unbelievable, isn’t it? Look at the drawing I conjured up.

Figure 1. Home plate as it should be and home plate as described in rule book.

I suppose a lawyer could litigate this. It seems that the intent was for the angle formed at the point to be 90 degrees, which it clearly is not when following the description from the rule book. It takes slightly more than 12 inches to meet the requirements of Pythagoras and his ubiquitous theorem. Is Major League Baseball concerned about this? Apparently not. Am I concerned that they have fudged a famous trigonometry theorem? I’ll crank up some Mozart and mull it over for a bit. My guess is I won’t lose much sleep.