Pop Time: A Critical Metric for Catchers

In baseball, a catcher’s Pop Time can be the difference between catching a base-stealer and letting them slide in safely. Pop Time measures how quickly a catcher transfers the ball from their mitt to second base, factoring in the catcher’s footwork, exchange, and arm strength. This metric provides a more comprehensive assessment of a catcher’s defensive capabilities than arm strength alone, making it crucial in evaluating how effectively a catcher can control the running game.

This post explores the distribution of pop times among various MLB catchers, with visualizations such as a histogram, Kernel Density Estimate (KDE) plot, violin plot, and box plot. We’ll also examine some key summary statistics and update the analysis with the best pop times recorded during the 2023 season.


What is Pop Time?

Pop Time is the time it takes for a catcher to throw the ball to second base during a steal or pickoff attempt. It measures the time elapsed from when the pitch hits the catcher’s mitt to when the throw reaches the center of the base. MLB’s average pop time for a throw to second base is 2.01 seconds, but elite catchers are significantly faster.

Pop Time considers three main factors:

  • Footwork: The catcher’s ability to quickly get into a throwing position.
  • Exchange: How fast the catcher transfers the ball from the glove to the throwing hand.
  • Arm Strength: The velocity and speed of the throw.

Catchers with exceptional Pop Times obviously offer a much higher probability of recording an out.


Best Pop Times from 2023

Below are the best average Pop Times to second base on stolen-base attempts (minimum 15 SB attempts) from the 2023 MLB season:

  • J.T. Realmuto: 1.90 seconds
  • Yan Gomes: 1.93 seconds
  • Jorge Alfaro: 1.94 seconds
  • Austin Hedges: 1.94 seconds
  • Manny Piña: 1.94 seconds
  • Gary Sánchez: 1.94 seconds

These elite catchers consistently post Pop Times well below the league average, making them highly effective at throwing out would-be base stealers. J.T. Realmuto, whose reputation proceeds him, leads the pack with an impressive 1.90 seconds.


Pop Time Distribution: A Closer Look

To better understand how Pop Times vary among catchers, I visualized the distribution using a histogram:

The histogram shows that most catchers’ Pop Times cluster around 1.95–2.0 seconds, with very few recording times below 1.90 seconds. The majority of catchers are near the league average of 2.01 seconds, but the elite catchers separate themselves by consistently being faster than this threshold.


Kernel Density Estimate (KDE) Plot

A Kernel Density Estimate (KDE) plot smooths out the distribution to provide a clearer picture of the underlying trends:

The KDE plot highlights the peak of Pop Times around 1.95 seconds, confirming that most catchers perform near this time. The data skews slightly to the right, indicating that a few catchers have slower pop times exceeding 2.0 seconds, but most fall below this threshold.


Violin Plot: Visualizing Distribution and Density

I also created a violin plot, which combines the features of a KDE and a box plot to visualize both the distribution and the density of pop times:

The violin plot shows that most catchers fall within a narrow range of 1.90 to 2.00 seconds. The distribution is dense around 1.95 seconds, with fewer catchers having significantly faster or slower times. This plot also highlights that catchers like J.T. Realmuto are outliers, excelling well beyond the typical range.


Box Plot: Highlighting Key Statistics

The box plot below offers a simple yet informative view of the data, focusing on the central tendency and spread of Pop Times:

Key points from the box plot:

  • Median Pop Time: 1.97 seconds
  • Interquartile Range (IQR): Most pop times fall between 1.93 and 1.99 seconds.
  • Outliers: A few catchers have slower times above 2.0 seconds, but these are rare.

Summary Statistics

The summary statistics for Pop Times further illustrate how closely clustered most catchers are around the league average:

  • Mean Pop Time: 1.96 seconds
  • Standard Deviation: 0.051 seconds (indicating low variability)
  • Minimum Pop Time: 1.83 seconds
  • Maximum Pop Time: 2.09 seconds
  • 25th Percentile: 1.93 seconds
  • 50th Percentile (Median): 1.97 seconds
  • 75th Percentile: 1.99 seconds

These statistics show that most catchers perform within a narrow band, with the elite catchers falling below 1.90 seconds.


Conclusion

Pop Time is a critical metric for evaluating a catcher’s ability to control the running game. While arm strength is important, Pop Time provides a fuller picture by incorporating footwork and exchange speed. This type of analysis also lets us ignore the pitcher and focus exclusively on the catcher’s skills.

Our analysis of Pop Times using visual tools like histograms, KDE plots, violin plots, and box plots shows that most catchers fall within a narrow range of 1.95 to 2.0 seconds, with a few standout performers excelling beyond this. The data from the 2023 season illustrates how slight differences in Pop Time can significantly impact a catcher’s effectiveness at throwing out base stealers.

For catchers, a fast Pop Time can be the difference between a successful defensive play and allowing the opposing team to gain momentum on the bases. I hope you are enjoying this deep dive into the nuances of catching; I certainly am. It is fascinating, isn’t it?

Whiff Percentages in Baseball: A Little EDA Goes A Long Way

In baseball analytics, understanding a player’s whiff percentage—the rate at which they miss the ball when swinging—can offer key insights into their performance. A higher whiff percentage suggests a tendency to miss pitches, while a lower percentage indicates better contact with the ball.

In this post, I explore whiff percentages from both leagues across several years using three different visualization techniques: box plots, violin plots, and a line plot of medians. Each method offers a unique perspective on the data, and together, they help paint a comprehensive picture of trends in whiff percentages from 2015 to 2023. All players with approximately 200 plate appearances in that given year are included in the study.


1. Box Plot: Visualizing the Distribution by Year

A box plot is a simple yet powerful tool to summarize the distribution of whiff percentages each year. It shows the median (the line within each box), the interquartile range (the box itself), and any outliers (the dots outside the whiskers).

This box plot gives us several insights:

  • Consistency: In certain years, the boxes are tightly grouped, indicating less variation in whiff percentages (e.g., 2015).
  • Outliers: Some years have extreme values, shown as dots, which highlight players who either significantly outperformed or underperformed compared to the rest.
  • Year-to-Year Comparison: The height of the boxes gives a sense of how spread out the whiff percentages were for each year, helping to identify years with more variability in player performance.

Why use a box plot? Box plots are ideal when you want to compare distributions without being distracted by individual data points. It provides a clean, uncluttered view of how the overall performance fluctuated from year to year, and highlights outliers effectively.


2. Violin Plot: Adding Depth to Distribution Analysis

A violin plot enhances the box plot by providing additional information about the shape of the distribution. It combines aspects of a box plot with a kernel density estimate, which helps visualize the probability distribution of the data. I will mention once again that I invented these plots, much to the chagrin of my peers, many decades ago. See my “A Crush, A Data Viz, and a Book Long Postponed” post for that tragic tale.

This violin plot offers some extra depth:

  • Distribution Shape: You can see how the whiff percentages are spread out within each year. Some years have narrow violins, suggesting that most players had similar whiff percentages, while others are more spread out, indicating more variability.
  • Density: The wider sections of the violin show where most data points are concentrated, allowing us to see not just the range but also the density of players’ performances in each year.

Why use a violin plot? Violin plots are particularly useful when you want a more nuanced understanding of the data distribution. While box plots are excellent for a high-level summary, violin plots allow us to see the underlying density, which can reveal patterns not visible in box plots alone.


3. Line Plot of Medians: Tracking Trends Over Time

Finally, to understand the overall trend in whiff percentages, I created a line plot of the median whiff percentage for each year. The median is a robust measure of central tendency, making it ideal for highlighting general shifts without being overly influenced by outliers.

This plot shows us:

  • Overall Trend: The line plot helps reveal whether the median whiff percentage is increasing, decreasing, or remaining stable over time. If the line rises, it suggests that players are missing more swings as the years progress, while a falling line indicates better contact rates.
  • Key Years: Significant upward or downward trends in specific years are easily spotted. These could prompt further investigation into why such changes occurred, whether due to rule changes, player performance shifts, or other factors.

Why use a line plot? A line plot of medians is the best way to capture the long-term trend. It smooths out individual variations and provides a clear picture of how the “middle” of the data is changing over time.


Conclusion: Insights from Multiple Perspectives

By using these three visualizations—box plots, violin plots, and line plots—we gain a multi-dimensional understanding of whiff percentages in baseball:

  • The box plot provides a clean, high-level comparison of distributions across years, highlighting outliers and general performance variability.
  • The violin plot offers a deeper look at how player performances are distributed within each year, revealing the shape and density of the data.
  • The line plot of medians shows the overall trend, capturing how the middle of the distribution shifts over time.

Each plot tells a part of the story, and when combined, they provide a comprehensive view of player performance over the years. Whether you’re a data enthusiast, baseball analyst, or interested bystander, these tools can help unlock valuable insights into the game. And yes, I find the trend reversal after the 2020 season curious. The great thing about Exploratory Data Analysis (EDA) is that it can strongly suggest what questions must be asked in subsequent stages of analysis. That is certainly what happened here.

 

Frame This: MLB Catchers (2023)

I took a deeper dive into MLB Catchers for the year 2023. I found lots of interesting stuff. Let’s get to it.

In this post, I decided to focus on catcher framing. Some catchers are better than others in fooling umpires that a ball is a strike. That is what catcher framing is all about. This may surprise some of you, but all this data is now readily available. Every pitch is tracked with impressive accuracy, with terabytes of data generated for each game played.

I created this figure to illustrate the standardized zones used for pitches thrown to home plate. The following is taken from the perspective of the catcher and home plate umpire.

Take Zone 11, for example. The reams of data tell us the percentage of pitches in that area that are taken and called strikes. In 2023, 19.2% of all pitches thrown into that zone were called strikes. Austin Hedges, then a catcher for the world-champion Texas Rangers, managed to get 27.6% of those pitches called strikes by the sweaty man crouching behind him. Get the idea? Hedges’ strike rate for that zone led all of MLB.

Hedges’ work in Zone 13 was even more impressive. The league average for pitches thrown up and away to right-handed batters was 23.6%. Hedges managed to get strike calls on 42.2% of those pitches. Extraordinary.

I ran a Cluster Analysis of all the framing data across all the zones to recognize the top ten catchers in MLB in 2023. Hedges and Patrick Bailey of the San Francisco Giants stand apart based on their superior performance.

And, yes, what is a top ten list without a bottom ten list? There might be a name or two on there that will surprise you.

In a previous post, I had identified J.T. Realmuto as having an outstanding defensive season in 2023. Regarding pitch framing, he ranked a ridiculous 63rd. I admit, I found that unexpected.

Now, we can move on to something very cool. I have known what heatmaps are for a long time, but I have never needed to create one. It simply never came up. Guess what is next; go ahead.

I want to point out one aspect of this map: Hedges was well below the league average regarding framing pitches in Zone 14. I must admit, that is curious. I do not know why he would be so bad in that area and excel in all the other zones. I have no explanation for that anomalous chunk of data.

And, yes, I also generated a heatmap for the bottom ten catchers in 2023.

Another strange fact is that Martin Maldonado was very good at getting strike calls in Zone 11 but well below average in all the others. Does that have something to do with the pitchers on the Houston Astros in 2023? That line of reasoning might lead to a possible explanation.

I thought that was the end of this post, but I decided to test the new AI release that ChatGPT just dropped. I asked it for recommendations on how it would display this data. It offered up something very cool. Here are Radar Plots of the top 5 and bottom 5 catchers for pitch framing for the 2023 season.

Note that Hedges in Zone 13 and Miguel Amaya in Zone 17 stand out.

These plots are beautiful, but I haven’t decided on their utility. Are they diagnostic enough to merit their use? We will look more into that question in future posts.

At least for now, the takeaway is that determining the best defensive catcher in 2023 is much more subtle and nuanced than one might have imagined. Stay tuned; there is more to come.

 

Baseball Has a Strange Math Issue

Baseball Has a Strange Math Issue

My last post was about the defensive capabilities of MLB catchers in 2023. I mentioned that there was more to come. As I was researching the follow-up post, I came across something bizarre. As soon as I stop violently shaking my head back and forth, I will show you what I found.

This post was supposed to be about framing pitches. Some catchers are very good at fooling umpires into calling strikes on pitches that are actually balls. There is lots of excellent data to quantify the ability of any catcher to do this. As you might guess, this is a precious skill that any team would want to have in their catcher.

As I reviewed the data and put together a strategy to analyze and visualize it for the post, I realized that I needed to draw pictures of home base, more commonly called home plate. Why home base, then? That is what it is called in the official baseball rule book. How did I end up on a web page showing those rules? That is an excellent question.

I searched for the dimensions of home plate; it wasn’t something I had committed to memory. Trust me, I know the numbers now, and I doubt I will ever forget. Here’s why…

The following paragraph is taken from Official Baseball Rules, 2024 edition, published by the Office of the Commissioner of Baseball.

2.02 Home Base. Home base shall be marked by a five-sided slab of whitened rubber. It shall be a 17-inch square with two of the corners removed so that one edge is 17 inches long, two adjacent sides are 8½ inches and the remaining two sides are 12 inches and set at an angle to make a point.

So, what’s the big deal? The rule book describes an impossible figure. The shape described does not, and cannot, exist. Unbelievable, isn’t it? Look at the drawing I conjured up.

 

Figure 1. Home plate as it should be and home plate as described in rule book.

 

I suppose a lawyer could litigate this. It seems that the intent was for the angle formed at the point to be 90 degrees, which it clearly is not when following the description from the rule book. It takes slightly more than 12 inches to meet the requirements of Pythagoras and his ubiquitous theorem. Is Major League Baseball concerned about this? Apparently not. Am I concerned that they have fudged a famous trigonometry theorem? I’ll crank up some Mozart and mull it over for a bit. My guess is I won’t lose much sleep.

 

Scales of Unusualness: 2023 MLB Catchers (Defense)

The hierarchical cluster tree, or dendrogram, visualizes the relationships among 2023 MLB catchers based on their defensive statistics. As always, players who are closer together on the x-axis have similar defensive profiles, meaning their statistics in categories like putouts, assists, errors, and caught stealing percentage are more alike. The height of the horizontal lines (distance) indicates how similar or dissimilar players are: the lower the line, the more similar the players are in their defensive performance.

The visualization highlights individual performance and helps teams or analysts compare players across a wide range of defensive metrics. For example, catchers clustered together likely share similar defensive styles or capabilities, making it easier to compare catchers in terms of their effectiveness behind the plate. Furthermore, the dendrogram’s structure shows which players stand out as outliers due to superior or weaker performance compared to their peers, giving teams valuable insights for recruitment, strategy, or training decisions.

Note that J.T. Realmuto is off by himself. Despite not receiving a Gold Glove Award, his defensive performance in 2023 was ostensibly exceptional. In a future post, I will drill down into the advanced metrics to see why he was overlooked. Don’t be surprised if the dendrogram I created in this post is deemed suspect in a few days or so.

 

 

Pitching is (or was) more Important than Hitting? Who knew?

 

This analysis examines the relationship between a team’s On-base Plus Slugging (OPS) and their total wins in Major League Baseball (MLB) over a five-year period from 2004 to 2008. OPS is a key statistic in baseball that combines on-base percentage and slugging percentage, providing a comprehensive measure of a player’s (or team’s) ability to get on base and hit for power. The scatterplot visualizes this relationship, with each point representing a team’s OPS and corresponding number of wins for a particular season. The data points are colored by year, allowing us to observe any patterns or trends across the seasons. That factor proved not to be very useful.

A linear regression model was applied to determine if there is a significant correlation between OPS and team wins. The analysis revealed an R-squared value of 0.196. The R-squared value indicates that approximately 19.6% of the variance in team wins can be explained by their OPS, suggesting a moderate correlation. While OPS is a useful statistic, the relatively low R-squared value implies that other factors, such as pitching, defense, and managerial decisions, also play a significant role in determining a team’s success over a season.

The analysis covers data from five consecutive MLB seasons, providing a broad overview of the relationship between OPS and wins over multiple years. The consistency of the trend line and equation across the years indicates that the OPS-wins relationship is relatively stable during this time period.  However, given the moderate R-squared value, this analysis suggests that while OPS is an important metric for assessing team performance, it should be considered alongside other variables for a more comprehensive understanding of what drives a team’s success.

In a recent post, I demonstrated that WHIP is much more predictive of a team’s record than OPS, at least in the mid-2000s. I don’t think anyone will be surprised to learn that pitching is more important than hitting if you want to win baseball games. There will be more on that and related topics coming soon.

 

Now, Isn’t This Interesting?

 

This scatterplot visualizes the relationship between a baseball team’s WHIP (Walks plus Hits per Inning Pitched) and the number of wins they achieved during the seasons from 2004 to 2008. I included both the AL and NL in this analysis. Each point on the graph represents a team in a specific year, with the color indicating the corresponding season. The WHIP is plotted on the x-axis, while the number of wins is plotted on the y-axis. This visualization allows us to observe if there is a pattern or trend between these two variables across different years.

A trendline, represented by a solid red line, has been added to the scatterplot, which provides a general indication of the relationship between WHIP and wins. The slope of the line suggests that as WHIP increases, the number of wins tends to decrease. The strength of this relationship is indicated by the R-squared value of 0.49, meaning that WHIP accounts for approximately 49% of the variability in the number of wins. This moderate R-squared value suggests a fairly significant correlation between the two variables.

In summary, the scatterplot illustrates a moderate negative correlation between WHIP and team wins, indicating that WHIP is a meaningful factor in a team’s success, though not the sole determinant. Including both leagues from 2004 to 2008 allows for an interesting, if limited, analysis over multiple seasons, with the trendline and R-squared value providing insights into the overall pattern between these two metrics. This plot highlights the importance of WHIP in predicting team performance while suggesting that other factors certainly contribute to a team’s total wins.

Here is a scatterplot illustrating wins in terms of team ERA (earned run average). When I was a kid, I didn’t think ERA was very valuable, and the following plot shows that it has less explanatory value than WHIP.

As we saw in a previous post, payroll differences explained approximately 19 percent of the variability in win totals. Team ERA explains about 44 percent of the variability, while the WHIP metric has more explanatory value (49 percent) when determining what leads to wins in major league baseball. I will keep posting more information as my research progresses.

 

2018 AL WAR vs OPS

 

The scatterplot titled “2018 AL WAR vs OPS (Colored by Position)” visually explores the relationship between Wins Above Replacement (WAR) and On-base Plus Slugging (OPS) for players in the American League during the 2018 season. Each point on the plot represents a player, with OPS on the x-axis and WAR on the y-axis, and the points are colored according to the player’s position. This allows us to observe how players across different positions performed in terms of their offensive output and overall contribution to their teams.

Notably, the plot highlights standout players such as Mookie Betts and Mike Trout, who are positioned in the upper right corner, indicating their exceptional performance. Betts, then an outfielder for the Boston Red Sox, and Trout, still a center fielder for the Los Angeles Angels, both had extremely high OPS and WAR values. Their positions in the plot underscore their status as two of the most valuable players in the league during the 2018 season.

In contrast, Chris Davis, a first baseman for the Baltimore Orioles, is positioned in the lower-left corner of the plot. Davis had one of the lowest OPS and WAR values in 2018, indicating his struggles. The spread of points across the plot also reveals how different positions cluster in certain areas, with players like Davis standing out as outliers in underperformance. At the same time, Betts and Trout exemplify top-tier performance. This is a pretty cool visualization of this type of data. I find scatterplots useful.

 

Here’s a Little 3D For You

 

How is this for a different perspective? The 3D Cluster Analysis of 2023 National League (NL) shortstops visually represents player performance using an extra dimension, highlighting their key differences and similarities. Using a sophisticated technique called Principal Component Analysis (PCA), the high-dimensional performance metrics of the shortstops were reduced to three principal components, which encapsulate most of the variance in the data. This dimensionality reduction (or expansion, if you prefer) allows for a clear visualization in three-dimensional space, where each player’s metrics reflect their overall performance. The players are grouped into three distinct clusters, each represented by a different color, providing insights into how these athletes compare to one another based on their statistics.

The clusters were determined using the K-means clustering algorithm (much more of that down the line), which groups players with similar performance metrics into the same cluster. As earlier, the plot reveals three main clusters: Cluster 1 in blue, Cluster 2 in green, and Cluster 3 in red. Each cluster represents a subset of players with comparable performance profiles. For instance, the player in Cluster 3 (Mookie Betts), shown in red, exhibits stronger or more consistent performance in certain areas, distinguishing him from those in the other clusters.

Unsurprisingly, Betts is once again highlighted in the analysis. Notice that he is off by himself in red, focusing our attention. This emphasis allows for a closer examination of where Betts stands relative to his peers in the 2003 NL shortstop group. While I do believe that the two-dimensional plot from the last post is more diagnostic, no one can deny how cool the 3D plot looks. And that is why I published this post.

 

Scales of Unusualness: Offensive Production of NL Shortstops in 2023

To the surprise of no one, Mookie Betts was, by far, the most unusual offensive performer last year among NL shortstops. If you study the plot, you can follow the line connecting Betts to the other players.

 

Betts is a cluster of one. His offensive production was so far above all the other shortstops that no one could cluster with him. And that, I must say, is highly unusual.