2018 AL WAR vs OPS

 

The scatterplot titled “2018 AL WAR vs OPS (Colored by Position)” visually explores the relationship between Wins Above Replacement (WAR) and On-base Plus Slugging (OPS) for players in the American League during the 2018 season. Each point on the plot represents a player, with OPS on the x-axis and WAR on the y-axis, and the points are colored according to the player’s position. This allows us to observe how players across different positions performed in terms of their offensive output and overall contribution to their teams.

Notably, the plot highlights standout players such as Mookie Betts and Mike Trout, who are positioned in the upper right corner, indicating their exceptional performance. Betts, then an outfielder for the Boston Red Sox, and Trout, still a center fielder for the Los Angeles Angels, both had extremely high OPS and WAR values. Their positions in the plot underscore their status as two of the most valuable players in the league during the 2018 season.

In contrast, Chris Davis, a first baseman for the Baltimore Orioles, is positioned in the lower-left corner of the plot. Davis had one of the lowest OPS and WAR values in 2018, indicating his struggles. The spread of points across the plot also reveals how different positions cluster in certain areas, with players like Davis standing out as outliers in underperformance. At the same time, Betts and Trout exemplify top-tier performance. This is a pretty cool visualization of this type of data. I find scatterplots useful.

 

Here’s a Little 3D For You

 

How is this for a different perspective? The 3D Cluster Analysis of 2023 National League (NL) shortstops visually represents player performance using an extra dimension, highlighting their key differences and similarities. Using a sophisticated technique called Principal Component Analysis (PCA), the high-dimensional performance metrics of the shortstops were reduced to three principal components, which encapsulate most of the variance in the data. This dimensionality reduction (or expansion, if you prefer) allows for a clear visualization in three-dimensional space, where each player’s metrics reflect their overall performance. The players are grouped into three distinct clusters, each represented by a different color, providing insights into how these athletes compare to one another based on their statistics.

The clusters were determined using the K-means clustering algorithm (much more of that down the line), which groups players with similar performance metrics into the same cluster. As earlier, the plot reveals three main clusters: Cluster 1 in blue, Cluster 2 in green, and Cluster 3 in red. Each cluster represents a subset of players with comparable performance profiles. For instance, the player in Cluster 3 (Mookie Betts), shown in red, exhibits stronger or more consistent performance in certain areas, distinguishing him from those in the other clusters.

Unsurprisingly, Betts is once again highlighted in the analysis. Notice that he is off by himself in red, focusing our attention. This emphasis allows for a closer examination of where Betts stands relative to his peers in the 2003 NL shortstop group. While I do believe that the two-dimensional plot from the last post is more diagnostic, no one can deny how cool the 3D plot looks. And that is why I published this post.

 

Scales of Unusualness: Offensive Production of NL Shortstops in 2023

To the surprise of no one, Mookie Betts was, by far, the most unusual offensive performer last year among NL shortstops. If you study the plot, you can follow the line connecting Betts to the other players.

 

Betts is a cluster of one. His offensive production was so far above all the other shortstops that no one could cluster with him. And that, I must say, is highly unusual.

 

Scales of Unusualness: Offensive Production of AL Second Basemen in 2009

The title of this post is unusual (you see what I did). Why? Historically, I have chosen titles inspired by the band Arctic Monkeys. They had a propensity for overly long song titles that may or may not have anything to do with the actual song. For the longest time, they were my favorite band. The last two CDs, though, have given me pause. I listened to both of them hundreds of times, and I must admit that I just don’t get it. Maybe on some deep (nearly subconscious) level, I have given this post a most unlikely title as a mild form of protest. I am dubious of my potential impact.

This short essay is about the offensive production of American League second basemen in 2009. I will view each player’s statistics through an Explanatory Data Analysis lens, specifically by creating different Scales of Unusualness.

Here is a table of some of the data used for the study. Of course, the variables would be very different if we used players from last year. In 2009, no exit velocity or launch angle data were available. Even with this partial data set, many advanced metrics were eliminated from the table to make it more legible. Variations of these variables (and the others) were used to create the plot, which will appear shortly.

Table 1. AL Second Basemen, 2009

If we were just considering batting average (BA), it would be easy to rank the players. In fact, the players happen to be ranked in descending order based on that column, with Cano leading the way at .320. What happens if we want to consider all the variables together? Human brains aren’t very good at that task, but computers have no problem.

Next comes the scale that was referenced earlier. We can take each column and standardize the data by giving each value a z-score. A z-score measures the distance of a number in standard deviations from the mean of the data sample, in this case, all the columns with a “Z” prefix. Cano had the highest batting average, and you can see that his z-score for “Z_BA” is 1.68, which is more than double the next highest number in the column. That means his batting average for that year was highly unusual compared to other AL second basemen.

Table 2. AL Second Basemen Z-Scores, 2009

One interesting note. Look at the table and see if you can determine the two most unusual players when all the data is considered. I don’t think it can be done. There are eight variables, and that is six or seven too many. Fortunately, a technique called Cluster Analysis quickly solves the problem. Below is a Cluster Tree, or Dendrogram, of the computer’s analysis.

 

Figure 1. Dendrogram

The plot shows two large categories, those with high and those with relatively low offensive productivity. Among the top performers, the software identified Ben Zobrist as the most unusual. That means that Zobrist had the best offensive season of any AL second baseman that year. If you study the plot, you will see that Alberto Callaspo finishes a close second.

I would like to point out a couple more things. The plot shows that Maicer Izturis and Howie Kendrick had the most similar seasons. Their statistics were highly correlated in their unusualness with respect to the other players. Who knew?

So, as you might have guessed, there is a payoff to this post. A Scale of Unusualness doesn’t just identify the best or most productive offensive player; it works equally well on both ends of the scale. The most unusual offensive second baseman in the AL in 2009 wasn’t Zobrist; it was the unfortunate Nick Punto (with Chris Getz closing fast). Punto was much more unusually bad than Zobrist was unusually good. My guess is that when I include defensive metrics, Punto will more than redeem himself. You can’t play from 2001 – 2014 (and win a World Series) without being a big-time player. This one-year snapshot does not do him justice. Maybe I will post the defensive analysis next. Perhaps I will include offense and defense together in a more comprehensive study. Now that I think of it, I should take a break and give those Arctic Monkeys’ CDs another 300 listens.

 

19 Percent…huh?

19 Percent…huh?

I spent a lot of time putting together the lone figure in this post. My forthcoming baseball book will be filled with plots like the one that follows. I have known many people whose eyes glaze over when presented with figures or graphs (including professors who should know better). Pay a little attention to this one; you will be rewarded.

Between 2004 and 2008, there was a growing disparity in the payrolls of clubs in Major League Baseball. Lots was written about the unfairness of this. I agree with those who thought it outrageous that one team could spend 8 or 9 times what others could afford to pay their players. Consequently, every season began with plenty of fan bases lamenting the stone-cold truth that their teams had no chance to compete for a title or make the playoffs.

Growing up as a fan of the team then known as the Cleveland Indians, I knew that as soon as a young player started to excel, he was on his way out of town. It was a simple fact that other larger market clubs could easily outbid us for a young star’s services. Such was life in the big city.

Every year, big-money teams seemed to crush the less fortunate, and no one seemed to care. The fact that always got me going was that if a team (think Yankees), signed a player to a big contract and that guy floundered, all they did was treat the signing as a sunken cost and go about their business. Clubs like Cleveland, on the other hand, could be crippled by one bad signing. That is a statement of fact.

So, let’s see if we can gain some insight. One of the great things about a scientific mindset is that we can cut through the narratives and what people think is true, and get at the mathematical heart of the issue at hand. The following figure does just that.

I plotted payroll data from 2004 through 2008 against the winning percentage of all MLB teams. I colored the data points using a playoffs variable to simplify the plot. I think it makes it more interesting and easier to read.

Figure 1. 2004 – 2008 MLB Payroll versus Win Percentage Data.

The scatterplot is basically a blob (yes, the Yankees are in the upper right corner). That means a minimal relationship exists between a team’s payroll and that team’s record, at least for these 5 years. Note the equation in the lower right of the plot. That means a team’s payroll explains only 19 percent of all the MLB team’s record. In other words, there was very little explanatory value in predicting the number of games a team would win if you knew their payroll. The relationship between payroll and record is minimal.

Surprised? Well…if payroll is not a predictor of a team’s success, what is? If payroll accounts for about 19 percent of the outcome; what explains the other 81 percent? I will be looking into that in my book. Perhaps I will find that left-handed middle relievers are the key to success (I doubt it), or maybe if you are putting together a team, you need batters with high exit velocities, pitchers with exceptional strikeout rates, and outfielders who can run like the wind. I am going to try my best to find out.