Scales of Unusualness: Offensive Production of AL Second Basemen in 2009

The title of this post is unusual (you see what I did). Why? Historically, I have chosen titles inspired by the band Arctic Monkeys. They had a propensity for overly long song titles that may or may not have anything to do with the actual song. For the longest time, they were my favorite band. The last two CDs, though, have given me pause. I listened to both of them hundreds of times, and I must admit that I just don’t get it. Maybe on some deep (nearly subconscious) level, I have given this post a most unlikely title as a mild form of protest. I am dubious of my potential impact.

This short essay is about the offensive production of American League second basemen in 2009. I will view each player’s statistics through an Explanatory Data Analysis lens, specifically by creating different Scales of Unusualness.

Here is a table of some of the data used for the study. Of course, the variables would be very different if we used players from last year. In 2009, no exit velocity or launch angle data were available. Even with this partial data set, many advanced metrics were eliminated from the table to make it more legible. Variations of these variables (and the others) were used to create the plot, which will appear shortly.

Table 1. AL Second Basemen, 2009

If we were just considering batting average (BA), it would be easy to rank the players. In fact, the players happen to be ranked in descending order based on that column, with Cano leading the way at .320. What happens if we want to consider all the variables together? Human brains aren’t very good at that task, but computers have no problem.

Next comes the scale that was referenced earlier. We can take each column and standardize the data by giving each value a z-score. A z-score measures the distance of a number in standard deviations from the mean of the data sample, in this case, all the columns with a “Z” prefix. Cano had the highest batting average, and you can see that his z-score for “Z_BA” is 1.68, which is more than double the next highest number in the column. That means his batting average for that year was highly unusual compared to other AL second basemen.

Table 2. AL Second Basemen Z-Scores, 2009

One interesting note. Look at the table and see if you can determine the two most unusual players when all the data is considered. I don’t think it can be done. There are eight variables, and that is six or seven too many. Fortunately, a technique called Cluster Analysis quickly solves the problem. Below is a Cluster Tree, or Dendrogram, of the computer’s analysis.

 

Figure 1. Dendrogram

The plot shows two large categories, those with high and those with relatively low offensive productivity. Among the top performers, the software identified Ben Zobrist as the most unusual. That means that Zobrist had the best offensive season of any AL second baseman that year. If you study the plot, you will see that Alberto Callaspo finishes a close second.

I would like to point out a couple more things. The plot shows that Maicer Izturis and Howie Kendrick had the most similar seasons. Their statistics were highly correlated in their unusualness with respect to the other players. Who knew?

So, as you might have guessed, there is a payoff to this post. A Scale of Unusualness doesn’t just identify the best or most productive offensive player; it works equally well on both ends of the scale. The most unusual offensive second baseman in the AL in 2009 wasn’t Zobrist; it was the unfortunate Nick Punto (with Chris Getz closing fast). Punto was much more unusually bad than Zobrist was unusually good. My guess is that when I include defensive metrics, Punto will more than redeem himself. You can’t play from 2001 – 2014 (and win a World Series) without being a big-time player. This one-year snapshot does not do him justice. Maybe I will post the defensive analysis next. Perhaps I will include offense and defense together in a more comprehensive study. Now that I think of it, I should take a break and give those Arctic Monkeys’ CDs another 300 listens.

 

19 Percent…huh?

19 Percent…huh?

I spent a lot of time putting together the lone figure in this post. My forthcoming baseball book will be filled with plots like the one that follows. I have known many people whose eyes glaze over when presented with figures or graphs (including professors who should know better). Pay a little attention to this one; you will be rewarded.

Between 2004 and 2008, there was a growing disparity in the payrolls of clubs in Major League Baseball. Lots was written about the unfairness of this. I agree with those who thought it outrageous that one team could spend 8 or 9 times what others could afford to pay their players. Consequently, every season began with plenty of fan bases lamenting the stone-cold truth that their teams had no chance to compete for a title or make the playoffs.

Growing up as a fan of the team then known as the Cleveland Indians, I knew that as soon as a young player started to excel, he was on his way out of town. It was a simple fact that other larger market clubs could easily outbid us for a young star’s services. Such was life in the big city.

Every year, big-money teams seemed to crush the less fortunate, and no one seemed to care. The fact that always got me going was that if a team (think Yankees), signed a player to a big contract and that guy floundered, all they did was treat the signing as a sunken cost and go about their business. Clubs like Cleveland, on the other hand, could be crippled by one bad signing. That is a statement of fact.

So, let’s see if we can gain some insight. One of the great things about a scientific mindset is that we can cut through the narratives and what people think is true, and get at the mathematical heart of the issue at hand. The following figure does just that.

I plotted payroll data from 2004 through 2008 against the winning percentage of all MLB teams. I colored the data points using a playoffs variable to simplify the plot. I think it makes it more interesting and easier to read.

Figure 1. 2004 – 2008 MLB Payroll versus Win Percentage Data.

The scatterplot is basically a blob (yes, the Yankees are in the upper right corner). That means a minimal relationship exists between a team’s payroll and that team’s record, at least for these 5 years. Note the equation in the lower right of the plot. That means a team’s payroll explains only 19 percent of all the MLB team’s record. In other words, there was very little explanatory value in predicting the number of games a team would win if you knew their payroll. The relationship between payroll and record is minimal.

Surprised? Well…if payroll is not a predictor of a team’s success, what is? If payroll accounts for about 19 percent of the outcome; what explains the other 81 percent? I will be looking into that in my book. Perhaps I will find that left-handed middle relievers are the key to success (I doubt it), or maybe if you are putting together a team, you need batters with high exit velocities, pitchers with exceptional strikeout rates, and outfielders who can run like the wind. I am going to try my best to find out.