Clustering hitters on swing characteristics of balls in play


This is a long piece, so if you don’t really care about why or how I approached this, feel free to skip to Results and Analysis.


I remember an instance during a high school baseball practice several years ago when one of my coaches was talking about how the best power hitters in baseball hit most of their home runs to center field. “If you watch Albert Pujols, almost all of his home runs are close to dead center” was roughly the quote.

Being the skeptical person I am, I made sure to look up Pujols’ spray chart when I got home and found something like this:

It appeared that coach was making it up.

In retrospect, I don’t think the biggest problem was a coach demonstrating that it’s okay for someone given a difficult task (help young men become better baseball players) to act like they know more than they do. I think the bigger problem is thinking along the lines of, “This player is one of the best in baseball and he does this. Therefore, you should do this because it’s how one of the best does it.” In short, I don’t think my coach’s point would have been any better if Pujols really did hit all his home runs to dead center.

This type of thinking about hitting is way too simplified. There are more variables in play which determine whether someone becomes a good hitter, so trying to use a single aspect of one good hitter to create a blanket framework for all hitters is silly. Albert Pujols has built a hall of fame career on driving home runs to left field, but that doesn’t mean you should necessarily emulate Pujols’ swing or try to hit home runs to left field (though they both might be decent ideas).  

Before I allow skepticism to get out of hand, I should clarify something. My point isn’t that we shouldn’t learn from specific big leaguers since we don’t have their skills (which is another stupid argument I heard as a bad high school baseball player… somehow there are coaches who simultaneously believe that AND that we should hit like Pujols). The issue is asking questions in the wrong order. If we ask the question, “Who are the best hitters in baseball and what do they swing like?”, we’re bound to be confused by great hitters finding success in different ways. For example, Giancarlo Stanton succeeds as a power hitter despite having a low average launch angle because he hits the piss out of the ball. Joe Mauer built a stellar career slapping line drives the other way. Jose Bautista had a successful prime by hitting pull-side homers. If we’re looking for broad direction on what swing characteristics we should value, I think we can ask a better, slightly different question.

We should ask, “What are the different ways hitters swing, and which ways tend to lead to the best results?” This doesn’t allow us the ease of saying we should all just copy Mike Trout’s swing since he’s the best. It forces us to look at swings with different characteristics to see which ones tend to get the best results, and thus, understand which characteristics are common among the best hitters instead of using a single hitter as an end-all example of the perfect swing.

Introduction- Clustering Hitters on Swing Characteristics

If we want to compare how swings with different characteristics fare, we can analyze them with a technique called clustering. Clustering algorithms can be run in R across a data set to see which natural groups arise among the data based on the variables in the data set. I’ll try to explain clustering in a way that’s not too technical, but still gets the point across.

For example, if we were marketers and wanted to segment our customers based on their attitudes towards our product, we’d want to issue a survey asking questions on a numerical scale and put each customer’s answers in a row in the data set. Then, we’d run a clustering algorithm (consisting of a function in R run from a package such as mclust) which separates customers into different groups which gave similar responses to each question while maximizing the distance between each group. Summarized, the clustering algorithm used in this analysis will attempt to maximize intra-cluster similarity while minimizing inter-cluster similarity.

In the case of clustering hitters, I wanted to select variables which represent a player’s swing. Selecting the variables to cluster on is somewhat subjective, but they should represent characteristics of whatever is being described. If we want to describe the swing, we should cluster on metrics which describe a given player’s swing. So, I decided to use as many of the variables used by Phillies Minor League Hitting Coordinator and Director of Hitting at Driveline Baseball Jason Ochart in hitter assessments as I could gather from Statcast data (Ochart gave an overview of the hitter assessment process at Driveline Baseball in a talk at the ABCA conference and I included as many of the variables he talked about as possible. The talk is available for subscribers of Driveline Plus, which I strongly recommend to any player, coach, or sabermetrics nerd looking to learn more about player development).

Some of the data Ochart tracks in his hitter assessments include:

  • Attack angle (vertical bat path)
  • Launch angle in different areas of the zone by dividing the strike zone into thirds vertically and horizontally and to different parts of the field
  • Exit velocity to different areas of the field
  • Standard deviation of launch angle
  • Standard deviation of exit velocity (comparing max and average exit velocity)

I was able to find some version of all the information listed above by manipulating Statcast data and using some shortcuts to calculate approximate attack angle which I’ll detail below. When evaluating hitters, Ochart also assesses contact depth using Blast Motion sensors, but unfortunately, I don’t know of a way to approximate that metric from Statcast data.

Once I knew which variables I wanted to cluster on, I started working in R to extract the right Statcast data for clustering.

Preparing for Clustering

Deriving the right information from Statcast data was the most difficult and time intensive part of this analysis. Finding the average launch angle and exit velocity is simple, but finding the values in different thirds of the zone (e.g. inner, middle, outer and lower, middle, upper) required filtering by location for each player. There was no perfect way to derive attack angle, since Statcast does not directly measure it.

A proxy for calculating attack angle was outlined by Anthony Shattell (@soxmoneyball on twitter) which consists of calculating the average launch angle of hitters’ 10% hardest struck balls. This workaround assumes the balls a batter hits hardest will be most in line with his swing plane and be flush on contact, which means the launch angle of those balls can be a good proxy for the hitter’s attack angle (a slightly more complicated and potentially more accurate way of approximating attack angle is outlined on Matt Reiland’s blog Power Alley Analytics, where he uses a combination of batted ball spin and launch angle to approximate attack angle since balls hit flush will tend to have minimal/knuckling spin. I went with Anthony Shattell’s hack since it is simpler to derive using Statcast data). In addition to launch angle and exit velocity, I created variables for a hitter’s attack angle in different thirds of the strike zone.

Variables were also created for average launch angle and average exit velocity to different areas of the field (pull side, center field, and opposite field for each hitter). I was able to do this by filtering based on spray angle of each batted ball. I calculated spray angle by using an equation put forth by Bill Petti in a Hardball Times article published a few years ago. Variables were also created for standard deviation of exit velocity and launch angle to each area of the field.  

The last three variables I included were barrel percentages for each hitter to their pull side, up the middle, and to the opposite field.

Taken directly from the R console, here is a complete list of the variables I clustered on:

In addition to breaking out exit velocity and launch angle to different parts of the field, I also included statistics for each batter like max exit velocity, attack angle, etc. This is where some subjectivity in clustering appears. If I was concerned about having no overlap between any of these variables, I could run a factor analysis on the variables to see which ones could be collapsed together and reduce the number of variables. However, this would cause some information to be lost when combining the variables, which would result in fewer clusters. While it might be interesting to see which variables covary and reduce some overlap among the variables, I’ll live with the overlap over reducing the number of variables.

I was able to collect all this information for each player after writing a few functions in R. I used 2017 and 2018 Statcast data and each row was grouped by player, year, and handedness (switch hitters have two rows each year). Once all the Statcast data was gathered, I paired it with a splits table from Fangraphs for each player grouped by year and handedness with the necessary inputs to calculate wOBA.

Before clustering on the variables in the picture above, I ran a function to normalize each variable by calculating the z-score for each data point. Normalizing data before clustering allows variables of different magnitudes to be compared. Once the data was normalized, I used the mclust package and function in R to cluster the rows. The Bayesian Information Criterion (BIC) included in the mclust function allows you to specify the range of clusters to test and allows you to compare different Expectation-Maximization models and find the one that maximizes the likelihood of each point belonging to its respective cluster with a penalty for adding additional clusters. In simpler terms, the BIC is one method to determine the optimal number of clusters in the data set based on maximizing the similarity of batters within each cluster while maximizing the dissimilarity between each cluster.

I tested the mclust function over 11 different clusters and found a 9 cluster VEE model maximized the BIC.  

Results and Analysis

wOBA Rules of Thumb from

Cluster Overview

As you can see first table above, the nine clusters have a fairly large range of wOBA. wOBA wasn’t one of the inputs while clustering, so we can see the clustering algorithm was able to separate swings into unique groups which fare differently at the Major League level. I calculated wOBA for each cluster using season statistics from FanGraphs for each hitter within the cluster, so while some hitters within the cluster may have a small sample size of at-bats, the wOBA value is reflective of all at-bats within the cluster rather than taking the average of each player’s wOBA in the cluster.

It should be noted that the bottom two clusters were comprised mostly of pitchers. I’ll give examples of players in each cluster later.

Cluster Swing Characteristics

In these tables, we began to see the different swing characteristics of each cluster (again, these tables are sorted with the best performing clusters at the top and the poor performing clusters at the bottom). In the table above, we can see the best performing clusters have both high maximum exit velocity and average exit velocity. We also see these top clusters have higher overall attack angles than the bottom clusters. It might makes more sense for hitters with high exit velocity to swing with a higher attack angle (up to a point), but it’s interesting to see that these types of hitters really do end up performing better.

For pitches on the inner third of the plate to the batter, the top performing clusters tend to have slightly higher attack angles than the poor performing clusters. As we move from inside pitches to outside pitches, we see an increasingly large difference between the attack angles of the top performing clusters and the bottom performing clusters.

Even larger differences in attack angles are found when breaking the strike zone into vertical thirds. Top performing clusters have higher attack angles on low pitches (though, as you can see, attack angle doesn’t get consistently lower as we go from best to worst performing cluster). As we move from the bottom of the strike zone to the top, we see the differences between top and poor performing clusters increase with top performing clusters swinging at high pitches with a higher attack angle. While a player’s optimal attack angle likely increases as their ability to generate bat speed/exit velocity increases, it’s safe to generalize the top performing clusters of swings have higher attack angles and greater exit velocity.

Another interesting finding is only the worst players have negative attack angles on any pitch in any area of the zone. Coaches who claim hitters should swing down to the ball at contact are wrong because they see the bat start above the shoulders for every hitter and see that it’s lower than where it started at contact and conclude the bat must be moving down at contact. They fail to consider swing paths are not simple two-dimensional lines.

Now surely some Hall of Fame hitter has claimed that good hitters swing down on the ball, but the swings of good hitters (or really any Major League Hitter) are not traveling downward at impact. Having a negative attack angle at contact both reduces a hitter’s chance of driving the ball and making contact since pitchers pitch off mounds and release the ball at their shoulders (meaning the ball is traveling downwards off the mound, so a negative attack angle shortens your window to be on plane with the pitch’s trajectory). Ted Williams knew this when he wrote his book The Science of Hitting, so it’s not as if people haven’t known this to be correct for years. The difference is that we now have the data to show anyone saying we should swing down on the ball is full of crap for several reasons, regardless of how good they were as a player. In addition to reducing the window for contact, any additional backspin created by chopping down at the ball means the ball was not hit flush, so it doesn’t stand a chance of being a deep fly ball anyway (plus, Alan Nathan has shown adding additional RPM’s to a baseball above ~2500 RPM doesn’t add much to carry).

Moving on to the launch angle chart.

Cluster Launch Angle

It’s interesting to see nearly every group of hitters has an average pull side launch angle close to 0 degrees. Excluding the two worst performing clusters, there doesn’t appear to be much of a pattern between high performing clusters’ and lower performing clusters’ average pull side launch angle. It might be the case that when players hit ground balls, they tend to be pulled which is why we don’t see much difference between the best and worst players. I might need to revisit this.

Moving on to balls hit up the middle and to the opposite field, we again see differences between the best and worst performing clusters. The most striking jump is comparing average launch angles on balls hit to center field to opposite field launch angle for all clusters. Besides the bottom two clusters (comprised mostly of pitchers), the average launch angle to the opposite field is over 20 degrees, which means a lot of flyballs. Once we look at the barrel percentages for each cluster below, it’ll be easy to see the top clusters with the highest exit velocity are much better at turning these opposite field flyballs into high value balls in play than clusters with lower exit velocities, who probably end up flying out to the opposite field rather than driving it over the outfielder’s head.

For the standard deviation of launch angle to each area of the field, we see a few differences. Pull side differences appear to be small (a difference of roughly one degree between the top and bottom half of the clusters, last cluster where n=15 excluded). The variance in launch angle to center and opposite field seem to be a couple degrees smaller in the best performing clusters. It’s hard to say what is responsible for these differences. It might be better performing hitters do a better job of selecting which pitches to hit the other way, or maybe a more complicated mechanical reason.

Cluster Exit Velocity

There are large differences between the swing clusters when comparing exit velocity (all these categories went into forming the clusters, so it’s not too surprising). To all areas of the field, the best performing clusters hit the ball harder. Not too shocking to see hitting the ball hard is good.

Differences in standard deviation of exit velocity aren’t large among hitters. The clusters comprised of pitchers look like they have more variance in exit velocity, but there isn’t much difference among the clusters containing mostly hitters.

Cluster Barrel Percentage to Area of Field (as decimals, e.g. .107 means 10.7% of balls to that area of the field)

This isn’t too surprising either. Better performing clusters, consisting of players who tend to hit the ball hard in the air, have more batted balls Statcast classifies as barrels. When hitters in Cluster A hit balls to the opposite field, they appear to result in barrels at a much larger rate than any other cluster. Surprisingly, this is despite hitters in Cluster A having higher exit velocities on balls up the middle. This difference might be caused by hitters in Cluster A being more selective about which balls they hit the other way, or the fact all clusters tend to hit more flyballs to the opposite field.

Examining the Hitters in Each Cluster

What fun would this article be if we didn’t look at several hitters in each cluster? One great thing about using BIC when clustering is you get an uncertainty estimate for each member of a cluster. The lower the uncertainty, the more likely it is that hitter really belongs to the cluster. For each cluster, I’ll go over the 10 “most certain” members of each cluster along with any other hitters I think are interesting to discuss.

Cluster A- .379 wOBA

The ten most certain members of Cluster A are shown above. Not too many surprises here, other than Cameron Rupp. Rupp’s 2017 season is the only player season in this top ten which finished with a below average wOBA (.319). It’s easy to see what areas he struggled compared to other players. His average exit velocity was lower than the rest of the hitters in the table. There was a larger spread between his attack angle in high and low pitches than the other hitters as well. He also struggled to pull the ball hard. Still, a .319 wOBA is not bad at all for a catcher (the wOBA calculation used is from the FanGraphs library, so different weights will show Rupp’s wOBA as being slightly lower on his FanGraphs player page). The Giants picked him up on a minor league deal this winter after bouncing around with a few teams at AAA last year, so maybe he’ll find his way back up by addressing some of these issues.

Other than Rupp, none of these players are too surprising to be confidently labeled as belonging to the highest performing cluster. O’Hearn was on fire when he was called up by the Royals this year. Gallo makes less contact than the rest of these hitters, but we only looked at metrics on balls in play when clustering, so contact metrics weren’t factored in while clustering.

Khris Davis looks like the best match for a single player who represents Cluster A. Across the board, his numbers were close to the Cluster’s averages. It looks like Stanton made it into this cluster, despite having low attack angles in each part of the zone. According to the way I approximated attack angle (which is likely flawed), Stanton might benefit from raising his attack angle to better match the approach angle of pitches and make contact more often.

Let’s look at the five most uncertain members of this cluster. These players were classified as being in Cluster A, though the level of certainty that they really belong in Cluster A is lower than the players in the table above.

These five are still good hitters, but we can see their attack angles vary quite a bit compared to the cluster above. Christian Yelich might be the most interesting hitter in this table. His season consisted of a solid first half and a scorching second half. Let’s see how his numbers compare from 2017 to 2018 and before and after the All-Star break in 2018.

This makes it clear why Yelich was an uncertain member of Cluster A. In the first half of the season, his swing characteristics on balls in play closely matched his numbers in 2017 when he was placed in Cluster C. In the second half of 2018, Yelich’s attack angle at different areas of the zone started matching Cluster A much more closely. Yelich was a solid hitter when he was in Cluster C, but making contact at low attack angles looks like it was preventing him from turning into MVP Yelich. He had the requisite exit velocity to be classified in Cluster A, but his attack angles were below the Cluster’s averages. It’ll be interesting to see if Yelich is scored as a more confident member of Cluster A in 2019, or if he reverts to Cluster C by making contact at lower attack angles. Regardless, Yelich’s 2018 provides an interesting example of a player who managed to go from solid to scorching hot mid-season by changing his approach to hit more flyballs.

Similar analyses could be done for the other players in the uncertain table, but this piece is already running long so I’ll move on to the next Cluster (if you want me to go in depth on any of the players above, leave a comment and I’ll maybe make a brief post about the player).

Cluster B- .353 wOBA

The closest matches for Cluster B are still good hitters, but it wouldn’t be controversial to say they’re a step below the closest matches in Cluster A. Cluster B hitters tend to have slightly lower average and peak exit velocity compared to Cluster A hitters. At different heights within the zone, Cluster A seems to have a higher variance of attack angles compared to Cluster B. Cluster B averages slightly higher launch angles, though the range of attack angles looks similar to Cluster A. Hitters in Cluster A average similar exit velocity on pulled and opposite field balls in play while Cluster B’s opposite field average is over a mph lower when going the other way.

It’s hard to know what to conclude from these differences. Does the greater difference in attack angles at different heights benefit Cluster A or do those hitters succeed despite it since they hit the ball harder? If a player adds bat speed and starts hitting the ball harder, should they tweak their attack angle to be a little higher than the average approach angle of a pitch to hit more flyballs or just try to align their swing with pitch plane to put the ball in play as often as possible?

These questions are difficult to answer from this analysis since we’re not looking at year-to-year changes in each player’s metrics and finding the difference in performance. In a case like 2017 Yelich where his attack angles weren’t closely aligned with pitch plane, it’s clearer an adjustment should be made to raise his attack angle and give himself a better chance to put the ball in play. One way this analysis could be used to identify players who could breakout by adjusting their attack angles in different parts of the zone is to find players who have attack angles less than the typical approach angle of pitches (roughly 5-15 degrees). The further a hitter’s attack angle is below the typical plane of a pitch, the easier it is to make a case for them to increase attack angle to maximize their contact window.

Now let’s look at the 5 least certain members of Cluster B.

Joey Gallo went from being an uncertain member of Cluster B in 2017 to being confidently assigned to Cluster A in 2018. His biggest changes appeared to be a reduction in attack angle in 2018 and harder hit balls to the opposite field. Interestingly, according to his FanGraphs page, Gallo’s wOBA lowered from .364 to .343 from 2017 to 2018. Again, it’s not easy to conclude he should raise it back to what it was in 2017 or lower it further to better match the approach angle of the ball.

Based on exit velocity, I’d expect to see Machado in Cluster A. Machado ended up in Cluster B in 2017 and Cluster C in 2018. Let’s see how he changed from 2017-2018 and where he might differ from players in Cluster A.

Despite moving to a cluster with a lower wOBA in 2018, Machado performed much better at the plate in 2018, raising his wOBA from .328 to .377 according to his FanGraphs player page. This is one example of how the goal with an analysis like this shouldn’t necessarily be to target a specific cluster a player should try to move to, but figure out which areas the hitter may be struggling with (e.g. low exit velocity to a particular side of the field or suboptimal attack angle in an area of the strike zone) and look for ways to improve those areas. Machado’s 2018 success appears to be attributable in part to his increased exit velocity and attack angle to the pull side. Machado also managed to increase his attack angle in the mid and upper parts of the zone. The biggest differences between Machado’s numbers and the Cluster A averages is his low horizontal middle third attack angle in both 2017 and 2018 and a low attack angle on pitches in the lower third of the zone in 2018.

Before moving on to Cluster C, let’s look at one more player who fell into Cluster B in 2018: Mike Trout.

It’s easy to see why Trout wouldn’t easily fall into Cluster A despite being perhaps the best hitter in baseball. Measures of plate discipline and approach weren’t factored in when clustering, and Trout has fantastic discipline which allows him to draw a lot of walks. Trout excels at pulling the ball and hitting it to center but has a low average exit velocity to the opposite field. However, only about one in five balls Trout puts in play go to the opposite field, so he likely either fouls off pitches he would have to hit the other way or doesn’t swing at them until late in the count.

This makes Trout a good example of a player who wasn’t put in the top cluster by swing characteristics, but still manages to be an exceptional hitter because he’s built his approach around his strengths. He still shares elite peak and average exit velocity with Cluster A and can make up for low average opposite field exit velocity by hitting fewer balls the other way.

Moving on to Cluster C.

Cluster C- .349 wOBA

There are quite a few good hitters in Cluster C as well. Cluster C is a little closer to Cluster A than Cluster B when comparing exit velocity, but Cluster C hitters tend to have lower attack angles. The overall difference in wOBA between Clusters B and C isn’t large at all.  

It’s easier to see where the most certain members of Cluster C may have holes in different parts of the zone. Devers looks like he could do a better job of elevating low pitches and pitches on the outer third. Yunel Escobar may want to either avoid pulling the ball or learn to hit it harder to the pull side. Juan Soto and Ryan Braun had low attack angles on low pitches.

I think Hosmer might be the most interesting player here to dive into. Hosmer in 2017 was a productive hitter, but he may benefit from bumping his attack angle closer to 10 degrees across the board instead of the lower end of the typical range of approach angles around 5 degrees. Of course, we know he had a disappointing 2018 when he wOBA dropped from .376 to .309. What changed?

Two things stick out and they’re likely related. First, we see Hosmer’s average exit velocity went down a tick in 2018, and two ticks on balls to center field—the area he hit balls hardest in 2017. Second, we see Hosmer’s attack angle on pitches high in the zone and on the inner third change by roughly 4-5 degrees each. Most importantly, the drop in attack angle on high pitches brought him off plane with the ball which made it harder to square up and do damage on high pitches. From Brooks Baseball, it’s easy to see how his ISO changed from 2017 to 2018:

Eric Hosmer Isolated Power 2017-2018, from

Unlike 2017, Hosmer did not do damage on pitches up in 2018. He may want to consider adjusting his attack angle in different areas of the zone to get on plane with the approach angle of the baseball (about 5-15 degrees) and do more damage on flyballs.

Now moving on to the uncertain members of Cluster C.

Several good hitters in this group. Bryant had a solid year in 2017, though he seemed to take a step back in 2018 when his wOBA dropped 40 points. Bryant has been on record saying he’s on a mission to hit flyballs, so it’s no surprise he has high attack angles across the board. Let’s see what changed from 2017 to 2018 for Bryant.

Bryant’s attack angles fluctuated a little in different areas of the zone, but the biggest change from 2017 to 2018 was in his exit velocity. This was likely caused by a shoulder issue that kept him out part of the season in 2018. If Bryant gets his shoulder healthy and his exit velocity back up, I bet we see him return to form in 2019.

Before moving on, let’s look at another player who was in Cluster C in 2018: Bryce Harper.

I only ran clustering on the 2017 and 2018 seasons, but I figured it’d be interesting to see what Harper’s numbers looked like dating back to his MVP season in 2015. Harper’s overall exit velocity hasn’t changed much since, but his exit velocity to different areas of the field have changed. Back in 2015, Harper’s exit velocity to different areas of the field was like Trout’s, who hits the ball hard to the pull side and up the middle but not so much the other way. After 2015, Harper’s attack angles in different areas of the zone changed. In 2015, Harper’s attack angles seemed to all fall within 5-15 degrees in different areas of the zone. In 2016, his attack angle became much steeper on high pitches and pitches on the inner third while becoming flatter on mid-height pitches and pitches down the middle of the plate. This likely contributed to his wOBA falling from .461 in his MVP season to .343 in 2016. Since then, he’s struggled to get his attack angles in different areas of the zone back to form. He’s now managing to hit the ball the other way harder, but he’s struggled to keep his attack angles within a consistent range (again, this could be for a multitude of reasons, potentially not driven by conscious swing changes). Harper in 2017 (.416 wOBA on FanGraphs) seemed to get closest to his old form, sans attack angle on inner pitches. Then, in 2018 his swing looks like it might’ve been a little too steep on contact again when he posted a .376 wOBA.

Based on this, I think Harper could return to 2015 form if he manages to get his attack angles within an ideal range. The Phillies, who are one of front runners for Harper, hired Jason Ochart to be their Minor League Hitting Coordinator this offseason, so the Phillies might be able to help Harper out. Harper has a tremendous amount of talent and seems to work hard, so I bet we see him get back to MVP form eventually.

Cluster D- .338 wOBA

Starting with Cluster D, I’ll just give quick overviews of the most confident members of the remaining clusters and saved any more detailed player analysis for future posts.

Still some great hitters in Cluster D, but it’s easy to see the cluster has lower exit velocity to different areas of the field. For these players, finding a way to improve bat speed might be the best way to improve, though some players with higher attack angles and lower exit velocity may want to consider adjustments as well. I should also note the 2017 Jose Ramirez listed here is from when he hits left-handed.

Cluster E- .335 wOBA

Cluster E’s exit velocities are a little higher than Cluster D’s, but attack angles seem to be lower in different parts of the zone.

And I lied about not digging into any more players in depth… let’s go over Mookie Betts.

Betts hit the ball harder and higher in 2018. Betts raised his attack angle in every area of the zone. This paired with an increase in exit velocity helped raise his wOBA from .339 to .449. It’ll be interesting to see if he sustains it going forward.

Cluster F- .315 wOBA

Coincidentally, Cluster F is the first cluster with below average wOBA. Attack angles still tend to fall around the 5-15 degree range, but exit velocity is much lower than the best hitters.

Cluster G- .290 wOBA

Hitters in Cluster G appear to be like hitters in Cluster F, though they have slightly lower exit velocity and a wider range of attack angles on pitches at different heights. These hitters also have fewer balls in play, so we start to see some more extreme values for individual players because of sample size issues.

Cluster H- .191 wOBA

The last two clusters are comprised mostly of pitchers. These pitchers still seem to have enough rotational power to occasionally hit the ball 100 mph+, but average exit velocities and attack angles are much lower than hitters.

Cluster I- .138 wOBA

These guys are paid to pitch.

A few conclusions

This analysis didn’t reveal anything revolutionary. Smart hitting coaches, players, and analysts have been pointing out the value of hitting hard flyballs for years. What it did show is that grouping hitters together with similar swing characteristics reveals different swings tend to lead to different results. It’s still fun to look at individual players for anecdotes on what allows certain hitters to succeed, but using clustering allows an abstraction of ideal swing characteristics to be formed—Cluster A. Surely different hitters in Cluster A have swings that look dissimilar to each other, but they tend to share similar exit velocities and attack angles on balls in play. It’s also important to note not being in Cluster A doesn’t prevent hitters like Mike Trout and Mookie Betts from being some of the best hitters in baseball. They optimize their approach around their strengths.

It’s also interesting to see very few professional hitters with negative attack angles in ANY part of the zone. This alone should be enough to refute any hitting coach saying the best are swinging down at contact, but I’m guessing this wouldn’t stop the anti-data crowd from making stupid claims and bad analysis in the future anyway.

It’s difficult to know exactly what the takeaways should be for coaches and players at lower levels, but it’s clear the ability to create bat speed and high exit velocity is crucial for playing at higher levels. Every single cluster (including the two with mostly pitchers!) averaged peak exit velocities over 100 mph. It’s a little tougher to make broad recommendations about attack angle since colleges tend to have smaller field dimensions, but aiming for a 5-15 degree attack to give the batter the best chance to be on plane with the pitch is probably a good place to start.

Limitations of this analysis

It’s worth pointing out there’s a few potential issues with this piece. Attack angles were calculated with an approximation since we don’t have access to Blast Motion data for any hitter. Access to data collected with a device such as Blast Motion would allow us to directly measure attack angles of hitters, but we don’t have that luxury. So, it’s likely the attack angle estimates are at least somewhat inaccurate.

It’s also worth pointing out the variables selected for clustering could have been different. A few thousand words into writing this piece I realized I didn’t break exit velocity into different thirds of the strike zone like I did with attack angle for no good reason. It might be worth running another iteration with those variables included, along with adding 2015 and 2016 data to clustering.

The variables used for clustering are best thought of as outcomes rather than kinematic factors which drive attack angle and exit velocity. It’s possible players are achieving high exit velocity and attack angles in different ways mechanically, but the data available from Statcast can only provide a high-level view of what a player is doing. At the very least we can begin to see what optimal outcomes look like regardless of the timing and speed of the legs, hips, torso, and arms which caused the outcome.

One way an analysis like this could be used by a major league front office is to identify players who may have fixable flaws or sub-optimal attack angles and consider acquiring that player with the hope a data-driven coach could help them eliminate the flaw. Traditionally, we’ve seen front offices using advanced analytics to find players who are undervalued or guide on-field decision making for coaches. The emergence of new hitting and pitching technologies is creating an opportunity for front offices to bring data-savvy coaches into evaluation discussions, as they may be able to quantitatively identify flaws they can fix in free agent and trade targets.

Let me know in the comments if you have any questions about specific players or if you have any feedback. Thanks for reading!

Data from,, and

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: