Covering All The Bases: How MLB StatCast is Changing the Game with Data

by Ryan Getchell, on November 1, 2016

Hi, my name is Ryan and I'm a baseball addict.

Years before my addiction, I saw baseball as a long drawn out game. The idea of sitting for 3 hours just to watch 5 minutes of action only seemed palatable if you were actually at the ball park. However, a lot changed about 5 years ago when I started playing fantasy sports. Baseball quickly became my passion.

While all sports have stats, baseball seems to have an unlimited supply of them. When I first started, there were some obvious stats that could tell you the performance of a player. David Ortiz has hit 37 home runs so far this year, that sounds pretty good! But how do you figure out who the next David Ortiz will be BEFORE he hits 37 home runs? That's where the data comes in and the MLB has a new system that allows fans and teams alike to evaluate talent in a whole new way: MLB StatCast.

Covering All The Bases_How MLB StatCast is Changing the Game with Data-2.png

MLB StatCast debuted in 2015 and has rocketed to the forefront of baseball lexicon. MLB Advanced Media (MLBLAM) is behind the StatCast service and employs a number of technologies to capture the data points necessary to calculate the next frontier of MLB stats. Radar from a company called Trackman is used to measure pitch speed and exit velocity of batted balls. Fun fact: Trackman's radar is based off of their missile defense technology and can track the speed of a baseball's seams at 40,000 frames per second! MBLAM also employs another technology to track player movement on the field, ChyronHego. Between these 2 technologies, StatCast can capture dozens of new metrics including:

  1. Spin rate of a pitch
  2. Exit velocity of a batted ball
  3. HR distance
  4. Launch angle of a batted ball
  5. Elapse time it takes for a player to take their first step
  6. Max speed of a player running

You can click through the 2016 leaderboard here.

Now all these stats have data points behind them that are captured throughout every game. To store this massive amount of data, StatCast uses none other than our favorite cloud web services provider, Amazon Web Services and its S3 cloud storage service. One single game generates about 7 terabytes of uncompressed data or 80 gigabytes compressed. If you do the math out to entire MLB season or 2,430 games, you are looking at 17 petabytes of raw data...that's EVERY season!

Now all this tech is great to nerd out about, but how is this data driving change in the MLB? Let's take for example Arizona Diamondback's third baseman, Jake Lamb. 

Lamb's 2015 rookie season wasn't too impressive. He hit .263 with 6 home runs in 107 games. Better than you and I, but no where near David Ortiz level. In the off season he overhauled his approach at the plate by adding a leg kick and lowering his hands. Minor changes that can have a major impact, but through May of the 2016 season, the results were not coming and Lamb began to question their effectiveness. You see, there is a lot of luck involved in baseball, but it is believed that over 160 games, that luck- good or bad- will even out and your true talent will show through. So for Lamb, a couple months of subpar play didn't necessarily indict his demise, especially when looked into his exit velocity - a metric often used to predict which players were more likely to have a breakout power season.

Lamb had a 93.7-mph average exit velocity, which is faster than Bryce Harper, Miguel Cabrera, and Lamb's teammate, Paul Goldschmidt - all of which are perennial all stars who rank among the top in home runs every year. Realizing he must be doing something right, Lamb decided not to ditch his approach, and as I'm writing this, Lamb has the following stat line for 2016:


2016 Stats 513 80 128 29 91 6 .250 .332 .846


Note that he is only 1HR shy of 30, a stud-level achievement for anyone in the major leagues!

Using data to drive action is nothing new to Arkatechture...we build data driven business. Are you looking for your "Jake Lamb"? Let us help you find it!

Topics:Data ScienceData Analytics

The Arkatechture Blog

A place for visualization veterans, analytics enthusiasts, and self-aware artificial intelligence to binge on all things data. 

Subscribe to our Blog