Analyzing Hockey by Someone Who Doesn’t Know Hockey

What would it be like for an aspiring Major League Baseball (MLB) analyst to work with data from a sport they know little to nothing about? 

I moved to Raleigh to attend the IAA, where the city’s only professional sports team is the National Hockey League’s (NHL) Carolina Hurricanes. Now that hockey season is just beginning, I decided to do an analysis on hockey. Only knowing little snippets of information about professional hockey, I decided to compare the draft systems between the NHL and the MLB and to find out which system produces more players that reach their respective leagues.

In my study, I found out it is more likely for a hockey player to make it to the NHL than a baseball player that is drafted to reach the Major Leagues. 

As a data scientist, one of the most important parts of any project is understanding the data inside and out. For this project, I researched the draft systems for each of the two leagues. In both sports, players can be drafted at a young age; however, for the NHL, the signing period is longer where an agreement does not need to be met until the player turns 21. On the contrary, an MLB signing period lasts only a few months where a player may go unsigned and is eligible to be redrafted by a different team after three years of college. 

Here is a simplified version of the typical flow for the leagues’ player development schema: 

steps for making it to MLB or NHL
Figure 1: Draft to League Pipelines

All baseball players who hope to make the league undergo the gauntlet of levels within Minor League Baseball. While the NHL draft to league progression is more complicated, there are fewer steps a player must go through to make the league. 

Now that the data was understood, it still had to be collected and manipulated, which created some unexpected findings that needed to be researched. There are two notions about the NHL that I found to be quite interesting when pulling the data for this project from the NHL API:

  1. Franchise relocations complicate the data such as the Winnipeg Jets moving to Phoenix as the Arizona Coyotes, and then the Atlanta Thrashers relocated as the new Winnipeg Jets. 
  2. There was a strike during the 2004-2005 season with no Stanley Cup awarded.

Working around these data points was difficult because I did not know what a hockey fan or analyst already knows, which created errors they would know to account for. 

The findings of my study were that a higher percentage of players drafted in the NHL draft made it to the league compared to baseball players drafted in the MLB draft across the last 20 years.

Even when only looking at the top five rounds of each draft across the 20 years, there was a statistically significant difference between the proportion of the players to make their respective leagues with 47.22% for NHL, but only 44.64% of MLB players reaching the Major Leagues. With 95% confidence, it was around 0.3% to 4.9% less likely for a player drafted in the top five rounds to make it to the MLB than the NHL. 

This can be seen in Figure 2, which shows the trend of players to make the league after being drafted. We can see there is a similar lag between players making the NHL and MLB taking about six years for those players to make the league.

Line graph
Figure 2: Percentage to Make League from Past 20 Draft Years

This project allowed me to expand my knowledge of the game of hockey while implementing some of the techniques I’ve learned while studying at the IAA such as thoroughly understanding the data set before running statistical tests, visually displaying data when possible, and testing statistical significance to support and strengthen claims.

Columnist: Zach Houghtaling