Background
Just two weeks into my internship at Sports Reference, I had the opportunity to own a project from start to finish. When looking through a list of ideas for improvements to the sites, I saw one for updating scoring log columns on Hockey Reference. Scoring logs are created for each skater1 for every season they play in the NHL and contain entries on each point they’re credited with, whether goal or assist. The specific task was swapping the “description” column for a more detailed breakdown of goal scorer, primary assist, and secondary assist. This was promptly done.
Hockey is my favorite sport, and I follow the NHL closely, so I saw the opportunity for more from the perspective of a fan who uses the site. The scoring logs were pretty barebones, so I was determined to add more columns to provide greater context. I developed queries to pull the score of the game at the time of the scoring log entry, and the goalie that the goal and assists were recorded against.
I enjoyed putting them together, and it sparked my curiosity. Though scoring logs are filled with important information, they can be dense and long. How could I make it even more clear for our users and share some of the cool insights that come to me when I’m working with all this data?
So, then came the idea of breakdowns.
Skating Forward
I’d seen some tables on Baseball Reference that carried information about a batter’s season data and their RBI opportunities.2 It could be possible to base my new proposal off of that. It was important to me that I not reinvent the wheel because it was a programming language (Perl) that I had been introduced to only weeks prior, and I also didn’t want to get too outside of the scope of the bet sheet3 I was completing, which was really just splitting up the scoring description.
So, I quickly put together a proof of concept inspired by the baseball tables. It had a couple rows about a skater’s season stats with total points, goals, and assists, and a breakdown of how many points they had by period (first, second, third, in overtime). I presented it to my manager with my suggestions, who approved! I had the freedom to make this project my next assignment. My manager also connected me with our hockey SME4 and members of the product team. Since it was going to be an entirely new feature on the site, frontend and backend, it required more eyes.
Breaking It Down
Once I’d gotten the go-ahead, I started to come up with some interesting SQL queries. At Sports Reference, the backbone of what we do are our databases. Our sites are the greatest collection of encyclopaedic sports statistics for our subject matter, freely available on the internet. Our tools are built from the ground up, so my tables also had to be.
I came up with adding points scored by situation (even strength, power play, or short-handed), which opposing teams and goalies a player had scored most against, and lists of which players had assisted the most on the specific skater’s goals and which players this specific skater had assisted the most. I also asked for categories from everyone else involved in the project. This lead to adding which arenas a skater had scored the most at.
Here’s some of the cool insights that come from these categories, as modeled by Sidney Crosby:
- points scored by situation -> is Sid the Kid “clutch”?
- points by opposing teams -> prove that he is a Flyers “killer”!
- points by arenas -> show off his road production
- points by goalies -> give an idea of effectiveness against teams vs specific goalie5
- goals by assister -> who gives him great passes?
- assists by goal scorer -> whose play does he propel?6
I built it out on my development server, seeking feedback about the layout and page-specific considerations. At first, empty net points were a separate listing, but they were folded into the other sections because empty net points are essentially at even strength (same number of skaters for each team). I had also had two rows of tables, which became one. The titling of the sections were also refined by guidance from product.
Puck Drop
After about a week, everything was in place for launch! Here’s a link to what’s live on Hockey Reference.
The final breakdown tables were approved to be one for regular season and one for playoffs. There are different layouts and columns that display depending on the scoring log. For example, there’s no need to show each category if a player had no points during the regular season or playoffs. Some columns appear once a player’s hit a certain threshold of points scored during the period.
These breakdowns appear for every skater for every season they’ve ever played in the NHL - 7,677 forwards and defensemen7 dating all the way back to the 1917-18 NHL season. I’m really happy with the end result and to see it used8 by so many people to further their love of the game.
I was able to present my work on these breakdowns to the company as a whole at our engineering team iteration boundary meeting,9 which was amazing. I’m very grateful to everyone who gave me guidance as I tackled this project. Thank you so much!
Technical Considerations
When I put together all of the Perl subroutines10 as well as the SQL queries, I had reusability and maintainability in mind.
So, breakdowns are generated by the same function whether they are regular season or playoffs, handled by an optional parameter denoting playoffs. My queries were written as prepared statements taking in parameters, useful in preventing SQL injection and in holding to a single source of truth. Each query has its own subroutine that is delegated to when populating data.
This allows for the columns in the breakdown to be flexibly swapped, and also helps when needing to make changes or fix bugs because the code is neatly divisioned.
Footnotes
-
non-goalie, aka forward or defenseman ↩
-
like for Shohei Ohtani ↩
-
task definition ↩
-
subject matter expert ↩
-
the nuance here is that teams regularly use two goalies, and they can be traded through a season ↩
-
including the Mark Donks and Buzz Flibbets of the world ↩
-
Wayne Gretzky’s insane count of assists to Jari Kurri ↩
-
when compiling this number, I discovered that there has never been a skater or goalie with a last name starting with Q ↩
-
think end of a sprint demos ↩
-
the term for functions ↩