IV. Case Studies – Refactoring at Scale

Part IV. Case Studies

Before I dive into our case studies, let me set the stage by telling you a little bit about Slack: the history of the product, the company, and its early influences.

Slack was developed as an internal tool at a small gaming company based out of Vancouver called Tiny Speck. The team, a mash-up of engineers, designers, and product people from Flickr, sought to build a fantastical, massively mulitplayer online game focused on community building. They called it Glitch.

Because everyone was distributed across North America, Tiny Speck began to rely heavily on internet relay chat (IRC) to communicate. Before long, the team realized that it needed something a bit more powerful: a tool that enabled it to keep in touch asynchronously, search through message history, and send files. The members set out to build it.

The game ultimately shut down in 2012, and the company laid off most of its employees, but Tiny Speck had one final trick up its sleeve. In an unlikely pivot, the few remaining employees chose to commercialize their internal communications tool. They polished the experience and branded it Slack: searchable log of all conversation and knowledge.

The Tiny Speck crew contacted friends and past colleagues to test out its new tool. With each new batch of users, the team collected feedback, fixed bugs, and built new functionality. By May 2013, the product was ready for a preview release, available to a select few who requested invitations. Just nine months later, Slack launched publicly.

Usage skyrocketed. Within a year, the tool went from having just under 15,000 daily active users to 500,000. By the time the product hit its two-year anniversary, more than 2.3 million users were using Slack every day. In late 2019, nearly six years from launch, that number exceeded 12 million, with more than 1 billion messages sent every week.

Many of Slack’s early technology and design decisions were informed by the founders’ experience building Flickr and Glitch. The usage of PHP and MySQL, for instance, was a logical one, given their experience building the photosharing website in 2004. In fact, much of Slack’s basic server functionality has its roots in Flamework, a PHP web-application framework, borne out of the processes and house style developed at Flickr; you can find it on GitHub. Much of the real-time messaging infrastructure was derived directly from Tiny Speck’s IRC-like internal tool.

In early 2016, Slack began to look at some alternatives to the Zend Engine II interpreter for PHP. There were two main contenders: upgrade to PHP 7 and use Zend Engine III, or try Facebook’s HipHop Virtual Machine (HHVM). After some deliberation, leadership decided to roll out the HHVM runtime to its web servers. Once the rollout proved successful, the engineering team began to adopt the Hack programming language, a gradually typed dialect of PHP developed to run atop HHVM. At the time of publication, the portion of Slack’s codebase that was once written in PHP is now written in Hack.

Both of the case studies in this section will focus on large refactoring efforts carried out on the portion of the codebase written in PHP and, later, Hack. To convey the nature of each problem as well as possible, the code samples in these sections will be in Hack. But don’t worry! While the snippets help provide small, concrete examples of the problem we were tackling, they are not the focus of the story. Refactoring at scale is primarily about the process and the people involved rather than the code itself, and I hope that these case studies help illustrate exactly that. If you’re still concerned about being able to parse the code samples, let me reassure you that at the time, Hack code still looked quite a bit like PHP. For those who aren’t comfortable with either Hack or PHP, we’ll walk through each snippet in detail so that you can get your bearings.

I’d like to draw attention to one final observation before we move on. At the time of publication, Slack has only been publicly available for six years. The code, the product, and the company are all relatively young. The code has had to scale rapidly to handle increasing customer usage as well as a growing number of engineers developing the product. Many of the large refactoring efforts that have begun throughout the company over the years have been in response to hypergrowth, both external due to high adoption and internal due to hiring.