Process kick – Amy Phillips – Build Quality In

Process kick - Amy Phillips

Amy on Twitter @ItJustBroke - Amy’s blog testingthemind.wordpress.com

Songkick - Songkick developer blog devblog.songkick.com

Timeline: February 2011 - August 2014

Songkick was founded in 2007 to help music fans find out when their favourite artists are coming to town and get tickets. With over 10 million unique users per month Songkick is currently the second most used live music service in the world after TicketMaster.

Currently employing 30 staff in London and Portland, Oregon. Songkick exists on the web for fans (Songkick.com) and for artists (Tourbox.songkick.com), on mobile (iOS, Android), and as a Facebook app. The Songkick API is used to integrate concert listings into numerous third party applications including Spotify, YouTube, and FourSquare

The decision to change your release process should never be taken lightly. Releases are risky and require co-ordination throughout the team and any successful process change requires a change in culture. But attempting to change either process or team culture is a huge undertaking, and creating a pipeline to automatically release code to Production sounds like an very challenging task.

Back in 2011 Songkick made the decision to take on this challenge.

Needing a change

At the time we were a small development team with big ideas. Our team consisted of 7 developers, 1 frontend designer, 1 User Experience (UX) designer, 1 Product Manager (PM) plus myself as the sole tester. We were working to create and maintain the Songkick.com website which provides live music alerts to fans worldwide, and the Tourbox website which allows artists to manage their tour dates and display them on 3rd party sites such as YouTube and Spotify. We also had an iOS app, an Android app, and a Facebook app.

At Songkick we have always believed that new features should be shipped to the user as soon as they are ready. A new feature only starts providing benefit when it is being used by real users, and we’re a startup; trying to find market fit involves quickly validating ideas. We want to understand who our users are and what problems they want solving. No matter how closely the Songkick staff represent real users - and they are definitely all real users - nothing compares to seeing a new feature being used across the world by millions of fans and artists.

At the time our process involved the PM defining a new feature, such as a new page footer to encourage signups or a feature allowing users to import their listening history from Spotify. The Design and UX team would prototype some options and create visual designs. A developer would implement the feature and then we would do some testing. Once the feature was live we could monitor usage and identify feature improvements.

This approach succeeded for some time. The team works well together and by involving all the different roles we were able to design, implement, test and ship new features within several days. However, the problems started as Songkick scaled up. With more developers we were able to build more features and slowly we realised that our release process was a serious bottleneck.

Each release required both regression testing (particularly around frontend code) and feature testing to ensure PM and Design team expectations were met. We ran test suites of unit and integration tests over multiple EC2 instances to keep our total build time under an hour, the downside being flaky builds which often required re-runs. Once on our Staging environment it was accepted that we would need at least one bug fix - and that bug fix would necessitate a re-build and more testing to look for new problems caused by the fix.

Like many development teams, we did not explicitly design our build and release process. We had a Git repository and Jenkins managing Continuous Integration but mostly we responded to problems:

  • When a release accidentally went out without the final copy we decided all releases needed a copy sign-off.
  • When a release didn’t meet design expectations we decided all releases needed a design sign-off.
  • Problematic releases which contained multiple features caused extra stress - which change had caused the problem? Were other features in the release delayed because of this one thing? Often the cause of delay would be a trivial bug fix implemented as a personal project, particularly frustrating and difficult to predict for the non-technical team. Eventually, we decided that each release should only contain code from a single story.
The Songkick release process including average build and test durations

On and on it went until we reached the point where each release was pretty predictable but we were severely limiting our freedom to make a release. Queuing up releases causes obvious delays in getting code out to users but it also impacts the development team’s ability to improve code. At the same time as our releases dropped off, with even trivial features taking 1-2 weeks to release due to queueing, we were hiring to expand our development team. It was incredibly frustrating to have to admit that we had reached our release limit - adding new developers was not going to help us move faster, because we couldn’t get completed features out to users.

Something had to change.

Up to that point we had released features as soon as we could. We knew that we needed to get features out to users as soon as possible but we delayed ourselves by aiming for perfection. Infrequent releasing encourages this behaviour by making it difficult to predict when the next release, and potential bug fix, will go out.

We needed to decide between scheduled releases which would give us control and predictability or a continuous release approach with less control. Scheduled releases sound simpler; everyone knows exactly when the release is expected and there is a clear shift in focus as you move from feature development into code-freeze and testing. However it was clear that regression testing was already a bottleneck. We knew that scheduled releases would tie up all testers until the release date, at the same time the developers would likely be starting to code new features ready for the next release. It felt like we would be creating an unhealthy divide between the developers and testers.

Trusting that ‘bringing the pain forwards’ would teach us how to release code painlessly, and hoping that being able to release when we needed to would help us overcome our fixation with perfection, we decided that a careful adoption of Continuous Deployment could overcome most of our difficulties. Our goal was the unblock our release pipeline in the simplest possible way.

Making the change

After several months of researching release approaches, and talking to other companies who had made similar shifts to Continuous Deployment we were able to begin our transformation. Most of the team were on board to try the change, we had reached a stage where releases were so difficult and so infrequent that it was clear we needed to do something. Almost everyone was fully committed to taking the leap to Continuous Deployment rather than just CI. We knew this would require additional monitoring and make risk harder to manage, so we decided to focus on small steps towards Continuous Deployment.

First up was removing the integration queue and for that we had to solve our unreliable builds. When the development team was small only one developer was integrating code at any one time, and flaky builds were easy to spot. However, as the team grew, broken builds became a serious problem, with developers checking in code on top of broken builds and unrelated incomplete features causing regression issues in one another. To encourage a cultural shift we used annoying (but effective) sound effects to make it known to the whole office when a build was broken. From there we were able to analyse breakages and make changes to improve stability.

Seeing Progress

After a few weeks we were in a great position - we had moved the bottleneck, a strong indicator that you have succeeded in making changes. Instead of having code queued up waiting to be committed we now had too many builds ready for testing. Each release into our Staging environment still required design and copy sign-off as well as feature testing and regression testing. New features were still suffering from inevitable bugs and tweaks which blocked releases and wasted time by duplicating regression testing. Testing took so much longer than our build process that we had multiple green builds sitting on the shelf waiting for testing.

We decided to try and shift as much testing as possible to the front of the pipeline. By bringing the developer environments inline with the Staging environment we were able to perform the majority of feature testing upfront. Moving design and copy sign-off prior to code commit meant that stakeholders did not need to be present for a release.

Focusing on building small pieces of functionality, either as iterative development, or as simplified features, taught us the strength of treating feature launches as separate from code releases. Feature Flippers gave us the tools we needed to make committed code safe to go into Production. If delays were to happen, either because of quality issues, or availability problems, we were only delaying a single developer and a single feature rather than blocking the entire company’s release pipeline.

Seeing the benefits of this process change motivated the developers to make improvements to our automated test suites. Previously we were using Cucumber and had an extensive set of high-level tests. Touching multiple layers of the stack, and in particular the database caused the tests to be slow. Once we realised how difficult it was for developers to run these slow automated test suites locally we prioritised the work to speed the tests up.

From talking to other teams who were successfully using Continuous Deployment we knew that our test coverage was above average. Just as with our features, we were trying to cover every eventuality and falling into the trap of trying to test everything. Despite this we still relied on manual regression testing, and still missed bugs which could have detected with automated checks.

Analysing exactly what we were checking, and comparing this against the bugs we found in Staging, and more importantly the bugs which users contacted us about allowed us to understand the gaps in our coverage. By considering the risk of different parts of the website, looking at the impact on the end-user, and using this to focus testing on the high and medium risk areas we were able to simplify and speed up our test suites. Many of the checks were rewritten as lower-level unit tests, which meant a focus upon small chunks of code rather than entire user journeys. Our slower running Cucumber tests became focused on journeys we, and our users, actually cared about. Finally we seemed to have reached a place where Continuous Deployment might be a possibility.

Dealing with uncertainty

The non-technical members of the Songkick team were understandably concerned about Continuous Deployment. We were used to working in a way which allowed for plenty of manual checking. Our designers would craft beautiful frontend designs, we would try to give users the best possible user journey, and each word of our copy was precisely chosen. Now it seemed we were proposing a radical approach of simply writing code and pushing it live. One way we addressed these concerns was with a test suite of business defined automated user journeys, and we agreed we would never release if any of these journeys was broken. To make the checks as realistic as possible we used Selenium and Capybara to execute them, via a web browser, against our Staging environment. This gave us enough confidence in a build to consider releasing to Production without extensive manual testing. Suddenly after months of improvements we had reached Continuous Deployment.

Releasing with Continuous Deployment

One of the challenges of Continuous Deployment is handling the cases where you need more testing than is feasible on a developer environment. Often this is caused by environmental limitations, particularly for performance or security testing. We tried to avoid stopping our release pipeline by testing high risk changes in Production. Feature flippers mean we can run the code on live servers, using live data, but test them as if we were working on a test environment.

Keeping the Peace

One benefit of having a fast release process is the ease with which issues can be fixed in Production. To help us detect issues as quickly as possible we significantly increased our Production monitoring, including adding alerts which would fire if the number of errors on the Production servers increases significantly.

Each day the development team receives an email summarising all Production errors from the previous day; our goal is to reach an empty email.

Using methods like this to visualise blockers or smells is an effective way to keep each other accountable. Together we share the goal of investigating and fixing issues.

Did it work?

Our original goal was to unblock our release cycle. After several months of dedicated work to improve our code, tests, environments, and culture we reached a point where developers no longer faced commit or release delays. We had advanced to a place where we could release code even if a key person was unavailable at the time of the release, we knew how to recognise and respond to issues, and our process openly acknowledged and considered the level of risk.

At Songkick a quality feature is the right feature. We need to understand who our users are and we need to build the right feature for them. The speed and ease of Continuous Deployment provided the unexpected benefit of allowing us to embrace experimentation. We can quickly build a ‘good enough’ version of a feature, release it, and then within a day or two we have enough data to decide if the feature should be implemented for real. Every experiment must have a defined end date, nothing can live on indefinitely until we see the results we would like to see, and the code experiment must be deleted and implemented as live, maintainable code. This clear distinction helps us define the amount of design and testing these experiments need.

At the beginning of 2011 we were in a difficult place, struggling with an unwieldy process and codebase. By reviewing what we were doing, and really considering why, we were able to make small incremental steps towards a better place. Keeping in mind our end goal, no matter how ambitious it seemed at the time was the key to working through difficulties while guaranteeing that we arrived at the right destination. Most significantly of all we learnt how to identify and implement improvements in our process and our products.

About the contributor