9. Making the Refactor Stick – Refactoring at Scale

Chapter 9. Making the Refactor Stick

A little over a year ago, a friend of mine named Tim decided to stop consuming sugar altogether to help him shed a few pesky pounds and regain more energy. The first week was tough; he felt lethargic and craved anything sweet, but by the end of the third week, the sugar withdrawal had abated and he began to feel peppy again. Shortly afterward, the benefits of the new diet began to creep in: he felt more alert throughout the workday, and he lost a few pounds.

After that, sticking to the diet was his biggest challenge. Tim had seen his friends try and fail to stick to a diet, so he knew that he needed to set realistic expectations for himself. To eliminate the temptation, he banished any sweet food from his apartment. He kept a regular food journal to keep himself accountable, but allowed himself the occasional treat when meeting up with friends. Two months into his journey, his partner joined him on the sugar-free journey, and together they were able to better support and encourage one another. Today, Tim is in much better health and his energy levels are only rivaled by his puppy.

Refactoring is a bit like taking up a new diet and sticking with it. Although it might seem like the greatest challenge is figuring out the change to make and implementing it, equally significant effort is required to ensure that the change lasts. In this chapter, we’ll look at a variety of tools and practices we can adopt to ensure that the improvements we made with our at-scale refactor are as long-lasting as possible. We’ll examine how to encourage engineers across the organization to embrace the patterns established by the refactor and how to use continuous integration to continue to promote their adoption. We’ll talk about the importance of educating fellow engineers by doing a post-refactor roadshow. Finally, we’ll touch on how to integrate incremental improvement into the engineering culture so that, hopefully, fewer large, at-scale refactors are needed in your near future.

Fostering Adoption

Quite often, a large number of engineers will need to interact with your refactor. You need these engineers’ support for the refactor and the patterns it established for two reasons.

The first is to ensure that the changes it introduced persist long-term. Expansive refactors can be polarizing; frequently, within any company of more than just a few individuals, there are both avid supporters and opponents of the chosen design. If the opponents of the design refuse to write new code following the new design/patterns, they’ll find ways to avoid doing so and generate new cruft at the boundary between the changes made by your team and their own code. Ultimately, this build-up could render nearly all the benefits of the refactor meaningless.

Even if you plan and execute a quality refactor, not everyone will understand or agree with your vision. For newcomers to the engineering team, the problems the refactor attempted to solve may not be abundantly clear. When fellow engineers do not have the necessary context to properly appreciate the outcome of a refactor, they may struggle when working at its perimeter. They risk incorrectly implementing the new patterns it introduces, or fail to use them at all in situations when the code would greatly benefit from them.

The second reason you need engineers’ support is to enable the further permeation of the patterns established by the refactor throughout the codebase. You not only want the changes you introduced to remain, you also want them to inform future decisions made by engineers working in the codebase for months, perhaps years to come. Consider a simple analogy: a refactor is just like weeding an overrun vegetable garden, turning over the soil, and planting a few scallions. Maintaining the scallions would be our first goal, and encouraging our family members to plant other vegetables of their own into the newly replenished soil would be our secondary goal.

For example, a team refactoring the primary logging library used throughout its extensive codebase, after more than a few mishaps with engineers accidentally leaking personally identifiable information (PII) into their data processing pipelines, rewrote the library’s primary interface to refuse arbitrary strings. If developers wanted to log a new field or create a new log type, they now had to register it in the logging library and then use it accordingly. Instead of replacing each individual callsite in the existing logging library, the team decided to scope down the refactor and simply modify the logic of the existing library to call into the new one.

Some engineers at the company were reticent to lose the flexibility that comes with being able to log arbitrary strings. Engineers coming from previous companies with more flexible logging might also be confused about why a new logging framework would purposefully introduce these limitations. Without properly communicating your motivations to these engineers, and working with them to address their frustrations, you risked them finding inventive ways of working around the safeguards built into the new logging library, thus further increasing the risk that PII would be leaked into your data processing pipelines once more.

Even if engineers accept the changes brought about by the refactor, they may not be in favor of actively converting existing callsites to use the new library directly. They may also be apathetic about adding new log fields and types to the new library, choosing instead to use existing fields and types for a broader range of logs, thereby diminishing their specificity. By making it extremely easy to extend the logging library, and then teaching engineers how to do so, you’ll ease their transition and, hopefully, increase overall usage of the new library throughout the codebase.

While there a number of ways we can encourage adoption of the refactor across our engineering organizations, the following methods are the ones that work best in my experience. The first is to build ergonomic interfaces for engineers to use when interacting with the newly refactored code. These interfaces should be defined early in the project’s execution and be further refined throughout development. You should be gathering feedback from both your teammates and trusted peers across the engineering organization on how the boundary between the refactor and the remainder of the codebase could be made more ergonomic. If you’ve wrapped up the refactor and haven’t sufficiently vetted your interfaces with their future users, set up a workshop with a few engineers from distinct product areas and work with them to iterate on the interfaces.

The methods we’ll look at more closely in this chapter are most effective post-refactor. These include teaching engineers about the refactor using the documentation you’ve crafted, and finally, carefully reinforcing usage of any new patterns introduced by the refactor to encourage continued adoption.

Education

There are two primary methods of educating others about your refactor. The first is active; this includes planning and leading workshops or similar training to engage actively with engineers. The second is passive; this includes step-by-step tutorials engineers can walk through on their own, or short online courses through your company’s learning platform.

Active Education

An active educational component is most important when the refactor affects a critical portion of the codebase that is used frequently by other engineers from a range of teams. Engineers who are accustomed to an existing set of patterns will need to familiarize themselves with a whole new way of doing things.

Workshops

One of the best ways to ensure that engineers can work effectively with the refactored code is to engage with them in a forum that requires them to work interactively through code samples and ask questions as they learn how to interface with the refactor. A significant advantage of holding workshops is that it encourages busy engineers to deliberately set aside time to get up to speed; some of us are involved in so many different tasks that we would otherwise never manage to prioritize informing ourselves about the refactor.

The time to educate engineers actively about how to interface with the refactor is once it’s been newly completed. You don’t want engineers coming in to learn new code and patterns when there’s a risk that it might still be in flux or it hasn’t yet been fully cleaned up and prepared for use by individuals who aren’t intimately familiar with the details of the refactor. Take the time to verify that everything is in order before scheduling your first workshop. Better yet, do a dry run of the workshop with your team to iron out any kinks before opening it up to your peers.

These sessions shouldn’t be held in perpetuity. Ideally, within a few months, most of the engineers most significantly affected by the refactor should be well acquainted with it. At that point, the refactored code becomes the new normal, and demand for help understanding it should dramatically decrease. Consider holding just two or three workshops, and keep an eye on the interest level and subsequent attendance. Live trainings, as engaging as they might be, are incredibly time consuming for your team and should be held only a handful of times. If demand continues after more than just a handful of sessions, you may want to invest in improving your documentation and leaning on it more heavily.

In practice, because just about every engineer uses logging in their regular workflow, our previous example would a perfect candidate for a training session. Here’s how it could be structured:

  1. Give a quick overview of the goals of the refactor. To communicate its impact effectively and excite your coworkers to take advantage of it, talk through the most compelling examples. With the logging library, for instance, you might show a few misleading log statements responsible for leaking PII over the past few months; then, demonstrate how to use the new logging library to prevent this information from being leaked altogether.

  2. Next, to cement these concepts, pair up the attendees and ask them to migrate the same simple log statement to use the new library. Answer any questions as they arise. There may be more than one solution here; if there is, have the pairs explain their distinct solutions.

  3. Finally, have the pairs choose a more complex log statement to migrate, ideally one that requires extending the log library (by either adding a new log type or field type). Check in with each group and answer any questions they might have.

Office hours

Office hours can be an equally helpful forum for actively educating your colleagues. They give engineers an open opportunity to drop by and ask you and your team questions about the refactor and its adoption in their specific use cases. Not everyone who will interact with your refactor will have time (or interest) to attend a workshop; having office hours when they can have your team’s undivided attention will make them more likely to have a positive experience adopting the changes implemented by the refactor. Furthermore, previous workshop attendees can drop by and get additional guidance if necessary.

One of the advantages of hosting office hours is that it enables your team to time-box the amount of time they spend answering questions pertaining to the refactor. Your team may start to get bombarded with requests from colleagues across the company as soon as the refactor wraps up. If you aren’t judicious with your time, these questions could easily monopolize your attention (not to mention disrupt your day with frequent context-switching.) By diverting all nonurgent requests to your office hours, you are protecting your team’s time and focus.

Keep track of the questions and concerns your team addresses during these office hours and use these to write an FAQ. This document will help save your team valuable time repeatedly answering the same questions both during office hours and beyond.

Engineering gatherings

Many engineering groups host regular open forums (e.g., Thursday afternoon Drinks and Demos, or bi-weekly Lunch and Learn) where engineers can present about the work they’re spearheading. Large refactoring projects often come with a number of interesting stories: the mind-boggling, load-bearing bug the team uncovered, the terrifying encounter with code last modified 15 years ago, the deploy gone wrong. Most of us genuinely enjoy hearing one another’s stories about our experiences in the code we share, and we tend to vividly remember the particularly good ones.

Sign up to give a short talk to your peers about a compelling portion of the refactor to make them aware of the project and curious to learn more about its motivations and how they might benefit from it in their areas of the codebase. Sometimes, a little bit of great storytelling is all the publicity you need to garner the support of your fellow engineers.

Passive Education

In Chapter 7, we discussed the importance of documentation: not only the importance of producing thorough documentation throughout the refactoring process, but also the importance of choosing a medium and organization scheme that works well for your team. Once you’ve reached the final stages of the refactor, your team should prioritize crafting documentation describing the intent of the refactor and how it can benefit fellow engineers working within the same codebase. Per our discussion in Chapter 7, any documentation you or your team produces should be added to your source-of-truth directory.

This documentation can take a number of forms: it can be an FAQ, a short README providing a high-level summary of the project’s goals, or a tutorial. Having documentation you can point curious engineers to helps your team save time answering questions. As previously mentioned in “Office hours”, your team will likely need to answer a multitude of questions from peers throughout the company. Instead of answering everyone individually, your team can instead point them to prepared documentation.

If you intend to write a how-to guide on navigating the codebase post-refactor, I recommend writing it from a historical perspective; that is, ground it in the story of the refactor, starting from the very beginning and concluding with the current state of the world. By discussing the refactor from such a perspective, you can prevent your documentation from immediately becoming outdated. Whenever possible, add dates to give readers appropriate context (even something as broad as a year may suffice). Let’s illustrate this, using our logging example.

  1. Start by giving readers the insight that you and your team acquired by spending the time understanding why the code had degraded before you sought to improve it (see Chapter 2). In the case of our logging library, begin by giving an overview of the initial design and the decisions that informed that design. Talk about how the authors wanted the library to be lightweight and easy to use, and allow anyone to (carefully) log just about anything conveniently.

  2. Discuss how that as the product became more complex, and more engineers joined the team, the risk of leaking PII increased. List recent, serious instances when leaks occurred, demonstrating a growing frequency in recent months.

  3. Describe your solution and how it inhibits PII from being leaked. Compare and contrast the same log statement, using both the previous and new logging libraries. Try to avoid using words like “now,” “currently,” or “today.” Although you may be outlining how the code presently functions from your perspective, there is a strong chance that the code will continue to evolve. By prefacing your explanations with something like “as of September 2020,” instead of “today,” you are future-proofing your documentation.

Reinforcement

Positive reinforcement is a powerful tool. Regardless of proximity to the project, developers across the company will need to be reminded of the patterns established by it (and probably more than once). Here, we have two broader options. You can employ many of the motivational tactics we described in “Motivating individuals” to recognize engineers who are doing a great job of adopting the patterns established in your refactor. Seeing your coworkers being publicly praised for their contributions can lead to a rapid increase in adoption by developers far and wide.

A second option is to automate reinforcement in the development process with continuous integration. With continuous integration, we can kick off a number of processes when an author pushes a new commit, indicates that their code is ready for review, or prepares to merge their changes with the main development branch. A typical setup will verify changes by running a series of tests alongside lints and code analysis tools. We’ll look at both linting and code analyzers and then consider the ways in which you can configure these tools to effectively free your team from needing to actively encourage and monitor adoption.

Progressive Linting

Progressive linting allows you to improve a codebase gradually by only enforcing rules on newly written or modified code. This enables developers to address problems slowly as they arise rather than requiring one or two engineers to patch every instance where the rule would be violated. If your team is replacing one pattern with another, writing a new (progressive) linter rule is an easy way to nudge developers to use the newer pattern and prevent propagation of the deprecated pattern.

For example, as part of the logging library refactor, your team wants to eradicate references to logEvent, which allows for arbitrary strings to be ingested, in favor of logEventType, which only logs specific, non-PII pieces of data. Your team could write a new linter rule that bans any new usage of logEvent, with an error message informing engineers that the function is deprecated and encouraging them to use logEventType instead.

Some engineers are very sensitive about encountering unexpected linter failures. Be certain to adequately communicate the goal of the new linter rule and when it will come into effect so that no one is surprised. Add as much context to the error message as possible so that engineers hitting the error don’t need to pull up any additional documentation to fix it.

Note

Not all languages have extensible linters that allow for developers to write custom rules, and even fewer have progressive linting capabilities built in. Some engineering teams invest in building these tools internally (and, in some cases, later open-source their solutions). If you are using an extensible linter, and are able to write custom rules, a quick way to introduce progressive linting is by running the linter either only on modified files in a given commit or only on the code difference itself.

Code Analysis Tools

Many of the metrics covered in Chapter 3 can be monitored over time, using out-of-the-box code analysis tools triggered at integration time. There is a wide range of both free and paid open-source solutions that will automatically calculate code complexity at different scales (individual functions, classes, files, etc.) and generate test coverage statistics. Many of these solutions are easily extendable so that your team can develop and hook in its own metrics calculations and assert new rules as time goes on.

For example, say your team wants to ensure that no function in the codebase exceeds 500 lines. Your team could configure your chosen code analysis tool to warn or throw an error whenever a change causes a function to cross that threshold. If an engineer comes along and adds a few lines to an existing function, increasing its line count from 490 to 512, they’d be nudged to split up the function into smaller subfunctions before merging their changes.

Gates Versus Guardrails

Each verification step configured in our integration flow can either be a gate, preventing the changes from continuing to move forward, or a guardrail, producing a warning for the code author to consider before proceeding.

Too many gates can be detrimental to an engineering organization: they slow down development and can frustrate engineers (especially if they are unexpected). Say your organization has configured 10 blocking test suites. When a developer is ready to put their code up for review, the test suites kick off in parallel. Unfortunately, about half of these suites take just over 10 minutes to run, and a few of them regularly produce flaky results. Engineers are spending valuable time waiting for their code to clear each of these 10 gates.

Now suppose that instead of setting up gates, the organization instead institutes guardrails; that is, instead of having each of these test suites block progress, the team decides which two or three are truly business-critical premerge, and labels the others as optional. Engineers are now responsible for determining which suites they believe to be most important to their changes, and if the results are flaky, they can choose to ignore them. Of course opting for more guardrails comes with its own risks, and perhaps more bugs may make it out into production, but by and large, I’m of the opinion that we should be trusting our fellow engineers more.

Integrating Improvement into the Culture

There will always be a need for large-scale refactors, as long as none of us can predict how shifts in technologies or requirements will continue to affect our systems. However, I do believe that some large-scale refactors are avoidable, and that we should do our best to prevent them when possible. As we conclude this chapter, I want to leave you with some thoughts on how to build a culture of continuous improvement. By perpetually pinpointing and taking advantage of opportunities for tangibly improving our code, we can hopefully ward off ambitious, disruptive refactors for a while longer.

First and foremost, one of the best ways to maintain a healthy codebase is simply to continue deliberately refactoring small, well-contained portions of code as you encounter the opportunity. We do not want to become drive-by refactorers (see “Because You Happened to Be Passing By”), but instead focus on incrementally improving areas of the codebase owned and maintained by our own team. There are always plenty of opportunities for us to tidy up in our own neighborhood. When we encounter an opportunity for another team to improve their code, we can reach out, leaning toward asking questions to understand their problems better, rather than immediately proposing a solution. Work together to craft a cleaner implementation.

We should encourage and facilitate design conversations on our team frequently, seeking others’ feedback early rather than forging ahead on our own. Code reviews are not only an opportunity for someone to double-check our work, but also a chance for an open discussion about how we can make our solution just that much better. As code authors, we should consider annotating our code reviews with specific questions for our reviewers. As reviewers, we should be just as analytical when reviewing our peers’ code as we are when we are writing code ourselves.

Finally, hold inclusive design reviews early in the feature development process. This means inviting engineers from all backgrounds to evaluate your designs and ask questions. Your reviewers should span all experience and seniority levels; they should include individuals from a wide range of backgrounds. The more diverse perspectives you are able to gather, the more likely you’ll be able to spot fatal flaws early and, ultimately, the more likely you’ll be able to architect a far superior solution.

Whenever you next sit down to work, think critically about how what you do today might or might not lead to a large-scale refactor later. Sometimes, all we need is a little reminder of the potential long-term consequences of our decisions to steer us back in the right direction.