DevOps in an Enterprise environment - Jan-Joost Bouwman
Jan-Joost on Twitter: @JanJoostBouwman
Timeline: early 2012 to July 2014
This is the story of the journey of ING Bank Netherlands from a process oriented to a content driven DevOps organisation.
Something about ING
ING is a global financial institution, with its roots and headquarters in the Netherlands. ING, consisting of ING Bank and NN Group, offers banking, investment, life insurance and retirement services. On the Banking side some 63.000 employees work for over 32 million private, corporate and institutional customers in over 40 countries in Europe, the Americas, Asia and Australia. This story is about the IT department of the Dutch Retail Bank and its journey towards DevOps. A department of over 1800 people, working both in development and application service management, currently with 180 DevOps teams working on some 2000 application services in several hundred value chains of different complexity.
Reasons leading to the transition
The reasons that led to the transition from a process oriented to a content driven DevOps organisation were those of most established banks in mature markets, especially those that grew through mergers and take overs in a time that the market was still expanding. In an expanding market with good profit margins there never was the urge to consolidate brands, departments and IT applications. ING (Internationale Nederlanden Groep) was no exception, being the result of a merger between privately led NMB Bank Nederland and state bank Postbank and later insurance company Nationale Nederlanden. The first merger was in 1989, with the merger with NN in 1992 but for years the two banking brands, with their separate IT landscape existed next to each other with little interaction between the two. When the notion struck that the market was levelling off it was time to look at cost. Both the cost/income ratio and the percentage of maintenance cost of the total were on the high side. One of the first things to do was to finally merge the brands, not just in the high street, but more importantly in the data center. A ‘target end landscape’ of applications was chosen, and a massive data migration was started to merge the two datasets. This migration was done per client segment, because different segments use different applications. The final segment of business clients was migrated in Q3 2014.
Although this data migration programme will reduce cost greatly in the future, on the short term it didn’t. So alongside it a decommission programme was started in 2010. This programme looks at all applications to see which are not used anymore and where there is overlap in functionality. An applications that had its data migrated to the same application on the other side of the bank are the obvious candidates for decommissioning, but it also turned out that there were a lot of applications with just a few users, who only used it as a backup for another system. So far 738 applications have been decommissioned and a target of a total reduction of 50% of the current number of applications is set for when the programme will be concluded in 2016.
These two programmes helped reduce the maintenance versus new functionality ratio by reducing our IT footprint, but they did not solve the problem yet that any new functionality took months to reach production. For a typical application the time between the initiation of a project as an idea and the release into production was at least a year. Furthermore, when new functionality was finally released into production it often had quality issues or did not deliver the value that the business had hoped for. The complexity of our IT landscape played a major part in this, but also the way we developed with Waterfall contributed.
To sum it up: we were facing three challenges:
- the need to reduce cost
- the need to reduce time to market
- the need to improve our quality
We soon learned we were not unique in this.
The process organisation
To fully understand where we were coming from I need to explain a little bit about the ‘process organisation’ we were in the past. For Service management it really took off when we introduced a new Service Management tool in 2007 including processes based on ITIL V2. It took a long time to implement, and initially it wasn’t hugely successful. But what it did was replace a number of different tools and process implementations with a single tool and a single process for both the Netherlands and Belgium. For ING at the time this was massive: we never did anything together, and most departments took their own decisions for tooling. There was a Tooling and Methods department, but it wasn’t hard to ignore them.
Then we got a new IT Governance structure. Processes were to be governed by no less than four layers: at the top the Service Management Council, with senior management from all domains involved (domain manager). Below it the Process councils with representatives at mid-level management (department manager) with a process expert. There were councils for Incident, Problem, Change and Configuration management. The third level was at domain level were the same mid-level manager chaired a meeting with all process managers from the departments within that domain. At the lowest level the process manager had his operational meetings with his process coordinators in the teams of that department.
For Change Management on the operational level this meant departmental CAB meetings for every Service Management department, formally chaired by the Department Manager (or delegated to a team manager or the Change Manager), and a joined CAB meeting for all of DB/CIO chaired by the Head of Service Management, with representatives of every department. In this Change Control Board all high risk or high impact changes needed to be approved before deployment. Next to that there were separate process improvement meetings.
An attempt was made to implement the same governance structure on the development side based on the CMMi framework, but due to the complex implementation that was chosen with a huge number of different processes (not unlike ITIL V3!) it never got the support or adoption rate as the Service management structure.
When we started preparing for the migration I was a Change manager in a department of three teams in a domain of three departments (Channels). By the time we were really up and running we had reorganised into five of the six (service management) departments of what was going to be IT of Domestic Bank Retail NL (the sixth joined us mid 2012). And by now I was the lead process manager representing the department in the process council, first as the expert, later as the sole representative of my department. One of the first tasks we undertook was to make proper process documentation in the form of Standard, Rules and Guidelines, signifying what was compulsory and what was good practice and how we wanted to work together. A huge part of that document was copied from the way my own team had been working previously. For the first time we had a single description of the process, the roles and responsibilities, instead of 10 different ones. The next step was to consolidate the non-functional requirements. First within my own division, later using that standard as a blueprint for the rest of ING.
At the same time senior management had asked the Change management community to think about ways to improve the collaboration between development and operations parties. They were also concerned with the number of waivers requested to deviate from the standards. Although this usually meant quicker delivery into production for our customers it also meant a potential increase in operations cost in the future. This proved to be the momentum we had been looking for to push our non-functional requirements out into the rest of the organisation as Generic Acceptance Criteria. In the design of this so called Tollgates process I collaborated with my CMMi colleagues to avoid checking the same thing twice. Once by the project and once by the acceptant on the service management side. Looking back this was the first step towards Lean/DevOps: we eliminated double checks and instead had both our own requirements we were responsible for. And we were forced to trust each other’s professionalism in delivering the goods!
Starting the journey
Lean and Agile principles get introduced into the game
In 2010 the first development teams started experimenting with Agile/Scrum. There had been a history of XP (eXtreme Programming) in that department to improve output and Agile/Scrum seemed the logical next step. The first steps were hard and didn’t seem to deliver a lot. But after a few sprints velocity increased. And other teams saw how happy the people in the Scrum pilot teams were. So it started spreading. By 2012 Agile/Scrum was the official way of working for the development teams, and it became increasingly clear that management was aiming for a merger between the development and the service management teams. Some teams had already started with some Lean service management as early as 2009. The operations departments on the business side had had quite a bit of experience by that time with Lean, and some of the Blackbelts that had been improving the flow of business processes (like ‘opening a new account’) started analysing the flow within the service management departments. Senior management started talking a lot about Agile and Lean, and even the first copies of the Continuous Delivery book started showing up. All managers were sent to a Lean boot camp and were made responsible for implementing Lean Service Management in their own department. The service teams started using Kanban boards to track the progress of their incidents, problems, and changes.
Moving from process to content
Around the same time Peter, the manager of the Service Management department started pushing people to read Continuous Delivery by Humble and Farley. In our Change Control Board I teamed up with him to get people to start on deployment scripts, monitoring and automated testing. I had worked closely with him on the Generic Acceptance Criteria before and we knew we made a good team together. Peter as the visionary, pushing for change, me as the process man dotting the i’s, making sure everybody followed through on their promises. At the same time he started challenging people to make their change approval decisions based more on the content of the changes rather than whether the process steps were followed. For a lot of my colleague Change Managers this was new, because they were trained to check mostly if people followed the process. Not everybody knew a lot about what their department was actually changing. The same went for the process managers for the other processes. A lot of the Incident managers were very good at managing the expiration dates of the incidents in their queue but wouldn’t be able to solve an incident themselves.
In retrospect the next step was obvious, but for me and a lot of my process management colleagues it came as a bit of an unpleasant surprise anyway: by September 2011 it was announced we were getting rid of the process management positions. All employees should be able to work with the ITIL processes now and needed to take their responsibility for their actions. There was no more need for fulltime process managers. As it turned out, management was right. An analysis I did over the incident and change records of the period of transition from 2012 up to June 2014 showed no significant increase in the number of incidents. Of course the number of incidents on its own isn’t proof enough, but the parallel increase in number of changes at least proves that we are more successful in doing changes. My work as lead Change manager must have made an impression with senior management, because I was promoted to Process owner. Now I really was responsible for a smooth running Change process within DB/CIO, without having the day to day operational responsibility for Change management in a department. The move towards content over process in Service management as well was a logical step in light of the adoption of Agile/Scrum by development. For some time preceding the exit of the process managers senior management kept asking us whether a decision was based on content or on process; whenever the answer was ‘process’ they would emphasise the need to look at the content.
Introduction of DevOps
Still, these steps didn’t solve our original problem completely: we still needed to improve quality, reduce cost and improve our time to market. The next step was that in Q3 2012 the announcement was made that from May 2013 we were making the transition from two organisations of Agile/Scrum development plus Lean Service management to a virtual DevOps organisation based on Agile/Scrum methodologies, leading up to the ideal situation of Continuous Delivery, using a host of state of the art (preferably Open Source) tools. Continuous Delivery was impossible in the current organisation because of the inability of the Service teams to cope with a faster pace. Development was Agile, but after a number of sprints the result was being handed over to a service team for integration and testing and delivery into Production. In effect we were doing a very classic ‘scrum-fall’.
No more! The DevOps teams were going to be responsible for both new functionality and stability of production as a team; every sprint should lead to shippable (i.e. production-deployable) software. Whether it would be deployed would be up to the Product owner, as a representative of the Business. A team would consist of an average of 6 DevEngineers and 2-3 OpsEngineers. Because most value chains run across more than one team a coordinating role of Integrator was also created, together with a Blue Print Expert for more of an architectural viewpoint. Dev and Ops still had their own line management though, although in theory the teams would be self-organising (as per Scrum) under a Scrum Master. The role of Scrum Master was to be absorbed by the (Dev) Engineers in time.
All of the 1800 employees of DB/CIO had to apply for one of those positions, including team managers. And for all positions the emphasis was on technical skills. Only department managers and up remained were they were, along with me and my change managers for all six departments (plus a handful of other support staff). A current lack of technical skills would not be a barrier, as long as people had both the willingness and the ability to overcome the perceived gap in the next 18 months.
Naturally this announcement made some people nervous, especially in the service teams, where there was limited knowledge of Agile/Scrum, especially in some teams where the level of technical skills was low. People had a good functional (banking) understanding of their application (which you could argue should have been on the business side), and they had a good network to get problems solved. But they did not have in-depth knowledge of the platform it ran on, what the architecture was, what database it ran on, what communication protocols were used to communicate with other applications etc.. People asked to be part of one of the pilot teams that were set up to get some experience; other people tried to set up their own pilot teams. Internal Agile/Scrum training days were very popular, as were the introductory presentations on Continuous Delivery. When there was a seminar on DevOps in Amsterdam about 50% of the participants that day were from ING, including me, my team manager, her manager and a host of other colleagues.
Getting back to the office after that seminar I thought about my search for knowledge and that of my colleagues. I decided that what we needed was a community on our internal social network that had just started, where we could talk about the transition and share documents and blog posts from all over the internet. So together with two colleagues I opened our community and started posting links to interesting blogs, copying white papers and sharing relevant presentations from Meetups and DevOpsDays. The community has been running for over 2 years now and without any real communication about it we now have one of the largest active communities with close to 500 members from not just DB/CIO but all of the other departments in the Netherlands and most of the IT departments in other countries.
One of the people we met on that seminar was Kris Buytaert, a DevOps and Open Source evangelist from the earliest hour. We really liked his down-to-earth approach to DevOps, which stood out from the other presentations which were mostly by sales people. When we explained to him what we were planning to do and asked for his help as a consultant he thought we were joking, especially since the department that we wanted him to get started on was heavily into Mainframe. DevOps in a bank, on Mainframe and starting with 150 teams? We certainly picked a challenge! Still we insisted that he should come over to talk to some of our management and finally we convinced him that our motives were genuine and we really could do with some help. His help was perhaps subtle. He talked a lot to middle management, convincing them that it could be done. And helped a few talented engineers to set up some sort of rudimentary Continuous Delivery Pipeline, which could be presented in a DevOps training to their colleagues, creating momentum for the transition in the Mortgages department.
Joining the DevOps community
Through Kris we also discovered DevOpsDays. A couple of colleagues went with Kris to DevOpsDays Paris and came back overflowing with ideas. The first one was organising an internal DevOpsDay in June 2013. It had the same set up as the normal DevOpsDays, but all in one day, with some external speakers (which included some big names of DevOps who were in town for the DevOpsDays Amsterdam), some internal and in the afternoon Open Spaces.
The Open Spaces with great open discussions were a revelation to a lot of people. We liked it so much that after a couple of months we organised another one, but now with demos of tools instead of talks; a few months later we ran a ‘DevOpsDiscovery’ with small workshops where people could actually ‘feel’ some of the tools and get answers to some of their questions from colleagues who were already using them. In 2014 we had our third DevOpsDay, combined with the Amsterdam DevOpsDays, where I did a talk on our transition. Through the DevOpsDays organisers I and a few of my colleagues also became regular visitors of the DevOps Meetup group in Amsterdam, more recently contributors and sponsors.
Ambitions for the future and problems we are facing
We have made tremendous progress! Every conference I visit I am reminded that we really are doing wonderful things! Within DB/CIO we now have 180 DevOps teams running Sprints of 2-3 weeks, resulting in Potentially Shippable Increments. Ideally we ship every Sprint. We have made an enormous step towards Continuous Delivery, with automated building, testing, deployments, with monitoring. We are making good progress on automated provisioning in our private cloud. Not all teams have made the same progress, but all teams have made a huge step. In all fairness, some teams had it a little easier than others, but there is room for improvement: not so much on the tooling side, but more on the organisation side.
Hearing all voices
Our transition was very much a Top Down approach. Senior management invested quite a lot of time to convince employees that the journey we were embarking upon was exciting and well worth the effort. However, not everyone is ready to move at the same pace, and many of staff had seen a lot of reorganisations over the past years. Why would this transition be any different to the preceding ones, that didn’t really change anything about how we did our work but only replaced management? So change can be hard, and it takes time, more time than we sometimes realise, especially if you have been working on the transition for so long and most of the people you talk to are very enthusiastic about it. Make sure you also hear the people who are not totally convinced, and keep asking for their feedback. They may think it harmful for their career to speak up, because they know that more reduction in staff is inevitable.
One message, made relevant a 100 ways
It is vital that all layers of management have the same message. The problem you often see - one that is common in large organisations - is that the message gets watered down as you pass it downwards in the hierarchy. It is impossible for a manager of 1800 people to talk to each and every one individually on a regular basis. It is even difficult to talk to all of his team managers on a regular basis; every time the message passes one layer of management it gets digested, interpreted and retold in a slightly different way if you are not careful. If a manager fully understands what we are aiming for and knows what that means for his department and is able to interpret the message in a meaningful way there is no problem passing it down; luckily most of our managers do understand. But we have to be realistic – just like some employees some managers may have trouble fully understanding why we are doing what we are doing and what it is we are aiming for. They are experienced managers so they still know how to manage a team. But they may not be able to inspire their team to reach their full Agile potential.
The balance between manager and coach
It is important to keep focussing on coaching your teams rather than managing them, which is tough when team leads are selected for their technical knowledge. They need that technical knowledge to be good coaches, but this doesn’t always come with good management skills. Coaching does not mean getting up to your knees in the mud of content, solving your teams problems for them; instead, You should be coaching your team to solve their problems themselves, even when that sometimes takes longer than when you would be directive.
Changing our environment
In Agile/Scrum teams are meant to be self-organising, and so far the results seem to suggest that to be true: the most successful teams in the transition are those that can organise their own work with minimal interference by management. But at the same time we haven’t changed our entire bank; some of the departments we work with don’t understand Agile and we still launch 3 year programmes with very ambitious goals. How are we going to get those programmes embedded in the Product Backlogs of all our teams? And in such a way that they are not too dependent on the outcome of the other teams? How do we stop management from pushing for results on these programmes when we still make management responsible for delivery? We still need to do regular internal audits. Have we updated the rules we have to follow to our new way of working? And if there are audit findings, why do we make a manager responsible for solving it? Why not the Product Owner of the application? He is the one to put it with proper priority on the backlog!
Agile/Scrum teams are focused on delivering quality software that is actually working at a high but sustainable pace. That almost automatically means that retrospectives are internally facing: how can the team perform better, how can we work together more efficiently, etc. For a large organisation like ours that is not sufficient. Multiple DevOps teams have to work together in one value chain for our clients. There are virtually no teams that can deliver new functionality to their stakeholders without touching other teams. Yet cooperation between teams is wanting at times.
The Scaled Agile Framework may help with delivering larger chunks of functionality (as an Agile replacement for traditional Programme Management). So may organising a Scrum of Scrums. We even have seen ‘super’ Product Owners being appointed to keep an eye on the bigger picture of combining different Product Backlogs in a Value Chain. But if this also works for solving incidents? And what about knowledge sharing? Yes, all teams are using Confluence to keep track of their team knowledge. Some engineers even know how to find some information on there from other teams, but a lot of teams operate as silos, inventing stuff themselves.
Through our internal community and through organising events like Meetups and Hackathons we encourage people to share, but I know we are not reaching everybody yet and maybe we never will. It will take a lot more effort and a lot more broadcasting before we can safely say that there really are collaborative communities of engineers, who share knowledge freely, even when it doesn’t directly benefit their Sprint Backlog.
Ambitions for the future
We are now almost 2 years into our journey towards Continuous Delivery. The other IT departments in the Netherlands are now also starting with the adoption process, including the infrastructure teams. For some of our own teams the road to Continuous Delivery has only just begun: they have only just begun using automated deployment and testing. Our ultimate goal is that all teams have a fully functional Continuous Delivery Pipeline, where code is built automatically upon check-in, after a validation that it lives up to the coding standards. The executables will be placed in an artifact repository with all necessary configuration files. The artifacts will be deployed to the test environment (which may be provisioned on the fly in our private cloud solution), tested automatically both for new functionality and for regression testing. After passing regression testing, the artifacts will be promoted to the Acceptance environment where the acceptance testing is done. If acceptance testing is also passed successfully and there are no remaining issues with the Non Functional Requirements the new version will be released into production, if the switch for automatic updates is on ‘yes’.
For the Non Functional Requirements we are currently building a tool where the teams will update the status for each requirements and be able to store all test evidence as well. This will help us maintain control over all our applications from a Risk Management perspective and aid in delivering the required in control statements to our regulators. At the same time it will automate most of the current Change Management process, facilitating the teams towards the goal of Continuous Delivery.