8. Managing Code – Expert Python Programming

Chapter 8. Managing Code

Working on a software project that involves more than one person is tough. Everything slows down and gets harder. This happens for several reasons. The chapter will expose these reasons, and will try to provide some ways to fight against them.

This chapter is organized in two parts, which explain:

  • How to work with a version control system

  • How to set up continuous integration

First of all, a code base evolves so much that it is important to track all the changes that are made, even more so when many developers work on it. That is the role of a version control system.

Next, several brains that are not directly wired together can still work on the same project. They have different roles and work on different aspects. Therefore, a lack of global visibility generates a lot of confusion about what is going on, and what is being done by others. This is unavoidable, and some tools have to be used to provide continuous visibility and mitigate the problem. This is done by setting up a series of tools for continuous integration.

Now, we will discuss these two aspects in detail.

Version Control Systems

Version control systems (VCS) provide a way to share, synchronize, and back up any kind of files. They are categorized in two families:

  1. Centralized systems

  2. Distributed systems

Centralized Systems

A centralized version control system is based on a single server that holds the files and lets people check in and check out the changes that are made to those files. The principle is quite simple: Everyone can get a copy of the files on his/her system and work on them. From there, every user can commit his/her changes to the server. They will be applied and the revision number will be raised. Other users will then be able to get those changes by synchronizing their repository copy through an update.

The repository evolves through all the commits, and the system archives all revisions into a database to undo any change, or provide information on what has been done:

Every user in this centralized configuration is responsible for synchronizing his/her local repository with the main one, in order to get the other users changes. This means that some conflicts can occur when a locally-changed file has been changed, and is checked in by someone else. A conflict resolution mechanism is carried out, in this case, on the user system as shown in the following figure:

  1. Joe checks in a change.

  2. Pamela attempts to check in a change on the same file.

  3. The server complains that her copy of the file is out of date.

  4. Pamela updates her local copy. The version control software may or may not be able to merge the two versions seamlessly (that is, without a conflict).

  5. Pamela commits a new version that contains the latest changes made by Joe and her own.

This process is perfectly fine on small-sized projects that involve a few developers and a small number of files. But it becomes problematic for bigger projects. For instance, a complex change involves a lot of files, which is time consuming, and keeping everything local before the whole work is done is unfeasible.

  • It is dangerous because the user may keep on his/her computer changes that are not necessarily backed up.

  • It is hard to share with others until it is checked in and sharing it before it is done would leave the repository in an unstable state, and so the other users would not want to share.

Centralized VCS have resolved this problem by providing "branches" and "merges". It is possible to fork from the main stream of revisions to work on a separated line, and then to get back to the main stream.

In the figure that follows overleaf, Joe starts a new branch from revision 2, to work on a new feature. The revisions are incremented in the main stream and in his branch, every time a change is checked in. At revision 7, Joe has finished his work and commits his changes into trunk (the main branch). This requires, most of the time, some conflict resolution.

But in spite of their advantages, centralized VCS have several pitfalls:

  • Branching and merging is quite hard to deal with. It can become a nightmare.

  • Since the system is centralized, it is impossible to commit changes offline. This can lead to a huge, single commit to the server when the user gets back online. Last of all, it doesn't work very well for projects such as Linux, where many companies permanently maintain their own "branch" of the software, and there is no central repository that everyone has an account on.

For the latter, some tools are making it possible to work offline, such as SVK ( http://svk.bestpractical.com/view/HomePage), but a more fundamental problem is how the centralized VCS work.

Despite these pitfalls, VCS are really popular amongst open-source developers.

In the open-source world, CVS (Concurrent Version System, see http://cvs.org) has made centralized version control systems very popular in the last fifteen years, and forges such as Sourceforge ( http://sourceforge.net) or Gna! ( http://gna.org) made them available to any public project. Almost all open-source projects use a VCS. Subversion ( http://subversion.tigris.org) is currently the most popular and is used by thousands of projects.

But another kind of VCS has evolved in the last few years, which tries to make things better: Distributed VCS (DVCS).

Distributed Systems

Distributed VCS is the answer to the centralized VCS. It does not rely on a main server that people work with, but on peer-to-peer principles. Everyone can hold and manage his/her own independent repository for a project, and synchronize it with other repositories:

In the last figure, we can see an example of such a system in use:

  1. Bill pulls the files from HAL's repository.

  2. Bill makes some changes on the files.

  3. Amina pulls the files from Bill's repository.

  4. Amina changes the files too.

  5. Amina pushes the changes to HAL.

  6. Kenny pulls the files from HAL.

  7. Kenny makes changes.

  8. Kenny regularly pushes his changes to HAL.

The key concept is that people push and pull the files with other repositories, and this behavior changes according to the way people work and the way the project is managed. Since there is no main repository anymore, the maintainer of the project needs to define a strategy for people to push and pull the changes.

Furthermore, people have to be a bit smarter when they work with several repositories. Since the revision numbers are local to each repository, there are no global revision IDs anyone can refer to. Therefore, tags have to be used to make things clearer. They are labels that can be attached to a revision. Last, users are responsible for backing up their own repositories, which is not the case in a centralized infrastructure where the administrator usually sets back up strategies.

Distributed Strategies

A centralized server is, of course, still desirable with a DVCS, if you're working in a company setting with everyone working toward the same goal.

Different approaches can be applied. The simplest one is to set up a server that acts like a regular centralized server, where every member of the project can push his/her changes into a common stream. But this approach is a bit simplistic. It does not take full advantage of the distributed system, since people will use push and pull commands in the same way as they would do with a centralized system.

Another approach consists of providing several repositories on a server with different levels of access:

  • An unstable repository is where everyone can push changes.

  • A stable repository is read-only for all members, except the release managers. They are allowed to pull changes from the unstable repository and decide what should be merged.

  • Various release repositories corresponds to the releases and are read-only, as we will see later in the chapter.

This allows people to contribute, and managers to review the changes before they make it to the stable repository.

Other strategies can be made up, since DVCS provides infinite combinations. For instance, the Linux Kernel, which is using Git ( http://git.or.cz), is based on a star model, where Linus Torvalds is maintaining the official repository, and pulls the changes from a set of developers he trusts. In this model, people who wish to push changes to the kernel will try to push them to the trusted developers so that they reach Linus through them, hopefully.

Centralized or Distributed?

Choosing between a centralized and a distributed approach depends a lot on the nature of the project and the way the team works.

For instance, an application that is being developed by an isolated team will not need the features provided by a distributed system. Everything is under control in a development server, and the managers will not deal with outside contributors. There are no worries about backing up the work people do. Developers create branches when needed, and then go back to the trunk as soon as possible. They might have a hard time when they need to merge their changes, or when they are working away from an Internet connection, but they are still happy with what such a system provides. Branching and merging does not occur often in such a context anyway.

That's why most companies do not deal with a wider community of contributors. Their own employees are massively using centralized version control systems. Everyone is working together in the same place.

For projects with a broader list of contributors, the centralized approach is a bit rigid, and using a DVCS makes more sense. Many open-source projects are opting for this model now-a-days. For instance, adopting a DVCS for Python is currently being discussed, and this will probably occur soon, since it is mainly a matter of setting up a set of good practices and teaching the developers this new way of working with the code.

In this book, we will use a DVCS and explain how it can be used in project management, together with a set of good practices. The chosen software for this is Mercurial.

Mercurial

Mercurial ( http://www.selenic.com/mercurial/wiki) is a DVCS written in Python that provides a simple, yet powerful, command-line utility to work with the code.

To install it, the simplest way is to call easy_install:

$ easy_install mercurial

Note

Under some versions of Windows, the script generated in Python's Scripts directory is wrong and hg is not available at the prompt. In that case, you might want to rename it to hg.py and run it as hg.py in the prompt.

A specific binary installer can be used if you still encounter problems.

See http://mercurial.berkwood.com.

If you are under systems such as Debian or Ubuntu, you can also use the package system provided:

$ apt-get install mercurial

A script called hg is then available at the prompt with an exhaustive list of options (truncated here):

$ hg -h
Mercurial Distributed SCM

list of commands:

 add          add the specified files on the next commit
 ..
 clone        make a copy of an existing repository
 commit       commit the specified files 
 copy         mark files as copied for the next commit
 diff         diff repository (or selected files)
 ...
 incoming     show new changesets found in source
 init         create a new repository in the given directory
 ... 
 pull         pull changes from the specified source
 push         push changes to the specified destination
 ... 
 status       show changed files in the working directory
 ...
 update       update working directory
 
use "hg -v help" to show aliases and global options

Creating a repository is done with the init command in a folder that will contain the repository:

$ cd /tmp/
$ mkdir repo
$ hg init repo

From there, files can be added in the repository with the add command:

$ cd repo/
$ touch file.txt
$ hg add file.txt

The file is not checked in until the commit (ci) command is called:

$ hg status
A file.txt
$ hg commit -m "added file.txt"
No username found, using 'tziade@macziade' instead 
$ hg log
changeset:   0:d557683c40bc
tag:         tip
user:        tziade@macziade
date:        Tue Apr 01 17:56:41 2008 +0200
summary:     added file.txt

The repository is self-contained in the directory it was created in, and can be copied in another directory with the clone command:

$ hg clone . /tmp/repo2
1 files updated, 0 files merged, 0 files removed, 0 files unresolved

This can also be done through SSH to another machine if it has an SSH server and Mercurial installed:

$ hg clone . ssh://tarek@ziade.org/repo
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changeset with 1 change to 1 files

The distant repositories can then be synchronized using the push command:

$ echo more >> file.txt
$ hg diff
diff -r d557683c40bc file.txt
--- a/file.txt Tue Apr 01 17:56:41 2008 +0200
+++ b/file.txt Tue Apr 01 19:32:59 2008 +0200
@@ -0,0 +1,1 @@
+more
$ hg commit -m 'changing a file'
No username found, using 'tziade@macziade' instead
$ hg push ssh://tarek@ziade.org/repo
pushing to ssh://tarek@ziade.org/repo
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 1 changes to 1 files

The diff command (di) is used here to display the changes made.

Another nice feature provided by the hg command is serve, which provides a small web server for the current repository:

$ hg serve

From here you can point your browser to http://localhost:8000. You will get a view of the repository, which will be similar to the following:.

Beyond this view, hg serve will also provide access to other users who want to call the clone and pull commands:

$ hg clone http://localhost:8000 copy
requesting all changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 1 changes to 1 files
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
$ cd copy/
$ ls
file.txt
$ touch file2.txt
$ hg add file2.txt
$ hg ci -m "new file"
No username found, using 'tziade@macziade' instead

The clone command allows copying a repository to start working on it.

hg serve will not allow people to push changes, as this requires setting up a real web server to handle authentication, as we will see in the next section. But it can be useful in some situations where you want to temporarily share a repository for other people to pull it.

Note

To go deeper in Mercurial, an online book is available for free at http://hgbook.red-bean.com.

Project Management with Mercurial

The simplest way to manage repositories with Mercurial is to use the hgwebdir.cgi script provided with it. It is a CGI (Common Gateway Interface) script that can be used to publish the repository through a web server, and to provide the same features that hg serve provides. Furthermore, it allows push commands to be performed in a safe way, by configuring a password file to restrict this command usage.

Note

CGI is robust and simple to set up, but not the fastest way to publish a repository. Some other solutions based on fastcgi or mod_wsgi are available.

Configuring such a system is not hard, but can rely on platform-specific parts. So a generic installation tutorial is impossible to provide. This section will rather focus on how to set up everything on a Linux Debian Sarge and Apache 2 platform, which is quite common.

The steps to install such a server are:

  • Setting up a dedicated folder

  • Configuring hgwebdir

  • Configuring Apache

  • Setting up authorizations

Setting Up a Dedicated Folder

The multiple repository approach we described earlier is quite simple to set up with Mercurial, since one repository corresponds to one system folder. A repositories folder can be created to hold all repositories, and located in a folder dedicated to the project. This project folder can be located in a home folder. The user mercurial can be used for that matter.

Let's create a Mercurial environment for Atomisator:

$ sudo adduser mercurial
$ sudo su mercurial
$ cd
$ mkdir atomisator
$ mkdir atomisator/repositories
$ cd atomisator

From there, the stable and unstable repositories can be created with hg:

$ hg init repositories/stable
$ hg init repositories/unstable
$ ls repositories/
unstable stable

Note

Some teams don't use separate repositories, but work on a single repository where they use a named branch to differentiate the stable version from the developments, and do the merges.

See http://www.selenic.com/mercurial/wiki/index.cgi/Branch.

Whenever a release is created, a new repository is added by cloning the stable one. For instance, if version 0.1 is released, it will be done like this:

$ hg clone repositories/stable repositories/release-0.1

Let's add the Atomisator code in the unstable repository, by copying the buildout and the packages folder that we created in the last chapter in the unstable folder, and checking in them. The unstable folder should look like this after it is done:

$ ls repositories/unstable
buildout packages


Note

Releasing will be covered in the next chapter.

Configuring hgwebdir

To serve these repositories, the hgwebdir.cgi file has to be added in the atomisator folder. This script is provided with your installation. If you cannot find it, you can get it on the Mercurial website by downloading a source distribution. But make sure you get the file that strictly corresponds to the installed version:

$ hg --version
Mercurial Distributed SCM (version 0.9.4)
$ locate hgwebdir.cgi
/usr/share/doc/mercurial/examples/hgwebdir.cgi
$ cp /usr/share/doc/mercurial/examples/hgwebdir.cgi .

This script works with a configuration file called hgweb.config, which contains the path to the repositories folder:

[collections]
repositories/ = repositories/
[web]
style = gitweb
push_ssl = false

The collections section provides a generic way to point to a folder that contains several repositories. They are visited iteratively by the script.

The web section can be used to set a few options. In our case we can set two of them:

  • style will set the look and feel of web pages, and gitweb is probably the best default. Notice that Mercurial uses templates to render all pages, and that they are all configurable.

  • If push_ssl is true (its default value), users will not be able to use the push command over HTTP.

Configuring Apache

The next step to configure is the web server layer that will execute the CGI script. The simplest way to do it is to provide a configuration file within the atomisator folder that defines a Directory, a ScriptAliasMatch and an AddHandler directive.

Let's add an apache.conf file with this content:

AddHandler cgi-script .cgi
ScriptAliasMatch        ^/hg(.*)        /home/mercurial/atomisator/hgwebdir.cgi$1

<Directory /home/mercurial/atomisator>
  Options ExecCGI FollowSymLinks
  AllowOverride None
  AuthType Basic
  AuthName "Mercurial"
  AuthUserFile /home/mercurial/atomisator/passwords

  <LimitExcept GET>
    Require valid-user
  </LimitExcept>

</Directory>

Notice that:

  • The AddHandler directive might not be necessary with some distributions, but has to be present in Debian Sarge.

  • The ScriptAliasMatch needs mod_alias to be enabled.

  • When a POST occurs, which means the user sends data to the server, an authentication is done using a password file.

Note

If you are not familiar with Apache, take a look at http://httpd.apache.org/docs.

The password file is generated with the htpasswd utility in the atomisator folder:

$ htpasswd -c passwords tarek
New password:
Re-type new password:
Adding password for user tarek
$ htpasswd passwords rob
New password:
Re-type new password:
Adding password for user rob

Note

Under Windows, you might need to add the htpasswd location into PATH manually if not available at the prompt.

Every time a user who is allowed to push into a repository needs to be added, this file can be upgraded with htpasswd.

Lastly, a few steps are required in order to allow the execution of the script, and to make sure that the data is available to the group that is used by the Apache process:

sudo chmod +x /home/mercurial/atomisator/hgwebdir.cgi
sudo chown -R mercurial:www-data /home/mercurial/atomisator
sudo chmod -R g+w /home/mercurial/atomisator

To hook the configuration, the file can be added into the site-enabled directory visited by Apache:

$ sudo ln -s /home/mercurial/atomisator/apache.conf /etc/apache2/sites-enabled/007-atomisator
$ sudo apache2ctl restart

After Apache is restarted, the page should be reachable at http://localhost/hg, as in the following screenshot:

Note

Notice that each repository comes with an RSS feed that people can use to keep track of the changes. Every time someone pushes a file, a new entry is added in the RSS feed, with a link to the change log. This change log will display a detailed log together with a different view.

If you need to virtual-host your Mercurial repository, you will need to add a specific rewrite rule that will serve the static files used by hgwebdir, such as the style sheet.

This is the apache.conf file used to publish the book repository (that contains source code from examples) on the Web, which corresponds to http://hg-atomisator.ziade.org/:

<VirtualHost *:80>
  ServerName hg-atomisator.ziade.org
  CustomLog /home/mercurial/atomisator/access_log combined
  ErrorLog  /home/mercurial/atomisator/error.org.log

  AddHandler cgi-script .cgi
  RewriteEngine On

  DocumentRoot /home/mercurial/atomisator
  ScriptAliasMatch ^/(.*) /home/mercurial/atomisator/hgwebdir.cgi/$1

  <Directory /home/mercurial/atomisator>
    Options ExecCGI FollowSymLinks
    AllowOverride None
    AuthType Basic
    AuthName "Mercurial"
    AuthUserFile /home/mercurial/atomisator/passwords
    <LimitExcept GET>
      Require valid-user
    </LimitExcept>
  </Directory>

</VirtualHost>

Each repository can be reached from this front page. To make all pages use the same style, an hgrc file has to be added in each repository, in the .hg configuration directory. This file can define a web section like the main CGI file uses:

$ more repositories/stable/.hg/hgrc
[web]
style = gitweb
description = Stable branch
contact = Tarek <tarek@ziade.org>

The description and contact fields will be used in the web pages as well.

Setting Up Authorizations

We have seen that a global access file filters the people that are allowed to push. This is a first level of authorization, as we need to define the push policy for each repository. The strategy we defined earlier was:

  • Let all registered developers be allowed to push in the unstable repository.

  • Leave the stable repository in read-only access for everyone, except the release manager.

This can be set with the allow_push parameter in the hgrc file for each repository. If the user tarek is the release manager, the stable hgrc file will look like this:

$ more repositories/stable/.hg/hgrc
[web]
style = gitweb
description = Stable branch
contact = Tarek <tarek@ziade.org>
push_ssl = false
allow_push = tarek

Notice that push_ssl has been added in order to push through HTTP. The hgrc file for the unstable repository will look like this:

$ more repositories/unstable/.hg/hgrc
[web]
style = gitweb
description = Unstable branch
contact = Tarek <tarek@ziade.org>
push_ssl = false
allow_push = *

This means that everyone is allowed to push in this repository, as long as they are added to the password file.

Note

In this book, an SSL configuration was not set for the sake of simplicity, but should be used in a real server for more secure transactions. For instance, in our configuration, HTTP allows sniffing.

Setting Up the Client Side

To avoid authentication prompts, and to provide a human-readable name in the commit logs, a .hgrc file can be added in the HOME directory on the client side:

[ui]
username = Tarek Ziade
[paths]
default = http://tarek:secret@atomisator.ziade.org/hg/unstable
unstable = http://tarek:secret@atomisator .ziade.org/hg/unstable
stable = http://tarek:secret@atomisator.ziade.org/hg/stable

The ui part gives the server the full name of the committer, and the paths part a list of the repository URLs. Notice that here we put the user name and the password in the URL, which prevents prompting every time a push is done. This is not safe at all, and a password prompt would be safer. However, the safest way would be to work with the server through the SSH protocol instead of using a web server.

With this file, pushes can be done like this:

$ hg push # will push to the default repository (unstable)
$ hg push stable # will push to stable
$ hg push unstable # will push to unstable

Note

If you need to install it on another platform, the steps will not differ a lot. This page will help you out on platform specifics: http://www.selenic.com/mercurial/wiki/index.cgi/HgWebDirStepByStep.

Continuous Integration

Setting up a repository is the first step towards continuous integration, which is a set of software practices that have emerged from eXtreme Programming (XP). The principles are clearly described on Wikipedia ( http://en.wikipedia.org/wiki/Continuous_integration#The_Practices), and define a way to make sure the software is easy to build, test, and deliver.

Let's summarize these practices in our egg-based application environment, using zc.buildout and Mercurial:

  • Maintain a code repository: This is done by Mercurial.

  • Automate the build: zc.buildout fulfills this need, as we have seen in the previous chapter.

  • Make your build self-testing: zc.buildout provides a way to launch a test campaign over the whole software.

  • Everyone commits everyday: Mercurial provides the tool for the developers to commit changes often. But this is more a developer behavior. People should commit as often as possible, as long as it doesn't break the build.

  • Every commit should be built: Every time a change is made, the software should be built again and all tests run to make sure there are no regressions introduced. If such a problem occurs, a mail should be sent to warn the developers. This is not yet covered in this chapter.

  • Keep the build fast: This is not a real problem for Python applications, since the compilation step is not needed most of the time. In any case, when the software is built two times in a row, the second pass should be way faster.

  • Test in a staging environment that is a clone of the production environment: It is important to be able to test the software on all production environments. This is not yet covered in this chapter.

  • Make it easy to get the latest deliverables: zc.buildout provides a simple way to bundle the deliverables in archives.

  • Everyone can see the result of the latest build: The system should provide feedback on builds. This is not yet covered in this chapter.

Using these practices raises the code quality through early discovery of problems, even if those problems are related to the code or are specific to a target platform.

Furthermore, having an automated system to build and re-launch tests makes the developer's life easier, since they will not have to re-launch an exhaustive set of tests.

Finally, such rules will make the developers more responsible on what they commit. Checking in broken code will generate a feedback seen by everyone.

The only parts that are not yet covered in our environment are:

  • Building the system on every commit

  • Building the system on target systems

  • Providing feedback on the latest builds

This can be covered with Buildbot, a software that automates builds.

Buildbot

Buildbot ( http://buildbot.net/trac) is software written in Python that automates the compile and test cycles for any kind of software projects. It is configurable in a way that every change made on a source code repository generates some builds and launches some tests, and then provides some feedback:

This tool is used, for instance, by Python for the core development, and can be seen at http://www.python.org/dev/buildbot/stable/ (don't forget the last "/").

Each column corresponds to a build composed of steps and is associated with some build slaves. The whole system is driven by the build master:

  • The build master centralizes and drives everything.

  • A build is a sequence of steps used to build an application and run tests over it.

  • A step is an atomic command, for example:

    • Check out the files of a project.

    • Build the application.

    • Run tests.

A build slave is a machine that is in charge of running a build. It can be located anywhere as long as it can reach the build master.

Installing Buildbot

Buildbot installation is mainly based on installing a series of required software, and on creating a Python script to configure Buildbot. This is described in the User Manual available online at http://buildbot.net/trac/wiki/UserManual.

Another option is to use the collective.buildbot project, which provides a zc.buildout-based configuration tool. In other words, it makes possible the defining of a Buildbot in a configuration file, without having to take care of either installing all required software, or writing any Python script.

Let's create such a buildout in our server environment, besides the repositories in a dedicated folder:

$ cd /home/mercurial/atomisator
$ mkdir buildbot
$ cd buildbot
$ wget http://ziade.org/bootstrap.py

A buildout.cfg file is then added in the buildbot folder with this content:

[buildout]

parts =
    buildmaster
    linux
    atomisator

[buildmaster]
recipe = collective.buildbot:master 

project-name = Atomisator project buildbot
project-url = http://atomisator.ziade.org

port = 8999
wport = 9000
url = http://atomisator.ziade.org/buildbot

slaves = 
    linux       ty54ddf32


[linux]
recipe = collective.buildbot:slave
host = localhost
port = ${buildmaster:port}
password = ty54ddf32

[atomisator]
recipe = collective.buildbot:project
slave-names = linux
repository=http://hg-atomisator.ziade.org/unstable
vcs = hg

build-sequence = 
    ./build

test-sequence = 
    buildout/bin/nosetests  

email-notification-sender = tarek@ziade.org
email-notification-recipient = tarek@ziade.org

[poller]
recipe = collective.buildbot:poller
repository=http://hg-atomisator.ziade.org/unstable
vcs = hg
user=anonymous

This defines a build master, together with a build slave and an Atomisator project. The project defines a build script to be called and a test sequence that runs the test runner located in the project buildout.

Note

Complementary information on options can be found at PyPI:

http://pypi.python.org/pypi/collective.buildbot

The build script referenced in the build-sequence is a script that has to be added in the root of the repository, with this content:

#!/bin/sh
cd buildout
python bootstrap.py
bin/buildout -v

Do not forget to set the execution flag before it is pushed:

$ chmod +x build
$ hg add build
$ hg commit -m "added build script"

From there let's run the buildout:

$ python bootstrap.py
$ bin/buildout -v

Note

bootstrap.py is a small script that makes sure your system meets the requirements to build the buildbot.

You should get two scripts in the bin folder: one that launches the build master and one for the build slave.

They are named with the buildout sections; and now let's run them:

$ bin/buildmaster.py start
Following twistd.log until startup finished..
2008-04-03 16:06:49+0200 [-] Log opened.
...
2008-04-03 16:06:50+0200 [-] configuration update complete
The buildmaster appears to have (re)started correctly.
$ bin/linux.py start
Following twistd.log until startup finished..
The buildslave appears to have (re)started correctly.

From there, you should be able to reach the Buildbot in your web browser at http://localhost:9000, and force a build by clicking on the atomisator link to control everything.

Note

There's a good Buildbot manual available online here: http://buildbot.net/repos/release/docs/buildbot.html

Hooking Buildbot and Mercurial

There's one more step to finish this setup: that is hooking the repository commit events with the Buildbot, so it is automatically rebuilt every time someone pushes a file. This is done by the hgbuildbot.py script that comes with Buildbot.

To make it available as a command, simply run an easy_install over pbp.buildbotenv. That will install the script and make sure Buildbot and Twisted are installed as well:

$ easy_install pbp.buildbotenv

The hook is added in the unstable hgrc file in the .hg folder at /path/to/unstable/.hg/hgrc:

[web]
style = gitweb
description = Unstable branch
contact = Tarek <tarek@ziade.org>
push_ssl = false
allow_push = *
[hooks]
changegroup.buildbot = python:buildbot.changes.hgbuildbot.hook
[hgbuildbot]
master = atomisator.ziade.org:8999

The hooks section links the hgbuildbot script, and the hgbuildbot section defines where the master server and the slave port are located.

Hooking Apache and Buildbot

From there, a rewrite rule can be added in Apache, to make the Buildbot available without calling the specific 9000 port.

The simplest way is to create a specific virtual host for it and add it into your Apache configuration file collection:

<VirtualHost *:80>
  ServerName atomisator-buildbot.ziade.org
  CustomLog /var/log/apache2/bot-access_log combined
  ErrorLog  /var/log/apache2/bot-error.org.log

  RewriteEngine On
  RewriteRule ^(.*) http://localhost:9000/$1
</VirtualHost>

Summary

We have learned the following things in this chapter:

  • The difference between the centralized and distributed version control systems

  • How to use Mercurial, which is a great distributed version control system

  • How to set up and use a multiple repository strategy

  • What continuous integration is

  • How to set up Buildbot together with Mercurial, in order to provide continuous integration

The next chapter will explain how to manage the software life-cycle using an iterative and incremental approach.