Chapter 22. Moving from Python 2 to Python 3 – The Quick Python Book

Chapter 22. Moving from Python 2 to Python 3

This chapter covers

  • Using Python 2.6 for preliminary testing
  • Converting using 2to3
  • Testing and common pitfalls
  • Using the same code base for Python 2 and 3

We’ve dealt with only the syntax and features of Python 3.x so far in this book. That was deliberate, because we feel that Python 3.x is a distinct improvement over early versions of Python and it’s where future development should occur. On the other hand, it’s not always possible to make a clean break from the past. For the foreseeable future, many situations will call for dealing with legacy Python 2.x code. This chapter discusses how you can migrate older Python 2.x code to Python 3.x.

22.1. Porting from 2 to 3

Several changes from Python 2.x to 3.x broke compatibility between the two versions. Some of them were obvious but easy to fix, such as the change of the print statement to the print function or the change of the input function to behave the way that the old raw_input function did. Other changes were more subtle, and more posed sneakier problems: the change of strings to be Unicode by default with the addition of a bytes type to hold unencoded byte values, returning iterators and views instead of lists from functions like range and a dictionary’s keys method, differences in handling exceptions, and the addition of sets come to mind, to name a few.

Table 22.1 illustrates some of the representative differences between Python 2.x and 3.x code.

Table 22.1. Python 2.x vs. Python 3.x code

Python 2.x

Python 3.x

raw_input(prompt) input(prompt)
print x (no parentheses) print(x)
str type and unicode type bytes type (sequence of bytes) and str type (all strings are Unicode)
1L (long integer data type) 1 (all ints can be long)
an_iterator.next() next(an_iterator)
try.... except Exception, e try.... except Exception as e
StandardError class Exception class
1/2 -> 0 (integer division) 1/2 -> 0.5, 1//2 -> 0
my_dict.keys() (and many others) return lists my_dict.keys()(and others) return dynamic view that changes if underlying object changes

Although these differences aren’t in themselves that large, taken together they mean that virtually no legacy Python 2.x code can run on Python 3.x unchanged.

22.1.1. Steps in porting from Python 2.x to 3.x

To help you make the transition from 2 to 3, the core developers of Python offer a clear set of steps to follow to migrate code to the new version:

  1. Make sure the existing code has adequate test coverage. The change from Python 2.x to Python 3 isn’t huge, but there are enough subtle incompatibilities that code will need to be tested to make sure it still behaves as it’s supposed to. Complete test coverage isn’t recommended—it’s necessary.
  2. Test the code using Python 2.6 and the -3 command-line switch to find and remove obvious compatibility problems. Running code under Python 2.6 (or later) using the -3 switch will point out some obvious issues that should be fixed before going on.
  3. Run code through the 2to3 conversion tool to create a new version of the code with several fixes automatically applied. The 2to3 tool will automatically fix the straightforward incompatibilities, like print and raw_input, and it will flag some others that it can’t fix.
  4. Run the tests on the new code using Python 3.x, and fix the failures. Running the code’s tests is likely to yield failures. The tests may well fail to run at all without fixing both the code and the tests. Fix and test until all tests pass.

In general, the first step, complete test coverage, is the biggest stumbling block for most projects. But the time you spend adding tests will continue to pay off long after the conversion is complete.

22.2. Testing with Python 2.6 and -3

Python 2.6 and later versions have a -3 command-line switch that will report obvious incompatibilities in your code. In listing 22.1, we consider a small Python 2.x script, with several elements that need to be changed to run correctly on Python 3.

Listing 22.1. File convert2_3.py
""" example of conversion from Python 2.x to Python 3

"""

def write_file(filename, version, data):
try:
outfile = open(filename, "wb")
outfile.write("%s\n" % (version))
outfile.write(data)
outfile.close()

except StandardError, e:
print e

def read_file(filename):
infile = open(filename, "rb")
file_version = infile.readline()
data = infile.read()
major_version = int(file_version[0])
minor_version = int(file_version[2])

if major_version <> 1 or minor_version > 5:
raise Exception, "Wrong file version"

infile.close()
return file_version, data

if __name__ == "__main__":
version = "1.1"
filename = raw_input("Please enter a filename: ")
write_file(filename, version, "this is test data")
print "File created, reading data from file"
new_version, data = read_file(filename)
cents = 73L
quarters = cents / 25
print "%d cents contains %d quarters" % (cents, quarters)

new_dict = {}
if not new_dict.has_key(new_version):
new_dict[new_version] = filename

This code doesn’t do much of anything, but it does contain a lot of fairly common bits of code: getting a string from the user, opening a file, reading from the file, comparing the contents of the file to a string, appending a string, handling errors, and so on. Running this code using Python 2.6 with the -3 switch generates a couple of deprecation warnings about using <> and dict.has_key:

doc@mac:~$ python2.6 -3 convert2_3.py
convert2_3.py:22: DeprecationWarning: <> not supported in 3.x; use !=
if major_version <> 1 or minor_version > 5:
Please enter a filename: testfile
File created, reading data from file
convert2_3.py:35: DeprecationWarning: classic int division

quarters = cents / 25
73 cents contains 2 quarters
convert2_3.py:39: DeprecationWarning: dict.has_key() not supported in 3.x;
use the in operator
if not new_dict.has_key(new_version):

Obviously, some other fixes need to be made. Both print and raw_input need to be changed, and the exception handling needs to be tweaked, but those issues can be taken care of automatically. Although some of the warnings generated by the -3 switch can also be corrected by the conversion tool, it’s recommended that you fix all of its warnings before taking the next step. In this case, that means changing these lines:

  • In the read_file function, the <> should be replaced by !=.
  • cents = 73L needs to become 73, because there is no long an integer type in Python 3.x.
  • cents / 25 needs to be changed to cents // 25 to return an integer number of quarters.
  • if not new_dict.has_key(new_version): needs to be rephrased to if new_version not in new_dict: to eliminate the use of has_key.

After we make those changes, the code should run without warnings.

22.3. Using 2to3 to convert the code

After you’ve fixed any warnings from running the code with the -3 parameter under Python 2.6 or higher, the next step is to run the 2to3 tool on it to produce a version for Python 3. 2to3 will make a number of automatic fixes, things like changing raw_input to input, print to print(), and so on, but it’s unlikely that the converted code will run without error.

The 2to3 tool takes either a file or a directory to convert and generates a patch file of all the changes to be applied to convert to Python 3. If it’s unable to apply a fix for a problem automatically, it prints a warning below the diff for that file. By default, 2to3 runs several fixers and more can be added; running 2to3 –l (that’s a lowercase “el”) will show what fixers are available. If you want to exclude a particular fixer, using the –x option followed by the fixer name will exclude it, as in the following example where the has_key fixer is turned off:

doc@mac:~$ 2to3 –x has_key convert2_3.py

Using the –f option turns on fixers explicitly, with all enabling all the default fixers, whereas using the fixer name enables only the fixer mentioned. If we want to enable only the has_key fixer, we use

doc@mac:~$ 2to3 –f has_key convert2_3.py

And if we want to enable all the default fixers and the apply fixer, the command is

doc@mac:~$ 2to3 –f all –f apply convert2_3.py

Running 2to3 on our sample file gives us the diff in listing 22.2.

Listing 22.2. File convert2_3.diff
--- convert2_3.py (original)
+++ convert2_3.py (refactored)
@@ -9,8 +9,8 @@
outfile.write(data)
outfile.close()

- except StandardError, e:
- print e
+ except Exception as e:
+ print(e)

def read_file(filename):
infile = file(filename, "rb")
@@ -19,24 +19,24 @@
major_version = int(file_version[0])
minor_version = int(file_version[2])

- if major_version <> 1 or minor_version > 5:
- raise Exception, "Wrong file version"
+ if major_version != 1 or minor_version > 5:
+ raise Exception("Wrong file version")

infile.close()
return file_version, data

if __name__ == "__main__":
version = "1.1"
- filename = raw_input("Please enter a filename: ")
+ filename = input("Please enter a filename: ")
write_file(filename, version, "this is test data")
- print "File created, reading data from file"
+ print("File created, reading data from file")
new_version, data = read_file(filename)
- cents = 73L
+ cents = 73
quarters = cents / 25
- print "%s cents contains %s quarters" % (cents, quarters)
+ print("%s cents contains %s quarters" % (cents, quarters))

new_dict = {}
- if not new_dict.has_key(new_version):
+ if new_version not in new_dict:
new_dict[new_version] = filename

We can use a patch tool to apply the changes to our source, or we can use 2to3’s –w option to automatically write the changes back to the file while creating a backup copy of the original.

22.4. Testing and common problems

As mentioned previously, there’s little chance that the automatic conversions performed by the 2to3 program will be enough for the code to run correctly. In particular, places where the old code handled strings are likely to have problems, because Python 2.x doesn’t make a distinction between strings and sequences of raw bytes, and it’s next to impossible for any automated conversion tool to consistently make that distinction on its own.

Listing 22.3 is our converted version of listing 22.2, with notes indicating all of the problems that the automatic conversion didn’t fix.

Listing 22.3. File convert2_3_converted.py

file_version is a bytes object, so int(file_version[0]) yields 49, not 1 . Division returns an int in Python 2.x but a float in Python 3.x, so quarters is now 2.92 instead of 2 .

When we attempt to run this file with Python 3, it doesn’t run without errors, mostly because of the difference between bytes and strings. As indicated in the annotations, several errors aren’t caught by the conversion. Even when these errors are fixed and the code runs, it doesn’t behave correctly, with both the conversion of a single element of a bytes object and division behaving differently between the two versions. If these problems occur in such a small script, you can imagine the possibilities for error in a larger application. The moral is pretty clear: if you’re migrating code to Python 3.x, you must have good test coverage, and you must test extensively.

22.5. Using the same code for 2 and 3

The core developers don’t expect that the same code base can be used on both Python 2.x and 3.x. Although Python is flexible enough that in some cases it may be possible to create code that can run on both Python 2.x and 3.x, it isn’t recommended.

It’s possible to import several features of Python 3, like division, into Python 2.x by using the __future__ module; but even if you imported all the features of the __future__ library, the differences in library structure, the distinction between strings and Unicode, and so on all would make for code that would be hard to maintain and debug.

22.5.1. Using Python 2.5 or earlier

If attempting to use the same code for version 3.x and 2.6 and higher isn’t recommended, trying to do the same with any earlier version of Python borders on insanity. The differences between earlier versions of Python and Python 3 are large enough that you’d spend more time making the code run than in designing and implementing an application.

22.5.2. Writing for Python 3.x and converting back

Another way of using one code base for both platforms may be to go in the opposite direction and write code for Python 3.x and convert it back to Python 2.x. As this book is being written, a 3to2.py tool is being developed, with the idea that it will be both easier and more reliable to convert back to 2.x, possibly even as part of the install process, if needed. Although this tool isn’t yet complete, it shows great promise for the future.

22.6. Summary

The incompatibility between Python 3 and previous versions is a major consideration when you’re dealing with legacy code. Python provides both procedures and tools for moving code from Python 2.x to Python 3, using the warnings available in Python 2.6 and higher and the 2to3 conversion tool, followed by ample testing. Migrating the code Python 3 is probably the ideal long-term solution, but only if the code has adequate test coverage to ensure the new version functions properly.