Chapter 18. Packages – The Quick Python Book

Chapter 18. Packages

This chapter covers

  • Defining a package
  • Creating a simple package
  • Exploring a concrete example
  • Using the __all__ attribute
  • Using packages properly

Modules make reusing small chunks of code easy. The problem comes when the project grows and the code you want to reload outgrows, either physically or logically, what would fit into a single file. If having one giant module file is an unsatisfactory solution, having a host of little unconnected modules isn’t much better. The answer to this problem is to combine related modules into a package.

18.1. What is a package?

A module is a file containing code. A module defines a group of usually related Python functions or other objects. The name of the module is derived from the name of the file.

When you understand modules, packages are easy, because a package is a directory containing code and possibly further subdirectories. A package contains a group of usually related code files (modules). The name of the package is derived from the name of the main package directory.

Packages are a natural extension of the module concept and are designed to handle very large projects. Just as modules group related functions, classes, and variables, packages group related modules.

18.2. A first example

To see how this might work in practice, let’s sketch a design layout for a type of project that by nature is very large—a generalized mathematics package, along the lines of Mathematica, Maple, or MATLAB. Maple, for example, consists of thousands of files, and some sort of hierarchical structure is vital to keeping such a project ordered. We’ll call our project as a whole mathproj.

We can organize such a project in many ways, but a reasonable design splits the project into two parts: ui, consisting of the user interface elements, and comp, the computational elements. Within comp, it may make sense to further segment the computational aspect into symbolic (real and complex symbolic computation, such as high school algebra) and numeric (real and complex numerical computation, such as numerical integration). It may then make sense to have a file in both the symbolic and numeric parts of the project.

The file in the numeric part of the project defines pi as

pi = 3.141592

whereas the file in the symbolic part of the project defines pi as

class PiClass:
def __str__(self):
return "PI"
pi = PiClass()

This means that a name like pi can be used in (and imported from) two different files named, as shown in figure 18.1.

Figure 18.1. Organizing a math package

The symbolic file defines pi as an abstract Python object, the sole instance of the PiClass class. As the system is developed, various operations can be implemented in this class, which return symbolic rather than numeric results.

There is a natural mapping from this design structure to a directory structure. The top-level directory of the project, called mathproj, contains subdirectories ui and comp; comp in turn contains subdirectories symbolic and numeric; and each of symbolic and numeric contains its own file.

Given this directory structure, and assuming that the root mathproj directory is installed somewhere in the Python search path, Python code both inside and outside the mathproj package can access the two variants of pi as mathproj.symbolic.constants.pi and mathproj.numeric.constants.pi. In other words, the Python name for an item in the package is a reflection of the directory pathname to the file containing that item.

That’s what packages are all about. They’re ways of organizing very large collections of Python code into coherent wholes, by allowing the code to be split among different files and directories and imposing a module/submodule naming scheme based on the directory structure of the package files. Unfortunately, all isn’t this simple in practice because details intrude to make their use more complex than their theory. The practical aspects of packages are the basis for the remainder of this chapter.

18.3. A concrete example

The rest of this chapter will use a running example to illustrate the inner workings of the package mechanism (see figure 18.2). Filenames and paths are shown in plain text, to avoid confusion as to whether we’re talking about a file/directory or the module/package defined by that file/directory. The files we’ll be using in our example package are shown in listings 18.1 through 18.6.

Figure 18.2. Example package

Listing 18.1. File mathproj/
print("Hello from mathproj init")
__all__ = ['comp']
version = 1.03
Listing 18.2. File mathproj/comp/
__all__ = ['c1']
print("Hello from mathproj.comp init")

Listing 18.3. File mathproj/comp/
x = 1.00
Listing 18.4. File mathproj/comp/numeric/
print("Hello from numeric init")
Listing 18.5. File mathproj/comp/numeric/
from mathproj import version
from mathproj.comp import c1
from mathproj.comp.numeric.n2 import h
def g():
print("version is", version)
Listing 18.6. File mathproj/comp/numeric/
def h():
return "Called function h in module n2"

For the purposes of the examples in this chapter, we’ll assume that you’ve created these files in a mathproj directory that’s on the Python search path. (It’s sufficient to ensure that the current working directory for Python is the directory containing mathproj when executing these examples.)



In most of the examples in this book, it’s not necessary to start up a new Python shell for each example. You can usually execute them in a Python shell you’ve used for previous examples and still get the results shown. This isn’t true for the examples in this chapter, because the Python namespace must be clean (unmodified by previous import statements) for the examples to work properly. If you do run the examples that follow, please ensure that you run each separate example in its own shell. In IDLE, this requires exiting and restarting the program, not just closing and reopening its Shell window.


18.3.1. Basic use of the mathproj package

Before getting into the details of packages, let’s look at accessing items contained in the mathproj package. Start up a new Python shell, and do the following:

>>> import mathproj
Hello from mathproj init

If all goes well, you should get another input prompt and no error messages. As well, the message "Hello from mathproj init" should be printed to the screen, by code in the mathproj/ file. We’ll talk more about files in a bit; for now, all you need to know is that they’re run automatically whenever a package is first loaded.

The mathproj/ file assigns 1.03 to the variable version. version is in the scope of the mathproj package namespace, and after it’s created, you can see it via mathproj, even from outside the mathproj/ file:

>>> mathproj.version

In use, packages can look a lot like modules; they can provide access to objects defined within them via attributes. This isn’t surprising, because packages are a generalization of modules.

18.3.2. Loading subpackages and submodules

Now, let’s start looking at how the various files defined in the mathproj package interact with one another. We’ll do this by invoking the function g defined in the file mathproj/comp/numeric/ The first obvious question is, has this module been loaded? We have already loaded mathproj, but what about its subpackage? Let’s see if it’s known to Python:

>>> mathproj.comp.numeric.n1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'comp'

In other words, loading the top-level module of a package isn’t enough to load all the submodules. This is in keeping with Python’s philosophy that it shouldn’t do things behind your back. Clarity is more important than conciseness.

This is simple enough to overcome. We import the module of interest and then execute the function g in that module:

>>> import mathproj.comp.numeric.n1
Hello from mathproj.comp init
Hello from numeric init
>>> mathproj.comp.numeric.n1.g()
version is 1.03
Called function h in module n2

Notice, however, that the lines beginning with Hello are printed out as a side effect of loading mathproj.comp.numeric.n1. These two lines are printed out by print statements in the files in mathproj/comp and mathproj/comp/numeric. In other words, before Python can import mathproj.comp.numeric.n1, it first has to import mathproj.comp and then mathproj.comp.numeric. Whenever a package is first imported, its associated file is executed, resulting in the Hello lines. To confirm that both mathproj.comp and mathproj.comp.numeric are imported as part of the process of importing mathproj.comp.numeric.n1, we can check to see that mathproj.comp and mathproj.comp.numeric are now known to the Python session:

>>> mathproj.comp
<module 'mathproj.comp' from 'mathproj/comp/'>
>>> mathproj.comp.numeric
<module 'mathproj.comp.numeric' from 'mathproj/comp/numeric/'>

18.3.3. import statements within packages

Files within a package don’t automatically have access to objects defined in other files in the same package. As in outside modules, you must use import statements to explicitly access objects from other package files. To see how this works in practice, look back at the n1 subpackage. The code contained in is

from mathproj import version
from mathproj.comp import c1
from mathproj.comp.numeric.n2 import h
def g():
print "version is", version
print h()

g makes use of both versions from the top-level mathproj package and the function h from the n2 module; hence, the module containing g must import both version and h to make them accessible. We import version as we would in an import statement from outside the mathproj package, by saying from mathproj import version. In this example, we explicitly import h into the code by saying from mathproj.comp.numeric.n2 import h, and this will work in any file—explicit imports of package files are always allowed. But because is in the same directory as, we can also use a relative import by prepending a single dot to the submodule name. In other words, we can say

from .n2 import h

as the third line in, and it works fine.

You can add more dots to move up more levels in the package hierarchy, and you can add module names. Instead of

from mathproj import version
from mathproj.comp import c1
from mathproj.comp.numeric.n2 import h

we could also have written the imports of as

from ... import version
from .. import c1
from . n2 import h

Relative imports can be handy and quick to type, but be aware that they’re relative to the module’s __name__ property. Therefore any module being executed as the main module, and thus having an __name__ of __main__, can’t use relative imports.

18.3.4. files in packages

You’ll have noticed that all the directories in our package—mathproj, mathproj/comp, and mathproj/numeric—contain a file called An file serves two purposes:

  • It’s automatically executed by Python the first time a package or subpackage is loaded. This permits whatever package initialization you may desire. Python requires that a directory contain an file before it can be recognized as a package. This prevents directories containing miscellaneous Python code from being accidentally imported as if they defined a package.
  • The second point is probably the more important. For many packages, you won’t need to put anything in the package’s file—just make sure an empty file is present.

18.4. The __all__ attribute

If you look back at the various files defined in mathproj, you’ll notice that some of them define an attribute called __all__. This has to do with execution of statements of the form from ... import *, and it requires explanation.

Generally speaking, we would hope that if outside code executed the statement from mathproj import *, it would import all nonprivate names from mathproj. In practice, life is more difficult. The primary problem is that some operating systems have an ambiguous definition of case when it comes to filenames. Microsoft Windows 95/98 is particularly bad in this regard, but it isn’t the only villain. Because objects in packages can be defined by files or directories, this leads to ambiguity as to exactly under what name a subpackage might be imported. If we say from mathproj import *, will comp be imported as comp, Comp, or COMP? If we were to rely only on the name as reported by the operating system, the results might be unpredictable.

There’s no good solution to this. It’s an inherent problem caused by poor OS design. As the best possible fix, the __all__ attribute was introduced. If present in an file, __all__ should give a list of strings, defining those names that are to be imported when a from ... import * is executed on that particular package. If __all__ isn’t present, then from ... import * on the given package does nothing. Because case in a text file is always meaningful, the names under which objects are imported isn’t ambiguous, and if the OS thinks that comp is the same as COMP, that’s its problem.

To see this in action, fire up Python again, and try the following:

>>> from mathproj import *
Hello from mathproj init
Hello from mathproj.comp init

The __all__ attribute in mathproj/ contains a single entry, comp, and the import statement imports only comp. It’s easy enough to check that comp is now known to the Python session:

>>> comp
<module 'mathproj.comp' from 'mathproj/comp/'>

But note that there’s no recursive importing of names with a from ... import * statement. The __all__ attribute for the comp package contains c1, but c1 isn’t magically loaded by our from mathproj import * statement:

>>> c1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'c1' is not defined

To insert names from mathproj.comp we must, again, do an explicit import:

>>> from mathproj.comp import c1
>>> c1
<module 'mathproj.comp.c1' from 'mathproj/comp/'>

18.5. Proper use of packages

Most of your packages shouldn’t be as structurally complex as these examples imply. The package mechanism allows wide latitude in the complexity and nesting of your package design. It’s obvious that very complex packages can be built, but it isn’t obvious that they should be built.

Here are a couple of suggestions that are appropriate in most circumstances:

  • Packages shouldn’t use deeply nested directory structures. Except for absolutely huge collections of code, there should be no need to do so. For most packages, a single top-level directory is all that’s needed. A two-level hierarchy should be able to effectively handle all but a few of the rest. As written in the Zen of Python, by Tim Peters (see the appendix), “Flat is better than nested.”
  • Although you can use the __all__ attribute to hide names from from ... import * by not listing those names, this is probably not a good idea, because it’s inconsistent. If you want to hide names, make them private by prefacing them with an underscore.

18.6. Summary

Packages let you create libraries of code that span multiple files and directories, which allows better organization of large collections of code than single modules would allow. Although using packages is useful, you should be wary of nesting directories in your packages more than one or two levels deep, unless you have a very large and complex library.