2. Syntax Best PracticesBelow the Class Level – Expert Python Programming

Chapter 2. Syntax Best PracticesBelow the Class Level

The ability to write an efficient syntax comes naturally with time. If you take a look back at your first program, you will probably agree with this. The right syntax will appear to your eyes as a good-looking piece of code, and the wrong syntax as something disturbing.

Besides the algorithms that are implemented and the architectural thought for your program, taking great care over how it is written weighs heavily on how it will evolve. Many programs are ditched and rewritten from scratch because of their obtuse syntax, unclear APIs, or unconventional standards.

But Python has evolved a lot in the last few years. So if you were kidnapped for a while by your neighbor (a jealous guy from the local Ruby developers' user group) and kept away from the news, you will probably be astonished by its new features. From the earliest version to the current one (2.6 at this time), a lot of enhancements have been made to make the language clearer, cleaner, and easier to write. Python basics have not changed drastically, but the tools to play with them are now a lot more ergonomic.

This chapter presents the most important elements of modern syntax, and the tips on their usage:

  • List comprehensions

  • Iterators and generators

  • Descriptors and properties

  • Decorators

  • with and contextlib

Note

Code performance tips such as for speed improvement or memory usage are covered in Chapter 12.

If you need a reminder on the Python syntax throughout the chapter, the three elements from the official documentation that you can refer to are:

List Comprehensions

As you probably know, writing a piece of code such as this is painful:

>>> numbers = range(10)
>>> size = len(numbers)
>>> evens = []
>>> i = 0
>>> while i < size:
...     if i % 2 == 0:
...         evens.append(i)
...     i += 1
... 
>>> evens
[0, 2, 4, 6, 8]

This may work for C, but it actually makes things slower for Python because:

  • It makes the interpreter work on each loop to determine what part of the sequence has to be changed.

  • It makes you keep a counter to track what element has to be treated.

A list comprehension is the correct answer to this pattern. It uses wired features that automate parts of the previous syntax:

>>> [i for i in range(10) if i % 2 == 0]
[0, 2, 4, 6, 8]

Besides the fact that this writing is more efficient, it is way shorter and involves fewer elements. In a bigger program, this means less bugs and code that is easy to read and understand.

Another typical example of a Pythonic syntax is the usage of enumerate. This built-in function provides a convenient way to get an index when a sequence is used in a loop. For example, this piece of code:

>>> i = 0
>>> seq = ["one", "two", "three"]
>>> for element in seq:
...     seq[i] = '%d: %s' % (i, seq[i])
...     i += 1
... 
>>> seq
['0: one', '1: two', '2: three']

can be replaced by the following shorter code:

>>> seq = ["one", "two", "three"]
>>> for i, element in enumerate(seq):
...     seq[i] = '%d: %s' % (i, seq[i])
... 
>>> seq
['0: one', '1: two', '2: three']

and then refactored in a list comprehension like this:

>>> def _treatment(pos, element):
...     return '%d: %s' % (pos, element)
... 
>>> seq = ["one", "two", "three"]
>>> [_treatment(i, el) for i, el in enumerate(seq)]
['0: one', '1: two', '2: three']

This last version is also making it easy to vectorize the code, by sharing small functions that work over a single item of a sequence.

Note

What does a Pythonic syntax mean?

A Pythonic syntax is a syntax that uses the most efficient idioms for the small code patterns. This word can also apply to high-level matters such as libraries. In that case, the library will be considered Pythonic if it plays well with the Pythonic idioms. This term is used sometimes in the community to classify pieces of code, and a tentative definition can be found here: http://faassen.ntree.net/blog/view/weblog/2005/08/06/0.

Note

Every time a loop is run to massage the contents of a sequence, try to replace it with a list comprehension.

Iterators and Generators

An iterator is nothing more than a container object that implements the iterator protocol. It is based on two methods:

  • next, which returns the next item of the container

  • __iter__, which returns the iterator itself

Iterators can be created with a sequence using the iter built-in function, for example:

>>> i = iter('abc')
>>> i.next()
'a'
>>> i.next()
'b'
>>> i.next()
'c'
>>> i.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

When the sequence is exhausted, a StopIteration exception is raised. It makes iterators compatible with loops since they catch this exception to stop cycling. To create a custom iterator, a class with a next method can be written, as long as it provides the special method __iter__ that returns an instance of the iterator:

>>> class MyIterator(object):
...     def __init__(self, step):
...         self.step = step
...     def next(self):
...         """Returns the next element."""
...         if self.step == 0:
...             raise StopIteration
...         self.step -= 1
...         return self.step
...     def __iter__(self):
...         """Returns the iterator itself."""
...         return self
... 
>>> for el in MyIterator(4):
...     print el
... 
3
2
1
0

Iterators themselves are a low-level feature and concept, and a program can live without them. But they provide the base for a much more interesting feature: generators.

Generators

Since Python 2.2, generators provide an elegant way to write simple and efficient code for functions that return a list of elements. Based on the yield directive, they allow you to pause a function and return an intermediate result. The function saves its execution context and can be resumed later if necessary.

For example (this is the example provided in the PEP about iterators), the Fibonacci series can be written with an iterator:

>>> def fibonacci():
...     a, b = 0, 1
...     while True:
...         yield b
...         a, b = b, a + b
... 
>>> fib = fibonnaci()
>>> fib.next()
1
>>> fib.next()
1
>>> fib.next()
2
>>> [fib.next() for i in range(10)] 
[3, 5, 8, 13, 21, 34, 55, 89, 144, 233]

This function returns a generator object, a special iterator, which knows how to save the execution context. It can be called indefinitely, yielding the next element of the suite each time. The syntax is concise, and the infinite nature of the algorithm does not disturb the readability of the code anymore. It does not have to provide a way to make the function stoppable. In fact, it looks similar to how the series would be designed in pseudo-code.

Note

A PEP is a Python Enhancement Proposal. It is a paper written to make a change on Python, and a start-point for the community to discuss it. See PEP 1 for further information: http://www.python.org/dev/peps/pep-0001

In the community, generators are not used so often because the developers are not used to thinking this way. The developers have been used to working with straight functions for years. generators should be considered every time you deal with a function that returns a sequence or works in a loop. Returning the elements one at a time can improve the overall performance, when they are passed to another function for further work.

In that case, the resources used to work out one element are most of the time less important than the resources used for the whole process. Therefore, they can be kept low, making the program more efficient. For instance, the Fibonacci sequence is infinite, and yet the generator that generates it does not require an infinite amount of memory to provide the values one at a time. A common use case is to stream data buffers with generators. They can be paused, resumed, and stopped by third-party code that plays over the data, and all the data need not be loaded before starting the process.

The tokenize module from the standard library, for instance, generates tokens out of a stream of text and returns an iterator for each treated line, that can be passed along to some processing:

>>> import tokenize
>>> reader = open('amina.py').next
>>> tokens = tokenize.generate_tokens(reader)
>>> tokens.next()
(1, 'from', (1, 0), (1, 4), 'from amina.quality import similarities\n')
>>> tokens.next()
(1, 'amina', (1, 5), (1, 10), 'from amina.quality import similarities\n')
>>> tokens.next()

Here we see that open iterates over the lines of the file and generate_tokens iterates over them in a pipeline, doing additional work.

generators can also help in breaking the complexity, and raising the efficiency of some data transformation algorithms that are based on several suites. Thinking of each suite as an iterator, and then combining them into a high-level function is a great way to avoid a big, ugly, and unreadable function. Moreover, this can provide a live feedback to the whole processing chain.

In the example below, each function defines a transformation over a sequence. They are then chained and applied. Each call processes one element and returns its result:

>>> def power(values):
...     for value in values:
...         print 'powering %s' % value
...         yield value
... 
>>> def adder(values):
...     for value in values:
...         print 'adding to %s' % value
...         if value % 2 == 0:
...             yield value + 3
...         else:
...             yield value + 2
... 
>>> elements = [1, 4, 7, 9, 12, 19]
>>> res = adder(power(elements))
>>> res.next()
powering 1
adding to 1
3
>>> res.next()
powering 4
adding to 4
7
>>> res.next()
powering 7
adding to 7
9

Note

Keep the code simple, not the data:

It is better to have a lot of simple iterable functions that work over sequences of values than a complex function that computes the result for one value at a time.

The last feature introduced in Python regarding generators is the ability to interact with the code called with the next method. yield becomes an expression, and a value can be passed along with a new method called send:

>>> def psychologist():
...     print 'Please tell me your problems'
...     while True:
...         answer = (yield)
...         if answer is not None:
...             if answer.endswith('?'):
...                 print ("Don't ask yourself " 
...                        "too much questions")
...             elif 'good' in answer:
...                 print "A that's good, go on"
...             elif 'bad' in answer:
...                 print "Don't be so negative"
... 
>>> free = psychologist()
>>> free.next()
Please tell me your problems
>>> free.send('I feel bad')
Don't be so negative
>>> free.send("Why I shouldn't ?")
Don't ask yourself too much questions
>>> free.send("ok then i should find what is good for me")
A that's good, go on

send acts like next, but makes yield return the value passed. The function can, therefore, change its behavior depending on the client code. Two other functions were added to complete this behavior: throw and close. They raise an error into the generator:

  • throw allows the client code to send any kind of exception to be raised.

  • close acts in the same way, but raises a specific exception: GeneratorExit. In that case, the generator function must raise GeneratorExit again, or StopIteration.

Therefore, a typical template for a generator would look like the following:

>>> def my_generator():
...     try:
...         yield 'something' 
...     except ValueError:
...         yield 'dealing with the exception'
...     finally:
...         print "ok let's clean"
... 
>>> gen = my_generator()
>>> gen.next()
'something'
>>> gen.throw(ValueError('mean mean mean'))
'dealing with the exception'
>>> gen.close()
ok let's clean
>>> gen.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

The finally section, which was not allowed on previous versions, will catch any close call or throw call that is not caught, and is the recommended way to do some cleanup. The GeneratorExit exception must not be caught in the generator because it is used by the compiler to make sure it exits cleanly, when close is called. If some code is associated with this exception, the interpreter will raise a system error and quit.

These three new methods make it possible to use generators to write coroutines.

Coroutines

A coroutine is a function that can be suspended and resumed, and can have multiple entry points. Some languages provide this feature natively such as Io ( http://iolanguage.com) or Lua ( http://www.lua.org). They allow the implementation of cooperative multitasking and pipelines. For example, each coroutine consumes or produces data, then pauses until other data are passed along.

Threading is an alternative to coroutines in Python. It can be used to run an interaction between pieces of code. But they need to take care of resource locking since they behave in a pre-emptive manner, whereas coroutines don't. Such code can become fairly complex to create and debug. Though generators are almost coroutines, the addition of send, throw, and close was originally meant to provide a coroutine-like feature to the language.

PEP 342 ( http://www.python.org/dev/peps/pep-0342) that initiated the new behavior of generators also provides a full example on how to create a coroutine scheduler. The pattern is called Trampoline, and can be seen as a mediator between coroutines that produce and consume data. It works with a queue where coroutines are wired together.

The multitask module available at PyPI (install it with easy_install multitask) implements this pattern and can be used straightforwardly:

>>> import multitask
>>> import time
>>> def coroutine_1():
...     for i in range(3):
...         print 'c1'
...         yield i
... 
>>> def coroutine_2():
...     for i in range(3):
...         print 'c2'
...         yield i
... 
>>> multitask.add(coroutine_1())
>>> multitask.add(coroutine_2())
>>> multitask.run()
c1
c2
c1
c2
c1
c2

A classical example of cooperative work between coroutines is a server application that receives queries from multiple clients, and delegates each one to a new thread that responds to it. Implementing this pattern with coroutines is a matter of writing a coroutine (server) that is in charge of receiving queries, and another one (handler) for treating them. The first coroutine places a new handler call for each request in the trampoline.

The multitask package adds good APIs to play with sockets, and an echo server, for example, is done straightforward with it:

from __future__ import with_statement
from contextlib import closing
import socket
import multitask

def client_handler(sock):
    with closing(sock):
        while True:
            data = (yield multitask.recv(sock, 1024))
            if not data:
                break
            yield multitask.send(sock, data)


def echo_server(hostname, port):
    addrinfo = socket.getaddrinfo(hostname, port, 
												  socket.AF_UNSPEC,
                                  socket.SOCK_STREAM)
    (family, socktype, proto, 
     canonname, sockaddr) = addrinfo[0]

    with closing(socket.socket(family,
                               socktype, 
                               proto)) as sock:
        sock.setsockopt(socket.SOL_SOCKET, 
                        socket.SO_REUSEADDR, 1)
        sock.bind(sockaddr)
        sock.listen(5)
        while True:
            multitask.add(client_handler((
                     yield multitask.accept(sock))[0]))

if __name__ == '__main__':
    import sys

    hostname = None
    port = 1111

    if len(sys.argv) > 1:
        hostname = sys.argv[1]
    if len(sys.argv) > 2:
        port = int(sys.argv[2])

    multitask.add(echo_server(hostname, port))
    try:
        multitask.run()
    except KeyboardInterrupt:
        pass

Note

contextlib is discussed a bit later in this chapter.

Note

Another coroutine implementation:

greenlet ( http://codespeak.net/py/dist/greenlet.html) is another library that provides a good implementation of coroutines for Python, among other features.

Generator Expressions

Python provides a shortcut to write simple generators over a sequence. A syntax similar to list comprehensions can be used to replace yield. Parentheses are used instead of brackets:

>>> iter = (x**2 for x in range(10) if x % 2 == 0)
>>> for el in iter:
...     print el
... 
0
4
16
36
64

These kinds of expressions are called generator expressions or genexp. They are used in the way the list comprehensions are used to reduce the size of a sequence of code. They also yield elements one at a time like regular generators do. So the whole sequence is not computed ahead of time as list comprehensions. They should be used every time a simple loop is made on a yield expression, or to replace a list comprehension that can behave as an iterator.

The itertools Module

When iterators were added in Python, a new module was provided to implement common patterns. Since it is written in the C language, it provides the most efficient iterators. itertools covers many patterns, but the most interesting ones are islice, tee, and groupby.

islice: The Window Iterator

islice returns an iterator that works over a subgroup of a sequence. The following example reads the lines in a standard input, and yields the elements of each line starting from the fifth one, as long as the line has more than four elements:

>>> import itertools
>>> def starting_at_five():
...     value = raw_input().strip()
...     while value != '':
...         for el in itertools.islice(value.split(), 
...                                    4, None):
...             yield el
...         value = raw_input().strip()
... 
>>> iter = starting_at_five()
>>> iter.next()
one two three four five six
'five'
>>> iter.next()
'six'
>>> iter.next()
one two
one two three four five six
'five'
>>> iter.next()
'six'
>>> iter.next()
one
one two three four five six seven eight
'five'
>>> iter.next()
'six'
>>> iter.next()
'seven'
>>> iter.next()
'eight'

One can use islice every time to extract data located in a particular position in a stream. This can be a file in a special format using records for instance, or a stream that presents data encapsulated with metadata, like a SOAP envelope, for example. In that case, islice can be seen as a window that slides over each line of data.

tee: The Back and Forth Iterator

An iterator consumes the sequence it works with. There is no turning back. tee provides a pattern to run several iterators over a sequence. This helps us to run over the data again, if provided with the information of the first run. For instance, reading the header of a file can provide information on its nature before running a process over it:

>>> import itertools
>>> def with_head(iterable, headsize=1):
...     a, b = itertools.tee(iterable)
...     return list(itertools.islice(a, headsize)), b
... 
>>> with_head(seq)
([1], <itertools.tee object at 0x100c698>)
>>> with_head(seq, 4)
([1, 2, 3, 4], <itertools.tee object at 0x100c670>)

In this function, if two iterators are generated with tee, then the first one is used with islice to get the first headsize elements of the iteration, and return them as a flat list. The second element returned is a fresh iterator that can be used to perform work over the whole sequence.

groupby: The uniq Iterator

This function works a little like the Unix command uniq. It is able to group the duplicate elements from an iterator, as long as they are adjacent. A function can be given to the function for it to compare the elements. Otherwise, the identity comparison is used.

An example use case for groupby is compressing data with run-length encoding (RLE). Each group of adjacent repeated characters of a string is replaced by the character itself and the number of occurrences. When the character is alone, 1 is used.

For example:

get uuuuuuuuuuuuuuuuuup

will be replaced by:

1g1e1t1 8u1p

Just a few lines are necessary with groupby to obtain RLE:

>>> from itertools import groupby
>>> def compress(data):
...     return ((len(list(group)), name)
...             for name, group in groupby(data))
... 
>>> def decompress(data):
...     return (car * size for size, car in data)
... 
>>> list(compress('get uuuuuuuuuuuuuuuuuup'))
[(1, 'g'), (1, 'e'), (1, 't'), (1, ' '), 
 (18, 'u'), (1, 'p')]
>>> compressed = compress('get uuuuuuuuuuuuuuuuuup')
>>> ''.join(decompress(compressed))
'get uuuuuuuuuuuuuuuuuup'

Note

Compression algorithms:

If you are interested in compression, consider the LZ77 algorithm. It is an enhanced version of RLE that looks for adjacent matching patterns instead of matching characters: http://en.wikipedia.org/wiki/LZ77.

groupby can be used each time a summary has to be done over data. In this matter, the built-in function sorted is very useful to make the similar elements adjacent from data passed to it.

Other Functions

http://docs.python.org/lib/itertools-functions.html will give you an exhaustive list of itertools functions that were not shown in this section. Each of them is presented with its corresponding code in pure Python to understand how it works:

  • chain(*iterables): This makes an iterator that iterates over the first iterable, then proceeds to the next one, and so on.

  • count([n]): This returns an iterator that gives consecutive integers, such as a range. Starts with 0 or with n, when given.

  • cycle(iterable): This iterates over each element of the iterable, and then restarts. This repeats indefinitely.

  • dropwhile(predicate, iterable): This drops each element from the iterable, as long as the predicate returns True. When the predicate returns False, it starts to yield the rest of the elements.

  • ifilter(predicate, iterable): This is similar to the built-in function filter.

  • ifilterfalse(predicate, iterable): This is similar to ifilter, but will iterate on elements when the predicate is False.

  • imap(function, *iterables): This is similar to the built-in function map, but works over several iterables. It stops when the shortest iterable is exhausted.

  • izip(*iterables): This works like zip but returns an iterator.

  • repeat(object[, times]): This returns an iterator that returns object on each call. Run times times or indefinitely when times is not given.

  • starmap(function, iterable): This works like imap but passes the iterable element as a star argument to function. This is helpful when returned elements are tuples that can be passed as arguments to function.

  • takewhile(predicate, iterable): This returns the elements from the iterable, and stops when predicate turns False.

Decorators

Decorators were added in Python 2.4 to make function and method wrapping (a function that receives a function and returns an enhanced one) easier to read and understand. The original use case was to be able to define the methods as class methods or static methods, on the head of their definition. The syntax before the decorators was:

>>> class WhatFor(object):
...     def it(cls):
...         print 'work with %s' % cls
...     it = classmethod(it)
...     def uncommon():
...         print 'I could be a global function'
...     uncommon = staticmethod(uncommon)
... 

This syntax was getting hard to read when the methods were getting big, or several transformations over the methods were done.

The decorator syntax is lighter and easier to understand:

>>> class WhatFor(object):
...     @classmethod
...     def it(cls):
...         print 'work with %s' % cls
...     @staticmethod
...     def uncommon():
...         print 'I could be a global function'
... 
>>> this_is = WhatFor()
>>> this_is.it()
work with <class '__main__.WhatFor'>
>>> this_is.uncommon()
I could be a global function

When the decorators appeared, many developers in the community started to use them because they became an obvious way to implement some patterns. One of the original mail threads on this was initiated by Jim Hugunin, the IronPython lead developer.

The rest of this section presents how to write decorators, and provides a few examples.

How to Write a Decorator

There are many ways to write custom decorators, but the simplest and most readable way is to write a function that returns a sub-function that wraps the original function call.

A generic pattern is:

>>> def mydecorator(function):
...     def _mydecorator(*args, **kw):
...         # do some stuff before the real 
...         # function gets called 
...         res = function(*args, **kw)
...         # do some stuff after
...         return res
...     # returns the sub-function
...     return _mydecorator
... 

It is a good practice to give an explicit name to the sub-function like _mydecorator, instead of a generic name like wrapper, because it will be easier to read tracebacks when an error is raised in the chain: you will know you are dealing with the given decorator.

When arguments are needed for the decorator to work on, a second level of wrapping has to be used.

def mydecorator(arg1, arg2):
    def _mydecorator(function):
        def __mydecorator(*args, **kw):
            # do some stuff before the real 
            # function gets called 
            res = function(*args, **kw)
            # do some stuff after
            return res
        # returns the sub-function
        return __mydecorator
    return _mydecorator

Since decorators are loaded by the interpreter when the module is first read, their usage should be limited to wrappers that can be generically applied. If a decorator is tied to the method's class or to the function's signature it enhances, it should be refactored into a regular callable to avoid complexity. In any case, when the decorators are dealing with APIs, a good practice is to group them in a module that is easy to maintain.

Note

A decorator should focus on arguments that the wrapped function or method receives and returns, and if needed, should limit its introspection work as much as possible.

The common patterns for decorators are:

  • Argument checking

  • Caching

  • Proxy

  • Context provider

Argument checking

Checking the arguments that a function receives or returns can be useful when it is executed in a specific context. For example, if a function is to be called through XML-RPC, Python will not be able to directly provide its full signature as in the statically-typed languages. This feature is needed to provide introspection capabilities, when the XML-RPC client asks for the function signatures.

Note

The XML-RPC protocol:

The XML-RPC protocol is a lightweight Remote Procedure Call protocol that uses XML over HTTP to encode its calls. It is often used instead of SOAP for simple client-server exchanges.

Unlike SOAP, which provides a page that lists all callable functions (WSDL), XML-RPC does not have a directory of available functions. An extension of the protocol that allows discovering the server API was proposed, and Python's xmlrpclib module implements it. (See http://docs.python.org/lib/serverproxy-objects.html .)

A decorator can provide this type of signature, and make sure that what goes in and out respects it:

>>> from itertools import izip
>>> rpc_info = {}
>>> def xmlrpc(in_=(), out=(type(None),)):
...     def _xmlrpc(function):
...         # registering the signature
...         func_name = function.func_name
...         rpc_info[func_name] = (in_, out)
...
...         def _check_types(elements, types):
...             """Subfunction that checks the types."""
...             if len(elements) != len(types):
...                 raise TypeError('argument count is wrong')
...             typed = enumerate(izip(elements, types))
...             for index, couple in typed:
...                 arg, of_the_right_type = couple 
...                 if isinstance(arg, of_the_right_type):
...                     continue
...			         raise TypeError('arg #%d should be %s' % \
...                         (index, of_the_right_type)
...
...         # wrapped function
...         def __xmlrpc(*args):   # no keywords allowed
...             # checking what goes in
...             checkable_args = args[1:]   # removing self
...             _check_types(checkable_args, in_)
...
...             # running the function 
...             res = function(*args)
...
...             # checking what goes out
...             if not type(res) in (tuple, list):
...                 checkable_res = (res,)
...             else:
...                 checkable_res = res
...             _check_types(checkable_res, out)
...
...             # the function and the type
...             # checking succeeded
...             return res
...         return __xmlrpc
...     return _xmlrpc                      
...

The decorator registers the function into a global dictionary, and keeps a list of the types for its arguments and for the returned values. Note that the example was highly simplified to demonstrate argument-checking decorators.

A usage example would be:

>>> class RPCView(object):
...
...     @xmlrpc((int, int))     # two int -> None
...     def meth1(self, int1, int2):
...         print 'received %d and %d' % (int1, int2)
...
...     @xmlrpc((str,), (int,))     # string -> int
...     def meth2(self, phrase):
...         print 'received %s' % phrase
...         return 12
... 

When it is read, this class definition populates the rpc_infos dictionary and can be used in a specific environment, where the argument types are checked:

>>> rpc_infos
{'meth2': ((<type 'str'>,), (<type 'int'>,)), 
 'meth1': ((<type 'int'>, <type 'int'>), 
           (<type 'NoneType'>,))}
>>> my = RPCView()
>>> my.meth1(1, 2)
received 1 and 2
>>> my.meth2(2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 16, in _wrapper
  File "<stdin>", line 11, in _check_types
TypeError: arg #0 should be <type 'str'>

There are many other use cases for argument-checking decorators, such as type enforcement (see http://wiki.python.org/moin/PythonDecoratorLibrary#head-308f2b3507ca91800def19d813348f78db34303e) where you can define several levels of type checking, given a global configuration value:

  • Nothing is checked.

  • The checker just pops warnings.

  • The checker raises TypeError exceptions.

Caching

The caching decorator is quite similar to argument checking, but focuses on those functions whose internal state does not affect the output. Each set of arguments can be linked to a unique result. This style of programming is the characteristic of functional programming (see http://en.wikipedia.org/wiki/Functional_programming), and can be used when the set of input values is finite.

Therefore, a caching decorator can keep the output together with the arguments that were needed to compute it, and return it directly on subsequent calls. This behavior is called memoizing (see http://en.wikipedia.org/wiki/Memoizing), and is quite simple to implement as a decorator:

>>> import time
>>> import hashlib
>>> import pickle
>>> from itertools import chain
>>> cache = {}
>>> def is_obsolete(entry, duration):
...     return time.time() - entry['time']> duration
... 
>>> def compute_key(function, args, kw):
...     key = pickle.dumps((function.func_name, args, kw))
...     return hashlib.sha1(key).hexdigest()
... 
>>> def memoize(duration=10):
...     def _memoize(function):
...         def __memoize(*args, **kw):
...             key = compute_key(function, args, kw)
...
...             # do we have it already ?       
...             if (key in cache and 
...                 not is_obsolete(cache[key], duration)):
...                 print 'we got a winner'
...                 return cache[key]['value']
...
...             # computing
...             result = function(*args, **kw)
...
...             # storing the result
...             cache[key] = {'value': result, 
...                           'time': time.time()}
...             return result
...         return __memoize
...     return _memoize
...  

A SHA hash key is built using the ordered argument values, and the result is stored in a global dictionary. The hash is made using a pickle, which is a bit of a shortcut to freeze the state of all objects passed as arguments, ensuring that all arguments are good candidates. If a thread or a socket is used as an argument, for instance, a PicklingError will occur. (See http://docs.python.org/lib/node318.html.)

The duration parameter is used to invalidate the cached value when too much time has passed since the last function call.

Here's an example of usage:

>>> @memoize()
... def very_very_very_complex_stuff(a, b):
...     # if your computer gets too hot on this calculation
...		# consider stopping it
...     return a + b     
... 
>>> very_very_very_complex_stuff(2, 2)
4
>>> very_very_very_complex_stuff(2, 2)
we got a winner
4
>>> @memoize(1)      # invalidates the cache after 1 second
... def very_very_very_complex_stuff(a, b):
...     return a + b
... 
>>> very_very_very_complex_stuff(2, 2)
4
>>> very_very_very_complex_stuff(2, 2)
we got a winner
4
>>> cache
{'c2727f43c6e39b3694649ee0883234cf': {'value': 4, 'time':  
 1199734132.7102251)}
>>> time.sleep(2)
>>> very_very_very_complex_stuff(2, 2)
4

Notice that the first call used empty parenthesis because of the two-level wrapping.

Caching expensive functions can dramatically increase the overall performance of a program, but it has to be used with care. The cached value could also be tied to the function itself to manage its scope and life cycle, instead of a centralized dictionary. But in any case, a more efficient decorator would use a specialized cache library based on advanced caching algorithms, and for the web applications on distributed caching features. Memcached is one of those and can be used in Python.

Note

Chapter 13 provides detailed information and techniques on caching

Proxy

Proxy decorators are used to tag and register functions with a global mechanism. For instance, a security layer that protects the access of the code, depending on the current user, can be implemented using a centralized checker with an associated permission required by the callable:

>>> class User(object):
...    def __init__(self, roles):
...        self.roles = roles
... 
>>> class Unauthorized(Exception):
...     pass
... 
>>> def protect(role):
...    def _protect(function):
...        def __protect(*args, **kw):
...            user = globals().get('user')
...            if user is None or role not in user.roles:
...                raise Unauthorized("I won't tell you")
...            return function(*args, **kw)
...        return __protect
...    return _protect
...

This model is often used in Python web frameworks to define the security over publishable classes. For instance, Django provides decorators to secure function access. (See Chapter 12 called Sessions, Users, and Registration in the Django book at http://www.djangobook.com.)

Here's an example, where the current user is kept in a global variable. The decorator checks his or her roles when the method is accessed:

>>> tarek = User(('admin', 'user'))
>>> bill = User(('user',))
>>> class MySecrets(object):
...     @protect('admin')
...     def waffle_recipe(self):
...          print 'use tons of butter!'
... 
>>> these_are = MySecrets()
>>> user = tarek
>>> these_are.waffle_recipe()
use tons of butter!
>>> user = bill
>>> these_are.waffle_recipe()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 7, in wrap
__main__.Unauthorized: I won't tell you

Context Provider

A context decorator makes sure that the function can run in the correct context, or run some code before and after the function. In other words, it sets and unsets a specific execution environment. For example, when a data item has to be shared among several threads, a lock has to be used to ensure that it is protected from multiple access. This lock can be coded in a decorator as follows:

>>> from threading import RLock
>>> lock = RLock()
>>> def synchronized(function):
...     def _synchronized(*args, **kw):
...         lock.acquire()
...         try:
...             return function(*args, **kw)
...         finally:
...             lock.release()
...     return _synchronized
>>> @locker
... def thread_safe():    # make sure it locks the resource
...     pass
... 

Context decorators are being replaced by the usage of the with statement that appeared in Python 2.5. This statement was created to streamline the try..finally pattern, and in some cases, covers the context decorator use cases.

A good place to start to get more decorator use cases is: http://wiki.python.org/moin/PythonDecoratorLibrary.

with and contextlib

The try..finally statement is useful to ensure some cleanup code is run even if an error is raised. There are many use cases for this, such as:

  • Closing a file

  • Releasing a lock

  • Making a temporary code patch

  • Running protected code in a special environment

The with statement factors out these use cases, by providing a simple way to call some code before and after a block of code. For example, working with a file is usually done like this:

>>> hosts = file('/etc/hosts')
>>> try:
...     for line in hosts:
...         if line.startswith('#'):
...             continue
...         print line
... finally:
...     hosts.close()
... 
127.0.0.1       localhost
255.255.255.255 broadcasthost
::1             localhost 

Note

This example is specific to Linux since it reads the host file located in etc, but any text file could have been used here in the same way.

By using the with statement, it can be rewritten like this:

>>> from __future__ import with_statement
>>> with file('/etc/hosts') as hosts:
...     for line in hosts:
...         if line.startswith('#'):
...             continue
...         print host
...
127.0.0.1       localhost
255.255.255.255 broadcasthost
::1             localhost  

Notice that the with statement is still located in the __future__ module for the 2.5 series and will be directly available in 2.6. It is described in: http://www.python.org/dev/peps/pep-0343.

The other items that are compatible with this statement are classes from the thread and threading module:

  • thread.LockType

  • threading.Lock

  • threading.RLock

  • threading.Condition

  • threading.Semaphore

  • threading.BoundedSemaphore

All these classes implement two methods: __enter__ and __exit__, which together form the with protocol. In other words, any class can implement it:

>>> class Context(object):
...     def __enter__(self):
...         print 'entering the zone'
...     def __exit__(self, exception_type, exception_value, 
...                  exception_traceback):
...         print 'leaving the zone'
...         if exception_type is None:
...             print 'with no error'
...         else:
...             print 'with an error (%s)' % exception_value
... 
>>> with Context():
...     print 'i am the zone'
... 
entering the zone
i am the zone
leaving the zone
with no error
>>> with Context():
...     print 'i am the buggy zone'
...     raise TypeError('i am the bug')
... 
entering the zone
i am the buggy zone
leaving the zone
with an error (i am the bug)
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
TypeError: i am the bug

__exit__ receives three arguments that are filled when an error occurs within the code block. If no error occurs, all three arguments are set to None. When an error occurs, __exit__ should not re-raise it, as this is the responsibility of the caller. It can prevent the exception being raised though, by returning True. This is provided to implement some specific use cases, such as the contextmanager decorator that we will see in the next section. But for most use cases, the right behavior for this method is to do some cleaning like what would be done by finally; no matter what happens in the block it does not returning anything.

The contextlib Module

A module was added to the standard library to provide helpers to use the with statement. The most useful item is contextmanager, a decorator that will enhance a generator that provides both __enter__ and __exit__ parts, separated by a yield statement. The previous example written with this decorator will look like this:

>>> from contextlib import contextmanager
>>> from __future__ import with_statement
>>> @contextmanager
... def context():
...     print 'entering the zone'
...     try:
...         yield
...     except Exception, e:
...         print 'with an error (%s)' % e
...         # we need to re-raise here
...         raise e
...     else:
...         print 'with no error'
... 

If any exception occurs, the function needs to re-raise it in order to pass it along. Note that context could have some arguments if needed, as long as they are provided in the call. This small helper simplifies the normal class-based context API exactly as generators do with the classed-based iterator API.

The two other helpers provided by this module are:

  • closing(element): This is a contextmanager decorated function that yields an element, and then calls the element's close method on exit. This is useful for classes that deal with streams, for instance.

  • nested(context1, context2, ...): This is a function that will combine contexts and make nested with calls with them.

Context Example

An interesting usage of with is logging the code that can be decorated when entering the context, and then set back as it was when it is over. This prevents changing the code itself and allows, for example, a unit test to get some feedback on the code usage.

In the following example, a context is created to equip all public APIs of a given class:

>>> import logging
>>> from __future__ import with_statement
>>> from contextlib import contextmanager
>>> @contextmanager
... def logged(klass, logger):
...     # logger
...     def _log(f):
...         def __log(*args, **kw):
...             logger(f, args, kw)
...             return f(*args, **kw)
...         return __log
...
...     # let's equip the class
...     for attribute in dir(klass):
...         if attribute.startswith('_'):
...             continue
...         element = getattr(klass, attribute)
...         setattr(klass, '__logged_%s' % attribute, element)
...         setattr(klass, attribute, _log(element))
...
...     # let's work
...     yield klass
...
...     # let's remove the logging
...     for attribute in dir(klass):
...         if not attribute.startswith('__logged_'):
...             continue
...         element = getattr(klass, attribute)
...         setattr(klass, attribute[len('__logged_'):], 
...                 element)    
...         delattr(klass, attribute)
...  

The logger function can then be used to record what APIs are being called in a given context. In the following example , the calls are added in a list to track the API usage, and then to perform some assertions. For instance, if the same API is called for more than once, it could mean that the public signature of the class could be refactored to avoid duplicate calls:

>>> class One(object):
...     def _private(self):
...         pass
...     def one(self, other):
...         self.two()
...         other.thing(self)
...         self._private()
...     def two(self):
...         pass
... 
>>> class Two(object):
...     def thing(self, other):
...         other.two()
... 
>>> calls = []
>>> def called(meth, args, kw):
...     calls.append(meth.im_func.func_name)
... 
>>> with logged(One, called):
...     one = One()
...     two = Two()
...     one.one(two)
... 
>>> calls
['one', 'two', 'two']

Summary

In this chapter, we have learned that:

  • List comprehensions are the most convenient way to take existing iterables, do something with them, and then produce new lists.

  • Iterators and generators provide an efficient set of tools to generate and work with sequences.

  • Decorators provide a readable way to wrap existing functions and methods with an additional behavior. This leads to new code-patterns that are very simple to implement and use.

  • The with statement streamlines the try..finally pattern.

The next chapter also covers syntax best-practices, but those dedicated to classes.