Chapter 6. Strings – The Quick Python Book

Chapter 6. Strings

This chapter covers

  • Understanding strings as sequences of characters
  • Using basic string operations
  • Inserting special characters and escape sequences
  • Converting from objects to strings
  • Formatting strings
  • Using the byte type

Handling text—from user input, to filenames, to processing chunks of text—is a common chore in programming. Python comes with powerful tools to handle and format text. This chapter discusses the standard string and string-related operations in Python.

6.1. Strings as sequences of characters

For the purposes of extracting characters and substrings, strings can be considered sequences of characters, which means you can use index or slice notation:

>>> x = "Hello"
>>> x[0]
'H'
>>> x[-1]
'o'
>>> x[1:]
'ello'

One use for slice notation with strings is to chop the newline off the end of a string, usually a line that’s just been read from a file:

>>> x = "Goodbye\n"
>>> x = x[:-1]
>>> x
'Goodbye'

This is just an example—you should know that Python strings have other, better methods to strip unwanted characters, but this illustrates the usefulness of slicing.

You can also determine how many characters are in the string by using the len function, just like finding out the number of elements in a list:

>>> len("Goodbye")
7

But strings aren’t lists of characters. The most noticeable difference between strings and lists is that, unlike lists, strings can’t be modified. Attempting to say something like string.append('c') or string[0] = 'H' will result in an error. You’ll notice in the previous example that we stripped off the newline from the string by creating a string that was a slice of the previous one, not by modifying the previous string directly. This is a basic Python restriction, imposed for efficiency reasons.

6.2. Basic string operations

The simplest (and probably most common) way of combining Python strings is to use the string concatenation operator +:

>>> x = "Hello " + "World"
>>> x
'Hello World'

There is an analogous string multiplication operator that I have found sometimes, but not often, useful:

>>> 8 * "x"
'xxxxxxxx'

6.3. Special characters and escape sequences

You’ve already seen a few of the character sequences Python regards as special when used within strings: \n represents the newline character and \t represents the tab character. Sequences of characters that start with a backslash and that are used to represent other characters are called escape sequences. Escape sequences are generally used to represent special characters—that is, characters (such as tab and newline) that don’t have a standard one-character printable representation. This section covers escape sequences, special characters, and related topics in more detail.

6.3.1. Basic escape sequences

Python provides a brief list of two-character escape sequences to use in strings (table 6.1).

Table 6.1. Escape sequences

Escape sequence

Character represented

\' Single-quote character
\" Double-quote character
\\ Backslash character
\a Bell character
\b Backspace character
\f Formfeed character
\n Newline character
\r Carriage return character (not the same as \n)
\t Tab character
\v Vertical tab character

The ASCII character set, which is the character set used by Python and the standard character set on almost all computers, defines quite a few more special characters. They’re accessed by the numeric escape sequences, described in the next section.

6.3.2. Numeric (octal and hexadecimal) and Unicode escape sequences

You can include any ASCII character in a string by using an octal (base 8) or hexadecimal (base 16) escape sequence corresponding to that character. An octal escape sequence is a backslash followed by three digits defining an octal number; the ASCII character corresponding to this octal number is substituted for the octal escape sequence. A hexadecimal escape sequence is similar but starts with \x rather than just \ and can consist of any number of hexadecimal digits. The escape sequence is terminated when a character is found that’s not a hexadecimal digit. For example, in the ASCII character table, the character m happens to have decimal value 109. This is octal value 155 and hexadecimal value 6D, so:

>>> 'm'
'm'
>>> '\155'
'm'
>>> '\x6D'
'm'

All three expressions evaluate to a string containing the single character m. But these forms can also be used to represent characters that have no printable representation. The newline character \n, for example, has octal value 012 and hexadecimal value 0A:

>>> '\n'
'\n'
>>> '\012'
'\n'
>>> '\x0A'
'\n'

Because all strings in Python 3 are Unicode strings, they can also contain almost every character from every language available. Although a discussion of the Unicode system is far beyond this book, the following examples illustrate that you can also escape any Unicode character, either by number similar to that shown earlier or by Unicode name:

The Unicode character set includes the common ASCII characters .

6.3.3. Printing vs. evaluating strings with special characters

We talked before about the difference between evaluating a Python expression interactively and printing the result of the same expression using the print function. Although the same string is involved, the two operations can produce screen outputs that look different. A string that is evaluated at the top level of an interactive Python session will be shown with all of its special characters as octal escape sequences, which makes clear what is in the string. Meanwhile, the print function passes the string directly to the terminal program, which may interpret special characters in special ways. For example, here’s what happens with a string consisting of an a followed by a newline, a tab, and a b:

>>> 'a\n\tb'
'a\n\tb'
>>> print('a\n\tb')
a
b

In the first case, the newline and tab are shown explicitly in the string; in the second, they’re used as newline and tab characters.

A normal print function also adds a newline to the end of the string. Sometimes (that is, when you have lines from files that already end with newlines) you may not want this behavior. Giving the print function an end parameter of "" causes the print function to not append the newline:

>>> print("abc\n")
abc

>>> print("abc\n", end="")
abc
>>>

6.4. String methods

Most of the Python string methods are built into the standard Python string class, so all string objects have them automatically. The standard string module also contains some useful constants. Modules will be discussed in detail in chapter 10.

For the purposes of this section, you need only remember that most string methods are attached to the string object they operate on by a dot (.), as in x.upper(). That is, they’re prepended with the string object followed by a dot.

Because strings are immutable, the string methods are used only to obtain their return value and don’t modify the string object they’re attached to in any way.

We’ll begin with those string operations that are the most useful and commonly used and then go on to discuss some less commonly used but still useful operations. At the end, we’ll discuss a few miscellaneous points related to strings. Not all of the string methods are documented here. See the documentation for a complete list of string methods.

6.4.1. The split and join string methods

Anyone who works with strings is almost certain to find the split and join methods invaluable. They’re the inverse of one another—split returns a list of substrings in the string, and join takes a list of strings and puts them together to form a single string with the original string between each element. Typically, split uses whitespace as the delimiter to the strings it’s splitting, but you can change that via an optional argument.

String concatenation using + is useful but not efficient for joining large numbers of strings into a single string, because each time + is applied, a new string object is created. Our previous “Hello World” example produced two string objects, one of which was immediately discarded. A better option is to use the join function:

>>> " ".join(["join", "puts", "spaces", "between", "elements"])
'join puts spaces between elements'

By changing the string used to join, you can put anything you want between the joined strings:

>>> "::".join(["Separated", "with", "colons"])
'Separated::with::colons'

You can even use an empty string, "", to join elements in a list:

>>> "".join(["Separated", "by", "nothing"])
'Separatedbynothing'

The most common use of split is probably as a simple parsing mechanism for string-delimited records stored in text files. By default, split splits on any whitespace, not just a single space character, but you can also tell it to split on a particular sequence by passing it an optional argument:

>>> x = "You\t\t can have tabs\t\n \t and newlines \n\n " \
"mixed in"
>>> x.split()
['You', 'can', 'have', 'tabs', 'and', 'newlines', 'mixed', 'in']
>>> x = "Mississippi"
>>> x.split("ss")
['Mi', 'i', 'ippi']

Sometimes it’s useful to permit the last field in a joined string to contain arbitrary text, including, perhaps, substrings that may match what split splits on when reading in that data. You can do this by specifying how many splits split should perform when it’s generating its result, via an optional second argument. If you specify n splits, then split will go along the input string until it has performed n splits (generating a list with n+1 substrings as elements) or until it runs out of string. Here are some examples:

>>> x = 'a b c d'
>>> x.split(' ', 1)
['a', 'b c d']
>>> x.split(' ', 2)
['a', 'b', 'c d']
>>> x.split(' ', 9)
['a', 'b', 'c', 'd']

When using split with its optional second argument, you must supply a first argument. To get it to split on runs of whitespace while using the second argument, use None as the first argument.

I use split and join extensively, usually when working with text files generated by other programs. But you should know that if you’re able to define your own data file format for use solely by your Python programs, there’s a much better alternative to storing data in text files. We’ll discuss it in chapter 13 when we talk about the Pickle module.

6.4.2. Converting strings to numbers

You can use the functions int and float to convert strings into integer or floating-point numbers, respectively. If they’re passed a string that can’t be interpreted as a number of the given type, they will raise a ValueError exception. Exceptions are explained in chapter 14, “Reading and writing files.” In addition, you may pass int an optional second argument, specifying the numeric base to use when interpreting the input string:

Did you catch the reason for that last error? We requested that the string be interpreted as a base 6 number, but the digit 6 can never appear in a base 6 number. Sneaky!

6.4.3. Getting rid of extra whitespace

A trio of simple methods that are surprisingly useful are the strip, lstrip, and rstrip functions. strip returns a new string that’s the same as the original string, except that any whitespace at the beginning or end of the string has been removed. lstrip and rstrip work similarly, except that they remove whitespace only at the left or right end of the original string, respectively:

>>> x = "  Hello,    World\t\t "
>>> x.strip()
'Hello, World'
>>> x.lstrip()
'Hello, World\t\t '
>>> x.rstrip()
' Hello, World'

In this example, tab characters are considered to be whitespace. The exact meaning may differ across operating systems, but you can always find out what Python considers to be whitespace by accessing the string.whitespace constant. On my Windows system, it gives the following:

import string
>>> string.whitespace
' \t\n\r\x0b\x0c'
>>> " \t\n\r\v\f"
' \t\n\r\x0b\x0c'

The characters given in backslashed hex (\xnn) format represent the vertical tab and formfeed characters. The space character is in there as itself. It may be tempting to change the value of this variable, to attempt to affect how strip and so forth work, but don’t do it. Such an action isn’t guaranteed to give you the results you’re looking for.

But you can change which characters strip, rstrip, and lstrip remove by passing a string containing the characters to be removed as an extra parameter:

Note that strip removes any and all of the characters in the extra parameter string, no matter in which order they occur .

The most common use for these functions is as a quick way of cleaning up strings that have just been read in. This is particularly helpful when you’re reading lines from files (discussed in chapter 13), because Python always reads in an entire line, including the trailing newline, if it exists. When you get around to processing the line read in, you typically don’t want the trailing newline. rstrip is a convenient way to get rid of it.

6.4.4. String searching

The string objects provide a number of methods to perform simple string searches. Before I describe them, though, let’s talk about another module in Python: re. (This module will be discussed in depth in chapter 17, “Regular expressions.”)

 

Another method for searching strings: the re module

The re module also does string searching but in a far more flexible manner, using regular expressions. Rather than searching for a single specified substring, an re search can look for a string pattern. You could look for substrings that consist entirely of digits, for example.

Why am I mentioning this, when re is discussed fully later? In my experience, many uses of basic string searches are inappropriate. You’d benefit from a more powerful searching mechanism but aren’t aware that one exists, and so you don’t even look for something better. Perhaps you have an urgent project involving strings and don’t have time to read this entire book. If basic string searching will do the job for you, that’s great. But be aware that you have a more powerful alternative.

 

The four basic string-searching methods are all similar: find, rfind, index, and rindex. A related method, count, counts how many times a substring can be found in another string. We’ll describe find in detail and then examine how the other methods differ from it.

find takes one required argument: the substring being searched for. find returns the position of the first character of the first instance of substring in the string object, or –1 if substring doesn’t occur in the string:

>>> x = "Mississippi"
>>> x.find("ss")
2
>>> x.find("zz")
-1

find can also take one or two additional, optional arguments. The first of these, if present, is an integer start; it causes find to ignore all characters before position start in string when searching for substring. The second optional argument, if present, is an integer end; it causes find to ignore characters at or after position end in string:

>>> x = "Mississippi"
>>> x.find("ss", 3)
5
>>> x.find("ss", 0, 3)
-1

rfind is almost the same as find, except that it starts its search at the end of string and so returns the position of the first character of the last occurrence of substring in string:

>>> x = "Mississippi"
>>> x.rfind("ss")
5

rfind can also take one or two optional arguments, with the same meanings as those for find.

index and rindex are identical to find and rfind, respectively, except for one difference: if index or rindex fails to find an occurrence of substring in string, it doesn’t return –1 but rather raises a ValueError exception. Exactly what this means will be clear after you read chapter 14, “Exceptions.”

count is used identically to any of the previous four functions but returns the number of non-overlapping times the given substring occurs in the given string:

>>> x = "Mississippi"
>>> x.count("ss")
2

You can use two other string methods to search strings: startswith and endswith. These methods return a True or False result depending on whether the string they’re used on starts or ends with one of the strings given as parameters:

>>> x = "Mississippi"
>>> x.startswith("Miss")
True
>>> x.startswith("Mist")
False
>>> x.endswith("pi")
True
>>> x.endswith("p")
False

Both startswith and endswith can look for more than one string at a time. If the parameter is a tuple of strings, both methods check for all of the strings in the tuple and return a True if any one of them is found:

>>> x.endswith(("i", "u"))
True

startswith and endswith are useful for simple searches.

6.4.5. Modifying strings

Strings are immutable, but string objects have a number of methods that can operate on that string and return a new string that’s a modified version of the original string. This provides much the same effect as direct modification for most purposes. You can find a more complete description of these methods in the documentation.

You can use the replace method to replace occurrences of substring (its first argument) in the string with newstring (its second argument). It also takes an optional third argument (see the documentation for details):

>>> x = "Mississippi"
>>> x.replace("ss", "+++")
'Mi+++i+++ippi'

As with the string search functions, the re module provides a much more powerful method of substring replacement.

The functions string.maketrans and string.translate may be used together to translate characters in strings into different characters. Although rarely used, these functions can simplify your life when they’re needed.

Let’s say, for example, that you’re working on a program that translates string expressions from one computer language into another. The first language uses ~ to mean logical not, whereas the second language uses !; the first language uses ^ to mean logical and, whereas the second language uses &; the first language uses ( and ), where the second language uses [ and ]. In a given string expression, you need to change all instances of ~ to !, all instances of ^ to &, all instances of ( to [, and all instances of ) to ]. You could do this using multiple invocations of replace, but an easier and more efficient way is

>>> x = "~x ^ (y % z)"
>>> table = x.maketrans("~^()", "!&[]")
>>> x.translate(table)
'!x & [y % z]'

The first line uses maketrans to make up a translation table from its two string arguments. The two arguments must each contain the same number of characters, and a table will be made such that looking up the nth character of the first argument in that table gives back the nth character of the second argument.

Next, the table produced by maketrans is passed to translate. Then, translate goes over each of the characters in its string object and checks to see if they can be found in the table given as the second argument. If a character can be found in the translation table, translate replaces that character with the corresponding character looked up in the table, to produce the translated string.

You can give an optional argument to translate, to specify characters that should be removed from the string entirely. See the documentation for details.

Other functions in the string module perform more specialized tasks. string.lower converts all alphabetic characters in a string to lowercase, and upper does the opposite. capitalize capitalizes the first character of a string, and title capitalizes all words in a string. swapcase converts lowercase characters to uppercase and uppercase to lowercase in the same string. expandtabs gets rid of tab characters in a string by replacing each tab with a specified number of spaces. ljust, rjust, and center pad a string with spaces, to justify it in a certain field width. zfill left-pads a numeric string with zeros. Refer to the documentation for details of these methods.

6.4.6. Modifying strings with list manipulations

Because strings are immutable objects, there’s no way to directly manipulate them in the same way you can lists. Although the operations that operate on strings to produce new strings (leaving the original strings unchanged) are useful for many things, sometimes you want to be able to manipulate a string as if it were a list of characters. In that case, just turn it into a list of characters, do whatever you want, and turn the resulting list back into a string:

Although you can use split to turn your string into a list of characters, the type-conversion function list is easier to use and to remember (and, for what it’s worth, you can turn a string into a tuple of characters using the built-in tuple function). To turn the list back into a string, use "".join.

You shouldn’t go overboard with this method because it causes the creation and destruction of new string objects, which is relatively expensive. Processing hundreds or thousands of strings in this manner probably won’t have much of an impact on your program. Processing millions probably will.

6.4.7. Useful methods and constants

string objects also have several useful methods to report qualities of the string, whether it consists of digits or alphabetic characters, is all uppercase or lowercase, and so on:

>>> x = "123"
>>> x.isdigit()
True
>>> x.isalpha()
False
>>> x = "M"
>>> x.islower()
False
>>> x.isupper()
True

For a fuller list of all the possible string methods, refer to the string section of the official Python documentation.

Finally, the string module defines some useful constants. You’ve already seen string.whitespace, which is a string made up of the characters Python thinks of as whitespace on your system. string.digits is the string '0123456789'. string.hexdigits includes all the characters in string.digits, as well as 'abcdefABCDEF', the extra characters used in hexadecimal numbers. string.octdigits contains '01234567'—just those digits used in octal numbers. string.lowercase contains all lowercase alphabetic characters; string.uppercase contains all uppercase alphabetic characters; string.letters contains all of the characters in string.lowercase and string.uppercase. You might be tempted to try assigning to these constants to change the behavior of the language. Python would let you get away with this, but it would probably be a bad idea.

Remember that strings are sequences of characters, so you can use the convenient Python in operator to test for a character’s membership in any of these strings, although usually the existing string methods is simpler and easier.

The most common string operations are shown in table 6.2.

Table 6.2. String operations

String operation

Explanation

Example

+ Adds two strings together x = "hello " + "world"
* Replicates a string x = " " * 20
upper Converts a string to uppercase x.upper()
lower Converts a string to lowercase x.lower()
title Capitalizes the first letter of each word in a string x.title()
find, index Searches for the target in a string x.find(y)
x.index(y)
rfind, rindex Searches for the target in a string, from the end of the string x.rfind(y)
x.rindex(y)
startswith, endswith Checks the beginning or end of a string for a match x.startswith(y)
x.endswith(y)
replace Replaces the target with a new string x.replace(y, z)
strip, rstrip, lstrip Removes whitespace or other characters from the ends of a string x.strip()
encode Converts a Unicode string to a bytes object x.encode("utf_8")

Note that these methods don’t change the string itself but return either a location in the string or a new string.

6.5. Converting from objects to strings

In Python, almost anything can be converted to some sort of a string representation, using the built-in repr function. Lists are the only complex Python data types you’re familiar with so far, so let’s turn some lists into their representations:

>>> repr([1, 2, 3])
'[1, 2, 3]'
>>> x = [1]
>>> x.append(2)
>>> x.append([3, 4])
>>> 'the list x is ' + repr(x)
'the list x is [1, 2, [3, 4]]'

The example uses repr to convert the list x into a string representation, which is then concatenated with the other string to form the final string. Without the use of repr, this wouldn’t work. In an expression like "string" + [1, 2] + 3, are you trying to add strings, or add lists, or just add numbers? Python doesn’t know what you want in such a circumstance, and it will do the safe thing (raise an error) rather than make any assumptions. In the previous example, all the elements had to be converted to string representations before the string concatenation would work.

Lists are the only complex Python objects that have been described to this point, but repr can be used to obtain some sort of string representation for almost any Python object. To see this, try repr around a built-in complex object—an actual Python function:

>>> repr(len)
'<built-in function len>'

Python hasn’t produced a string containing the code that implements the len function, but it has at least returned a string—<built-in function len>—that describes what that function is. If you keep the repr function in mind and try it on each Python data type (dictionaries, tuples, classes, and the like) as we get to them in the book, you’ll see that no matter what type of Python object you have, you can get a string saying something about that object.

This is great for debugging programs. If you’re in doubt as to what’s held in a variable at a certain point in your program, use repr and print out the contents of that variable.

We’ve covered how Python can convert any object into a string that describes that object. The truth is, Python can do this in either of two different ways. The repr function always returns what might be loosely called the formal string representation of a Python object. More specifically, repr returns a string representation of a Python object from which the original object can be rebuilt. For large, complex objects, this may not be the sort of thing you wish to see in debugging output or status reports.

Python also provides the built-in str function. In contrast to repr, str is intended to produce printable string representations, and it can be applied to any Python object. str returns what might be called the informal string representation of the object. A string returned by str need not define an object fully and is intended to be read by humans, not by Python code.

You won’t notice any difference between repr and str when you first start using them, because until you begin using the object-oriented features of Python, there is no difference. str applied to any built-in Python object always calls repr to calculate its result. It’s only when you start defining your own classes that the difference between str and repr becomes important. This will be discussed in chapter 15.

So why talk about this now? Basically, I wanted you to be aware that there’s more going on behind the scenes with repr than being able to easily write print functions for debugging. As a matter of good style, you may want to get into the habit of using str rather than repr when creating strings for displaying information.

6.6. Using the format method

You can format strings in Python 3 in two ways. The newer way to format strings in Python is to use the string class’s format method. The format method combines a format string containing replacement fields marked with { } with replacement values taken from the parameters given to the format command. If you need to include a literal { or } in the string, you double it to {{ or }}. The format command is a powerful string-formatting mini-language and offers almost endless possibilities for manipulating string formatting. On the other hand, it’s fairly simple to use for the most common use cases, so we’ll look at a few basic patterns. Then, if you need to use the more advanced options, you can refer to the string-formatting section of the standard library documentation.

6.6.1. The format method and positional parameters

The simplest use of the string format method uses numbered replacement fields that correspond to the parameters passed to the format function:

Note that the format method is applied to the format string, which can also be a string variable . Doubling the { } characters escapes them so that they don’t mark a replacement field .

This example has three replacement fields, {0}, {1}, and {2}, which are in turn filled by the first, second, and third parameters. No matter where in the format string we place {0}, it will always be replaced by the first parameter, and so on.

You can also use the positional parameters.

6.6.2. The format method and named parameters

The format method also recognizes named parameters and replacement fields:

>>> "{food} is the food of {user}".format(food="Ambrosia",
... user="the gods")
'Ambrosia is the food of the gods'

In this case, the replacement parameter is chosen by matching the name of the replacement field to the name of the parameter given to the format command.

You can also use both positional and named parameters, and you can even access attributes and elements within those parameters:

>>> "{0} is the food of {user[1]}".format("Ambrosia",
... user=["men", "the gods", "others"])
'Ambrosia is the food of the gods'

In this case, the first parameter is positional, and the second, user[1], refers to the second element of the named parameter user.

6.6.3. Format specifiers

Format specifiers let you specify the result of the formatting with even more power and control than the formatting sequences of the older style of string formatting. The format specifier lets you control the fill character, alignment, sign, width, precision, and type of the data when it’s substituted for the replacement field. As noted earlier, the syntax of format specifiers is a mini-language in its own right and too complex to cover completely here, but the following examples give you an idea of its usefulness:

:10 is a format specifier that makes the field 10 spaces wide and pads with spaces . :{1} takes the width from the second parameter . :>10 forces left justification of the field and pads with spaces . :&>10 forces left justification and pads with & instead of spaces .

6.7. Formatting strings with %

This section covers formatting strings with the string modulus (%) operator. It’s used to combine Python values into formatted strings for printing or other use. C users will notice a strange similarity to the printf family of functions. The use of % for string formatting is the old style of string formatting, and I cover it here because it was the standard in earlier versions of Python and you’re likely to see it in code that’s been ported from earlier versions of Python or was written by coders familiar with those versions. This style of formatting shouldn’t be used in new code, because it’s slated to be deprecated and then removed from the language in the future.

Here’s an example:

>>> "%s is the %s of %s" % ("Ambrosia", "food", "the gods")
'Ambrosia is the food of the gods'

The string modulus operator (the bold % that occurs in the middle, not the three instances of %s that come before it in the example) takes two parts: the left side, which is a string; and the right side, which is a tuple. The string modulus operator scans the left string for special formatting sequences and produces a new string by substituting the values on the right side for those formatting sequences, in order. In this example, the only formatting sequences on the left side are the three instances of %s, which stands for “stick a string in here.”

Passing in different values on the right side produces different strings:

>>> "%s is the %s of %s" % ("Nectar", "drink", "gods")
'Nectar is the drink of gods'
>>> "%s is the %s of the %s" % ("Brussels Sprouts", "food",
... "foolish")
'Brussels Sprouts is the food of the foolish'

The members of the tuple on the right will have str applied to them automatically by %s, so they don’t have to already be strings:

>>> x = [1, 2, "three"]
>>> "The %s contains: %s" % ("list", x)
"The list contains: [1, 2, 'three']"

6.7.1. Using formatting sequences

All formatting sequences are substrings contained in the string on the left side of the central %. Each formatting sequence begins with a percent sign and is followed by one or more characters that specify what is to be substituted for the formatting sequence and how the substitution is accomplished. The %s formatting sequence used previously is the simplest formatting sequence, and it indicates that the corresponding string from the tuple on the right side of the central % should be substituted in place of the %s.

Other formatting sequences can be more complex. This one specifies the field width (total number of characters) of a printed number to be six, specifies the number of characters after the decimal point to be two, and left-justifies the number in its field. I’ve put in angle brackets so you can see where extra spaces are inserted into the formatted string:

>>> "Pi is <%-6.2f>" % 3.14159 # use of the formatting sequence: %–6.2f
'Pi is <3.14 >'

All the options for characters that are allowable in formatting sequences are given in the documentation. There are quite a few options, but none are particularly difficult to use. Remember, you can always try a formatting sequence interactively in Python to see if it does what you expect it to do.

6.7.2. Named parameters and formatting sequences

Finally, one additional feature is available with the % operator that can be useful in certain circumstances. Unfortunately, to describe it we’re going to have to employ a Python feature we haven’t used yet—dictionaries, commonly called hashtables or associative arrays by other languages. You can skip ahead to the next chapter, “Dictionaries,” to learn about dictionaries, skip this section for now and come back to it later, or read straight through, trusting to the examples to make things clear.

Formatting sequences can specify what should be substituted for them by name rather than by position. When you do this, each formatting sequence has a name in parentheses, immediately following the initial % of the formatting sequence, like so:

In addition, the argument to the right of the % operator is no longer given as a single value or tuple of values to be printed but rather as a dictionary of values to be printed, with each named formatting sequence having a correspondingly named key in the dictionary. Using the previous formatting sequence with the string modulus operator, we might produce code like this:

>>> num_dict = {'e': 2.718, 'pi': 3.14159}
>>> print("%(pi).2f - %(pi).4f - %(e).2f" % num_dict)
3.14 - 3.1416 - 2.72

This is particularly useful when you’re using format strings that perform a large number of substitutions, because you no longer have to keep track of the positional correspondences of the right-side tuple of elements with the formatting sequences in the format string. The order in which elements are defined in the dict argument is irrelevant, and the template string may use values from dict more than once (as it does with the 'pi' entry).

 

Controlling output with the print function

Python’s built-in print function also has some options that can make handling simple string output easier. When used with one parameter, print prints the value and a newline character, so that a series of calls to print print each value on a separate line:

>>> print("a")
a
>>> print("b")
b

But print can do more than that. You can also give the print function a number of arguments, and they will be printed on the same line, separated by a space and ending with a newline:

>>> print("a", "b",   "c")
a b c

If that’s not quite what you need, you can give the print function additional parameters to control what separates each item and what ends the line:

>>> print("a", "b",   "c", sep="|")
a|b|c
>>> print("a", "b", "c", end="\n\n")
a b c

>>>

In chapter 12, you’ll also see that the print function can be used to print to files as well as console output.

 

Using the print function’s options gives you enough control for simple text output, but more complex situations are best served by using the format method.

6.8. Bytes

A bytes object is similar to a string object but with an important difference. A string is an immutable sequence of Unicode characters, whereas a bytes object is a sequence of integers with values from 0 to 256. Bytes can be necessary when you’re dealing with binary data—for example, reading from a binary data file.

The key thing to remember is that bytes objects may look like strings, but they can’t be used exactly like a string and they can’t be combined with strings:

The first thing you can see is that to convert from a regular (Unicode) string to bytes, you need to call the string’s encode method . After it’s encoded to a bytes object, the character is now 2 bytes and no longer prints the same way the string did . Further, if you attempt to add a bytes object and a string object together, you get a type error, because the two are incompatible types . Finally, to convert a bytes object back to a string, you need to call that object’s decode method .

Most of the time, you shouldn’t need to think about Unicode or bytes at all. But when you need to deal with international character sets, an increasingly common issue, you must understand the difference between regular strings and bytes.

6.9. Summary

Python’s string type gives you several powerful tools for text processing. Almost all of those tools are the methods attached to any string object, although there’s an even more powerful set of tools in the re module. The standard string methods can search and replace, trim off extra characters, change case, and much more. Because strings are immutable—that is, they can’t be changed—the operations that “change” strings return a copy with the changes, but the original remains untouched.

After lists and strings, the next important Python data structure to consider is the dictionary, before we move on to control structures.