4. Test Data – MySQL 8 Query Performance Tuning: A Systematic Method for Improving Execution Speeds

© Jesper Wisborg Krogh 2020
J. W. KroghMySQL 8 Query Performance Tuninghttps://doi.org/10.1007/978-1-4842-5584-1_4

4. Test Data

Jesper Wisborg Krogh1 
(1)
Hornsby, NSW, Australia
 

Testing is a very important part of the performance tuning work as it is important that you have verified that your changes work before you apply them to your production system. The best data for verification of your changes closely relates to your production data; however, for exploring how MySQL works, it can be better to use some generic test data. This chapter introduces four standard data sets with installation instructions as well as a few other data sets that are available.

Tip

The world, world_x, and sakila databases are used as test data in the remainder of this book.

First, however, you need to know how you can download the databases.

Downloading the Example Databases

Common for the example databases that are discussed in detail in this chapter is that they can be downloaded from https://dev.mysql.com/doc/index-other.html or there is a link to where they can be downloaded from. For several of the databases, there is also online documentation and PDF files linked from this page. The relevant part of the page is shown in Figure 4-1.
Figure 4-1

The table with links to the example databases

The employee data (the employees database) is downloaded from Giuseppe Maxia’s (also known as The Data Charmer) GitHub repository, whereas the other databases are downloaded from Oracle’s MySQL’s site. The download with the employee data also includes a copy of the sakila database. For the employee data, the world database, and the sakila database, there is also documentation available.

Note

If you are not using the latest version of the data, you may see warnings about deprecated features, when you install the test databases. You can ignore those warnings, however, it is recommended to get the latest version of the data.

The menagerie database is a tiny two-table database with a total of fewer than 20 rows that was created for the tutorials section in the MySQL manual. It will not be discussed further.

The world Database

The world sample database is one of the most commonly used databases for simple tests. It consists of three tables with a few hundred to a few thousand rows. This makes it a small data set which means it can easily be used even on small test instances.

Schema

The database consists of the city, country, and countrylanguage tables. The relationship between the tables is shown in Figure 4-2.
Figure 4-2

The world database

The country table includes information about 239 countries and serves as the parent table in foreign keys from the city and countrylanguage tables. There are a total of 4079 cities in the database and 984 combinations of country and language.

Installation

The downloaded file consists of a single file named world.sql.gz or world.sql.zip depending on whether you chose the Gzip or Zip link. In either case, the downloaded archive contains a single file world.sql. The installation of the data is straightforward as all that is required is to execute the script.

If you use MySQL Shell with a copy of the world database from around January 2020 or before, you will need to use the traditional protocol as the X Protocol (the default) requires UTF-8 and the world database used Latin 1. You use the \source command to load the data from MySQL Shell:
MySQL [localhost ssl] SQL> \source world.sql
If you use the legacy mysql command-line client, use the SOURCE command instead:
mysql> SOURCE world.sql

In either case, add the path to the world.sql file if it is not located in the directory where you started MySQL Shell or mysql.

A related database is world_x which contains the same data as world, but it is organized differently.

The world_x Database

MySQL 8 has added support for the MySQL Document Store which supports storing and retrieving data as JavaScript Object Notation (JSON) documents. The world_x database stores some of the data in JSON documents to give you a test database that can readily be used for tests that include the use of JSON.

Schema

The world_x database includes the same three tables as the world database, though the columns are a little different, for example, the city table includes the JSON column Info with the population instead of the Population column and the country table has omitted several columns. Instead, there is the countryinfo table which is a pure Document Store–type table with the information otherwise removed from the country table. The schema diagram is shown in Figure 4-3.
Figure 4-3

The world_x database

While there are no foreign keys from the city and countryinfo tables, they can be joined to the country table using the CountryCode column and doc->>'$.Code'  value, respectively. The _id column of the countryinfo table is an example of a stored generated column where the value is extracted from the JSON document in the doc column.

Installation

The installation of the world_x database is very similar to the world database. You download either the world_x-db.tar.gz or world_x-db.zip file and extract it. The extracted files include a file named world_x.sql as well as a README file. The world_x.sql file includes all the statements required to create the schema.

Since the world_x schema uses UTF-8, you can install it using either of the MySQL protocols. For example, using MySQL Shell:
MySQL [localhost+ ssl] SQL> \source world_x.sql

Add the path to the world_x.sql file if it is not located in the current directory.

The world and world_x databases are very simple which makes them easy to use; however, sometimes you will need something a little more complex which the sakila database can deliver.

The sakila Database

The sakila database is a realistic database that contains a schema for a film rental business with information about the films, inventory, stores, staff, and customers. It adds a full text index, a spatial index, views, and stored programs to provide a more complete example of using MySQL features. The database size is still very moderate making it suitable for small instances.

Schema

The sakila database consists of 16 tables, seven views, three stored procedures, three stored functions, and six triggers. The tables can be split into three groups, customer data, business, and inventory. For brevity, not all columns are included in the diagrams and most indexes are not shown. Figure 4-4 shows a complete overview of the tables, views, and stored routines.
Figure 4-4

Overview of the sakila database

The tables with customer-related data (plus addresses for staff and stores) are in the area in the top-left corner. The area in the lower left includes data related to the business, and the area in the top right contains information about the films and inventory. The lower right is used for the views and stored routines.

Tip

You can view the entire diagram (though formatted differently) by opening the sakila.mwb file included with the installation in MySQL Workbench. This is also a good example of how you can use enhanced entity-relationship (EER) diagrams in MySQL Workbench to document your schema.

As there is a relatively large number of objects, they will be split into five groups (each of the table groups, views, and stored routines) when discussing the schema. The first group is the customer-related data with the tables shown in Figure 4-5.
Figure 4-5

The tables with customer data in the sakila database

There are four tables with data related to the customers. The customer table is the main table, and the address information is stored in the address, city, and country tables.

There are foreign keys between the customer and business groups with a foreign key from the customer table to the store table in the business group. There are also four foreign keys from tables in the business group to the address and customer tables. The business group is shown in Figure 4-6.
Figure 4-6

The tables with business data in the sakila database

The business tables contain information about the stores, staff, rentals, and payments. The store and staff tables have foreign keys in both directions with staff belonging to a store and a store having a manager that is part of the staff. Rentals and payments are handled by a staff member and thus indirectly linked to a store, and payments are for a rental.

The business group of tables is the one with the most relations to other groups. The staff and store tables have foreign keys to the address table, and the rental and payment tables reference the customer. Finally, the rental table has a foreign key to the inventory table which is in the inventory group. The diagram for the inventory group is shown in Figure 4-7.
Figure 4-7

The tables with inventory data in the sakila database

The main table in the inventory group is the film table which contains the metadata about the films the stores offer. Additionally, there is the film_text table with the title and description with a full text index.

There is a many-to-many relationship between the film and the category and actor tables. Finally, there is a foreign key from the inventory table to the store table in the business group.

That covers all the tables in the sakila database, but there are also some views as shown in Figure 4-8.
Figure 4-8

The views in the sakila database

The views can be used like reports and can be divided into two categories. The film_list, nicer_but_slower_film_list, and actor_info views are related to the films stored in the database. The second category contains information related to the stores in the sales_by_store, sales_by_film_category, staff_list, and customer_list views.

To complete the database, there are also the stored functions and procedures shown in Figure 4-9.
Figure 4-9

The stored routines in the sakila database

The film_in_stock() and film_not_in_stock() procedures return a result set consisting of the inventory ids for a given film and store based on whether the film is in stock or not. The total number of inventory entries found is returned as an out parameter. The rewards_report() procedure generates a report based on minimum spends for the last month.

The get_customer_balance() function returns the balance for a given customer on a given data. The two remaining functions check the status of an inventory id with inventory_held_by_customer() returning customer id of the customer currently renting that item (and NULL if no customer is renting it), and if you want to check whether a given inventory id is in stock, you can use the inventory_in_stock() function.

Installation

The downloaded file expands into a directory with three files, of which two create the schema and data and the last file contains the ETL diagram in the format used by MySQL Workbench.

Note

This section and the examples later in the book use the copy of the sakila database that is downloaded from MySQL’s homepage.

The files are
  • sakila-data.sql: The INSERT statements needed to populate the tables as well as the trigger definitions.

  • sakila-schema.sql: The schema definition statements.

  • sakila.mwb: The MySQL Workbench ETL diagram. This is similar to that shown in Figure 4-4 with details in Figures 4-5 to 4-9.

You install the sakila database by first sourcing the sakila-schema.sql file and then the sakila-data.sql file. For example, the following is using MySQL Shell:
MySQL [localhost+ ssl] SQL> \source sakila-schema.sql
MySQL [localhost+ ssl] SQL> \source sakila-data.sql

Add the path to the files if they are not located in the current directory.

Common for the three data sets thus far is that they contain little data. While this is in many cases a nice feature as it makes it easier to work with, in some cases you need a bit more data to explore the difference in query plans. The employees database is an option with more data.

The employees Database

The employees database (called employee data on the MySQL documentation download page; the name of the GitHub repository is test_db) was originally created by Fusheng Wang and Carlo Zaniolo and is the largest of the test data sets linked from MySQL’s homepage. The total size of the data files is around 180 MiB for the non-partitioned version and 440 MiB for the partitioned version.

Schema

The employees database consists of six tables and two views. You can optionally install two more views, five stored functions, and two stored procedures. The tables are shown in Figure 4-10.
Figure 4-10

The tables, views, and routines in the employees database

It is possible to choose to have the salaries and titles tables partitioned by the year of the from_date column as shown in Listing 4-1.
PARTITION BY RANGE  COLUMNS(from_date)
(PARTITION p01 VALUES LESS THAN ('1985-12-31') ENGINE = InnoDB,
 PARTITION p02 VALUES LESS THAN ('1986-12-31') ENGINE = InnoDB,
 PARTITION p03 VALUES LESS THAN ('1987-12-31') ENGINE = InnoDB,
 PARTITION p04 VALUES LESS THAN ('1988-12-31') ENGINE = InnoDB,
 PARTITION p05 VALUES LESS THAN ('1989-12-31') ENGINE = InnoDB,
 PARTITION p06 VALUES LESS THAN ('1990-12-31') ENGINE = InnoDB,
 PARTITION p07 VALUES LESS THAN ('1991-12-31') ENGINE = InnoDB,
 PARTITION p08 VALUES LESS THAN ('1992-12-31') ENGINE = InnoDB,
 PARTITION p09 VALUES LESS THAN ('1993-12-31') ENGINE = InnoDB,
 PARTITION p10 VALUES LESS THAN ('1994-12-31') ENGINE = InnoDB,
 PARTITION p11 VALUES LESS THAN ('1995-12-31') ENGINE = InnoDB,
 PARTITION p12 VALUES LESS THAN ('1996-12-31') ENGINE = InnoDB,
 PARTITION p13 VALUES LESS THAN ('1997-12-31') ENGINE = InnoDB,
 PARTITION p14 VALUES LESS THAN ('1998-12-31') ENGINE = InnoDB,
 PARTITION p15 VALUES LESS THAN ('1999-12-31') ENGINE = InnoDB,
 PARTITION p16 VALUES LESS THAN ('2000-12-31') ENGINE = InnoDB,
 PARTITION p17 VALUES LESS THAN ('2001-12-31') ENGINE = InnoDB,
 PARTITION p18 VALUES LESS THAN ('2002-12-31') ENGINE = InnoDB,
 PARTITION p19 VALUES LESS THAN (MAXVALUE) ENGINE = InnoDB)
Listing 4-1

The optional partitioning of the salaries and titles tables

Table 4-1 shows the number of rows and size of the tablespace files for the tables in the employees database (note that the size may vary a little when you load the data). The size assumes you load the non-partitioned data; the partitioned tables are somewhat larger.
Table 4-1

The size of each table in the employees database

Table

# Rows

Tablespace Size

departments

9

128 kiB

dept_emp

331603

25600 kiB

dept_manager

24

128 kiB

employees

300024

22528 kiB

salaries

2844047

106496 kiB

titles

443308

27648 kiB

By today’s standards, it is still a relatively small amount of data, but it is big enough that you can start to see some performance differences for different query plans.

The views and routines are summarized in Figure 4-11.
Figure 4-11

The views and routines in the employees database

The dept_emp_latest_date and current_dept_emp views are installed together with the tables, whereas the rest of the objects are installed separately in the objects.sql file. The stored routines come with their own built-in help which you can obtain by using the employees_usage() function or the employees_help() procedure. The latter is shown in Listing 4-2.
mysql> CALL employees_help()\G
*************************** 1. row ***************************
info:
    == USAGE ==
    ====================
    PROCEDURE show_departments()
        shows the departments with the manager and
        number of employees per department
    FUNCTION current_manager (dept_id)
        Shows who is the manager of a given departmennt
    FUNCTION emp_name (emp_id)
        Shows name and surname of a given employee
    FUNCTION emp_dept_id (emp_id)
        Shows the current department of given employee
1 row in set (0.00 sec)
Query OK, 0 rows affected (0.02 sec)
Listing 4-2

The built-in help for the stored routines in the employees database

Installation

You can download a ZIP file with the files required for the installation, or you can clone the GitHub repository at https://github.com/datacharmer/test_db. At the time of writing, there is only a single branch named master. If you have downloaded the ZIP file, it will unzip into a directory named test_db-master.

There are several files. The two relevant for installing the employees database in MySQL 8 are employees.sql and employees_partitioned.sql. The difference is whether the salaries and titles tables are partitioned. (There is also employees_partitioned_5.1.sql which is meant for MySQL 5.1 where the partitioning scheme used in employees_partitioned.sql is not supported.)

The data is loaded by sourcing the .dump files using the SOURCE command. At the time of writing, the SOURCE command is not supported in MySQL Shell, so you will need to use the legacy mysql command-line client to import the data. Go to the directory with the source files, and choose the employees.sql or employees_partitioned.sql file, depending on whether you want to use partitioning or not, for example:
mysql> SOURCE employees.sql
The import takes a little time and completes by showing how long it took:
+---------------------+
| data_load_time_diff |
+---------------------+
| 00:01:51            |
+---------------------+
1 row in set (0.44 sec)
Optionally, you can load some extra views and stored routines by sourcing the objects.sql file:
mysql> SOURCE objects.sql

In addition to the data sets discussed here, there are some other choices to obtain example data to work with.

Other Databases

It can happen that you need to perform testing that requires data with some requirements that are not fulfilled by the standard example databases discussed thus far. Fortunately, there are other options available.

Tip

Do not discount the possibility to create your own custom example database, for example, by using data masking on your production data.

If you are looking for a very large real-world example, then you can download the Wikipedia database as described at https://en.wikipedia.org/wiki/Wikipedia:Database_download. The English Wikipedia dump from September 20, 2019, is 16.3 GiB in bzip2 compressed XML format.

If you are looking for JSON data, then an option is the earthquake information from the United States Geological Survey (USGS) which is provided in GeoJSON format with options to download information for earthquakes for the last hour, day, week, or month optionally filtered by the strength of the earthquake. The format description and links to the feeds can be found at https://earthquake.usgs.gov/earthquakes/feed/v1.0/geojson.php. Since the data includes geographic information in the GeoJSON format, it can be useful for testing that requires spatial indexes.

The benchmark tools described in the previous chapter also include test data or support creating test data. This data may also be useful for your own testing.

There are other example databases available if you search the Internet. In the end, the important things to consider are whether the data has a good size for your testing and whether it uses the features you require.

Summary

This chapter has introduced four standard example databases and some other examples of test data. The four standard databases that were discussed were world, world_x, sakila, and employees. These can all be found through the MySQL manual at https://dev.mysql.com/doc/index-other.html. Except for employees, these databases are used for the examples in this book unless stated otherwise.

The world and world_x databases are the simplest with the difference that world_x uses JSON to store some of the information, whereas the world database is purely relational. These databases do not contain much data, but because of their small size and simplicity, they are useful for simple tests and examples. Particularly the world database is used extensively in this book.

The sakila database has a much more complex schema including different index types, views, and stored routines. This makes it more realistic and allows for more complex tests. Yet, the size of the data is still small enough to use it on even small MySQL instances. It is also used extensively in this book.

The employees database has a schema that is in between the world and sakila databases in complexity, but has significantly more data making it better for testing the difference between various query plans. It is also useful if you need to generate some load on the instance, for example, using table scans. The employees database is not directly used in this book, but if you want to reproduce some of the examples where some load is required, then this is the best of the four standard test databases to use.

You should not limit yourself to consider the standard test databases. You may be able to create your own, create one using a benchmark tool, or find data made available on the Internet. Wikipedia’s database and the earthquake data from the United States Geological Survey (USGS) are examples of data that can be downloaded.

This completes the introduction to MySQL query performance tuning. Part II goes through the common sources of information in connection with diagnosing performance problems starting with the Performance Schema.