3. Benchmarking with Sysbench – MySQL 8 Query Performance Tuning: A Systematic Method for Improving Execution Speeds

© Jesper Wisborg Krogh 2020
J. W. KroghMySQL 8 Query Performance Tuninghttps://doi.org/10.1007/978-1-4842-5584-1_3

3. Benchmarking with Sysbench

Jesper Wisborg Krogh1 
(1)
Hornsby, NSW, Australia
 

It is very important to verify the impact of changes before you apply them to your production systems. This applies to both small changes like modifying a query and large changes such as refactoring the application and schema as well as MySQL upgrades. You may think that the optimal performance test is based on your production schema and data using the same queries that the application executes. However, it is not always as simple as it sounds to recreate the right workload, so sometimes it is necessary to use standard benchmark suites.

This chapter starts out with some best practices when executing benchmarks and an overview of some of the most common benchmarks and tools used with MySQL. Then Sysbench which is the most commonly used benchmark will be considered in more detail.

Best Practices

It is easy to install a benchmark program and execute it. The difficult part is to use it right. Performing MySQL benchmark tests shares some of the concepts of performance tuning, and the first and most important point is that you need to work in an “informed way.” This means that you must know your tools well and clearly define the goal and success criteria of the tests. For your tools, you need to know how to use them correctly as executing them with default parameters likely will not produce the test that you want.

This is tied together with the goal of the benchmark. What is it you need to determine? For example, you may want to verify the effect of changing some configuration variable in which case you must make sure your tests are set up, so that area is tested. Consider an option such as innodb_io_capacity which influences how fast InnoDB writes. If your benchmark is a read-only test, changing innodb_io_capacity will not make any difference. In this context, you also need to make sure that you only change one thing at a time and only make relatively small changes – just as you should do when making changes to your production system. Otherwise, if you change several settings at the same time, then some may contribute positively to the result and others negatively, but you have no way to determine which changes to keep and which to revert. If you make large changes, you may overshoot the optimal value, so you end up discarding that change even though there is room for improvements.

When reading the results at the end of the test, you need to understand what the benchmark measures; otherwise, the result is just a meaningless number. This also includes defining which variables to adjust during the tests, and as for performance tuning in general, it is important to keep the number of variables limited, so you easily can identify the effect of each variable. For a result to be valid, you must also ensure that the test is repeatable, that is, if you execute the same test twice, then you get the same result. One requirement for a test to be repeatable is that you have a well-defined starting state of the system.

Tip

Do not assume that one client is enough to generate the load you are aiming at. How many clients are required depends on the number of concurrent queries and the benchmark you are executing.

That leads to the next important point. Your benchmarks should reflect the workload of the application. It does not help you have used an online transaction processing (OLTP) benchmark to prove that your configuration changes work great, if your application has an online analytical processing (OLAP) workload, or that you have a great read-only performance if your application is write-heavy.

You may think that the optimal way to design a benchmark is to capture all queries executed in production and replay them as the benchmark. This definitely has some merits, but there are also challenges. It is expensive to collect all queries executed, though if you already have the MySQL Enterprise Audit log enabled for auditing purposes, that can be used. There may also be data privacy problems of copying the production data to the test system. Finally, it is difficult to scale the test to change the size of the data set (whether down to make it more manageable or up to test growth) or to increase the test workload compared to the current production load. For these reasons, it is often necessary to use artificial benchmarks.

Tip

You can use the MySQL Enterprise Audit log (requires subscription) or the general query log (very high overhead) to capture all queries for a period. This includes timestamps when the queries were executed, so you can use the log to replay the queries in the same order with the same concurrency. It does however require that you create a script yourself to extract the queries and execute them.

The next point is about the benchmark results which also relates to the previous points. When you have the result of a benchmark, it is important to understand what the result means and that you do not discard results just because they look wrong. As such, a benchmark result is “never wrong”; it is the result of some work. If the result is unexpected, it is important to understand why it ended up that way. Maybe, you did not use the parameters you intended or used a different table size than expected, but it may also be that something else interfered with the benchmark, or something third. If something interfered with the benchmark, is it something that could also happen in production? If it can, then the benchmark is very much relevant, and you need to decide how you will handle such a case in production.

To understand what happened during a benchmark, it is also important that you monitor MySQL and the host system. One option is to use the same monitoring solution as that you use for your production system. However, benchmarks on a test or development system are a bit different than a production system as you are typically interested in higher-frequency sampling but for a shorter duration during a benchmark, so it can be useful using a dedicated monitoring solution specifically for benchmarks. One such option is dim_STAT (http://dimitrik.free.fr/) developed by Dimitri Kravtchuk who is a performance architect for MySQL and who is behind many of the MySQL Server benchmarks.

In general, understanding the result is not a simple thing. One thing you also need to be aware of is what happens during a benchmark if there is a temporary stall. Does the benchmark hold back on subsequent queries, or does it keep submitting queries? If it holds back, then the subsequent queries will effectively be faster than they should be as in the real world as users do not stop submitting requests just because there is a backlog.

Finally, a benchmark typically produces several metrics, so you need to analyze the result as it makes most relevance for your system. For example, is the latency or throughput the most important? Or do you have requirements to both? Or are you more interested in some third metric?

Standard TPC Benchmarks

There is an almost endless list of benchmarks, but in the end the ones that are commonly used boils down to a handful of tests. This does not mean that you should not consider other benchmarks; in the end the important thing is that the benchmark works for your requirements.

The most commonly used standard benchmarks are defined by TPC (www.tpc.org/) with new benchmarks being designed as the hardware and software changes making older benchmarks too simple. The TPC web site includes detailed descriptions and specifications of the benchmarks. Table 3-1 summarizes the current enterprise TPC benchmarks.
Table 3-1

Common TPC benchmarks

Name

Type

Description

TPC-C

OLTP

This is maybe the most classic of the TPC benchmarks and dates back to 1992. It simulates the queries of a wholesale supplier and uses nine tables.

TPC-DI

Data Integration

Tests extract, transform, and load (ETL) workloads.

TPC-DS

Decision Support

This benchmark includes complex queries of a data warehouse (star schema).

TPC-E

OLTP

This is meant as a replacement for TPC-C with a more complex schema and queries, so it is more realistic for modern databases. It includes 33 tables.

TPC-H

Decision Support

This is another classic benchmark which is often used to test optimizer features. It consists of 22 complex queries meant to simulate the reporting side of an OLTP database.

TPC-VMS

Virtualization

This uses the TPC-C, TPC-DS, TPS-E, and TPC-H benchmarks to determine performance metrics for virtualized databases.

The advantages of these standard benchmarks are that you are more likely to find tools implementing them and you can compare with results obtained by other people.

Tip

If you want to learn more about the TPC benchmarks as well as how to perform database benchmarks the best way, consider the book by Bert Scalzo: Database Benchmarking and Stress Testing (Apress), www.apress.com/gp/book/9781484240076.

In the same way as there are standard benchmarks, there are also some common benchmark tools.

Common Benchmarks Tools

Implementing a benchmark is far from trivial, so in most cases it is preferred to use a preexisting benchmark tool that can execute the benchmark for you. Some tools are cross-platform and/or can use several different database systems, whereas others are more specific. You should choose the one that implements the benchmarks you need and work on the platform that you have your production system on.

Table 3-2 summarizes some of the most commonly used benchmark tools to test the performance of MySQL.
Table 3-2

Common benchmarks used with MySQL

Benchmark

Description

Sysbench

This is the most commonly used benchmark and the one that will be covered most in this chapter. It has built-in tests for OLTP workloads, non-database tests (such as pure I/O, CPU, and memory tests), and more. Additionally, the latest versions support custom workloads. It is open source and is mostly used on Linux. It can be downloaded from https://github.com/akopytov/sysbench.

DBT2

DBT2 can be used to emulate OLTP workloads using an order system (TPC-C). DBT2 can also be used to automate Sysbench and is available from https://dev.mysql.com/downloads/benchmarks.html.

DBT3

DBT3 implements the TPC-H benchmark and is used to test the performance of complex queries. It is one of the favorite tests used by the MySQL optimizer developers to verify the performance after implementing new optimizer features. A copy of DBT3 is available from https://sourceforge.net/projects/osdldbt/.

HammerDB

The HammerDB tool is a free cross-database tool with support for both Microsoft Windows and Linux. It has support for the TPC-C and TPC-H benchmarks and is available from https://hammerdb.com/.

Database Factory

Database Factory is a powerful benchmark tool for Microsoft Windows that supports several databases and benchmarks. It supports the TPC-H, TPC-C, TPC-D, and TPC-E benchmarks and more. It is a commercial product (free trial available): www.quest.com/products/benchmark-factory/.

iiBench

iiBench is testing how fast you can insert data into the database and thus is useful if you regularly need to ingest large amount of data. It can be downloaded from https://github.com/tmcallaghan/iibench-mysql.

DVD Store Version 3

The DVD Store combines data for a sample DVD store with a benchmark. It can generate data for any given size with standard sizes being 10 MB, 1 GB, and 100 GB. It is also useful as general test data and can be downloaded from https://github.com/dvdstore/ds3. It is based on the older Dell DVD Store Database Test Suite.

mysqlslap

The mysqlslap tool is special as it is included with the MySQL installation. It can be used to generate a concurrent workload against a table of your choice. It is a very simple tool, so it cannot be used for too many purposes, but it is easy to use. The manual page for mysqlslap can be found at https://dev.mysql.com/doc/refman/en/mysqlslap.html.

The tool that is most commonly used with MySQL is Sysbench, and the remainder of this chapter covers its installation and example usages.

Sysbench Installation

Since Sysbench is an open source tool, there are several forks available. MySQL maintains one of these forks; however, to get the version with the most recent features, it is recommended to use the fork by Alexey Kopytov. (This is also the fork recommended by MySQL performance architect Dimitri Kravtchuk.) The examples in this chapter all use Kopytov’s fork version 1.0.17 (but note the version listed on outputs is 1.1.0), but the examples will be similar for other Sysbench forks as long as the fork is new enough to include the features demonstrated.

There is support for installing Sysbench using native Linux packages, from Homebrew on macOS, or to compile it yourself. While installing using native packages is simpler, it is in general better to compile yourself as it ensures you compile against the MySQL 8 development libraries, and you can compile Sysbench on more platforms than where there are packages available.

Tip

For details about all the installation instructions, including required dependencies and using native packages, see https://github.com/akopytov/sysbench. Support for Microsoft Windows has been dropped in Sysbench 1.0. It is currently unknown whether support will be reintroduced. If you are using Microsoft Windows, the recommendation is to install Sysbench through Windows Subsystem for Linux (WSL) (https://msdn.microsoft.com/en-us/commandline/wsl/about) in which case the instruction in this chapter should work with minor modifications (depending on the Linux distribution you choose). An alternative is to use a virtual machine, for example, in VirtualBox.

Compiling software may not be very common any longer, but fortunately it is straightforward to compile Sysbench. You will need to download the source code and then configure the build, compile it, and finally install it.

There are some tools you will need to install before you can compile Sysbench. The exact tools required depend on your operating system. See the installation instructions on the project’s GitHub page for details. For example, on Oracle Linux 7:
shell$ sudo yum install make automake libtool \
                        pkgconfig libaio-devel \
                        openssl-devel
You will also need to have the MySQL 8 development libraries installed. The easiest way to do this on Linux is to install the MySQL repository for your Linux distribution from https://dev.mysql.com/downloads/. Listing 3-1 shows an example of installing the MySQL 8 development libraries on Oracle Linux 7.
shell$ wget https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm
...
Saving to: 'mysql80-community-release-el7-3.noarch.rpm'
100%[=================>] 26,024      --.-K/s   in 0.006s
2019-10-12 14:21:18 (4.37 MB/s) - 'mysql80-community-release-el7-3.noarch.rpm' saved [26024/26024]
shell$ sudo yum install mysql80-community-release-el7-3.noarch.rpm
Loaded plugins: langpacks, ulninfo
Examining mysql80-community-release-el7-3.noarch.rpm: mysql80-community-release-el7-3.noarch
Marking mysql80-community-release-el7-3.noarch.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package mysql80-community-release.noarch 0:el7-3 will be installed
--> Finished Dependency Resolution
Dependencies Resolved
===========================================================
 Package
   Arch   Version
          Repository                               Size
===========================================================
Installing:
 mysql80-community-release
   noarch el7-3
             /mysql80-community-release-el7-3.noarch  31 k
Transaction Summary
===========================================================
Install  1 Package
Total size: 31 k
Installed size: 31 k
Is this ok [y/d/N]: y
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : mysql80-community-release-el7-3.noarc   1/1
  Verifying  : mysql80-community-release-el7-3.noarc   1/1
Installed:
  mysql80-community-release.noarch 0:el7-3
Complete!
shell$ sudo yum install mysql-devel
...
Dependencies Resolved
===========================================================
 Package       Arch   Version      Repository         Size
===========================================================
Installing:
 mysql-community-client
         x86_64 8.0.17-1.el7 mysql80-community  32 M
     replacing  mariadb.x86_64 1:5.5.64-1.el7
 mysql-community-devel
         x86_64 8.0.17-1.el7 mysql80-community 5.5 M
 mysql-community-libs
         x86_64 8.0.17-1.el7 mysql80-community 3.0 M
     replacing  mariadb-libs.x86_64 1:5.5.64-1.el7
 mysql-community-libs-compat
         x86_64 8.0.17-1.el7 mysql80-community 2.1 M
     replacing  mariadb-libs.x86_64 1:5.5.64-1.el7
 mysql-community-server
         x86_64 8.0.17-1.el7 mysql80-community 415 M
     replacing  mariadb-server.x86_64 1:5.5.64-1.el7
Installing for dependencies:
 mysql-community-common
         x86_64 8.0.17-1.el7 mysql80-community 589 k
Transaction Summary
===========================================================
Install  5 Packages (+1 Dependent package)
Total download size: 459 M
...
Complete!
Listing 3-1

Installing the MySQL 8 development libraries

The output depends on what you have already installed. Notice how several other MySQL packages, including mysql-community-server, are pulled in as dependencies. This is because the mysql-community-devel package in this case replaces another preexisting package which triggers a chain of dependency updates.

Note

If you have an older version of MySQL or a fork installed, all related packages will be upgraded. For this reason, it is best to compile Sysbench on a host where you can freely replace packages or the correct MySQL 8 development libraries are already installed.

You are now ready to consider Sysbench itself. You can choose to either clone the GitHub repository or download the source as a ZIP file. To clone the repository, you need to have git installed and then use the git clone command :
shell$ git clone https://github.com/akopytov/sysbench.git
Cloning into 'sysbench'...
remote: Enumerating objects: 14, done.
remote: Counting objects: 100% (14/14), done.
remote: Compressing objects: 100% (12/12), done.
remote: Total 9740 (delta 4), reused 5 (delta 2), pack-reused 9726
Receiving objects: 100% (9740/9740), 4.12 MiB | 2.12 MiB/s, done.
Resolving deltas: 100% (6958/6958), done.
The ZIP file with the source code can be downloaded from the GitHub repository, for example, using wget :
shell$ wget https://github.com/akopytov/sysbench/archive/master.zip
...
Connecting to codeload.github.com (codeload.github.com)|52.63.100.255|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: 'master.zip'
    [    <=>               ] 2,282,636   3.48MB/s   in 0.6s
2019-10-12 16:01:33 (3.48 MB/s) - 'master.zip' saved [2282636]
Alternatively, you can download the ZIP file using your browser as shown in Figure 3-1.
Figure 3-1

Downloading the Sysbench source code from GitHub in a browser

Click Download ZIP and the file will download. Once the source code is downloaded, unzip it.

You are now ready to configure the compilation. Enter the top-level directory with the source code. The directory listing should look similar to the following output:
shell$ ls
autogen.sh    COPYING     Makefile.am    rpm      tests
ChangeLog     debian      missing        scripts  third_party
config        install-sh  mkinstalldirs  snap
configure.ac  m4          README.md      src
The configuration is done using the autogen.sh script followed by the configure command as shown in Listing 3-2.
shell$ ./autogen.sh
autoreconf: Entering directory `.'
...
parallel-tests: installing 'config/test-driver'
autoreconf: Leaving directory `.'
shell$ ./configure
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
...
===========================================================================
sysbench version   : 1.1.0-74f3b6b
CC                 : gcc -std=gnu99
CFLAGS             : -O3 -funroll-loops -ggdb3  -march=core2 -Wall -Wextra -Wpointer-arith -Wbad-function-cast -Wstrict-prototypes -Wnested-externs -Wno-format-zero-length -Wundef -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -Wredundant-decls -Wcast-align -Wvla   -pthread
CPPFLAGS           : -D_GNU_SOURCE   -I$(top_srcdir)/src -I$(abs_top_builddir)/third_party/luajit/inc -I$(abs_top_builddir)/third_party/concurrency_kit/include
LDFLAGS            : -L/usr/local/lib
LIBS               : -laio -lm
prefix             : /usr/local
bindir             : ${prefix}/bin
libexecdir         : ${prefix}/libexec
mandir             : ${prefix}/share/man
datadir            : ${prefix}/share
MySQL support      : yes
PostgreSQL support : no
LuaJIT             : bundled
LUAJIT_CFLAGS      : -I$(abs_top_builddir)/third_party/luajit/inc
LUAJIT_LIBS        : $(abs_top_builddir)/third_party/luajit/lib/libluajit-5.1.a -ldl
LUAJIT_LDFLAGS     : -rdynamic
Concurrency Kit    : bundled
CK_CFLAGS          : -I$(abs_top_builddir)/third_party/concurrency_kit/include
CK_LIBS            : $(abs_top_builddir)/third_party/concurrency_kit/lib/libck.a
configure flags    :
===========================================================================
Listing 3-2

Configuring Sysbench for compilation and installation

The end of the configuration shows the options that will be used for the compilation. Make sure that MySQL support says yes. The default is to install in /usr/local. You can change that using the --prefix option when executing configure, for example, ./configure --prefix=/home/myuser/sysbench.

The next step is to compile the code which is done using the make command :
shell$ make -j
Making all in third_party/luajit
...
make[1]: Nothing to be done for `all-am'.
make[1]: Leaving directory `/home/myuser/git/sysbench'

The -j option tells make to compile the source in parallel which can reduce the compilation time. However, Sysbench is in all cases quick to compile, so it is not of great importance in this case.

The final step is to install the compiled version of Sysbench:
shell$ sudo make install
Making install in third_party/luajit
...
make[2]: Leaving directory `/home/myuser/git/sysbench'
make[1]: Leaving directory `/home/myuser/git/sysbench'

That is it. You are now ready to use Sysbench to perform benchmark.

Executing Benchmarks

Sysbench includes several benchmarks that are ready to use. This ranges from non-database built-in tests to various database tests. The non-database tests are considered built-in as they are defined within the Sysbench source code itself. The other tests are defined in Lua scripts and are installed in the /usr/local/share/sysbench/ directory (assuming you installed into the default location).

Note

This and the next section assume you have a MySQL instance available for testing on the same host as where you have installed Sysbench. If that is not the case, you need to adjust the hostnames as necessary.

You can get general help to understand the Sysbench arguments by invoking sysbench with the --help argument:
shell$ sysbench –help
...
Compiled-in tests:
  fileio - File I/O test
  cpu - CPU performance test
  memory - Memory functions speed test
  threads - Threads subsystem performance test
  mutex - Mutex performance test
See 'sysbench <testname> help' for a list of options for each test.
At the bottom of the output is a list of the built-in tests and a hint on how to get more information about a given test. You can get a list of the additional tests by listing the files in the shared directory:
shell$ ls /usr/local/share/sysbench/
bulk_insert.lua        oltp_update_index.lua
oltp_common.lua        oltp_update_non_index.lua
oltp_delete.lua        oltp_write_only.lua
oltp_insert.lua        select_random_points.lua
oltp_point_select.lua  select_random_ranges.lua
oltp_read_only.lua     tests
oltp_read_write.lua

The files with the .lua extension except oltp_common.lua (shared code for the OLTP tests) are the tests available. The Lua language1 is a lightweight programming language that is often used for embedding code into programs. Working with Lua programs is similar to working with scripting languages such as Python except your code gets executed through another program (Sysbench in this case).

As mentioned, you can get additional help about the tests by providing the name of the test and the help command. For example, to get additional information about the test defined in oltp_read_only.lua , you can use the help command as shown in Listing 3-3.
shell$ sysbench oltp_read_only help
sysbench 1.1.0-74f3b6b (using bundled LuaJIT 2.1.0-beta3)
oltp_read_only options:
  --auto_inc[=on|off]           Use AUTO_INCREMENT column as Primary Key (for MySQL), or its alternatives in other DBMS. When disabled, use client-generated IDs [on]
  --create_secondary[=on|off]   Create a secondary index in addition to the PRIMARY KEY [on]
  --create_table_options=STRING Extra CREATE TABLE options []
  --delete_inserts=N            Number of DELETE/INSERT combinations per transaction [1]
  --distinct_ranges=N           Number of SELECT DISTINCT queries per transaction [1]
  --index_updates=N             Number of UPDATE index queries per transaction [1]
  --mysql_storage_engine=STRING Storage engine, if MySQL is used [innodb]
  --non_index_updates=N         Number of UPDATE non-index queries per transaction [1]
  --order_ranges=N              Number of SELECT ORDER BY queries per transaction [1]
  --pgsql_variant=STRING        Use this PostgreSQL variant when running with the PostgreSQL driver. The only currently supported variant is 'redshift'. When enabled, create_secondary is automatically disabled, and delete_inserts is set to 0
  --point_selects=N             Number of point SELECT queries per transaction [10]
  --range_selects[=on|off]      Enable/disable all range SELECT queries [on]
  --range_size=N                Range size for range SELECT queries [100]
  --reconnect=N                 Reconnect after every N events. The default (0) is to not reconnect [0]
  --secondary[=on|off]          Use a secondary index in place of the PRIMARY KEY [off]
  --simple_ranges=N             Number of simple range SELECT queries per transaction [1]
  --skip_trx[=on|off]           Don't start explicit transactions and execute all queries in the AUTOCOMMIT mode [off]
  --sum_ranges=N                Number of SELECT SUM() queries per transaction [1]
  --table_size=N                Number of rows per table [10000]
  --tables=N                    Number of tables [1]
Listing 3-3

Obtaining help for the oltp_read_only test

The values in square brackets are the default values.

The help command is just one of several commands available (some tests may not implement all of the commands). The other commands cover the phases of a benchmark test:
  • prepare: Executes the steps required to set up the test, for example, by creating and populating the tables needed by the test.

  • warmup: Ensures the buffers and caches are warm, for example, that table data and indexes have been loaded into the InnoDB buffer pool. This is special for the OLTP benchmarks.

  • run: Executes the test itself. This command is provided by all tests.

  • cleanup: Removes any tables used by the test.

As an example, consider the read-only OLTP test that you retrieved the help for before. First, create a MySQL user that can execute the required queries. The default is to use the sbtest schema for the benchmark, so a simple solution is to create a user with all privileges on this schema:
mysql> CREATE USER sbtest@localhost IDENTIFIED BY 'password';
Query OK, 0 rows affected (0.02 sec)
mysql> GRANT ALL ON sbtest.* TO sbtest@localhost;
Query OK, 0 rows affected (0.01 sec)
mysql> CREATE SCHEMA sbtest;
Query OK, 1 row affected (0.01 sec)

In this case, the user is expected to connect from localhost. In general, that will not be the case, so you need to change the hostname part of the account to reflect where the Sysbench user is connecting from. The username was chosen as sbtest as that is the default used by Sysbench. The sbtest schema is also created as the Sysbench tests require it to exist when first connecting.

Note

It is strongly recommended to choose a strong password for the account.

If you want to execute a benchmark that uses four tables each with 20000 rows, then you can prepare that test like it is shown in Listing 3-4.
shell$ sysbench oltp_read_only \
         --mysql-host=127.0.0.1 \
         --mysql-port=3306 \
         --mysql-user=sbtest \
         --mysql-password=password \
         --mysql-ssl=REQUIRED \
         --mysql-db=sbtest \
         --table_size=20000 \
         --tables=4 \
         --threads=4 \
         prepare
sysbench 1.1.0-74f3b6b (using bundled LuaJIT 2.1.0-beta3)
Initializing worker threads...
Creating table 'sbtest1'...
Creating table 'sbtest3'...
Creating table 'sbtest4'...
Creating table 'sbtest2'...
Inserting 20000 records into 'sbtest2'
Inserting 20000 records into 'sbtest3'
Inserting 20000 records into 'sbtest1'
Inserting 20000 records into 'sbtest4'
Creating a secondary index on 'sbtest3'...
Creating a secondary index on 'sbtest2'...
Creating a secondary index on 'sbtest4'...
Creating a secondary index on 'sbtest1'...
Listing 3-4

Preparing the test

This creates the four tables as sbtest1, sbtest2, sbtest3, and sbtest4 using four threads. The prepare step will be quick in this case as the tables are small; however, if you perform benchmarks using large tables, it can take a significant amount of time to set up the test. As benchmark testing typically involves executing a range of tests, you can speed up the testing by creating a binary backup (copying the tables, either with MySQL shut down or using a tool such as MySQL Enterprise Backup) or a file system snapshot. For each subsequent test, you can restore the backup instead of recreating the tables.

Optionally, you can as the next step go through a warmup phase as shown in Listing 3-5.
shell$ sysbench oltp_read_only \
         --mysql-host=127.0.0.1 \
         --mysql-port=3306 \
         --mysql-user=sbtest \
         --mysql-password=password \
         --mysql-ssl=REQUIRED \
         --mysql-db=sbtest \
         --table_size=20000 \
         --tables=4 \
         --threads=4 \
         warmup
sysbench 1.1.0-74f3b6b (using bundled LuaJIT 2.1.0-beta3)
Initializing worker threads...
Preloading table sbtest3
Preloading table sbtest1
Preloading table sbtest2
Preloading table sbtest4
Listing 3-5

Warming MySQL up for the test

Here it is important that you include the --tables and --table-size options as otherwise only the default number of rows (10,000) of the sbtest1 table will be preloaded. The preloading consists of averaging the id column and a simple SELECT COUNT(*) query with the rows fetched in a subquery (the queries have been reformatted):
SELECT AVG(id)
  FROM (SELECT *
          FROM sbtest1 FORCE KEY (PRIMARY)
         LIMIT 20000
       ) t
SELECT COUNT(*)
  FROM (SELECT *
          FROM sbtest1
         WHERE k LIKE '%0%'
         LIMIT 20000
       ) t

So the warmup phase may not be equivalent to running the actual benchmark for a while.

Tip

You can also use the --warmup-time=N option when executing the benchmark to disable statistics for the first N seconds.

The benchmark itself is executing using the run command . There are two options to specify the duration of the test:
  • --events=N: The maximum number of events to execute. The default is 0.

  • --time=N: The maximum duration in seconds. The default is 10.

When the value is 0 for one of the options, it means infinite. So, if you set both --events and --time to 0, the test will run forever. This can, for example, be useful, if you are not interested in the benchmark statistics themselves but want to collect monitoring metrics or want to create a workload while performing some other task.

Tip

The author of this book uses Sysbench with both the number of events and time limits set to 0 to generate a concurrent workload for tests creating backups.

If you, for example, want to execute a test for one minute (60 seconds), you can use a command like the one in Listing 3-6.
shell$ sysbench oltp_read_only \
         --mysql-host=127.0.0.1 \
         --mysql-port=3306 \
         --mysql-user=sbtest \
         --mysql-password=password \
         --mysql-ssl=REQUIRED \
         --mysql-db=sbtest \
         --table_size=20000 \
         --tables=4 \
         --time=60 \
         --threads=8 \
         run
sysbench 1.1.0-74f3b6b (using bundled LuaJIT 2.1.0-beta3)
Running the test with following options:
Number of threads: 8
Initializing random number generator from current time
Initializing worker threads...
Threads started!
SQL statistics:
    queries performed:
        read:                            766682
        write:                           0
        other:                           109526
        total:                           876208
    transactions:                        54763  (912.52 per sec.)
    queries:                             876208 (14600.36 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)
Throughput:
    events/s (eps):                      912.5224
    time elapsed:                        60.0128s
    total number of events:              54763
Latency (ms):
         min:                                    3.26
         avg:                                    8.76
         max:                                  122.43
         95th percentile:                       11.24
         sum:                               479591.29
Threads fairness:
    events (avg/stddev):           6845.3750/70.14
    execution time (avg/stddev):   59.9489/0.00
Listing 3-6

Executing a Sysbench test for one minute

Notice that unlike the prepare and warmup phases, the run command was run with eight threads. The number of threads is often one of the things that is varied in a series of tests to determine how concurrent a workload the system can sustain. It is necessary to specify the number of tables and rows that the run command should use as otherwise the default values will be used (there is no state shared between the Sysbench commands).

Once you are done with the tests, you can tell Sysbench to clean up after itself using the cleanup command as shown in Listing 3-7.
shell$ sysbench oltp_read_only \
         --mysql-host=127.0.0.1 \
         --mysql-port=3306 \
         --mysql-user=sbtest \
         --mysql-password=password \
         --mysql-ssl=REQUIRED \
         --mysql-db=sbtest \
         --tables=4 \
         cleanup
sysbench 1.1.0-74f3b6b (using bundled LuaJIT 2.1.0-beta3)
Dropping table 'sbtest1'...
Dropping table 'sbtest2'...
Dropping table 'sbtest3'...
Dropping table 'sbtest4'...
Listing 3-7

Cleaning up after a test

Notice that it is necessary to specify the number of tables; otherwise, only the first table will be dropped.

The built-in tests are great, but what makes Sysbench a really strong tool is that you can also define your own benchmarks.

Creating Custom Benchmarks

As you saw in the previous section, the database tests that are included with Sysbench are defined in Lua scripts (www.lua.org/). This means that all that you need to do to define your own tests is to create a Lua script with the definition of the test and save it in Sysbench’s shared directory. One example where this can be useful is if you want to create a test based on the specific requirements of your application either to test the effect of indexes, refactoring your application, or similar.

This section will put together a small example test script, so you can see the principles of creating your own tests. The test can also be found in sequence.lua in this book’s GitHub repository.

Tip

A great way to learn how to write your own Sysbench Lua scripts is to study the existing ones. In addition to the example in this chapter, you can look at the Lua scripts shipped with Sysbench and another relatively simple example in https://gist.github.com/utdrmac/92d00a34149565bc155cdef80b6cba12.

Overview of the Custom Script

The example benchmark test will test the performance of a sequence implemented by having a single row per sequence in a table. Such constructs are sometimes used to implement custom sequences in applications. The table definition and an example of the use of the table is shown in Listing 3-8.
mysql> SHOW CREATE TABLE sbtest.sbtest1\G
*************************** 1. row ***************************
       Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
  `id` varchar(10) NOT NULL,
  `val` bigint(20) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)
mysql> SELECT * FROM sbtest.sbtest1;
+--------+-----+
| id     | val |
+--------+-----+
| sbkey1 |   0 |
+--------+-----+
1 row in set (0.00 sec)
mysql> UPDATE sbtest1
          SET val = LAST_INSERT_ID(val+1)
        WHERE id = 'sbkey1';
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0
mysql> SELECT LAST_INSERT_ID();
+------------------+
| LAST_INSERT_ID() |
+------------------+
|                1 |
+------------------+
1 row in set (0.00 sec)
mysql> SELECT * FROM sbtest.sbtest1;
+--------+-----+
| id     | val |
+--------+-----+
| sbkey1 |   1 |
+--------+-----+
1 row in set (0.00 sec)
Listing 3-8

Using a custom sequence table

The LAST_INSERT_ID() function is used in the UPDATE statement to assign the session value for the last inserted id, so it can be fetched afterward in the SELECT statement.

The example test will have the following features:
  • Support for the prepare, run, cleanup, and help commands.

  • The prepare and run commands can be executed in parallel.

  • Support for specifying the number of tables, table size, and whether explicit transactions are used.

  • Validation that the number of rows per table is in the range 1–99999. The id column of the tables is created as a varchar(10), and the keys are prefixed with sbkey, so there can be at most five digits.

The functions that will be implemented are summarized in Figure 3-2.
Figure 3-2

Overview of the functions in the sequence test

The Prepare, Run, and Cleanup groups represent commands, and the Helpers group contains the two helper functions that will be used from multiple commands. The run and help commands are special as they always exist. The help is automatically generated based on the options that the script adds, so no special consideration is required for that. There is also a little code that is outside functions, of which the first is a sanity check and the options that the script will support.

Defining the Options

The options that the script supports are configured by adding elements to the sysbench.cmdline.options hash. This is one of the built-in features of Sysbench that you can use in your scripts. Another is sysbench.cmdline.command which is the name of the command that has been provided for the execution.

Listing 3-9 shows how you can verify that the command has been set and then add the three options that this script supports.
-- Validate that a command was provided
if sysbench.cmdline.command == nil then
    error("Command is required. Supported commands: " ..
          "prepare, run, cleanup, help")
end
-- Specify the supported options for this test
sysbench.cmdline.options = {
    skip_trx = {"Don't start explicit transactions and " ..
                "execute all queries in the AUTOCOMMIT mode",
                false},
    table_size = {"The number of rows per table. Supported " ..
                  "values: 1-99999", 1},
    tables = {"The number of tables", 1}
}
Listing 3-9

Verifying a command is specified and adding the options

The built-in error() function is used to emit the error message with a list of supported commands, if the command is not set. It is not necessary to verify whether the command is one of the supported ones as Sysbench will automatically validate that.

The options are added with an array consisting of the help text and the default value. With the definitions in this script, the generated help text becomes:
shell$ sysbench sequence help
sysbench 1.1.0-74f3b6b (using bundled LuaJIT 2.1.0-beta3)
sequence options
  --skip_trx[=on|off] Don't start explicit transactions and execute all queries in the AUTOCOMMIT mode [off]
  --table_size=N      The number of rows per table. Supported values: 1-99999 [1]
  --tables=N          The number of tables [1]

The option values are made available in the sysbench.opt hash, for example, to get the number of tables in the test, you can use sysbench.opt.tables. The hash is available globally, so you do not need to do anything before you can use it.

You are now ready to implement the three commands that the script supports. Since the run command is mandatory, it is the first one that will be discussed.

The run Command

The run command is special as it is mandatory and that it always has support for parallel execution. Unlike other commands that are implemented in a single function (optionally invoking other functions), Sysbench uses three functions for the run command. The three functions that must always exist are
  • thread_init(): This is called when Sysbench initializes the script.

  • thread_done(): This is called when Sysbench is done executing the script.

  • event(): This is where the actual test is implemented and is called once per iteration.

For this example, the thread_init() function can be kept very simple:
-- Initialize the script
-- Initialize the global variables used in the rest of the script
function thread_init()
    -- Initialize the database driver and connections
    db = sysbench.sql.driver()
    cnx = db:connect()
end

For this simple test, all the initialization that is required is to create the connection to MySQL which consists of initializing the database driver and to use that to create the connection. The driver is available from the sysbench object. By creating the connections in the thread_init() function , Sysbench can reuse the connections instead of creating a new connection for each iteration. If you want to simulate creating a new connection for each group of queries, you can also choose to do that by adding the code in the event() function and make the connection object local in the same way as it will be done later for the prepare and cleanup commands.

Similarly, the thread_done() function cleans up after the execution:
-- Clean up after the test
function thread_done()
    -- Close the connection to the database
    cnx:disconnect()
end

All that is required in this case is to close the connection which is done using the disconnect() method of the connection.

The most interesting of the three required functions is the event() function which defines what is done when executing the test. The code for the example script can be seen in Listing 3-10.
-- Called for each iteration
function event()
    -- Check the --skip_trx option which determines
    -- whether explicit transactions are required.
    if not sysbench.opt.skip_trx then
        cnx:query("BEGIN")
    end
    -- Execute the customer test
    do_increment()
    -- If necessary, commit the transaction
    if not sysbench.opt.skip_trx then
        cnx:query("COMMIT")
    end
end
Listing 3-10

The event() function

This code uses one option, the --skip_trx option. If --skip_trx is disabled, then the test relies on the auto-commit feature; otherwise, explicit BEGIN and COMMIT are executed.

Note

In the Sysbench Lua scripts, you cannot use START TRANSACTION to begin a transaction.

In this case, the event() function does not actually perform any work itself. That is deferred to the do_increment() function to show how you can add extra functions to separate out the work like in other programs. The do_increment() function together with a couple of helper functions is shown in Listing 3-11.
-- Generate the table name from the table number
function gen_table_name(table_num)
    return string.format("sbtest%d", table_num)
end
-- Generate the key from an id
function gen_key(id)
    return string.format("sbkey%d", id)
end
-- Increment the counter and fetch the new value
function do_increment()
    -- Choose a random table and id
    -- among the tables and rows available
    table_num = math.random(sysbench.opt.tables)
    table_name = gen_table_name(table_num)
    id = math.random(sysbench.opt.table_size)
    key = gen_key(id)
    query = string.format([[
UPDATE %s
   SET val = LAST_INSERT_ID(val+1)
 WHERE id = '%s']], table_name, key)
    cnx:query(query)
    cnx:query("SELECT LAST_INSERT_ID()")
end
Listing 3-11

The do_increment() and helper functions

The gen_table_name() function generates the table name based on an integer, and the gen_key() function similarly generates a key value based on an integer id. The table name and key value are used in a few other places in the script, so by splitting the logic into helper functions, you can ensure they are generated in the same way throughout the script.

The do_increment() function itself starts out generating the table name and key based on random values based on the number of tables and the number of rows in each table in the test. In a real application, you may not have such a uniform access to the sequences, in which case you can modify the logic in the script. Finally, the UPDATE and SELECT statements are executed. A possible extension of the script is to use the generated sequence number in some other query, but be careful that you do not end up doing work that is not relevant to what you are trying to benchmark.

That is all that is required for the run command. Notice that nothing was done to implement parallel execution; that is handled automatically by Sysbench unless you do not want to treat all threads the same. An example where the threads should not perform identical work is the prepare command where each thread will be responsible for its own tables.

The prepare Command

The prepare command is an example of a custom command that supports parallel execution. The top-level code for the command is implemented in the do_prepare() function which in turn uses the create_table() function to create one specific table based on the table number passed to the function. The two functions can be seen in Listing 3-12.
-- Prepare the table
-- Can be parallelized up to the number of tables
function do_prepare()
    -- The script only supports up to 99999 rows
    -- as the id column is a varchar(10) and five
    -- characters is used by 'sbkey'
    assert(sysbench.opt.table_size > 0 and
           sysbench.opt.table_size < 100000,
           "Only 1-99999 rows per table is supported.")
    -- Initialize the database driver and connection
    local db = sysbench.sql.driver()
    local cnx = db:connect()
    -- Create table based on thread id
    for i = sysbench.tid % sysbench.opt.threads + 1,
            sysbench.opt.tables,
            sysbench.opt.threads do
        create_table(cnx, i)
    end
    -- Disconnect
    cnx:disconnect()
end
-- Create the Nth table
function create_table(cnx, table_num)
    table_name = gen_table_name(table_num)
    print(string.format(
          "Creating table '%s'...", table_name))
    -- Drop the table if it exists
    query = string.format(
        "DROP TABLE IF EXISTS %s", table_name)
    cnx:query(query)
    -- Create the new table
    query = string.format([[
CREATE TABLE %s (
  id varchar(10) NOT NULL,
  val bigint unsigned NOT NULL DEFAULT 0,
  PRIMARY KEY (id)
)]], table_name)
    cnx:query(query)
    -- Insert the rows inside a transaction
    cnx:query("BEGIN")
    for i = 1, sysbench.opt.table_size, 1 do
        query = string.format([[
INSERT INTO %s (id)
VALUES ('%s')]], table_name, gen_key(i))
        cnx:query(query)
    end
    cnx:query("COMMIT")
end
Listing 3-12

The do_prepare() and create_table() functions

The first thing that is done in the do_prepare() function is to verify that the number of rows is within the range 1–99999. This is done using the assert() function where the first argument must evaluate to true; otherwise, the error message given as the second output is printed and the script exists.

The do_prepare() function is called once per thread, so the parallelization is handled for you (more about this at the end of the example), but you need to ensure that each table is only created once. That is done through the for loop where the modulus of sysbench.tid (the Sysbench thread id) with the number of threads is used to determine the table numbers handled by each thread.

The actual table creation is performed in create_table() to separate out the tasks to make it easier to maintain the script. If the table already exists, it is dropped and then created, and finally the table is populated with the number of rows requested. All rows are inserted in a single transaction to improve the performance. If you need to populate larger tables, it is worth committing after every few thousand rows, but since the maximum number of rows in this table is 99999 and the rows are very small, it is fine to keep things simple and just use a single transaction per table.

The cleanup Command

The last command that must be implemented is cleanup which is an example of a single-threaded command. The work for the command is done in the cleanup() function as shown in Listing 3-13.
-- Cleanup after the test
function cleanup()
    -- Initialize the database driver and connection
    local db = sysbench.sql.driver()
    local cnx = db:connect()
    -- Drop each table
    for i = 1, sysbench.opt.tables, 1 do
        table_name = gen_table_name(i)
        print(string.format(
              "Dropping table '%s' ...", table_name))
        query = string.format(
            "DROP TABLE IF EXISTS %s", table_name)
        cnx:query(query)
    end
    -- Disconnect
    cnx:disconnect()
end
Listing 3-13

The cleanup() function

The cleanup() function only supports serial execution, so it can just loop over the tables and drop them one by one.

This leaves a question: How does Sysbench know that the prepare command can be run on parallel, but the cleanup command cannot?

Registering Commands

By default, all commands except run execute in serial, and the function implementing the command is named the same as the command. So, for the prepare command, it is necessary to set the prepare object in the script to point to the do_prepare() function with an additional argument that do_prepare() should be called once per thread:
-- Specify the actions other than run that support
-- execution in parallel.
-- (Other supported actions are found based on the
-- function name except 'help' that is built-in.)
sysbench.cmdline.commands = {
    prepare = {do_prepare, sysbench.cmdline.PARALLEL_COMMAND}
}

The sysbench.cmdline.PARALLEL_COMMAND constant is built-in and specifies that the command should be executed in parallel. It is important that this code is after the definition of do_prepare() as otherwise a nil value is assigned. In practice, it is convenient to add the code at the end of the script.

That concludes the script. You can now use it in the same way as the tests that ship with Sysbench provided that you have copied it into the shared Sysbench directory (/usr/local/share/sysbench/ with the default installation directory when you compile Sysbench yourself). Assuming you have saved the script as sequence.lua, an example use of the script is shown – without output – in Listing 3-14.
shell$ sysbench sequence \
         --mysql-host=127.0.0.1 \
         --mysql-port=3306 \
         --mysql-user=sbtest \
         --mysql-password=password \
         --mysql-ssl=REQUIRED \
         --mysql-db=sbtest \
         --table_size=10 \
         --tables=4 \
         --threads=4 \
         prepare
shell$ sysbench sequence \
         --mysql-host=127.0.0.1 \
         --mysql-port=3306 \
         --mysql-user=sbtest \
         --mysql-password=password \
         --mysql-ssl=REQUIRED \
         --mysql-db=sbtest \
         --table_size=10 \
         --tables=4 \
         --time=60 \
         --threads=8 \
         run
shell$ sysbench sequence \
         --mysql-host=127.0.0.1 \
         --mysql-port=3306 \
         --mysql-user=sbtest \
         --mysql-password=password \
         --mysql-ssl=REQUIRED \
         --mysql-db=sbtest \
         --tables=4 \
         cleanup
Listing 3-14

Example commands for the sequence test

Note that as for the oltp_read_only test, the sbtest schema must exist before executing the prepare command. It is left as an exercise to the reader to try the script with different values for --threads, --tables, --table_size, and --skip_trx.

Summary

This chapter has discussed how benchmarks can be used with MySQL. First, some general best practices using benchmarks were discussed. The single most important things are that you have determined what to benchmark and what is considered the success criteria. This is not all that different from performance tuning in general. It is also important that you understand the tests the benchmark performs and what the result means. Often, you will need to collect additional metrics either through your normal monitoring solution or through specialized scripts to determine whether the benchmark was a success.

Next, the standard TPC benchmarks were covered. The TPC-C and TPC-E benchmarks are good for testing OLTP workloads with TPC-C being the most used as it is the oldest, but TPC-E being the most realistic for modern applications. The TPC-H and TPC-DS use complex queries that can be great, for example, to explore changes that can affect the query plan.

While you can choose to implement a benchmark from scratch yourself, it is more likely you will use a preexisting benchmark tool. The most commonly used tool with MySQL is Sysbench which was covered in some detail. First, Sysbench was installed by compiling it. Then it was shown how the standard Sysbench benchmarks can be executed. The real strength of Sysbench, though, is that you can define your own custom tests. A simple example was shown in the last section.

In the same way as it is not always possible to use a real-world benchmark, it is not always possible to use read-world data for general testing. The next chapter explores some generic data sets often used with MySQL and of which several are also used in this book.