27. Caching – MySQL 8 Query Performance Tuning: A Systematic Method for Improving Execution Speeds

© Jesper Wisborg Krogh 2020
J. W. KroghMySQL 8 Query Performance Tuninghttps://doi.org/10.1007/978-1-4842-5584-1_27

27. Caching

Jesper Wisborg Krogh1 
(1)
Hornsby, NSW, Australia
 

The cheapest queries are those you do not execute at all. This chapter investigates how you can use caching to avoid executing queries or to reduce the complexity of queries. First, it will be discussed how caching exists everywhere and how there are different types of caching. Then it is covered how you can use caching inside MySQL using cache tables and approximate values. The two next sections consider the two popular products that offer caching: Memcached and ProxySQL. Finally, some caching tips are discussed.

Caching Is Everywhere

Even if you do not think you have implemented caching, you are already using caching in several places. These caches are transparent and maintained at the hardware, operating system, or MySQL levels. The most obvious of these caches is the InnoDB buffer pool.

Figure 27-1 shows examples of how caching exists throughout the system and examples of how custom caching can be added. The picture – including the interactions – is by no means complete, but it serves to illustrate how common caching is and in how many places it can occur.
Figure 27-1

Examples of where caching can occur

In the lower-left corner, there is the CPU which has several levels of caches that cache the instructions and data used for CPU instructions. The operating system implements an I/O cache, and InnoDB has its buffer pool. All of these caches are examples of caches that return the up-to-date data.

There are also caches that may serve slightly stale data. This includes implementing cache tables in MySQL, caching query results in ProxySQL, or caching data directly in the application. In those cases, you typically define a period to consider the data fresh enough, and when it has reached a given age – time to live (TTL)  – the cache entry is invalidated. The Memcached solution is special as there are two versions of it. The regular Memcached daemon uses time to live or some application-depending logic to evict the data when it is too old; however, there is also a special MySQL version which works as a plugin and can fetch the data from the InnoDB buffer pool and write data back to the buffer pool, so the data is never stale.

It may seem wrong to use potentially out-of-date data in your application. In many cases, however, that is perfectly fine as exact data is not required. If you have an application that shows a dashboard of sales figures, how big a difference does it make if the data is current as of the time the queries executed or if they are a few minutes old? By the time the user is done reading the figures, they are likely slightly out of date anyway. The important thing is that the sales figures are consistent and get updated regularly.

Tip

Consider carefully what the requirements for your application are and remember it is easier to start out with relaxed requirements to how up to date data must be and make it more strict if needed than convincing a user that they no longer can have up-to-the-second result. If you use cached data that is not automatically updated to the latest values, you can consider storing the time when the data was current and show that to the user, so the user knows when the data was last refreshed.

The next three sections will go through more specific examples of caching starting with implementing your own caching inside MySQL.

Caching Inside MySQL

A logical place to implement caching is inside MySQL. This is particularly useful if the cached data is used together with other tables. The downside is that it still requires a roundtrip from the application to the database to query the data, and it requires executing a query. This section covers two ways to cache data in MySQL: cache tables and histogram statistics.

Cache Tables

A cache table can be used to pre-calculate data, for example, for a report or a dashboard. It is mostly useful for complex aggregations that are needed frequently.

There are several approaches to use cache tables. You can choose to create a table storing the result for the feature that it is used with. This makes it cheap to use, but also relatively inflexible as it can only be used with that one feature. Alternatively, you can create building blocks that need to be joined together, so they can be used for several features. This makes the queries a little more expensive, but you can reuse the cached data and avoid duplicating the data. It depends on your application which approach is the best, and you may end up choosing a hybrid where some tables are used on their own and others are joined together.

There are two main tactics to populate the cache tables. You can either periodically completely rebuild the tables, or you can use triggers to keep the data up to date continuously. Completely rebuilding the tables works best by creating a new copy of cache tables and at the end of the rebuild using RENAME TABLE to swap the tables around as it avoids deleting a potentially large number of rows in the transaction and it avoids fragmentation building up over time. Alternatively, you can use triggers to update the cached data as the data it depends on changes. Rebuilding the cache tables is the preferred in most cases if it is acceptable to use not completely up-to-date data as it is less error prone and the refresh is done in the background.

Tip

If you rebuild cache tables in place by deleting the existing data inside the transaction, then either disable auto-recalculation of index statistics and use ANALYZE TABLE at the end of the rebuild or enable the innodb_stats_include_delete_marked option.

A special case is a cached column that is included in a table that otherwise does not cache data. An example where a cached column is useful is to store the time, status, or id of the latest event that belongs to some group. Imagine that your application supports sending text messages and for each message you store the history such as when it was created in the application, when it was sent, and when the recipient acknowledged the message. In most cases only the latest status and when the status was reached are needed, so you may want to store that with the message record itself rather than have to query it explicitly. In that case you can use two tables to store the statuses:
CREATE TABLE message (
  message_id bigint unsigned NOT NULL auto_increment,
  message_text varchar(1024) NOT NULL,
  cached_status_time datetime(3) NOT NULL,
  cached_status_id tinyint unsigned NOT NULL,
  PRIMARY KEY (message_id)
);
CREATE TABLE message_status_history (
  message_status_id bigint unsigned NOT NULL auto_increment,
  message_id bigint unsigned NOT NULL,
  status_time datetime(3) NOT NULL,
  status_id tinyint unsigned NOT NULL,
  PRIMARY KEY (message_status_id)
);

In the real world, there may be more columns and foreign keys, but for the example this information will suffice. When the status changes for a message, a row is inserted into the message_status_history table. You can look for the latest row for a message to find the latest status, but here a business rule has been created to update the cached_status_time and cached_status_id in the message table with the latest status and time it was changed. That way, to return to the application details of the message (except when requiring the history), you only need to query the message table. You can update the cached columns through the application or a trigger, or if you do not need the cached status to be completely up to date, you can use a background job.

Tip

Use a naming scheme that makes it clear what data is cached and what is not. You can, for example, prefix cache tables and columns with cached_.

Another case that you can consider a case of caching are the histogram statistics.

Histogram Statistics

Recall from Chapter 16 how histogram statistics are statistics of the frequency each value is encountered for a column. You can take advantage of this and use the histogram statistics as a cache. It is primarily useful if there are at most 1024 unique values for the column as that is the maximum number of buckets supported, so 1024 is the maximum number of values that can be used with singleton histograms.

Listing 27-1 shows an example of using a histogram to return the number of cities in India (CountryCode = IND) in the world database.
-- Create the histogram on the CountryCode
-- column of the world.city table.
mysql> ANALYZE TABLE world.city
        UPDATE HISTOGRAM on CountryCode
          WITH 1024 BUCKETS\G
*************************** 1. row ***************************
   Table: world.city
      Op: histogram
Msg_type: status
Msg_text: Histogram statistics created for column 'CountryCode'.
1 row in set (0.5909 sec)
mysql> SELECT Bucket_Value, Frequency
         FROM (
           SELECT (Row_ID - 1) AS Bucket_Number,
                  SUBSTRING_INDEX(Bucket_Value, ':', -1)
                     AS Bucket_Value,
                  (Cumulative_Frequency
                   - LAG(Cumulative_Frequency, 1, 0)
                         OVER (ORDER BY Row_ID))
                     AS Frequency
             FROM information_schema.COLUMN_STATISTICS
                  INNER JOIN JSON_TABLE(
                     histogram->'$.buckets',
                     '$[*]' COLUMNS(
                          Row_ID FOR ORDINALITY,
                          Bucket_Value varchar(42) PATH '$[0]',
                          Cumulative_Frequency double PATH '$[1]'
                     )
                  ) buckets
            WHERE SCHEMA_NAME = 'world'
                  AND TABLE_NAME = 'city'
                  AND COLUMN_NAME = 'CountryCode'
         ) stats
        WHERE Bucket_Value = 'IND';
+--------------+---------------------+
| Bucket_Value | Frequency           |
+--------------+---------------------+
| IND          | 0.08359892130424124 |
+--------------+---------------------+
1 row in set (0.0102 sec)
mysql> SELECT TABLE_ROWS
         FROM information_schema.TABLES
        WHERE TABLE_SCHEMA = 'world'
              AND TABLE_NAME = 'city';
+------------+
| TABLE_ROWS |
+------------+
|       4188 |
+------------+
1 row in set (0.0075 sec)
mysql> SELECT 0.08359892130424124*4188;
+--------------------------+
| 0.08359892130424124*4188 |
+--------------------------+
|    350.11228242216231312 |
+--------------------------+
1 row in set (0.0023 sec)
mysql> SELECT COUNT(*)
         FROM world.city
        WHERE CountryCode = 'IND';
+----------+
| COUNT(*) |
+----------+
|      341 |
+----------+
1 row in set (0.0360 sec)
Listing 27-1

Using histograms as a cache

If you think the query against COLUMN_STATITICS looks familiar, then it is derived from the one used in Chapter 16 when listing bucket information for a singleton histogram. It is necessary to collect the histogram information in a subquery as otherwise the frequency is not calculated.

You will also need the total number of rows. You can either use the approximate value from the information_schema.TABLES view or cache the result of SELECT COUNT(*) for the table. In the example, the estimate is that the city table has 4188 rows (your estimate may be different) which together with the frequency for India suggests there are around 350 Indian cities in the table. An exact count shows that there are 341. The deviation comes from the total row count estimate (there are 4079 rows in the city table).

Using histograms as a cache is mostly useful for large tables for a column with at most 1024 unique values, particularly if there is no index on the column. This means that it does not match all that many use cases. It does however show an example of thinking outside the box – something that is very useful when you try to find caching solutions.

For more advanced caching solutions, you need to look at third-party solutions or implement your own in the application.

Memcached

Memcached is a simple but highly scalable in-memory key-value store that is popular as a caching tool. It has traditionally been mostly used with web servers but can be used by any kind of application. An advantage of Memcached is it can be distributed across multiple hosts which allows you to create a large cache.

Note

Memcached is only officially supported on Linux and Unix.

There are two ways to use Memcached with MySQL. You can use the regular standalone Memcached or you can use the MySQL InnoDB Memcached plugin. This section will show a simple example of using both. For the full Memcached documentation, see the official homepage at https://memcached.org/ and the official wiki at https://github.com/memcached/memcached/wiki.

Standalone Memcached

The standalone Memcached is the official daemon from https://memcached.org/. It allows you to use it as a distributed cache or to have the cache very close – possibly on the same host – to the application reducing the cost of querying the cache.

There are a few options to install Memcached including using the package manager of the operating system and compiling from source. The simplest is to use your package manager, for example, on Oracle Linux, Red Hat Enterprise Linux, and CentOS 7:
shell$ sudo yum install memcached libevent

The libevent package is included as memcached requires it. On Ubuntu Linux the package is called libevent-dev. You may already have libevent and/or memcached installed in which case the package manager will let you know there is nothing to do.

You start the daemon by using the memcached command. For example, to start it using all the default options
shell$ memcached

If you use it in production, you should configure systemd or whatever service manager you are using to start and stop the daemon when the operating system boots and shuts down. For testing, it is fine to just start it from the command line.

Caution

There is no security support in Memcached. Limit the cached data to nonsensitive data, and make sure that your Memcached instances are only available in the internal network and use a firewall to restrict access. One option is to deploy Memcached on the same host as your application and prevent remote connections.

You can now use Memcached by storing the data you retrieve from MySQL in the cache. There is support for Memcached in several programming languages. For this discussion, Python will be used with the pymemcache module 1 and MySQL Connector/Python. Listing 27-2 shows how to install the modules using pip. The output may look a little different depending on the exact version of Python you are using and what you have installed already, and the name of the Python command depends on your system. At the time of writing, pymemcache supports Python 2.7, 3.5, 3.6, and 3.7. The example uses Python 3.6 installed as an extra package on Oracle Linux 7.
shell$ python3 -m pip install --user pymemcache
Collecting pymemcache
  Downloading https://files.pythonhosted.org/packages/20/08/3dfe193f9a1dc60186fc40d41b7dc59f6bf2990722c3cbaf19cee36bbd93/pymemcache-2.2.2-py2.py3-none-any.whl (44kB)
     |████████████████████████████████| 51kB 3.3MB/s
Requirement already satisfied: six in /usr/local/lib/python3.6/site-packages (from pymemcache) (1.11.0)
Installing collected packages: pymemcache
Successfully installed pymemcache-2.2.2
shell$ python36 -m pip install --user mysql-connector-python
Collecting mysql-connector-python
  Downloading https://files.pythonhosted.org/packages/58/ac/a3e86e5df84b818f69ebb8c89f282efe6a15d3ad63a769314cdd00bccbbb/mysql_connector_python-8.0.17-cp36-cp36m-manylinux1_x86_64.whl (13.1MB)
     |████████████████████████████████| 13.1MB 5.6MB/s
Requirement already satisfied: protobuf>=3.0.0 in /usr/local/lib64/python3.6/site-packages (from mysql-connector-python) (3.6.1)
Requirement already satisfied: setuptools in /usr/local/lib/python3.6/site-packages (from protobuf>=3.0.0->mysql-connector-python) (39.0.1)
Requirement already satisfied: six>=1.9 in /usr/local/lib/python3.6/site-packages (from protobuf>=3.0.0->mysql-connector-python) (1.11.0)
Installing collected packages: mysql-connector-python
Successfully installed mysql-connector-python-8.0.17
Listing 27-2

Installing the Python pymemcache module

In your application you can query Memcached by a key. If the key is found, Memcached returns the value that was stored with the key, and if it is not found you need to query MySQL and store the result in the cache. Listing 27-3 shows a simple example of doing this querying the world.city table. The program can also be found in the file listing_27_3.py that is included in this book’s GitHub repository. If you want to execute the program, you need to update the connection arguments in connect_args to reflect the settings used to connect to your MySQL instance.
from pymemcache.client.base import Client
import mysql.connector
connect_args = {
    "user": "root",
    "password": "password",
    "host": "localhost",
    "port": 3306,
}
db = mysql.connector.connect(**connect_args)
cursor = db.cursor()
memcache = Client(("localhost", 11211))
sql = "SELECT CountryCode, Name FROM world.city WHERE ID = %s"
city_id = 130
city = memcache.get(str(city_id))
if city is not None:
    country_code, name = city.decode("utf-8").split("|")
    print("memcached: country: {0} - city: {1}".format(country_code, name))
else:
    cursor.execute(sql, (city_id,))
    country_code, name = cursor.fetchone()
    memcache.set(str(city_id), "|".join([country_code, name]), expire=60)
    print("MySQL: country: {0} - city: {1}".format(country_code, name))
memcache.close()
cursor.close()
db.close()
Listing 27-3

Simple Python program using memcached and MySQL

The program starts out creating a connection both to MySQL and the memcached daemon. In this case the connection parameters and the id to query are hardcoded. In a real program, you should read the connection parameters from a configuration file or similar.

Caution

Never store connection details in the application. Particularly never hardcode the password. Storing the connection details in the application is both inflexible and insecure.

The program then tries to fetch the data from Memcached; notice how the integer is converted to a string as Memcached uses strings for keys. If the key is found, the country code and name are extracted from the cached value by splitting the string at the | character. If the key is not found in the cache, the city data are fetched from MySQL and stored in the cache with the time to keep the value in the cache set to 60 seconds. Print statements are added for each case to show where the data was fetched from.

The first time you execute the program after each restart of memcached, it will end up querying MySQL:
shell$ python3 listing_27_3.py
MySQL: country: AUS - city: Sydney
On subsequent executions for up to a minute, the data will be found in the cache:
shell$ python3 listing_27_3.py
memcached: country: AUS - city: Sydney
When you are done testing Memcached, you can stop it using Ctrl+C in the session where memcached is running or by sending it a SIGTEM (15) signal, for example:
shell$ kill -s SIGTERM $(pidof memcached)

Using the Memcached directly as in this example has the advantage that you can have a pool of daemons and you can run the daemon close to the application, possibly even on the same host as the application. The disadvantage is that you must maintain the cache yourself. An alternative is to use the memcached plugin that comes from MySQL which will manage the cache for you and even automatically persist writes to the cache.

MySQL InnoDB Memcached Plugin

The InnoDB Memcached plugin was introduced in MySQL 5.6 as a way to access InnoDB data without the overhead of parsing the SQL statements. The primary use of the plugin is to let InnoDB handle the caching through its buffer pool and just use Memcached as a mechanism to query the data. Some of the nice features of using the plugin this way are that writes to the plugin are written to the underlying InnoDB table, the data is always up to date, and you can use both SQL and Memcached to access the data concurrently.

Note

Make sure you have stopped the standalone Memcached process before installing the MySQL InnoDB Memcached plugin as they by default use the same port. If you do not, you will keep connecting to the standalone process.

Before you install the MySQL memcached daemon, you must ensure that the libevent package is installed like for the standalone Memcached installation. Once you have installed libevent, you need to install the innodb_memcache schema which includes the tables that are used for the configuration. You perform the installation by sourcing the share/innodb_memcached_config.sql file that is included in the MySQL distribution. The file is relative to the MySQL base directory which you can find through the basedir system variable, for example:
mysql> SELECT @@global.basedir AS basedir;
+---------+
| basedir |
+---------+
| /usr/   |
+---------+
1 row in set (0.00 sec)

If you have installed MySQL using the RPM from https://dev.mysql.com/downloads/, the command is

mysql> SOURCE /usr/share/mysql-8.0/innodb_memcached_config.sql

Note

Be aware that this command does not work in MySQL Shell as the script includes the USE command without a semicolon which MySQL Shell does not support in scripts.

The script also creates the test.demo_test table which will be used in the rest of this discussion.

The innodb_memcache schema consists of three tables:
  • cache_policies: The configuration of the cache policies which defines how the caching should work. The default is to leave it to InnoDB. This is usually the recommended and ensures that you will never read stale data.

  • config_options: The configuration options for the plugin. This includes which separator to use when returning multiple columns for the value and the table map delimiter.

  • containers: The definition of the mapping to the InnoDB tables. You must add a mapping for all the tables you want to use with the InnoDB memcached plugin.

The containers table is the table you will use the most. By default, the table includes a mapping for the test.demo_test table:
mysql> SELECT * FROM innodb_memcache.containers\G
*************************** 1. row ***************************
                  name: aaa
             db_schema: test
              db_table: demo_test
           key_columns: c1
         value_columns: c2
                 flags: c3
            cas_column: c4
    expire_time_column: c5
unique_idx_name_on_key: PRIMARY
1 row in set (0.0007 sec)

You can use the name to reference the table defined by db_schema and db_table when querying the table. The key_columns column defines the columns in the InnoDB table that is used for the key lookup. You specify the columns you want to include in the query results in the value_columns column. If you include multiple columns, you use the separator configured in the config_options table in the row with name = separator (the default is |) to separate the column names.

The cas_column and expire_time_column columns are rarely needed and will not be discussed further here. The final column, unique_idx_name_on_key, is the name of a unique index in the table, preferably the primary key.

Tip

The detailed description of the tables and their use can be found in https://dev.mysql.com/doc/refman/en/innodb-memcached-internals.html.

You are now ready to install the plugin itself. You can do that using the INSTALL PLUGIN command (remember this does not work on Windows):
mysql> INSTALL PLUGIN daemon_memcached soname "libmemcached.so";
Query OK, 0 rows affected (0.09 sec)
This statement must be executed using the legacy MySQL protocol (by default port 3306) as the X Protocol (by default port 33060) does not allow you to install plugins. That is it – the InnoDB memcached plugin is now ready for testing. The simplest way to test it is to use the telnet client. Listing 27-4 shows a simple example specifying the container explicitly and using the default container.
shell$ telnet localhost 11211
Trying ::1...
Connected to localhost.
Escape character is '^]'.
get @@aaa.AA
VALUE @@aaa.AA 8 12
HELLO, HELLO
END
get AA
VALUE AA 8 12
HELLO, HELLO
END
Listing 27-4

Testing InnoDB memcached with telnet

To make it easier to see the two commands, an empty line has been inserted before each. The first command uses @@ to specify the container name before the key value. The second command relies on Memcached using the default container (the first entry when sorting alphabetically in ascending order by the container name). You exit telnet by pressing Ctrl+] followed by the quit command:
^]
telnet> quit
Connection closed.
The daemon uses port 11211 by default as for the standalone Memcached instance. If you want to change the port or any of the other Memcached options, you can use the daemon_memcached_option option which takes a string with the memcached options. For example, to set the port to 22222
[mysqld]
daemon_memcached_option = "-p22222"

The option can only be set in the MySQL configuration file or on the command line, so it requires a restart of MySQL to make the change take effect.

If you add new entries to the containers table or change existing entries, you will need to restart the memcached plugin to make it read the definitions again. You can do that by restarting MySQL or by uninstalling and installing the plugin:
mysql> UNINSTALL PLUGIN daemon_memcached;
Query OK, 0 rows affected (4.05 sec)
mysql> INSTALL PLUGIN daemon_memcached soname "libmemcached.so";
Query OK, 0 rows affected (0.02 sec)
In practice you will mostly be using the plugin from your application. The usage is straightforward if you are used to use Memcached. As an example, consider Listing 27-5 which shows a few Python commands using the pymemcache module . Note that the example assumes you have set the port back to 11211.
shell$ python3
Python 3.6.8 (default, May 16 2019, 05:58:38)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36.0.1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pymemcache.client.base import Client
>>> client = Client(('localhost', 11211))
>>> client.get('@@aaa.AA')
b'HELLO, HELLO'
>>> client.set('@@aaa.BB', 'Hello World')
True
>>> client.get('@@aaa.BB')
b'Hello World'
Listing 27-5

Using the InnoDB memcached plugin with Python

The interactive Python environment is used to query the test.demo_test table through the memcached plugin. After creating the connection, the existing row is queried using the get() method, and a new row is inserted using the set() method. There is no need to set a timeout in this case as the set() method ends up writing directly to InnoDB. Finally, the new row is retrieved again. Notice how simple this example is compared to the regular Memcached where you need to maintain the cache yourself.

You can verify that the new row was really inserted into the table by querying it in MySQL:
mysql> SELECT * FROM test.demo_test;
+----+--------------+----+----+----+
| c1 | c2           | c3 | c4 | c5 |
+----+--------------+----+----+----+
| AA | HELLO, HELLO |  8 |  0 |  0 |
| BB | Hello World  |  0 |  1 |  0 |
+----+--------------+----+----+----+
2 rows in set (0.0032 sec)

There is more to using the MySQL InnoDB Memcached plugin. If you plan to use it, you are encouraged to read the “InnoDB memcached Plugin” section in the reference manual at https://dev.mysql.com/doc/refman/en/innodb-memcached.html.

Another popular utility that supports caching is ProxySQL.

ProxySQL

The ProxySQL project2 is founded by René Cannaò and is an advanced proxy that supports load balancing, routing based on query rules, caching, and more. The caching feature caches based on query rules, for example, you can set that you want to cache queries with a given digest. The cache is automatically expired based on the time to live value you set for the query rule.

You download ProxySQL from https://github.com/sysown/proxysql/releases/. At the time of writing, the latest release is version 2.0.8 which is the release used in the examples.

Note

ProxySQL is only officially supported for Linux. For the full documentation including installation instructions for the supported distributions, see https://github.com/sysown/proxysql/wiki.

Listing 27-6 shows an example of installing ProxySQL 2.0.8 on Oracle Linux using the RPM from the ProxySQL GitHub repository. The installation process is similar on other Linux distributions using the package command for the distribution (but of course the output will be different depending on the package command used). After the installation has completed, ProxySQL is started.
shell$ wget https://github.com/sysown/proxysql/releases/download/v2.0.8/proxysql-2.0.8-1-centos7.x86_64.rpm
...
Length: 9340744 (8.9M) [application/octet-stream]
Saving to: 'proxysql-2.0.8-1-centos7.x86_64.rpm'
100%[===========================>] 9,340,744   2.22MB/s   in 4.0s
2019-11-24 18:41:34 (2.22 MB/s) - 'proxysql-2.0.8-1-centos7.x86_64.rpm' saved [9340744/9340744]
shell$ sudo yum install proxysql-2.0.8-1-centos7.x86_64.rpm
Loaded plugins: langpacks, ulninfo
Examining proxysql-2.0.8-1-centos7.x86_64.rpm: proxysql-2.0.8-1.x86_64
Marking proxysql-2.0.8-1-centos7.x86_64.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package proxysql.x86_64 0:2.0.8-1 will be installed
--> Finished Dependency Resolution
Dependencies Resolved
==============================================================
 Package  Arch   Version Repository                       Size
==============================================================
Installing:
 proxysql x86_64 2.0.8-1 /proxysql-2.0.8-1-centos7.x86_64  35 M
Transaction Summary
==============================================================
Install  1 Package
Total size: 35 M
Installed size: 35 M
Is this ok [y/d/N]: y
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : proxysql-2.0.8-1.x86_64                    1/1
warning: group proxysql does not exist - using root
warning: group proxysql does not exist - using root
Created symlink from /etc/systemd/system/multi-user.target.wants/proxysql.service to /etc/systemd/system/proxysql.service.
  Verifying  : proxysql-2.0.8-1.x86_64                    1/1
Installed:
  proxysql.x86_64 0:2.0.8-1
Complete!
shell$ sudo systemctl start proxysql
Listing 27-6

Installing and starting ProxySQL

You can configure ProxySQL only through its admin interface. This uses the mysql command-line client and has a familiar feel for MySQL administrators. By default, ProxySQL uses port 6032 for the administration interface, and the administrator username is admin with the password set to admin. Listing 27-7 shows an example of connecting to the administration interface and listing the schema and tables available.
shell$ mysql --host=127.0.0.1 --port=6032 \
             --user=admin --password \
             --default-character-set=utf8mb4 \
             --prompt='ProxySQL> '
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.5.30 (ProxySQL Admin Module)
Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
ProxySQL> SHOW SCHEMAS;
+-----+---------------+-------------------------------------+
| seq | name          | file                                |
+-----+---------------+-------------------------------------+
| 0   | main          |                                     |
| 2   | disk          | /var/lib/proxysql/proxysql.db       |
| 3   | stats         |                                     |
| 4   | monitor       |                                     |
| 5   | stats_history | /var/lib/proxysql/proxysql_stats.db |
+-----+---------------+-------------------------------------+
5 rows in set (0.00 sec)
ProxySQL> SHOW TABLES;
+--------------------------------------------+
| tables                                     |
+--------------------------------------------+
| global_variables                           |
| mysql_aws_aurora_hostgroups                |
| mysql_collations                           |
| mysql_galera_hostgroups                    |
| mysql_group_replication_hostgroups         |
| mysql_query_rules                          |
| mysql_query_rules_fast_routing             |
| mysql_replication_hostgroups               |
| mysql_servers                              |
| mysql_users                                |
| proxysql_servers                           |
| runtime_checksums_values                   |
| runtime_global_variables                   |
| runtime_mysql_aws_aurora_hostgroups        |
| runtime_mysql_galera_hostgroups            |
| runtime_mysql_group_replication_hostgroups |
| runtime_mysql_query_rules                  |
| runtime_mysql_query_rules_fast_routing     |
| runtime_mysql_replication_hostgroups       |
| runtime_mysql_servers                      |
| runtime_mysql_users                        |
| runtime_proxysql_servers                   |
| runtime_scheduler                          |
| scheduler                                  |
+--------------------------------------------+
24 rows in set (0.00 sec)
Listing 27-7

The administration interface

While the tables are grouped in schemas, you can access tables directly without referencing the schema. The output of SHOW TABLES shows the tables in the main schema which are the ones associated with the configuration of ProxySQL.

The configuration is a two-stage process where you first prepare the new configuration and then apply it. Applying the changes means saving them to disk if you want to persist them and to load them into the runtime threads.

The tables with the runtime_ prefix in the name are for configuration pushed to the runtime threads. One way of configuring ProxySQL is to use a SET statement similar to setting system variables in MySQL, but you can also use UPDATE statements. The first step should be to change the admin password (and optionally the administrator username) which you can do by setting the admin-admin_credentials variable as shown in Listing 27-8.
ProxySQL> SET admin-admin_credentials = 'admin:password';
Query OK, 1 row affected (0.01 sec)
ProxySQL> SAVE ADMIN VARIABLES TO DISK;
Query OK, 32 rows affected (0.02 sec)
ProxySQL> LOAD ADMIN VARIABLES TO RUNTIME;
Query OK, 0 rows affected (0.00 sec)
ProxySQL> SELECT @@admin-admin_credentials;
+---------------------------+
| @@admin-admin_credentials |
+---------------------------+
| admin:password            |
+---------------------------+
1 row in set (0.00 sec)
Listing 27-8

Setting the password for the administrator account

The value for the admin-admin_credentials option is the username and password separated by a colon. The SAVE ADMIN VARIABLES TO DISK statement persists the change, and the LOAD ADMIN VARIABLES TO RUNTIME command applies the changes to the runtime threads. It is necessary to load the variable into the runtime threads as ProxySQL keeps a copy of the variables in each thread for performance reasons. You can query the current values (whether applied or pending) as you can query system variables in MySQL.

You configure the MySQL backend instances that ProxySQL can use to direct the queries in the mysql_servers table. For this discussion, a single instance on the same host as ProxySQL will be used. Listing 27-9 shows how to add it to the list of servers that ProxySQL can route to.
ProxySQL> SHOW CREATE TABLE mysql_servers\G
*************************** 1. row ***************************
       table: mysql_servers
Create Table: CREATE TABLE mysql_servers (
    hostgroup_id INT CHECK (hostgroup_id>=0) NOT NULL DEFAULT 0,
    hostname VARCHAR NOT NULL,
    port INT CHECK (port >= 0 AND port <= 65535) NOT NULL DEFAULT 3306,
    gtid_port INT CHECK (gtid_port <> port AND gtid_port >= 0 AND gtid_port <= 65535) NOT NULL DEFAULT 0,
    status VARCHAR CHECK (UPPER(status) IN ('ONLINE','SHUNNED','OFFLINE_SOFT', 'OFFLINE_HARD')) NOT NULL DEFAULT 'ONLINE',
    weight INT CHECK (weight >= 0 AND weight <=10000000) NOT NULL DEFAULT 1,
    compression INT CHECK (compression IN(0,1)) NOT NULL DEFAULT 0,
    max_connections INT CHECK (max_connections >=0) NOT NULL DEFAULT 1000,
    max_replication_lag INT CHECK (max_replication_lag >= 0 AND max_replication_lag <= 126144000) NOT NULL DEFAULT 0,
    use_ssl INT CHECK (use_ssl IN(0,1)) NOT NULL DEFAULT 0,
    max_latency_ms INT UNSIGNED CHECK (max_latency_ms>=0) NOT NULL DEFAULT 0,
    comment VARCHAR NOT NULL DEFAULT ",
    PRIMARY KEY (hostgroup_id, hostname, port) )
1 row in set (0.01 sec)
ProxySQL> INSERT INTO mysql_servers
                      (hostname, port, use_ssl)
          VALUES ('127.0.0.1', 3306, 1);
Query OK, 1 row affected (0.01 sec)
ProxySQL> SAVE MYSQL SERVERS TO DISK;
Query OK, 0 rows affected (0.36 sec)
ProxySQL> LOAD MYSQL SERVERS TO RUNTIME;
Query OK, 0 rows affected (0.01 sec)
Listing 27-9

Adding a MySQL instance to the list of servers

The example shows how you can use SHOW CREATE TABLE to get information about the mysql_servers table. The table definition includes information about the settings you can include and allowed values. Other than the hostname, all settings have a default value. The remaining part of the listing inserts a row for the MySQL instance on localhost port 3306 with the requirement that SSL is used. The change is then persisted to disk and loaded into the runtime threads.

Note

SSL can only be used from ProxySQL to the MySQL instance, not between the client and ProxySQL.

You will also need to specify which users can use the connection. First, create a user in MySQL:
mysql> CREATE USER myuser@'127.0.0.1'
              IDENTIFIED WITH mysql_native_password
              BY 'password';
Query OK, 0 rows affected (0.0550 sec)
mysql> GRANT ALL ON world.* TO myuser@'127.0.0.1';
Query OK, 0 rows affected (0.0422 sec)
ProxySQL does not presently support the caching_sha2_password authentication plugin, which is the default in MySQL 8, when you connect using MySQL Shell (but there is support using the mysql command-line client), so you need to create the user with the mysql_native_password plugin. Then add the user in ProxySQL:
ProxySQL> INSERT INTO mysql_users
                     (username,password)
          VALUES ('myuser', 'password');
Query OK, 1 row affected (0.00 sec)
ProxySQL> SAVE MYSQL USERS TO DISK;
Query OK, 0 rows affected (0.06 sec)
ProxySQL> LOAD MYSQL USERS TO RUNTIME;
Query OK, 0 rows affected (0.00 sec)
You can now connect to MySQL through ProxySQL. The SQL interface by default uses port 6033. You connect through ProxySQL in the same way as usual except for the port number and possibly the hostname:
shell$ mysqlsh --user=myuser --password \
               --host=127.0.0.1 --port=6033 \
               --sql --table \
               -e "SELECT * FROM world.city WHERE ID = 130;"
+-----+--------+-------------+-----------------+------------+
| ID  | Name   | CountryCode | District        | Population |
+-----+--------+-------------+-----------------+------------+
| 130 | Sydney | AUS         | New South Wales |    3276207 |
+-----+--------+-------------+-----------------+------------+
ProxySQL collects statistics in a similar way to the Performance Schema. You can query the statistics in the stats_mysql_query_digest and stats_mysql_query_digest_reset tables. The difference between the two tables is that the latter only includes the digests since you queried the table the last time. For example, to get the queries ordered by their total execution time
ProxySQL> SELECT count_star, sum_time,
                 digest, digest_text
            FROM stats_mysql_query_digest_reset
           ORDER BY sum_time DESC\G
*************************** 1. row ***************************
 count_star: 1
   sum_time: 577149
     digest: 0x170E9EDDB525D570
digest_text: select @@sql_mode;
*************************** 2. row ***************************
 count_star: 1
   sum_time: 5795
     digest: 0x94656E0AA2C6D499
digest_text: SELECT * FROM world.city WHERE ID = ?
2 rows in set (0.01 sec)
If you see a query that you would like to cache the result of, you can add a query rule based on the digest of the query. Assuming you want to cache the result of querying the world.city table by ID (digest 0x94656E0AA2C6D499), you can add a rule like the following:
ProxySQL> INSERT INTO mysql_query_rules
                     (active, digest, cache_ttl, apply)
          VALUES (1, '0x94656E0AA2C6D499', 60000, 1);
Query OK, 1 row affected (0.01 sec)
ProxySQL> SAVE MYSQL QUERY RULES TO DISK;
Query OK, 0 rows affected (0.09 sec)
ProxySQL> LOAD MYSQL QUERY RULES TO RUNTIME;
Query OK, 0 rows affected (0.01 sec)

The active column specifies whether ProxySQL should take the rule into account when evaluating rules that can be used. The digest is the digest of the query you want to cache, and the cache_ttl specifies how long in milliseconds the result should be used before it is considered expired, and the result is refreshed. The time to live has been set to 60000 milliseconds (1 minute) to allow you time to execute the queries a few times before the cache gets invalidated. Setting apply to 1 means that no later rules will be evaluated when the query matches this rule.

If you execute the query a few times within a minute, you can query the cache statistics in the table stats_mysql_global to see how the cache is used. An example of the output is
ProxySQL> SELECT *
            FROM stats_mysql_global
           WHERE Variable_Name LIKE 'Query_Cache%';
+--------------------------+----------------+
| Variable_Name            | Variable_Value |
+--------------------------+----------------+
| Query_Cache_Memory_bytes | 3659           |
| Query_Cache_count_GET    | 6              |
| Query_Cache_count_GET_OK | 5              |
| Query_Cache_count_SET    | 1              |
| Query_Cache_bytes_IN     | 331            |
| Query_Cache_bytes_OUT    | 1655           |
| Query_Cache_Purged       | 0              |
| Query_Cache_Entries      | 1              |
+--------------------------+----------------+
8 rows in set (0.01 sec)

Your data will most likely be different. It shows that the cache uses 3659 bytes and there have been six queries against the cache and, in five of those cases, the result was returned from the cache. The last of the six queries required executing the query against the MySQL backend.

There are two options you can set to configure the cache. These are
  • mysql-query_cache_size_MB: The maximum size of the cache in megabytes. This is a soft limit that is used by the purging thread to decide how many queries to purge from the cache. So the memory usage may temporarily be larger than the configured size. The default is 256.

  • mysql-query_cache_stores_empty_result: Whether result sets with no rows are cached. The default is true. This can also be configured per query in the query rules table.

You change the configuration similar to how you changed the administrator password earlier. For example, to limit the query cache to 128 megabytes
ProxySQL> SET mysql-query_cache_size_MB = 128;
Query OK, 1 row affected (0.00 sec)
ProxySQL> SAVE MYSQL VARIABLES TO DISK;
Query OK, 121 rows affected (0.04 sec)
ProxySQL> LOAD MYSQL VARIABLES TO RUNTIME;
Query OK, 0 rows affected (0.00 sec)

This first prepares the configuration change, then saves it to disk, and finally loads the MySQL variables into the runtime threads.

If you want to use ProxySQL, you are encouraged to consult the wiki on the ProxySQL GitHub project at https://github.com/sysown/proxysql/wiki.

Caching Tips

If you decide to implement caching for your MySQL instances, there are a few things to take into consideration. This section investigates some general caching tips.

The most important consideration is what to cache. The example earlier in this chapter of caching the result of a single-row primary key lookup is not a good example of the type of queries that benefit the most from caching. In general, the more complex and expensive the query is and the more often the query is executed, the better a candidate the query is. One thing that can make the caching more effective is to split complex queries into smaller parts. That way you can cache the result of each part of the complex query separately which makes it more likely to be reused.

You should also take into consideration how much data the query returns. If the query returns a large result set, you may end up using all the memory you have made available for caching for a single query.

Another consideration is where to have the cache. The closer you can place the cache to the application, the more efficient it is as it reduces the time spent on network communication. The downside is that if you have multiple application instances, you will have to choose between duplicating the cache and having a remote shared cache. The exception is if you need to use the cached data with other MySQL tables. In that case, it may be better to keep the cache inside MySQL in the form of a cache table or similar.

Summary

This chapter has provided an overview of caching with MySQL. It started out describing how caching is found everywhere from inside the CPUs to dedicated caching processes. It was then discussed how you can use cache tables and histograms for caching inside MySQL.

The two main sections discussed using Memcached and ProxySQL for caching. Memcached is an in-memory key-value store that you can use from your application, or you can use the special version included with MySQL that allows you to interact directly with InnoDB. ProxySQL combines a router and caching mechanism which stores the result sets transparently according to the query rules you have defined.

Finally, a few considerations regarding caching were covered. The more often you execute a query, and the more expensive it is to execute, the more you benefit from caching. The second consideration is that the closer you can place the cache to the application, the better.

That concludes the last chapter of the journey through MySQL 8 query performance tuning. Hopefully it has been a rewarding journey, and you feel ready to use the tools and techniques in your work. Remember that the more you practice query tuning, the better you become at it. Happy query tuning.