7. The Information Schema
When you need to optimize a query, it is common that you need information about the schema, indexes, and the like. In that case, the Information Schema is a good resource of data. This chapter introduces the Information Schema together with an overview of the views it contains. The Information Schema is used on several occasions in the rest of the book.
What Is the Information Schema?
The Information Schema is a schema common to several of the relational databases including MySQL where it was added in MySQL 5.0. MySQL mostly follows the SQL:2003 standard for F021 Basic information schema with the changes necessary to reflect the unique features of MySQL and with additional views that are not part of the standard.
The Information Schema is virtual in the sense that no data is stored in it. For this reason, this chapter refers to all views and tables as views even if a SHOW CREATE TABLE displays it as if it was a regular table. This is also in line with the information_schema.TABLES view that has the table type set to SYSTEM VIEW for all the objects.
After the introduction of the Performance Schema in MySQL 5.5, the aim is to make relatively static data such as schema information available through the Information Schema and more volatile data belonging to the Performance Schema. That said, it is not always clear-cut what belongs where, for example, index statistics are relatively volatile, but are also part of the schema information. There is also some information such as the InnoDB metrics that for historical reasons still reside in the Information Schema.
As such, you can consider the Information Schema a collection of data describing the MySQL instance. In MySQL 8 with the relational data dictionary, several of the views are simple SQL views on the underlying data dictionary tables. This means that the performance of many Information Schema queries in MySQL 8 will be vastly superior to what you may have experienced in older versions. This is particularly the case when querying schema data that does not require retrieving information from the storage engine.
If you are still using MySQL 5.7 or earlier, be careful with queries against views such as the TABLES and COLUMNS views in the Information Schema. They can take a long time if the tables they contain data for are not yet in the table definition cache or if the cache is not large enough to hold all tables. An example of the performance difference of the Information Schema between MySQL 5.7 and 8 is discussed in a blog by the MySQL Server team: https://mysqlserverteam.com/mysql-8-0-scaling-and-performance-of-information_schema/.
The Information Schema is a virtual database, and the access to the views works a little different from other tables. All users will see that the information_schema schema exists, and they will see all views. However, the result of querying the views depends on the privileges assigned to the account. For example, an account that has no other privileges than the global USAGE privilege will only see the Information Schema views when querying the information_schema.TABLES view.
Now, it is time to look at what kind of information you can find in the Information Schema views.
The data that is available in the Information Schema ranges from high-level information about the system to low-level InnoDB metrics. This section provides an overview of the views but will not go into detail as the most important of the views from a performance tuning perspective are discussed in the relevant parts of later chapters.
Some plugins add their own views to the Information Schema. The extra plugin views are not considered here.
The highest level of information that is available in the Information Schema concerns the whole MySQL instance. This includes such information as which character sets are available and which plugins are installed.
Information Schema views with system information
The character sets available.
The collations available for each character set. This includes the id of the collation which in some cases (e.g., in the binary log) is used to uniquely specify both the collation and character set.
The mapping of collations to character sets (the same as the first two columns of COLLATIONS).
The storage engines that are known and whether they are loaded.
A list of the default stopwords that are used when creating a full text index on an InnoDB table.
A list of the keywords in MySQL and whether the keyword is reserved.
The plugins known to MySQL including the status.
The resource groups that are used by threads to do their work. A resource group specifies the priority of a thread and which CPUs it can use.
A list of the spatial reference systems including the SRS_ID column which contains the id used to specify a reference system for spatial columns.
The system-related views largely work as reference views with the RESOURCE_GROUPS table being somewhat a difference as it is possible to add resource groups as it will be discussed in Chapter 17.
The query uses the COLUMNS view to find all column names except for the system schemas (you can choose to include those, if you use those in your application or in scripts). The COLUMNS view is one of several views describing the schema objects.
Information Schema views with schema information
This view contains information about the CHECK constraints and is available in MySQL 8.0.16 and later.
The definition of histograms including the statistics. This is a very useful view for query performance tuning.
The column definitions.
The definitions of the stored events.
Information about InnoDB tablespace files.
Metadata information for columns in InnoDB tables.
This view links the InnoDB tablespace ids to the file system paths.
Metadata for columns included in InnoDB indexes.
Metadata for the InnoDB foreign keys.
Lists the child and parent columns of InnoDB foreign keys.
A snapshot of the INNODB_FT_DELETED view during an OPTIMIZE TABLE statement for the InnoDB table specified in the innodb_ft_aux_table option .
Configuration information for full text indexes on the InnoDB table specified in the innodb_ft_aux_table option.
Rows that have been deleted from full text indexes for the InnoDB table specified in the innodb_ft_aux_table option. InnoDB uses this extra list for performance reasons to avoid having to update the index itself for each DML statement.
Newly inserted rows into the full text indexes for the InnoDB table specified in the innodb_ft_aux_table option. InnoDB uses this extra list for performance reasons to avoid having to update the index itself for each DML statement.
The inverted full text index for the InnoDB table specified in the innodb_ft_aux_table option.
Information about indexes on InnoDB tables. This includes internal information such as the page number of the root page and the merge threshold.
Metadata for the InnoDB tables.
Metadata for the InnoDB tablespaces.
This view combines the SPACE, NAME, FLAG, and SPACE_TYPE columns from INNODB_TABLESPACES with the PATH column from INNODB_DATAFILES to provide a summary of the InnoDB tablespace.
Table statistics for InnoDB tables. Some of these statistics are updated at the same time as index statistics; others are maintained at an ongoing basis.
Metadata for InnoDB temporary tables (both internal and explicit).
Internal metadata information about virtual generated columns on InnoDB tables.
Information about the primary keys, unique keys, and foreign keys.
Information about the parameters for stored functions and stored procedures.
Information about table partitions.
Information about foreign keys.
The definition of stored functions and stored procedures.
Information about the schemas (databases). (Schemata is technically the correct word for the plural form of schema, but most use schemas nowadays.)
Information about columns with a spatial data type.
The index definitions and statistics. This is one of the most useful views when it comes to query performance turning.
Summary of the primary, unique, and foreign keys and CHECK constraints.
Information about tables and views and their properties.
This view is only used for NDB Cluster tablespaces.
The trigger definitions.
Lists the stored functions used in views. This table was added in 8.0.13.
Lists the tables referenced by views. This view was added in 8.0.13.
The view definitions.
TABLE_NAME: Used in the views not specific to InnoDB for the table name.
TABLE_SCHEMA: Used in the views not specific to InnoDB for the schema name.
COLUMN_NAME: Used in the views not specific to InnoDB for the column name.
SPACE: Used in the InnoDB-specific views for the tablespace id.
TABLE_ID: Used in the InnoDB-specific views to uniquely identify the table. This is also used internally in InnoDB.
NAME: The InnoDB-specific views use a column called NAME to give the name of the object irrespective of the object type.
The fk_film_actor_film constraint references the film_id column in the film table. You can use this as the starting point for finding the full chain of foreign keys either by manually executing the query for each table returned in the query against the KEY_COLUMN_USAGE view or by creating a recursive common table expression (CTE). This is left as an exercise for the reader.
For an example where the KEY_COLUMN_USAGE view is used in a recursive common table expression to find the chain of foreign key dependencies, see https://mysql.wisborg.dk/tracking-foreign-keys.
The diagram is created using the reverse engineering feature of MySQL Workbench.
The views with information specific to InnoDB use the SPACE and TABLE_ID to identify the tablespace and table. Each tablespace has a unique id with ranges reserved for different tablespace types. For example, the data dictionary tablespace file (<datadir>/mysql.ibd) has space id 4294967294, the temporary tablespace has id 4294967293, undo log tablespaces start with 4294967279 and decrement, and user tablespaces start at 1.
The values in the INNODB_FT_CONFIG view may differ for you.
InnoDB also includes views with information that relates to performance. These will be discussed together with a few other performance-related tables.
Information Schema views with performance-related information
A list of the pages in the InnoDB buffer pool which can be used to determine which tables and indexes are currently cached.
Warning: There is a high overhead of querying this table particularly for large buffer pools and many tables and indexes. It is best used on test systems.
Information about the pages in the InnoDB buffer pool and how they are ordered in the least recently used (LRU) list.
Warning: There is a high overhead of querying this table particularly for large buffer pools and many tables and indexes. It is best used on test systems.
Statistics about the usage of the InnoDB buffer pool. The information is similar to what can be found in the SHOW ENGINE INNODB STATUS output in the BUFFER POOL AND MEMORY section. This is one of the most useful views.
A summary of the number of index pages cached in the InnoDB buffer pool for each index.
Statistics about operations related to compressed InnoDB tables.
The same as INNODB_CMP but grouped by the index.
Statistics about compressed pages in the InnoDB buffer pool.
Similar to the global status variables but specific to InnoDB.
Metadata including the connection id, file path, and size for InnoDB temporary tablespace files (each session gets its own file in MySQL 8.0.13 and later). It can be used to link a session to a tablespace file which is very useful if you notice one file becoming large. The view was added in 8.0.13.
Information about InnoDB transactions.
When the optimizer trace is enabled, the trace can be queried from this view.
The same as SHOW PROCESSLIST.
When profiling is enabled, the profiling statistics can be queried from this view. This is deprecated, and it is recommended to use the Performance Schema instead.
For the views with information about InnoDB compressed tables, the table with _RESET as the suffix returns the operation and timing statistics as deltas since the last time the view was queried.
innodb_monitor_disable: Disable one or more metrics.
innodb_monitor_enable: Enable one or more metrics.
innodb_monitor_reset: Reset the counter for one or more metrics.
innodb_monitor_reset_all: Reset all statistics including the counter, minimum, and maximum values for one or more metrics.
Using the INNODB_METRICS view
First, the metrics are enabled using the innodb_monitor_enable variable; then the values are retrieved. In addition to the values shown, there is also a set of columns with the _RESET suffix which are reset when you set the innodb_monitor_reset (only the counter) or innodb_monitor_reset_all system variable. Finally, the metrics are disabled again.
The metrics have varying overheads, so you are recommended to test with your workload before enabling metrics in production.
The InnoDB metrics are also included in the sys.metrics view together with the global status variables and a few other metrics and when the metrics are retrieved.
The remaining Information Schema views contain information about privileges.
MySQL uses privileges assigned to the accounts to determine which accounts can access which schemas, tables, and columns. The common way to determine the privileges for a given account is to use the SHOW GRANTS statement, but the Information Schema also includes views that allow you to query the privileges.
Information Schema tables with privilege information
The global privileges.
Privileges to access schemas.
Privileges to access tables.
Privileges to access columns.
Using the Information Schema privilege views
Notice how the single quotes around the username and hostname are escaped by doubling the quotes.
While the views with the privilege information are not directly usable for performance tuning, they are very useful for maintaining a stable system as you can use them to easily identify whether any accounts have privileges that they do not need.
It is best practice to limit accounts to have just the privileges they need and no more. That is one of the steps to keep the system secure.
The last topic to consider about the Information Schema is how data related to index statistics are cached.
Caching of Index Statistics Data
One thing that is important to understand is where the information in the index statistics–related views (and the equivalent SHOW statements) comes from. Most of the data comes from the MySQL data dictionary. In MySQL 8, the data dictionary is stored in an InnoDB table, so the views are just normal SQL views on top of the data dictionary. (You can, for example, try to execute SHOW CREATE VIEW information_schema.STATISTICS to get the definition of the STATISTICS view.)
The index statistics themselves are however still originating from the storage engine layer, so it is relatively expensive to query those. To improve the performance, the statistics are cached in the data dictionary. You can control how old the statistics are allowed to be before MySQL refreshes the cache. This is done with the information_schema_stats_expiry variable which defaults to 86400 seconds (one day). If you set the value to 0, you will always get the latest values available from the storage engine; this is the equivalent of the MySQL 5.7 behavior. The variable can be set both at the global and session scopes, so you can set it to 0 for the session, if you are investigating an issue where it is important to see the current statistics, for example, if the optimizer is not using the index you expect.
Use the information_schema_stats_expiry variable to control how long index statistics can be cached in the data dictionary. This is only for displaying purposes – the optimizer always uses the latest statistics. Setting information_schema_stats_expiry to 0 to disable caching can, for example, be useful when investigating an issue with the wrong index being used by the optimizer. You can change the value both at the global and session scopes as needed.
Columns affected by information_schema_stats_expiry
The estimate for the number of unique values for the part of the index up and including to the column in the same row.
The next value for the auto-increment counter for the table.
The estimated data length divided with the estimated number of rows.
The table checksum. It is not used by InnoDB, so the value is NULL.
When the table was last checked (CHECK TABLE). For partitioned tables, InnoDB always returns NULL.
When the table was created.
An estimate of the amount of free space in the tablespace the table belongs to. For InnoDB, this is the size of completely free extents minus a safety margin.
The estimated size of the row data. For InnoDB, it is the size of the clustered index, which is found as the number of pages in the clustered index multiplied with the page size.
The estimated size of secondary indexes. For InnoDB, this is the sum of pages in non-clustered indexes times the page size.
The maximum allowed size of the data length. It is not used by InnoDB, so the value is NULL.
The estimated number of rows. For InnoDB tables, this comes from the cardinality of the primary key or clustered index.
When the tablespace file was last updated. For tables in the InnoDB system tablespace, the value is NULL. As data is written to the tablespace asynchronously, the time will not in general reflect the time of the last statement changing the data.
You can force an update of this data for a given table by executing ANALYZE TABLE for the table.
When the cached data has not yet expired, that is, it was refreshed less than information_schema_stats_expiry seconds ago
When information_schema_stats_expiry is set to 0
When MySQL or InnoDB is running in a read-only mode, that is, when one of the modes, read_only, super_read_only, transaction_read_only, or innodb_read_only, is enabled.
When the query also includes data from the Performance Schema
This chapter introduced the Information Schema by first discussing what the Information Schema is and how the user privileges work. The remainder of the chapter walked through the standard views and how caching works. The Information Schema views can be grouped by the type of information they contain: system, schema, performance, and privilege information.
The system information includes the character sets and collations, resource groups, keywords, and information related to spatial data. This is useful as an alternative to using the reference manual.
The schema information is the largest group of views and includes all the information available from schema data down to columns, indexes, and constraints. These views together with the performance views that have information such as metrics and InnoDB buffer pool statistics are the most commonly used views in performance tuning. The privilege-related views are not so often used for performance tuning, but they are very useful to help maintain a stable system.
A common shortcut to obtain information from the Information Schema views is to use a SHOW statement . These will be discussed in the next chapter.