Chapter 7 – Query Execution and Optimisation – Introduction to Database Management Systems

Chapter 7






Query Execution
and Optimisation

Database operations are quite expensive in terms of the CPU time usage and the wait time for the user. Optimising them helps tremendously in terms of boosting the overall throughput.

Chapter Highlights

  • Internals of Query Execution
  • Fundamentals of Query Optimisation
  • Examples of Optimisation
  • Recommendations for Query Optimisation
  • Database Statistics

7.1 QUERY PROCESSING

The DBMS performs a number of steps while executing a query. The conceptual view of this process is shown in Fig. 7.1.

Fig. 7.1 Query execution process

Another way of depicting this is captured in Fig. 7.2.

Now, let us discuss these steps.

1. Decomposition

In this step, the SQL query written by the user/application programmer is transformed into an internal form. The SQL syntax is changed into a machine-understandable form (called internal representation). Other nomenclatures for this form are abstract syntax tree and query tree.

There are several ways to do the same thing. Optimisation means choosing the best approach of all the available ones. It is a bit like time management. Depending on how best we manage our time, we can complete no, just one, a few, or many tasks in the best possible manner.

Fig. 7.2 Query execution – Another view

Consider the following SQL query, which finds out the names of employees who work in the Admin department:

SELECT Name

FROM Employee A, Department B

WHERE A.Department_ID = B.Department_ID

AND B.Department_name = ‘Admin’

The step-by-step process of building a query tree for this query is shown in Fig. 7.3.

2. Optimisation

This is the step in which the optimiser performs a number of optimisation processes. In this step, the optimiser also transforms the query into an internal form, called canonical form. This involves improving the original query, so that it can be executed better. The optimiser also calculates the possible cost of the query. The resulting code is an optimised version of the original code.

Fig. 7.3 Process of building the query tree

3. Choosing low-level operators

In this step, decisions regarding the execution of the query are taken. They may include the possible use of indexes, physical paths of the database, and so on. Examples of low-level operations are join, projection and restriction. There are usually interdependencies among them. For instance, a selection operation may require sorting, duplication elimination and filtering. For every low-level operation, the optimiser can perform a pre-defined set of implementation procedures. Every such procedure has a cost associated with it. The cost is a function of many factors, such as CPU utilisation, disk I/O, memory requirements and communication needs. Based on current database information from the database catalog (e.g. indexes, page sizes etc), the interdependency information and the cost, the optimiser chooses one or more possible procedures. This is called access path selection.

4. Code generation and execution

In this step, the optimiser generates the query plans and chooses the cheapest among them. A query plan is a group of possible implementation procedures in which each procedure is a low-level operation. There can be multiple plans for a query and one of them is chosen depending on the cost. The final cost of a query is the sum of the costs of the individual implementation procedures. This may not always be accurate, since it is based on results that are temporary/intermediate in nature.

7.2 USING INDEXES

We know that an index provides quicker access to data. Conceptually, database index is similar to the index provided at the end of a book. We know that an index in a book contains a list of important words, arranged in ascending order, along with the page numbers on which the word found. This is shown in Fig.7.4.

Fig. 7.4 Index concept

A database index is not very different. It contains the following:

  • A list of values in the specified column (on which the index is created) – This is similar to the words in the book index shown here.
  • The row identifiers in which these values occur – This is similar to the page numbers of a book on which the word is located.

Surprisingly, indexes are not really considered while relational databases are designed. In contrast, indexes are very much a part of the logical design stage in most other file or database systems. However, in the case of relational databases, indexes can be created, modified and deleted without changing the inherent database design or application logic. The only difference that this sort of operation leads to in is the change in performance (which becomes either better or worse, depending on what is done). The idea is illustrated in Fig. 7.5.

Fig. 7.5 Index considerations

In most cases, the optimiser chooses the most effective index. Thus, the focus of the people who participate in database design and maintenance activities should be to provide a good set of indexes, and then rely on the optimiser to make use of them. This leads to the most effective use of analysis time and also provides for the best possible performance.

7.3 OPTIMISER FUNCTIONALITY

Throughout the rest of the chapter, we shall consider an Employee table and a Department table with the structures shown in Fig. 7.6.

Fig. 7.6 Sample table structures

The optimiser can use one of the two techniques shown in Fig. 7.7 to execute any given query that involves two or more columns.

Fig. 7.7 Query execution approaches

Let us discuss these approaches. To understand these approaches, we shall take the example of the following query:

SELECT *

FROM Employee

WHERE Department_number = 10

AND Salary >= 5000

7.3.1 Driver Index

In the driver index approach, the optimiser avoids reading the database table twice, for the two conditions joined by AND. That is, it performs the reading only for one of the conditions, and once a list of rows matching this condition is prepared, it applies a filtering based on the second condition. This is illustrated in Fig. 7.8.

Fig. 7.8 Driver index approach

Note that the database is read based on the index on the Department_number column. This means that the optimiser would start reading from the first instance where Department_number = 10 is found. It would read all the rows where this condition is met. Whenever it finds salary of 5000 or above in any row, it would add this row to a temporary list. After all the rows for Department_number 10 are read, the list would contain the rows that match the WHERE condition specified in the query earlier (i.e. it would contain a list of employees who work in department 10, and earn a salary of 5000 or more).

7.3.2 List Merge

This approach is much costlier than the driver index approach. Therefore, as far as possible, it should be avoided. In this approach of list merge, the optimiser does the table scan twice, that is reads the table twice. The first table scan is for the expression before the AND clause, and the second scan is for the expression after the AND clause. The algorithm used by this approach is shown in Fig. 7.9. The list of employees in department 10, and those who earn a salary of 5000 or above, are a part of List-B.

We can clearly see that the list merge approach is quite cumbersome and time-consuming. As far as possible, we should force the optimiser to disallow this approach.

Fig. 7.9 List merge approach

We can clearly see that the list merge approach is quite cumbersome and time-consuming. As far as possible, we should force the optimiser to disallow this approach.

The DBMS has a special piece of concept built-in, called as the optimiser. The optimiser decides how to deal with database operations so as to make them effective and fast.

7.4 IMPLEMENTING SELECT

Let us take a look at how some of the SELECT SQL queries are implemented and executed by the optimiser.

7.4.1 Simple SELECT

A simple SELECT statement means that the FROM clause contains just one table, and the WHERE clause also contains straightforward conditions. The options for implementing such a SELECT statement are as follows.

1. Sequential search

Sequential search, also called linear search, is the simplest approach to any computer-based search operation. Here, every row of the table is retrieved. If the column values match the selection criteria, then the row is selected in the output. No intelligent processing is involved.

Sequential search is the traditional mechanism of searching for information. It is very simple to understand, but can be very poor in performance at times. As the name says, the process of searching for information in a sequential search is sequential (or one after the other). Given a list of items, the algorithm shown in Fig. 7.10 can be used to find if a particular item exists (or does not exist) in the list.

Fig. 7.10 Sequential search

The important point in this algorithm is that we start with the first item in the list and go up to the end of the list, or until the item to be searched is found, whichever is earlier.

Table 7.1 shows the number of times this algorithm is likely to be executed, depending on the best, average and worst possible cases. We assume that the list contains N items.

Table 7.1 Sequential search: Likely number of iterations

Case Meaning Number of iterations
Best The item to be searched is the first item in the list 1
Average The item to be searched is found somewhere close to the middle of the list N/2
Worst The item to be searched is the last item in the list, or it does not exist at all N

Note that the average number of iterations (N/2) is most likely over a period of time. Thus, if we have 10,000 items in a list, on an average, it would require (10,000/2), that is, 5,000 iterations.

2. Binary search

Binary search is a vast improvement over sequential search. However, unlike sequential search binary search cannot always be used, because for it to work, the items in the list must be sorted (in ascending or descending order). In RDBMS terms, this means that there must be an index on the column on which binary search is being performed—this is a pre-requisite for binary search. Note that this condition is not at all essential in the case of a sequential search (although sequential search works equally fine with a sorted/unsorted list). If the list to be searched for a specific item is not sorted, binary search fails.

The approach employed by binary search is called divide-and-conquer. Assume that a list contains 1000 elements, which are sorted in ascending order of their values. An item x is to be searched in that list. The approach used by binary search algorithm is as shown in Fig. 7.11.

Fig. 7.11 Binary search

This is summarised in Table 7.2.

Table 7.2 Next action depending on the outcome of binary search

Outcome Next action
(a) x = Last element of Half-1 Stop processing.
(b) x > Last element of Half-1 Search for x within Half-2.
(c) x < Last element of Half-1 Search for x within Half-1.
  • Clearly, case (a) is quite simple and there is nothing to discuss. So, we shall now concentrate on case (b) and case (c).
  • In case (b), we need to look for x in Half-2. So, we can safely discard Half-1 altogether and worry only about Half-2. We shall use the same approach as before. We will now divide Half-2, made up of 500 elements, into two halves, each consisting of 250 elements. Let us call these two halves of Half-2 as Half-2-1 and Half-2-2. We now compare x with the last element of Half-2-1. This will also have three possibilities—(a), (b) or (c)—as discussed earlier. Depending on that, we can stop processing, or divide Half-2-1 or Half-2-2 into two halves, and so on. We stop when either a match for x is found, or when we split the list so much that it cannot be split any further.
  • Case (c) is similar to case (b). However, we shall describe it completely, for the sake of completeness. In case (b), we need to look for x in Half-1. So, we can safely discard Half-2 altogether and worry only about Half-1. We shall use the same approach as before and divide Half-1, made up of 500 elements, into two halves, each consisting of 250 elements. Let us call these two halves of Half-1 as Half-1-1 and Half-1-2. We now compare x with the last element of Half-1-1. This will also have three possibilities—(a), (b) or (c)—as discussed earlier. Depending on that we can stop processing, or divide Half-1-1 or Half-1-2 into two halves, and so on. We stop when either a match for x is found, or when we split the list so much that it cannot be split any further.

An example would help make this discussion clearer. Suppose we have a list of numbers arranged in the ascending order—from 1 to 10—as follows.

1  2  3  4  5  6  7  8  9  10

The more the optimised a query, the faster it executes, and the better is the overall throughout. Query optimisation can be done at two levels: (a) Leaving it to the DBMS to decide about optimisation, and (b) Taking steps to do the optimisation ourselves. It is better to use a combination of the two techniques.

Assume that we are searching for number 8 (x) in the list. The search would proceed as follows.

  1. Firstly, divide the original list into two halves, the first half containing numbers 1-5, and the second half containing numbers 6-10.
  2. Compare x (i.e. 8) with the last number of the first half (i.e. 5). Since 8 > 5, from our earlier logic, case (b) is true. This means that we should discard the first half (i.e. 1-5) and look for x in the second half (i.e. 6-10). Thus, our new list looks as follows.

    6  7  8  9  10

  3. Now, we divide the current list (i.e. 6-10) into two halves, one containing the numbers 6-7 and the other containing the numbers 8-10. We now compare x (i.e. 8) with the last element of the first half (i.e. 7). Since x (i.e. 8) is greater than this, case (b) again stands to be proved. Therefore, we again discard the first half (i.e. 6-7) and use only the second one. Accordingly, our new list now looks as follows.

    8  9  10

  4. We now divide the current list into two halves. The first half would contain 8, and the second half would contain 9 and 10. We now compare x (i.e. 8) with the last element of the first half (i.e. 8). Since a match is found, case (a) is true. So, we display an appropriate message, indicating that the element to be searched has been found, and stop processing.

Note that the binary search required just two compare operations. In contrast, a sequential search would have required 8 compare operations. Thus, binary search can be really fast. Of course, more the number of elements in the list to be searched, more effective is binary search, as compared to sequential search. For smaller lists, the overhead involved in sorting the list so as to allow binary search can be more than performing a sequential search. Thus some judicious decision-making is required in such cases to compare the advantages of sequential search versus binary search.

3. Using primary index for searching one row/record

By using a primary index, the record being searched can be directly located, without scanning through the table. This is a very simple option and we have discussed it at length on many occasions. It is shown in Fig. 7.12.

Fig. 7.12 Use of primary index to search one record

4. Using primary index to search for multiple records

This option is a variation of the earlier method. Here, relational operators such as <, >, <=, >= are used. The idea is to use the index to look up the start of a set of matching records, and then to retrieve all the preceding or following matching records, as per the comparison condition. The concept is illustrated in Fig. 7.13.

Fig. 7.13 Use of primary index to search multiple records

7.4.2 Complex SELECT Implementation

A complex SELECT statement is a conjunctive condition. This means that it contains many conditions joined with an operator such as AND or OR. The following techniques can be used to search for information in a complex SELECT statement.

1. Conjunctive index using individual index

In this approach, we assume that an index is available on an individual column. It is used, just like the approaches described earlier in the case of simple SELECT.

2. Conjunctive index using combination index

If two or more columns are involved in a query, then we may use a combined index on these columns.

7.4.3 JOIN Implementation

The JOIN condition in an SQL query can be implemented in various ways. Some of the possible techniques are described below.

1. Nested-join

The nested loop join, also called brute force, joins every row of the outer table with every row of the inner table. It then checks if the two rows satisfy the join condition. This is quite expensive, as we can imagine.

2. Single-join

In single loop join, we assume that an index exists for one of the two (or more) columns being joined together. Let us consider that the join is based on two columns, C1 and C2. We assume that an index exists on C1. Given this background, the way this retrieval works is shown in Fig.7.14.

Fig. 7.14 Single loop join concept

The conceptual view of this is shown in Fig. 7.15.

3. Sort join

In the sort join approach, it is assumed that both the columns that are being joined have an index on them. So the idea is to read both the tables using the index sequentially – thereby reading them in ascending/descending order. We then match (i.e. join) rows that have the same value in both columns. Because the tables would be read in a sorted order, this approach would also work quite fast.

Fig. 7.15 Single loop gain concept

7.4.4 PROJECT Implementation

The implementation of the projection operation is quite simple. We simply need to think of and limit the retrieval to the few columns that the query mentions. The only variation is that we may need to think of duplicate elimination if the DISTINCT SQL clause is used.

7.4.5 SET Operator Implementation

Implementation of the set operators, such as Cartesian product, union, intersection and difference can be quite expensive. They are briefly described below.

1. Cartesian product

This is the most expensive of all query operations. Assuming that we want to take the Cartesian product of two tables—A and B—the result is the combination of every row of A with every row of B, with all the attributes of A and B. Clearly, this puts lots of demands on the CPU, memory and disk space. In fact, whenever possible, one should avoid Cartesian products.

2. UNION, INTERSECT and DIFFERENCE

These operations can work on union-compatible tables. Variations of the basic list-merge technique are used here.

(a) UNION: One needs to scan and merge the tables in question concurrently. After this, whenever a row is found in the merged result, it is retained. If it is found multiple times, duplicates are rejected and just one of them is retained in the result.

(b) INTERSECT: The only difference between this and UNION is that here, a row is retained only if it is found twice in the merged list.

(c) DIFFERENCE: It is retained in the result only if there is a row from the first table and no corresponding row from the second table.

7.4.6 Aggregate Functions Implementation

Examples of aggregate functions are MIN, MAX, AVG, SUM and COUNT. Indexes are quite useful for the implementation of these functions. For instance, consider the following query:

SELECT MAX (Salary)

FROM Emp

Clearly, the last (right) entry in the index can be used here. Similarly, for the MIN function, the first (left) entry in the index can be used.

7.5 OPTIMISATION RECOMMENDATIONS

Let us now list down some of the most notable recommendations in the field of database performance optimisation.

Recommendation 1: Focus on the WHERE clause in the query.

The WHERE clause is the prime focus of the query optimiser. In fact, every column listed in the WHERE clause is a possible candidate for an index. We need to find out if we should:

  1. Create indexes on some of them, or
  2. Leave the database in its current state (without a defined index), or
  3. Delete an index that is present

We can see that Emp_number is the primary key of the table. Therefore, an index is almost inevitable on this column. However, we should also examine to see if there are frequent queries based on the other columns (e.g. WHERE Salary BETWEEN 10000 and 20000). If so, we should create an index for the same.

Recommendation 2: Use narrow indexes.

We can create an index either on one column or on multiple columns. The recommendation is to use as many single-column indexes as possible. The reason for this recommendation is that single-column indexes provides the optimiser with a number of options to choose from. The optimiser can have a number of permutations and combinations because of this. Instead, if the indexes are based on multiple columns, then the optimiser does have too many choices, but to use the indexes as they are (or to discard them altogether).

At the same time, it is worth emphasising that there should not be unnecessary indexes. We should remember that although indexes speed up query processing considerably, they could also slow down the overall processing; because they need to be updated every time there is any database change (i.e. when we execute an INSERT, UPDATE or DELETE statement). Therefore, there is a definite cost associated with maintaining indexes.

Recommendation 3: If there are two or more AND constructs in a query, put the most limiting expression first (most limiting refers to the one that returns the minimum number of rows).

We know that A AND B is true if both A and B are true. Keeping this in mind, the most effective way of evaluating the expression A AND B is to have A < B. This is because, the smaller the A, the more is the number of rows eliminated even without requiring to evaluate B. We need to evaluate B only if A is true. By choosing A as the minimum between A and B, this process can be optimised to the best possible level.

Let us consider an example. Study the queries, shown in Fig. 7.16. We are interested in finding a list of employees who are working in department number 10 and are earning a salary of Rs 5000 and above. We assume that there is an index based on both the columns referred to in the WHERE clause (i.e. Department_number and Salary).

Fig. 7.16 Coding the most limiting expression first in a query containing AND

Several simple tricks can help in optimisation. For example, in an OR operation, it is better to have the condition that is most likely to be satisfied on the left of the OR operator. On the other hand, in the case of AND, it is better to keep this condition on the right side of the AND operator.

The first query [i.e. Query (a)] would be more effective if the number of employees belonging to department number 10 is lesser than the employees earning a salary of 5000 or above. However, the second query [i.e. Query (b)] would be more effective if only a few employees earn a salary of 5000 or above, but a far greater number of employees belong to department 10.

Recommendation 4: If there are two or more OR constructs in a query, put the expression that returns the maximum number of rows, first.

We know that A OR B is true if either or both of A and B are true. Keeping this in mind, the most effective way of evaluating the expression A OR B is to have A > B. This is because, the bigger the A, the more is the number of rows eliminated even without requiring to evaluate B. We need to evaluate B only if A is false. By choosing A as the maximum between A and B, this process can be optmised to the best possible level.

We can see that this description is exactly the opposite of the description for the previous case. The major difference, of course, is that the positions of A and B are reversed when we move from AND to OR.

We will go further into this recommendation. Interested readers can relate the earlier example to this one.

Recommendation 5: In the case of a join; put the table with the minimum number of qualified rows last in the FROM clause, and the first in the join expression of the WHERE clause. Then AND the join expression with a column >= ‘ ‘ (blank) to force the optimiser to choose this expression as the driver during query evaluation.

Suppose we want to join the Employee and Department tables based on the Department_number field. The simple and commonly-used approach is shown in section (b) of Fig. 7.17. However, section (a) shows an unusual approach, which is quite interesting.

Fig. 7.17 Join recommendation

The DBMS maintains various kinds of information and statistics to help in optimisation. The DBMS decides whether to use indexes, whether to do complete table scans or not, and so on, based on a number of parameters, and figures out the best way to accomplish this.

Let us assume that the Employee table contains 5000 rows and the Department table contains 20 rows. What we have done in case (a) is to force the optimiser to go for a driver index strategy, instead of a list merge strategy as we want the optimiser to mandate Department. Department_number as the driver. The expression WHERE B.Department_Number > 0 ensures this. In other words, because we are driving the query based on this expression, only 20 rows would be returned first, corresponding to the 20 rows in the Department table. Then, the optimiser would try and find matching values only for these 20 rows in the Employee table. In contrast to this, approach (b) is a simple join. Therefore, the optimiser would join the 5000 rows of the Employee table with the 20 rows of the Department table, and then perform a filtering operation based on the WHERE clause. Clearly, approach (a) is far better.

Recommendation 6: User ORDER BY, even if it is redundant, causes the performance to improve.

Suppose we want to find a list of employees who earn a salary of 5000 or above, in the order of the employee numbers. Consider the two equivalent queries shown in Fig. 7.18.

Fig. 7.18 Using redundant ORDER BY clause

We assume the following:

  1. The Employee table is indexed on the Employee_number column.
  2. The value of Employee_number is greater than zero.
  3. There is no index on the Salary column.

In order to evaluate the query to find the list of employees earning a salary of 5000 or more, the optimiser has two options:

(a) Retrieve all rows based on Employee_number. Remove rows with Salary < 5000. Note that no sorting operation is required here as the index based on the Employee_number would be in use.

(b) Retrieve rows containing a salary value of 5000 or above. This means that the full table needs to be scanned, and the result sorted.

Option (a) is better as there is no index on the Salary column, and there would be few rows matching the criteria of Salary >= 5000. As against this, in approach (b), we must read all the rows in the Employee table, accepting only those rows where Salary >= 5000. Note that the index on the Employee_number column would not be used in this case.

Approaches (a) and (b) mentioned here correspond to approaches (a) and (b) in the figure shown earlier. Clearly, approach (a) is better.

Recommendation 7: Be aware of the fact that LIKE may not be
optimised.

Let us assume that we want to find a list of employees whose name consists of four characters, starts with character A and ends with L. The standard approach to find this information is shown in section (b) of Fig. 7.19. However, an unusual but interesting approach is shown in section (a) of the figure.

Fig. 7.19 Optimising in the case of LIKE

Note that we have limited the searching operation in case (a) by adding an extra clause in the WHERE condition. This clause filters the rows for names and then hands over control to the LIKE clause. This ensures that a lot of filtering has already been done. This does not happen in case (b) and, therefore, can cause the query to take longer time to run.

Recommendation 8: If the table is small, force the optimiser not to use an index.

Suppose there are only 1000 rows in the Employee table. We want to find a list of employees who earn a salary of more than 5000. Let us assume that we do have an index on the Salary column. Then, we can write the query to find the required list of employees in two ways, as shown in Fig. 7.20.

Fig. 7.20 Forcing the optimiser for not using an index

In case (a), because of the change in the WHERE clause, the optimiser is fooled and believes that it cannot use the index on the Salary column. This expression forces a full table scan. This is recommended, because the size of the table is small. The whole table can be read in one disk access. However, if it were not the case, then query (b) would be recommended. Query (b) would always cause at least two disk accesses to be made, one for the index portion and one for the data portion.

Recommendation 9: Use OR instead of UNION if the predicate columns are not indexed.

Consider the two queries shown in Fig. 7.21.

Fig. 7.21 OR versus UNION

We assume that for some reason, the Employee_number and Department_number columns are not indexed. Many optimisers perform optimisation only if a query contains a single WHERE clause in a single SELECT statement. In query (b), two SELECT statements are executed. Thus, two separate passes are required. In the first pass, the optimiser looks for rows matching employee number 10, and in the second pass, the optimiser looks for rows matching department number 10. It then joins these lists and performs a filtering operation. This is not the case with approach (a), which is better.

7.6 DATABASE STATISTICS

A DBMS maintains information on the various tables and other database objects, such as indexes. This information is called as database statistics. Some examples of database statistics are shown in Table 7.3.

Table 7.3 Database statistics examples

Database object Statistics maintained
Table Number of rows, Number of pages occupied on the disk
Column Number of distinct values in the column, Maximum and minimum values, If the column has an index – 10 most frequently occurring values in the column with their frequencies
Index Levels of the index, Pages in the index

Whenever the DBMS takes decisions about optimising a query, it refers to these database statistics. Of course, this is internal to the DBMS. Neither the programmer nor the end user needs to know anything about it. However, the Database Administrator (DBA) needs to monitor database statistics quite closely and take corrective actions as and when needed.

Abstract syntax tree

Brute force

Conjunctive condition

Driver index

Linear search

Nested loop join

Query plan

Sequential search

Sort join

Binary search

Canonical form

Database statistics

Internal representation

List merge

Query optimisation

Query tree

Single loop join

  • The DBMS performs a number of steps while executing a query.
  • A query goes via the DML processor, optimiser, and run-time manager.
  • The four main steps in query execution are decomposition, optimisation, choosing low-level operations and execution.
  • Usage of indexes can greatly speed up query execution.
  • Database optimiser enhances the speed of query execution.
  • There are two approaches for query execution: driver index and list merge.
  • In driver index approach, the optimiser avoids reading a table more than once.
  • In list merge, the optimiser scans a table twice.
  • The DBMS may use sequential search or binary search for implementing the SELECT operation.
  • Sequential search is a process of scanning all the rows in a table. It is quite costly.
  • Binary search is a process of scanning very few rows in a table. It is quite efficient, but it needs data to be sorted in an ascending or descending order.
  • Apart from what the optimiser does internally, the programmer/DBA can think about a number of recommendations to speed up query processing.
  • Database statistics can help determine the parameters in query execution and why it is executing slowly.

  1. Query optimisation is not important at all.
  2. Query optimisation is a complicated process.
  3. The DBMS does not have an inbuilt query optimiser.
  4. In addition to what a DBMS can do, the DBA can also perform query optimisation.
  5. Sequential search is faster than binary search.
  6. Binary search requires that the data be sorted in an ascending or descending order.
  7. There are no standard query optimisation recommendations.
  8. Driver index is faster than list merge.
  9. Indexes can greatly enhance query performance.
  10. Database statistics contain information regarding the database objects.

  1. The first step in query processing is ________.

    (a) decomposition
    (b) optimisation

    (c) choosing low level operations
    (d) execution

  2. The last step in query processing is ________.

    (a) decomposition
    (b) optimisation

    (c) choosing low level operations
    (d) execution

  3. In _______, the query is transformed into an abstract syntax tree.

    (a) decomposition
    (b) optimisation

    (c) choosing low level operations
    (d) execution

  4. The output of the _______ is that the query is transformed into a canonical form.

    (a) decomposition
    (b) optimisation

    (c) choosing low level operations
    (d) execution

  5. In the ________ approach, the table is scanned only once.

    (a) list and join
    (b) list merge

    (c) driver index
    (d) list driver

  6. In the ________ approach, the table is scanned twice.

    (a) list and join
    (b) list merge

    (c) driver index
    (d) list driver

  7. In ______, the entire table is scanned.

    (a) sequential search
    (b) index search

    (c) direct search
    (d) binary search

  8. In ______, the entire table is not scanned.

    (a) sequential search
    (b) index search

    (c) direct search
    (d) binary search

  9. A complex SELECT statement is called as a _______.

    (a) sub-query
    (b) join

    (c) filter
    (d) conjunctive condition

  1. What is query optimisation?
  2. What are the steps in query processing?
  3. What is an optimiser?
  4. How can indexes help in query optimisation?
  5. What is the driver index approach?
  6. Discuss the list merge approach.
  7. Describe the sequential search process.
  8. How does binary search work?
  9. Detail any two query optimisation recommendations.
  10. What are database statistics?

  1. Study the differences in the way a query executes in the various RDBMS products.
  2. Is it always better to leave query optimisation decisions to the DBMS? Why?
  3. If we have an AND condition in the query, what should be our strategy?
  4. If we have an OR condition in the query, what would be our strategy?
  5. Read about correlated sub-queries and think about their optimisation.
  6. Is join faster than a sub-query in general? Why?
  7. Read about a concept called hit rate. What significance does it have over query optimisation? (Hint: It is related to the number of records being processed against the number of records in a file/table).
  8. When would you not use an index?
  9. Can we specify that an index not be used in file processing? Can we do so in the case of database processing?
  10. Why are update operations costly on tables that have indexes?