SQL Optimization for MySQL & MariaDB

Ok, so you have your server configured at optimal performance, a great caching strategy, and your code has been refactored into its best working order. Now what? Time to take a dive into the slowest part of the web server chain: the database.

Normalizing Data

Breaking up our tables to reduce and eliminate data redundancy not only helps reduce the size of data storage, but can also increase performance. As well, it can also decrease performance in many situations, and at times developers have had to denormalize in order recover better read times. Your mileage may vary.

Normalization is typically broken into 5 categories. The Wikipedia links do a decent job of explaining each one:

  • 1NF (First normal form): No repeating groups of data
  • 2NF (Second normal form): 1NF, plus all of it's non-key attributes are fully dependent on its PRIMARY KEY
  • 3NF (Third normal form): 2NF, plus all of its attributes are directly dependent on the PRIMARY KEY
  • 4NF (Fourth Normal Form): 3NF, plus it must not contain more than one multivalued dependency.
  • 5NF (Fifth Normal Form): A table is 5NF if and only if every non-trivial join dependency in it is implied by the candidate keys.

Indexes

This seems like a no-brainer, but proper indexing is pretty much rule #1 for optimizing queries. Focus on rows that match conditions on a SELECT's WHERE clause. Don't create an unnecessary index, as this wastes space and time for MySQL in determining which index to use, as well as time costs with inserts, updates and deletes.

KEY and INDEX mean the same thing in MySQL.

PRIMARY and UNIQUE are similar in nature:

  • PRIMARY is a column or a set of columns that can identify a ROW. It is a special UNIQUE key. You can only have one of these per table.
  • There can be many UNIQUE indexes. An example would be a mobile phone number broken up over several columns, such as +country code ( area code ) prefix - number.

Joins vs Sub Selects

For those of us who used MySQL before the OUTER JOIN existed, there were subqueries to save the day. According to MySQL's documentation, the LEFT [OUTER] JOIN can be faster than an equivalent subquery because the server might be able to optimize it better.

Storage Engines

While there are more than two storage engines with MySQL and now MariaDB, the two most commonly used ones are MyISAM and InnoDB. A somewhat dated review by Tag1 Consulting titled "MySQL Engines: MyISAM vs. InnoDB" explains a couple of the key issues most DBAs consider important (such as reliability).

MariaDB provides XtraDB, a performance enhanced fork of InnoDB, as well as many other storage engines.

Benchmarking

MySQL's BENCHMARK() function can be used on its client to repeatedly execute an expression a set number of times to measure elapsed time on the client end.

mysqlslap is a diagnostic program used to emulate client load.

For web-based services, ab - Apache HTTP server benchmarking tool can be used to measure database performance that is included in overall web server load.

Query Optimization

MySQL's EXPLAIN and EXPLAIN EXTENDED statements can be used in conjunction with a SELECT query to get information from the optimizer about how tables are joined, in what order, etc.

MySQL's ANALYZE TABLE should be run when there's a question as to whether specific indexes are being properly used.

Profiling

We can set up server variables to profile queries without indexes, slow queries, long query times and more. Cloud hosting provider DigitalOcean has an article explaining how to set the MySQL query profiler up.

Partition Tables

If you are doing full table scans and are managing tables that are larger than 100GB in size, partitioning tables is highly recommended from MySQL.