Optimize MySQL

When MySQL uses indexes

  • Using >, >=, =, <, <=, IF NULL and BETWEEN on a key.
    • SELECT * FROM tablename WHERE keypart1=1 AND key_part2 > 5;
    • SELECT * FROM tablename WHERE keypart1 IS NULL;
  • When you use a LIKE that doesn't start with a wildcard.
    • SELECT * FROM tablename WHERE keypart1 LIKE 'jani%'
  • Retrieving rows from other tables when performing joins.
    • SELECT * from t1,t2 where t1.col=t2.key_part
  • Find the MAX() or MIN() value for a specific index.
    • SELECT MIN(keypart2),MAX(keypart2) FROM tablename where keypart1=10
  • ORDER BY or GROUP BY on a prefix of a key.
    • SELECT * FROM foo ORDER BY keypart1,keypart2,key_part3
  • When all columns used in the query are part of one key.
    • SELECT keypart3 FROM tablename WHERE key_part1=1

When MySQL doesn't use an index

  • Indexes are NOT used if MySQL can calculate that it will probably be faster to scan the whole table.
  • For example if key_part1 is evenly distributed between 1 and 100, it's not good to use an index in the following query:
    • SELECT * FROM tablename where keypart1 > 1 and key_part1 < 90
  • If you are using HEAP tables and you don't search on all key parts with =
  • When you use ORDER BY on a HEAP table
  • If you are not using the first key part
    • SELECT * FROM tablename WHERE keypart2=1
  • If you are using LIKE that starts with a wildcard
    • SELECT * FROM tablename WHERE keypart1 LIKE '%jani%'
  • When you search on one index and do an ORDER BY on another
    • SELECT * from tablename WHERE keypart1 = # ORDER BY key2

Learn to use EXPLAIN

Use EXPLAIN on every query that you think is too slow!

EXPLAIN SELECT t3.date_of_action, t1.transaction_id
WHERE t2.id = t1.transaction_id 
AND t3.id = t2.group_id      
ORDER BY t3.date_of_action, t1.transaction_id;  
table type possible_keys key key_len ref rows Extra
t1 ALL NULL NULL NULL NULL 11 Using temporary; Using filesort
t2 ref id id 4 t1.transaction_id 13
t3 eq_ref PRIMARY PRIMARY 4 t2.group_id 1

Types ALL and range signal a potential problem.


Use SHOW processlist to find out what is going on:

Id User Host db Command Time State Info
6 monty localhost bp Query 15 Sending data SELECT * FROM station,station AS s1
8 monty localhost Query 0 SHOW PROCESSLIST

Use KILL in mysql or mysqladmin to kill off runaway threads.

How to find out how MySQL solves a query

Run the following commands and try to understand the output:

  • SELECT ...;

MySQL is extremely good

  • For logging.
  • When you do many connects; Connect is very fast.
  • Where you use SELECT and INSERT at the same time.
  • When you don't combine updates with selects that take a long time.
  • When most selects/updates are using unique keys.
  • When you use many tables without long conflicting locks.
  • When you have big tables (MySQL uses a very compact table format).

Things to avoid with MySQL

  • Updates to a table or INSERT on a table with deleted rows, combined with SELECTS that take a long time.
  • HAVING on things you can have in a WHERE clause.
  • JOINS without using keys or keys which are not unique enough.
  • JOINS on columns that have different column types.
  • Using HEAP tables when not using a full key match with =
  • Forgetting a WHERE clause with UPDATE or DELETE in the MySQL monitor. If you tend to do this, use the --i-am-a-dummy option to the mysq client.

Different locks in MySQL

  • Internal table locks.
  • LOCK TABLES (Works on all table types)
  • Page locks (for BDB tables)
  • ALTER TABLE also does a table lock on BDB tables.
  • LOCK TABLES gives you multiple readers on a table or one writer.
  • Normally a WRITE lock has higher priority than a READ lock to avoid starving the writers. For writers that are not important one can use the LOW_PRIORITY keyword to let the lock handler prefer readers.

Tricks to give MySQL more information to solve things better

Note that you can always comment out a MySQL feature to make the query portable:
  • Will force MySQL to make a temporary result set. As soon as the temporary set is done, all locks on the tables are released. This can help when you get a problem with table locks or when it takes a long time to transfer the result to the client.
  • To tell the optimizer that the result set will only contain a few rows.
  • To tell the optimizer that the result set will contain many rows.
  • Forces the optimizer to join the tables in the order in which they are listed in the FROM clause.
  • SELECT ... FROM tablename [USE INDEX (indexlist) | IGNORE INDEX (indexlist)] tablename2

Forces MySQL to use/ignore the listed indexes.

Example of doing transactions

How to do a transaction with MyISAM tables:

LOCK TABLES trans READ, customer WRITE;
SELECT sum(value) FROM trans WHERE customer_id=some_id;
UPDATE customer SET total_value=sum_from_previous_statement WHERE customer_id=some_id;

How to do a transaction with Berkeley DB tables:

SELECT sum(value) FROM trans WHERE customer_id=some_id;
UPDATE customer SET total_value=sum_from_previous_statement WHERE customer_id=some_id;

Note that you can often avoid transactions altogether by doing:

UPDATE customer SET value=value+new_value WHERE customer_id=some_id;  

Example of using REPLACE

REPLACE works exactly like INSERT, except that if an old record in the table has the same value as a new record on a unique index, the old record is deleted before the new record is inserted. Instead of using

SELECT 1 FROM t1 WHERE key=#    
IF found-row      
  LOCK TABLES t1      
  DELETE FROM t1 WHERE key1=#      
  INSERT INTO t1 VALUES (...)      
REPLACE INTO t1 VALUES (...)      

General tips

  • Use short primary keys. Use numbers, not strings, when joining tables.
  • When using multi-part keys, the first part should be the most-used key.
  • When in doubt, use columns with more duplicates first to get better key compression.
  • If you run the client and MySQL server on the same machine, use sockets instead of TCP/IP when connecting to MySQL (this can give you up to a 7.5 % improvement). You can do this by specifying no hostname or localhost when connecting to the MySQL server.
  • Use --skip-locking (default on some OSes) if possible. This will turn off external locking and will give better performance.
  • Use application-level hashed values instead of using long keys:
FROM table_name 
WHERE hash = MD5(concat(col1,col2)) 
AND col_1='constant' 
AND col_2='constant'  
  • Store BLOB's that you need to access as files in files. Store only the file name in the database.
  • It is faster to remove all rows than to remove a large part of the rows.
  • If SQL is not fast enough, take a look at the lower level interfaces to access the data.

Benefits of using MySQL 3.23

  • MyISAM ; Portable BIG table format
  • HEAP ; In memory tables
  • Berkeley DB ; Transactional tables from Sleepycat.
  • A lot of raised limits
  • Dynamic character sets
  • More STATUS variables.
  • CHECK and REPAIR table.
  • Faster GROUP BY and DISTINCT
  • LEFT JOIN ... IF NULL optimization.
  • CREATE TEMPORARY table_name (...)
  • Automatic conversion of temporary HEAP to MyISAM tables
  • Replication
  • mysqlhotcopy script.

Important features that we are actively working on

  • Improving transactions
  • Fail safe replication
  • Text searching
  • Delete with many tables (Updates with many tables will be done after this.)
  • Better key cache
  • Atomic RENAME (RENAME TABLE foo as fooold, foonew AS foo)
  • A query cache
  • A better GUI client