Notes from the Maximum Velocity MySQL talk by Jay Pipes
On Tuesday, 3 April 2007, I attended a three-hour class entitled,
"Maximum Velocity MySQL" in Omaha, NE. The class was taught by
Jay Pipes, MySQL Community Relations Manager and co-author of
Pro MySQL. From the advertisement,
the class covered:
- How to diagnose MySQL database performance bottlenecks using a variety of tools.
- How to identify and FIX problem SQL code.
- Best practices for writing streamlined, efficient SQL code that takes advantage of MySQL's strengths.
- About advanced index concepts.
- How to build an effective indexing strategy.
- How to choose appropriate storage engines for different types of data.
- How to adjust MySQL server parameters to squeeze the most performance out of your application.
- How to plan for growth, in your code and in your index strategy.
- How you can receive automatic preemptive advice & code for improving performance!
Slides for this presentation are
here (680KB PDF). Below are the
notes that I scribbled during the presentation. There were enough of them
and the class was good enough that I figured I'd benefit from typing them up.
Benchmarking
Profiling
- Diagnose bottlenecks on a running system
- For OLAP (lots of reads), memory is usually the bottleneck
- For OLTP (online transaction processing), I/O is usually the bottleneck
- turn on slow query logging in
my.cf
- Despite its name, the
log_long_format variable lists any
query that doesn't use an index
-
mysqldumpslow dumps the entire DB/table by selecting 1 row
at a time
EXPLAIN command
- Prepend "
explain" before any "select Shows execution plan chosen by mysql for a query
- Type of "all" means the entire table was searched.
- Sequential reading of all data is often faster than pulling lots of
indexed rows, so an available index may be ignored.
- Type of "index"
- Selecting only columns which are in an index is faster even
when pulling all rows, becasue it can scan the much smaller
index entries rather than the much larger row data.
- Look for "
Extra: using index"
- Type of "range" (like "between", "in", and <> in "where" clause)
- Very old and mature optimizer
- Very fast run time if where column is indexed, otherwise
searches all rows
- "
Extra: using where" means all rows were retrieved,
then filtered
Schema
- If table contains lots of columns which are infrequently used, split
the less used fields into a separate table to reduce the foootprint
of each row and improve the memory cache (buffer pool) utilization.
- Updates to a table invalidate all query cache rows which reference that
table, so split static and dynamic fields into separate tables to
improve cache hit rate for static data.
- "
select count(*) from table" in innodb requires a full
table scan, but is instantaneous in MyISAM.
- "
engine=memoryinit_sql script.
-
heap_table_size sets the size of memory tables - 32MB is
default.
- To query count of innodb databases, you can create a memory table
that stores the number of records in an innodb table. This is
filled at boot time by the
init_sql script and then
updated by the app when the table is modified.
-
INFORMATION_SCHEMA is new in MySQL 5.0.
- When creating indices, you can determine their usefulness by checking
the selectability:
distinct(*) / count(*)
- Rather than storing IP addresses as 15-character strings, store them
as 4-byte integers. Use the builtin functions
INET_ATON()
and INET_NTOA() to convert IP strings to ints.
- Never group by a non-indexed column.
- Autoincrement primary keys don't lock MyISAM tables when updated
like most other updates do.
- Use the "archive" storage engine for web logs. Archive is
as fast as MyISAM, and compresses its data using gzip, but it
can't be modified once inserted. To delete old entries,
use partitioning in v5.1 to split the data into buckets, then
periodically drop the entire bucket of old rows.
SQL Coding
- Inner join on a temp table is much faster than correlated subqueries.
- Applying a built-in function on a field prevents the use of any index
that may be on that field.
- A like match with a fixed prefix (
LIKE "abc%") uses the
"range" type and is quite fast.
- The
CURRENT_DATE() function is dynamic and therefore doesn't
allow storage of results in the query cache. It's faster to
hardcode today's date as obtained from the calling language.
- Doing a postfix match (
LIKE "%abc") requires a full text
search. It's faster to create a new field storing the reversed
field using the REVERSE() function and do an index search
on that (LIKE CONCAT(REVERSE(pattern), '%')).
You can insert the calculated reversed field on the fly with triggers.
- Unlike most DB's, stored procedures in MySQL are cached per connection
thread, not globally. If the connection is kept open for lots of
requests, they work well. For web apps that close the connection
after a few queries, stored procs are inefficient.
- PHP
- Avoid persistent connections, as individual connections from PHP
to MySQL are 10-100 times faster than to other DB's (has to
do with the driver).
- The
mysqli driver is faster than the
mysql driver when using prepared statements.
- Use PDO v5.1 (not to be
confused with MySQL 5.1)
- Using prepared statements avoids SQL injection and is faster.
- Serve static web content using a light weight HTTP server
like
lighthttpd,
not a heavy weight server like Apache.
- "Smarty" is a PHP templating system
Server Tuning
- "
show status" or "show status like 'Qc%'"
shows various server stats.
- When setting server variables at startup, be aware of which variables
are global and which are per thread. Setting a huge buffer per
thread will kill the server.
- Adding more RAM is the best way to boost performance via hardware upgrades.
- MyISAM variaables
-
key_buffer_size
-
key_block_unused=0 means you've run out
-
table_cache shows simultaneous open connections to
each table
-
myisam_sort_buffer_size - large helps when building
indices, but reduce it again when done. This setting is per
connection, so you can set it dynamically after you connect,
and it'll go away when your thread dies.
Misc Info
Good references for MySQL support and tuning include:
Adding "\G" to the end of a query in the mysql command line
UI will present the output as row based rather than column based.
DailyWTF has curious things every day.
last updated 5 Apr 2007
Obi-Wan (obiwan@jedi.com)
|