Identifying the load with the help of pt-query-digest and Percona Server

Overview

Profiling, analyzing and then fixing queries is likely the most oft-repeated part of a job of a DBA and one that keeps evolving, as new features are added to the application new queries pop up that need to be analyzed and fixed. And there are not too many tools out there that can make your life easy. However, there is one such tool, pt-query-digest (from Percona Toolkit) which provides you with all the data points you need to attack the right query in the right way. But vanilla MySQL does have its limitations, it reports only a subset of stats, however if you compare that to Percona server, it reports extra stats such as information about the queries’ execution plan (which includes things like whether Query cache was used or not, if Filesort was used, whether tmp table was created in memory or on disk, if full scan was done, etc) as well as InnoDB statistics (such as IO read operations, the number of unique pages the query accessed, the length of time query waited for row locks, etc). So you can see there is a plethora of useful information reported by Percona Server. Another great thing about Percona Server is the ability to enable logging atomically, not just for new connections as in MySQL. This is very helpful for measurement as otherwise we might not catch some long running connections.

Now let’s get started.

Before We Start!

But before we start, make sure you have enabled slow query logging and set a low enough value for long_query_time. We normally use a value of long_query_time=0, because if you set it to some other value say 0.1 seconds, it will miss all queries shorter than that which well may be majority of your workload. So if you want to analyze what causes load on your server you better use a value of 0. But remember that you would only want to set it to 0, for a period of time that allows you to gather enough statistics about the queries, after that time period remember to set it back to a value greater than 0, because otherwise you can have a really large log file generated. Another thing that we normally do is set the variable log_slow_verbosity to ‘full’, this variable is available in Percona Server and allows us to log all the extra stats I mentioned above in the overview section.

So say if you were using the vanilla MySQL server, you would see an entry like this in the slow query log:

# Time: 111229  3:02:26
# User@Host: msandbox[msandbox] @ localhost []
# Query_time: 2.365434  Lock_time: 0.000000 Rows_sent: 1  Rows_examined: 655360
use test;
SET timestamp=1325145746;
select count(*) from auto_inc;

Compare that to Percona Server with log_slow_verbosity=full:

# Time: 111229  3:11:26
# User@Host: msandbox[msandbox] @ localhost []
# Thread_id: 1  Schema: test  Last_errno: 0  Killed: 0
# Query_time: 0.117904  Lock_time: 0.002886  Rows_sent: 1  Rows_examined: 655360  Rows_affected: 0  Rows_read: 655361
# Bytes_sent: 68  Tmp_tables: 0  Tmp_disk_tables: 0  Tmp_table_sizes: 0
# InnoDB_trx_id: F00
# QC_Hit: No  Full_scan: Yes  Full_join: No  Tmp_table: No  Tmp_table_on_disk: No
# Filesort: No  Filesort_on_disk: No  Merge_passes: 0
#   InnoDB_IO_r_ops: 984  InnoDB_IO_r_bytes: 16121856  InnoDB_IO_r_wait: 0.001414
#   InnoDB_rec_lock_wait: 0.000000  InnoDB_queue_wait: 0.000000
#   InnoDB_pages_distinct: 973
SET timestamp=1325146286;
select count(*) from auto_inc;

Note that logging all queries in this fashion as opposed to the general query log, enables us to have the statistics available after the query is actually executed, while no such statistics are available for queries that are logged using the general query log.

Installing pt-query-digest tool (as well as other tools from Percona Toolkit) is very easy, and is explained here at this link.

Now before we move forward, I would like to point out that the results shown in this blog post are from the queries that I have gathered from one of the Percona Server instance running on my personal Amazon micro instance.

Using pt-query-digest

Using pt-query-digest is pretty straight forward:

pt-query-digest /path/to/slow-query.log

Note that executing pt-query-digest can be pretty CPU and memory consuming, so ideally you would want to download the "slow query log" to another machine and run it there.

Analyzing pt-query-digest Output

Now let's see what output it returns. The first part of the output is an overall summary:

# 8.1s user time, 60ms system time, 26.23M rss, 62.49M vsz
# Current date: Thu Dec 29 07:09:32 2011
# Hostname: somehost.net
# Files: slow-query.log.1
# Overall: 20.08k total, 167 unique, 16.04 QPS, 0.01x concurrency ________
# Time range: 2011-12-28 18:42:47 to 19:03:39
# Attribute          total     min     max     avg     95%  stddev  median
# ============     ======= ======= ======= ======= ======= ======= =======
# Exec time             8s     1us    44ms   403us   541us     2ms    98us
# Lock time          968ms       0    11ms    48us   119us   134us    36us
# Rows sent        105.76k       0    1000    5.39    9.83   32.69       0
# Rows examine     539.46k       0  15.65k   27.52   34.95  312.56       0
# Rows affecte       1.34k       0      65    0.07       0    1.35       0
# Rows read        105.76k       0    1000    5.39    9.83   32.69       0
# Bytes sent        46.63M      11 191.38k   2.38k   6.63k  11.24k  202.40
# Merge passes           0       0       0       0       0       0       0
# Tmp tables         1.37k       0      61    0.07       0    0.91       0
# Tmp disk tbl         490       0      10    0.02       0    0.20       0
# Tmp tbl size      72.52M       0 496.09k   3.70k       0  34.01k       0
# Query size         3.50M      13   2.00k  182.86  346.17  154.34   84.10
# InnoDB:
# IO r bytes        96.00k       0  32.00k   20.86       0  816.04       0
# IO r ops               6       0       2    0.00       0    0.05       0
# IO r wait           64ms       0    26ms    13us       0   530us       0
# pages distin      28.96k       0      48    6.29   38.53   10.74    1.96
# queue wait             0       0       0       0       0       0       0
# rec lock wai           0       0       0       0       0       0       0
# Boolean:
# Filesort       4% yes,  95% no
# Filesort on    0% yes,  99% no
# Full scan      4% yes,  95% no
# QC Hit         0% yes,  99% no
# Tmp table      4% yes,  95% no
# Tmp table on   2% yes,  97% no

It tells you that a total of 20.08k queries were captured which are actually invocations of 167 different queries. Following that there are summaries of various data points such as the total query execution time and the average query execution time, the number of tmp tables created in memory vs on-disk, percentage of queries that needed full scan, InnoDB IO stats, etc. One thing I suggest here is that, you should probably give more importance to the times/values reported in the 95% (95th percentile) column as that gives us more accurate understanding. Now, for example it is shown here that every query is reading approximately 38.53 distinct InnoDB pages (meaning 616.48K of data), however, 95% of the times InnoDB r ops is 0, which means it accesses these pages in memory. What it says though is that, if this query would run on a cold MySQL instance, then it would require reading nearly 40 pages from disk.

Let's analyze next part of the output produced by pt-query-digest.

# Profile
# Rank Query ID           Response time Calls R/Call Apdx V/M   Item
# ==== ================== ============= ===== ====== ==== ===== ==========
#    1 0x92F3B1B361FB0E5B  4.0522 50.0%   312 0.0130 1.00  0.00 SELECT wp_options
#    2 0xE71D28F50D128F0F  0.8312 10.3%  6412 0.0001 1.00  0.00 SELECT poller_output poller_item
#    3 0x211901BF2E1C351E  0.6811  8.4%  6416 0.0001 1.00  0.00 SELECT poller_time
#    4 0xA766EE8F7AB39063  0.2805  3.5%   149 0.0019 1.00  0.00 SELECT wp_terms wp_term_taxonomy wp_term_relationships
#    5 0xA3EEB63EFBA42E9B  0.1999  2.5%    51 0.0039 1.00  0.00 SELECT UNION wp_pp_daily_summary wp_pp_hourly_summary wp_pp_hits wp_posts
#    6 0x94350EA2AB8AAC34  0.1956  2.4%    89 0.0022 1.00  0.01 UPDATE wp_options
#    7 0x7AEDF19FDD3A33F1  0.1381  1.7%   909 0.0002 1.00  0.00 SELECT wp_options
#    8 0x4C16888631FD8EDB  0.1160  1.4%     5 0.0232 1.00  0.00 SELECT film
#    9 0xCFC0642B5BBD9AC7  0.0987  1.2%    50 0.0020 1.00  0.01 SELECT UNION wp_pp_daily_summary wp_pp_hourly_summary wp_pp_hits
#   10 0x88BA308B9C0EB583  0.0905  1.1%     4 0.0226 1.00  0.01 SELECT poller_item
#   11 0xD0A520C9DB2D6AC7  0.0850  1.0%   125 0.0007 1.00  0.00 SELECT wp_links wp_term_relationships wp_term_taxonomy
#   12 0x30DA85C940E0D491  0.0835  1.0%   542 0.0002 1.00  0.00 SELECT wp_posts
#   13 0x8A52FE35D340A347  0.0767  0.9%     4 0.0192 1.00  0.00 TRUNCATE TABLE poller_time
#   14 0x3E84BF7C0C2A3005  0.0624  0.8%   272 0.0002 1.00  0.00 SELECT wp_postmeta
#   15 0xA01053DA94ED829E  0.0567  0.7%   213 0.0003 1.00  0.00 SELECT data_template_rrd data_input_fields
#   16 0xBE797E1DD5E4222F  0.0524  0.6%    79 0.0007 1.00  0.00 SELECT wp_posts
#   17 0xF8EC4434E0061E89  0.0475  0.6%    62 0.0008 1.00  0.00 SELECT wp_terms wp_term_taxonomy
#   18 0xCDFFAD848B0C1D52  0.0465  0.6%     9 0.0052 1.00  0.01 SELECT wp_posts wp_term_relationships
#   19 0x5DE709416871BF99  0.0454  0.6%   260 0.0002 1.00  0.00 DELETE poller_output
#   20 0x428A588445FE580B  0.0449  0.6%   260 0.0002 1.00  0.00 INSERT poller_output
# MISC 0xMISC              0.8137 10.0%  3853 0.0002   NS   0.0 <147 ITEMS>

The above part of the output ranks the queries and shows the top queries with largest impact - longest sum of run time which typically (not always) shows queries causing highest load on the server. As we can see here the one causing the highest load is the SELECT wp_options query, this is basically a unique way of identifying the query and simply implies that this is a SELECT query executed against the wp_options table. Another thing to note is the last line in the output the # MISC part, it tells you how much of "load" is not covered by top queries, we have 10% in MISC which means that by reviewing these top 20 queries we essentially reviewed most of the load.

Now let's take a look at the most important part of the output:

# Query 1: 0.26 QPS, 0.00x concurrency, ID 0x92F3B1B361FB0E5B at byte 14081299
# This item is included in the report because it matches --limit.
# Scores: Apdex = 1.00 [1.0], V/M = 0.00
# Query_time sparkline: |   _^   |
# Time range: 2011-12-28 18:42:47 to 19:03:10
# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count          1     312
# Exec time     50      4s     5ms    25ms    13ms    20ms     4ms    12ms
# Lock time      3    32ms    43us   163us   103us   131us    19us    98us
# Rows sent     59  62.41k     203     231  204.82  202.40    3.99  202.40
# Rows examine  13  73.63k     238     296  241.67  246.02   10.15  234.30
# Rows affecte   0       0       0       0       0       0       0       0
# Rows read     59  62.41k     203     231  204.82  202.40    3.99  202.40
# Bytes sent    53  24.85M  46.52k  84.36k  81.56k  83.83k   7.31k  79.83k
# Merge passes   0       0       0       0       0       0       0       0
# Tmp tables     0       0       0       0       0       0       0       0
# Tmp disk tbl   0       0       0       0       0       0       0       0
# Tmp tbl size   0       0       0       0       0       0       0       0
# Query size     0  21.63k      71      71      71      71       0      71
# InnoDB:
# IO r bytes     0       0       0       0       0       0       0       0
# IO r ops       0       0       0       0       0       0       0       0
# IO r wait      0       0       0       0       0       0       0       0
# pages distin  40  11.77k      34      44   38.62   38.53    1.87   38.53
# queue wait     0       0       0       0       0       0       0       0
# rec lock wai   0       0       0       0       0       0       0       0
# Boolean:
# Full scan    100% yes,   0% no
# String:
# Databases    wp_blog_one (264/84%), wp_blog_tw… (36/11%)... 1 more
# Hosts
# InnoDB trxID 86B40B (1/0%), 86B430 (1/0%), 86B44A (1/0%)... 309 more
# Last errno   0
# Users        wp_blog_one (264/84%), wp_blog_two (36/11%)... 1 more
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms  #################
#  10ms  ################################################################
# 100ms
#    1s
#  10s+
# Tables
#    SHOW TABLE STATUS FROM `wp_blog_one ` LIKE 'wp_options'\G
#    SHOW CREATE TABLE `wp_blog_one `.`wp_options`\G
# EXPLAIN /*!50100 PARTITIONS*/
SELECT option_name, option_value FROM wp_options WHERE autoload = 'yes'\G

This is the actual part of the output dealing with analysis of the query that is taking up the longest sum of run time, query ranked #1. The first row in the table above shows the Count of number of times this query was executed. Now let's take a look at the values in the 95% column, we can see that this query is taking up 20ms 95% of the times and is sending 202 rows and 83.83k of data per query while its also examining 246 rows for every query. Another important thing that is shown here is that every query is reading approximately 38.53 distinct InnoDB pages (meaning 616.48k of data). While you can also see that this query is doing a full scan every time its run. The "Databases" section of the output also shows the name of the databases where this query was executed. Next the "Query_time distribution" section shows how this query times mostly, which you can see majority of the time lies in the range >= 10ms and < 100ms. The "Tables" section lists the queries that you can use to gather more data about the underlying tables involved and the query execution plan used by MySQL.
The end result might be that you end up limiting the number of results returned by the query, by using a LIMIT clause or by filtering based on the option_name column, or you might even compress the values stored in the option_value column so that less data is read and sent.

Let’s analyze another query, this time query ranked #4 by pt-query-digest.

# Query 4: 0.12 QPS, 0.00x concurrency, ID 0xA766EE8F7AB39063 at byte 4001761
# This item is included in the report because it matches --limit.
# Scores: Apdex = 1.00 [1.0], V/M = 0.00
# Query_time sparkline: |  .^_   |
# Time range: 2011-12-28 18:42:47 to 19:02:57
# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count          0     149
# Exec time      3   281ms   534us    17ms     2ms     4ms     2ms     1ms
# Lock time      1    14ms    56us   179us    92us   159us    29us    80us
# Rows sent      8   9.01k       0     216   61.90  202.40   59.80   24.84
# Rows examine   8  45.05k       0   1.05k  309.59 1012.63  299.07  124.25
# Rows affecte   0       0       0       0       0       0       0       0
# Rows read      8   9.01k       0     216   61.90  202.40   59.80   24.84
# Bytes sent     1 622.17k     694  13.27k   4.18k  11.91k   3.35k   2.06k
# Merge passes   0       0       0       0       0       0       0       0
# Tmp tables    10     149       1       1       1       1       0       1
# Tmp disk tbl  30     149       1       1       1       1       0       1
# Tmp tbl size  16  12.00M       0 287.72k  82.47k 270.35k  79.86k  33.17k
# Query size     1  45.68k     286     345  313.91  329.68   13.06  313.99
# InnoDB:
# IO r bytes     0       0       0       0       0       0       0       0
# IO r ops       0       0       0       0       0       0       0       0
# IO r wait      0       0       0       0       0       0       0       0
# pages distin   2     683       2       7    4.58    6.98    1.29    4.96
# queue wait     0       0       0       0       0       0       0       0
# rec lock wai   0       0       0       0       0       0       0       0
# Boolean:
# Filesort     100% yes,   0% no
# Tmp table    100% yes,   0% no
# Tmp table on 100% yes,   0% no
# String:
# Databases    wp_blog_one (105/70%), wp_blog_tw... (34/22%)... 1 more
# Hosts
# InnoDB trxID 86B40F (1/0%), 86B429 (1/0%), 86B434 (1/0%)... 146 more
# Last errno   0
# Users        wp_blog_one (105/70%), wp_blog_two (34/22%)... 1 more
# Query_time distribution
#   1us
#  10us
# 100us  ###################
#   1ms  ################################################################
#  10ms  #
# 100ms
#    1s
#  10s+
# Tables
#    SHOW TABLE STATUS FROM `wp_blog_one ` LIKE 'wp_terms'\G
#    SHOW CREATE TABLE `wp_blog_one `.`wp_terms`\G
#    SHOW TABLE STATUS FROM `wp_blog_one ` LIKE 'wp_term_taxonomy'\G
#    SHOW CREATE TABLE `wp_blog_one `.`wp_term_taxonomy`\G
#    SHOW TABLE STATUS FROM `wp_blog_one ` LIKE 'wp_term_relationships'\G
#    SHOW CREATE TABLE `wp_blog_one `.`wp_term_relationships`\G
# EXPLAIN /*!50100 PARTITIONS*/
SELECT t.*, tt.*, tr.object_id FROM wp_terms AS t INNER JOIN wp_term_taxonomy AS tt ON tt.term_id = t.term_id INNER JOIN wp_term_relationships AS tr ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE tt.taxonomy IN ('category', 'post_tag', 'post_format') AND tr.object_id IN (733) ORDER BY t.name ASC\G

Let’s again take a look at the 95% column in the above output. The query execution time is 4ms, it sends 202 rows for which it has to examine 1012 rows (per every query), interesting here is that this query needs to do Filesort 100% of the times and also needs to create on-disk tmp tables every time its executed. Those are the two important things that you would probably like to fix with respect to this query. The tmp table size needed per query is 270.35k, which is not much considering the fact that tmp_tbl_size variable is set to 32M on the server, so on-disk tables are probably being created because of blob columns being accessed by the query. So a quick fix here could be to instead of selecting every column from all the tables involved in the query, probably selecting only the needed columns which could exclude the blob ones.

Conclusion

The only conclusion, I can make out is “Get yourself Percona Server, turn on log_slow_verbosity and start using pt-query-digest”, your job of identifying queries producing most load will be all the more simpler then.

The post Identifying the load with the help of pt-query-digest and Percona Server appeared first on MySQL Performance Blog.

Identifying the load with the help of pt-query-digest and Percona Server

Overview

Before We Start!

Using pt-query-digest

Analyzing pt-query-digest Output

Conclusion

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112