MySQL::Log::ParseFilter - Parse and filter MySQL slow, general and binary logs
use MySQL::Log::ParseFilter;
# Parse all unique queries from logs given on command line
%params = (logs => \@ARGV,
queries => \%queries);
parse_slow_logs(%params);
parse_general_logs(%params);
parse_binary_logs(%params);
calc_final_values(%params);
MySQL::Log::ParseFilter is a Perl module for parsing and filtering MySQL slow, general and binary logs. MySQL::Log::ParseFilter also parses and filters user-defined logs: logs with variable headers and SQL statement meta-properties.
Each MySQL log is formatted differently and poses many problems to accurate parsing. From the range of MySQL server versions to the vast extent of SQL syntax, parsing a MySQL log file is rarely a trivial task if done well.
MySQL::Log::ParseFilter handles all the heavy log chopping, hacking and filtering, allowing a script to simply extract the data that it wants.
The following functions are exported by default:
set_meta_filter($filter)$filter. $filter is a scalar containing a single
string of meta-property filter conditions. See Meta-Property.
Returns nothing.
set_statement_filter($filter)$filter. $filter is a scalar containing a single
string of allowed or disallowed SQL statement types. See SQL Statement.
Returns nothing.
set_grep($pattern)$pattern is a scalar containing a single
Perl regex pattern without m// or similar. For example: "^SELECT foo FROM (?:this|that)".
Returns nothing.
parse_binary_logs(%params)%params may contain the following key/values:
logs => ref to array having log file names to parse (REQUIRED)
queries => ref to hash in which to save unique queries (REQUIRED)
all_queries => ref to array in which to save all queries (optional)
NOTE: MySQL binary logs are, as the name suggests, binary--they are not
text files. It is necessary to first ``decode'' a binary log with the
MySQL-provided program mysqlbinlog. The log files given to
parse_binary_logs() must be the text output from mysqlbinlog ran first
on the binary log files (without the --short-form option).
Returns nothing.
parse_general_logs(%params)%params may contain the following key/values:
logs => ref to array having log file names to parse (REQUIRED)
queries => ref to hash in which to save unique queries (REQUIRED)
all_queries => ref to array in which to save all queries (optional)
users => ref to hash in which to save unique users (optional)
Returns nothing.
parse_slow_logs(%params)%params may contain the following key/values:
logs => ref to array having log file names to parse (REQUIRED)
queries => ref to hash in which to save unique queries (REQUIRED)
all_queries => ref to array in which to save all queries (optional)
users => ref to hash in which to save unique users (optional)
microslow => 0 = regular slow log, 1 = microslow log (optional)
0 is default for microslow.
Returns nothing.
set_udl_format($format_file)$format_file. $format_file is a
scalar containing a single file name. See USER-DEFINED LOGS.
This function should be called before calling parse_udl_logs().
Returns nothing.
parse_udl_logs(%params)%params may contain the following key/values:
logs => ref to array having log file names to parse (REQUIRED)
queries => ref to hash in which to save unique queries (REQUIRED)
all_queries => ref to array in which to save all queries (optional)
users => ref to hash in which to save unique users (optional)
Returns nothing.
See USER-DEFINED LOGS.
calc_final_values(%params, $grand_totals)This function calculates: number of unique queries, c_sum_p, averages, grand total sums, percent true for true/false (or yes/no) meta-properties, per-meta-property percentages of grand total sums, per-user percentages of all users, and number of unique users.
%params is the same hashed passed earlier to one of the parse_ functions.
$grand_totals is a reference to a hash in which to save the grand total sums.
Pass 0 or undef if you do not want grand total sums.
Returns total number of queries.
apply_final_meta_filters(%params, $total_queries)calc_final_values().
%params is the same hashed passed earlier to calc_final_values()
or to one of the parse_ functions if you did not calculate final
values for some reason.
$total_queries is a reference to a scalar having the total number of queries.
Usually, this is obtained from the return value of calc_final_values().
$total_queries will be adjusted to account for queries which were removed
by a filter. Pass 0 or undef if you do not want an adjusted total queries.
Returns total number of queries removed.
Every SQL statement has many meta-properties. These are values about the SQL statement such as its execution time, how many rows it examines, the MySQL connection ID it is associated with, etc. It is by these values that a MySQL log is filtered (and usually sorted).
Every type of MySQL log provides different meta-properties. The list of all meta-properties is very long so only the basics are given here. For the full list visit http://hackmysql.com/mysqlsla_filters.
But first, it is important to understand the naming scheme that MySQL::Log::ParseFilter uses for meta-properties.
Meta-properties are either numeric or string. For strings, the naming scheme
does not change: db is always just db. For numeric values, however,
several additional meta-properties are created and identified by consistent
extensions to the base meta-property name.
Take for example t from slow logs. In addition to this base
meta-property, MySQL::Log::ParseFilter also creates: t_min,
t_max, t_avg, t_sum and t_sum_p (unless set_save_meta_values()
was disabled; see OPTIONS).
These additional meta-properties are identified by their extensions: _min,
_max, _avg, _sum, _sum_p. These extensions are consistent and
form the naming scheme for most numeric meta-properties. (cid is a
notable exception.) They tell you as well as MySQL::Log::ParseFilter what
the additional meta-property value represents: the minimum, maximum, average
and sum value of their base meta-property.
_sum_p means percentage that the base meta-property constitutes of the
grand total sum for all those base meta-properties (if grand total sum were
calculated when calling calc_final_values()).
There is another extension for true/false (yes/no) meta-properties: _t
and _t_p. Currently, this type of meta-property is only found in microslow logs.
This naming scheme is very important when working with user-defined logs because it allows you to know in advance the names the of additional meta-properties that MySQL::Log::ParseFilter will create from the given bases meta-properties.
c_sum : Total number of times SQL statement appears in log
host : Host name of MySQL connection
ip : IP address of MySQL connection
l : Time spent acquiring lock
l_sum : Total time spent acquiring lock
re : Number of rows examined
rs : Number of rows sent
t : Execution time
t_sum : Total execution time
user : User of MySQL connection
c_sum : Total number of times SQL statement appears in log
cid : Connection ID of MySQL connection
host : Host name of MySQL connection
user : User of MySQL connection
c_sum : Total number of times SQL statement appears in log
cid : Connection ID of MySQL connection
ext : Execution time
err : Error code (if any) caused by SQL statement
sid : Server ID of MySQL server
All filters are inclusive: every condition for every filter must pass for the statement to be saved.
The format of $filter when calling set_meta_filter($filter) is
[CONDITION],[CONDITION]... where each [CONDITION] is [meta][op][val].
[meta] is a meta-property name (listed above or from the full list
at http://hackmysql.com/mysqlsla_filters). [op] is either >, <, or =.
And [val] is the value against which [meta] from the log must
pass according to [op]. [val] is numeric or string according to
[meta]. For string values, only = is valid for [op].
The format of $filter when calling set_statement_filter($filter) is
[+-][TYPE],[TYPE]....
[+-] is give only once at the first start of the filter string. + means that
the filter is positive: allow only the given [TYPE]s. - means that the filter
is negative: remove the given [TYPE]s. [TYPE] is a SQL statement type:
SELECT, UPDATE, INSERT, DO, SET, CREATE, DROP, ALTER, etc.
# Parse general logs given on command line extracting only SELECT queries
# using database foo and calculate grand total sums
my %queries;
my %grand_totals;
my %params = (
logs => \@ARGV,
queries => \%queries,
);
set_meta_filter("db=foo");
set_statement_filter("+SELECT");
parse_general_logs(%params);
calc_final_values(%params, \%grand_totals);
# Parse slow logs given on command line removing SET statements and
# extracting only queries which took longer than 5 seconds to execute
my %queries;
my %params = (
logs => \@ARGV,
queries => \%queries,
);
set_meta_filter("t>5");
set_statement_filter("-SET");
parse_slow_logs(%params);
calc_final_values(%params, 0);
# Parse output files from mysqlbinlog given on command line extracting
# only INSERT and UPDATE queries which account for more than 75% of all
# INSERT and UPDATE queries extracted
my %queries;
my %grand_totals;
my %params = (
logs => \@ARGV,
queries => \%queries,
);
set_meta_filter("c_sum_p>75");
set_statement_filter("+INSERT,UPDATE");
parse_binary_logs(%params);
calc_final_values(%params, \%grand_totals);
apply_final_meta_filters(%params, 0);
#!/usr/bin/perl -w
use strict;
use MySQL::Log::ParseFilter;
my %queries;
if(@ARGV != 2) {
print "dump_type dumps a unique sample of all statements of TYPE from general LOG.\n";
print "Usage: dump_type TYPE LOG\n";
exit;
}
set_statement_filter("+$ARGV[0]");
parse_general_logs( (logs => [ $ARGV[1] ], queries => \%queries) );
foreach(keys %queries) { print "$queries{$_}->{sample}\n"; }
exit;
MySQL::Log::ParseFilter has six functions to set special options which can be
imported with the :options tag (use MySQL::Log::ParseFilter qw(:DEFAULT :options)).
set_save_meta_values($val)sample,
db, cid.
Any meta-property value not check in apply_final_meta_filters()
can still be used (t, l, host, cid, etc.)
set_save_all_values($val)meta_all). Default 0 (disabled). Can be set to 1 (enabled). This does not
affect user-defined logs which has a seperate mechanism for saving all values
(type nf).
At present, enabling this option causes the following all values to be saved:
for microslow (msl) patched slow log with InnoDB values: iorops_all,
iorbytes_all, iorwait_all, reclwait_all, qwait_all, pages_all;
for binary logs: ext_all.
set_IN_abstraction($val)This is an experimental option. Normally, all IN clauses are condensed from
IN (N, N, N) to IN (N3). This option furthers this abstraction by grouping
the condensed IN clauses in groups of $val where $val is the ``dividing line.''
Example: with $val=10 IN (N3) becomes IN (N0-9). Therefore, any IN clause
with 0 to 9 values will be condensed and then further abstracted to IN (N0-9).
Likewise, any IN clauses with 10 to 19 values will be condensed and further
abstracted to IN (10-19).
set_VALUES_abstraction($val)This is an experimental option. Normally, all VALUES clauses are condensed from
VALUES (NULL, 'foo'), (NULL, 'bar') to VALUES (NULL, 'S')2. This option
furthers this abstractiong by removing that number of condensed value sets: 2.
Example: two queries INSERT INTO table VALUES ('S') and
INSERT INTO table VALUES ('S'), ('S'), ('S') are first condensed to
INSERT ... VALUES ('S')1 and INSERT ... VALUES ('S')3 then further abstracted
to one single query: INSERT INTO table VALUES ('S').
set_atomic_statements($val)This is an experimental option. Normally, each statement in a multi-statement group is filtered individually: only those which fail a filter are removed and those which pass are kept. With this option enabled, if any one statement in a group fails, the entire group of statements is removed.
This option does not apply to general logs because general logs never group statements.
set_db_inheritance($val)Normally, the log must explicitly specify the database for each statement. Or, in the case of general logs, the current database is tracked by other means. Sometimes, however, logs only specify the database explicitly once. If this option is enabled, all statements following an explicit database specification inherit that database.
MySQL::Log::ParseFilter can parse user-defined logs which have variable headers and meta-property values. Accomplishing this is not a trivial task. Therefore the subject is not covered here but at http://hackmysql.com/udl.
The following four functions can be imported with the :hacks tag.
get_meta_filter()meta => [ op, value ].
get_statement_filter()type => 0. Also has pos_neg => 1 (positive) or 0 (negative).
passes_meta_filter($meta, $val, $type)$meta is a meta-property name. $val is the log value. $type is
'n' (numeric) or 's' (string). Returns 1 on pass, 0 on fail.
passes_statement_filter($type)$type is a SQL statement type (SELECT, CREATE, DROP, etc.), case
insensitive. Returns 1 on pass, 0 on fail.
Calling MySQL::Log::ParseFilter::set_debug(1) will enable debugging and cause
MySQL::Log::ParseFilter to print a flood of debugging information to STDOUT.
This may be necessary if you feel a function is not working correctly because,
although they do not return errors, they print debugging messages.
There are no known bugs. Please contact me if you find one. Expect that I will ask for at least a portion of your log because that makes finding and fixing the bug easier.
Daniel Nichter <perl@hackmysql.com>
v1.00
Copyright 2008 Daniel Nichter
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.