ChistaDATA Anansi Query Profiler

Introduction

Top-n queries extract the top or bottom n rows from a result set. In other words, they identify the best or worst examples, such as the top 10 place in a particular area, the 5 worst performing retailers, etc. The most common use of this type of query is in business intelligence, where it’s important to monitor the performance of specific organisations in order to identify opportunities to increase revenue or reduce costs. This is where ChistaDATA Anansi comes in.

A log analysis tool called ChistaDATA Anansi has been developed for ClickHouse and PostgreSQL, designed as a query profiler. The reports generated by the profiler provide information on a variety of query characteristics, including execution time, memory usage, bytes read, and other elementary but essential details.

Setting Up the ClickHouse Logging

Anansi is fully compatible with ClickHouse and we can use this effective tool by running it directly. There is no need to set any parameters to use it. Full documentation is available on the ChistaDATA Anansi github page.

Setting Up the PostgreSQL Logging

To use Anansi effectively, logging in PostgreSQL should be set up to give Anansi as much information as possible. The Anansi tool will not affect the performance of your database like similar tools.

alter system set log_line_prefix = "%t [%p]: user=%u,db=%d,app=%a,client=%h"; #If log_destination = ‘stderr’
alter system set log_min_duration_statement = 0; #If we need to catch all the queries
select pg_reload_conf(); #Once the above two parameters have been set, this operation is necessary for the parameters to take effect without a restart.

In addition, the following configurations, which comes with PostgreSQL by default, should be used. Note that the Anansi tool only supports stderr in PostgreSQL but we will continue to add support for other log formats.

#log_destination = 'stderr'             # Valid values are combinations of
                                        # stderr, csvlog, jsonlog, syslog, and
                                        # eventlog, depending on platform.
                                        # csvlog and jsonlog require
                                        # logging_collector to be on.

# This is used when logging to stderr:
#logging_collector = off                # Enable capturing of stderr, jsonlog,
                                        # and csvlog into log files. Required
                                        # to be on for csvlogs and jsonlogs.
                                        # (change requires restart)

# These are only used if logging_collector is on:
#log_directory = 'log'                  # directory where log files are written,
                                        # can be absolute or relative to PGDATA
#log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'        # log file name pattern,
                                        # can include strftime() escapes

Having done that, we can move on to creating and checking our queries.

Queries

Queries is sometimes the most interesting output, as it provides details of the different types of queries generated by applications and users. With more knowledge of the database churn caused by a user or application, the database system can be better tuned in the future.

A output showing the many types of queries, including INSERT, UPDATE, DELETE, SELECT, etc., and all of these query types are collected by default. If you want to discard a query type, you can use the --discard-queries feature. Database specific queries are displayed in the output respectively.

Generating Logs

The Anansi tool is used to generate logs from the command line. There are many options, but the simplest configuration is to specify the log file, the output as txt file and the log line prefix used by the database.

./anansi-profiler /var/log/postgresql/postgresql-15-main.log --database-type=postgres --log-prefix="%t [%p]: user=%u,db=%d,app=%a,client=%h" --discard-queries="delete" -n 10

We specified --database-name as postgres because that is what we are using now. We also caught the top 10 queries with the -n parameter. By default, Anansi catches all DML/DDL queries, but in this scenario, we don’t want to include the delete operations for now, so --discard-queries is for that. The --log-prefix  parameter also allows the Anansi profiler to correctly identify and parse the relevant information in the log file.

Output

In the output file, you will be able to see that the number of times the query was run, the memory used and where the query came from. You will also see queries of the same format grouped together. But don’t worry, you’ll easily catch the query that breaks the trend in terms of execution time. Here is an example of the Anansi output:

# Query 1 : 0.003 QPS
# Time range: From 2023-04-17 10:09:22 +0000 UTC to 2023-04-17 10:09:22 +0000 UTC
# ====================================================================
# Attribute      total     min     max     avg     95%  stddev  median
# ============ ======= ======= ======= ======= ======= ======= =======
# Count           1.00 
# Exec time      5.73s   5.73s   5.73s   5.73s   5.73s   0.00s   5.73s
# Peak Memory            0.00B   0.00B   0.00B   0.00B   0.00B   0.00B
# ====================================================================
# Databases    postgres (1/1)  
# Hosts        [local] (1/1)  
# Users        postgres (1/1)  
# Completion   1/1
# Query_time distribution
# ====================================================================
#   1us  
#  10us  
# 100us  
#   1ms  
#  10ms  
# 100ms  
#    1s  ############################################################
#  10s+  
# ====================================================================
# Query
insert into land_registry_price_paid_uk select * from land_registry_price_paid_uk  limit 5000000	;

Conclusion

While we’ve focused on finding the worst performers using Anansi in the ClickHouse and PostgreSQL databases in environments, also note that there is an S3 connection option that is also feasible. Stay tuned! Because ChistaDATA will soon add support for MySQL databases and formats other than the stderr log format.

To read more about ClickHouse’s Query Profiler, do consider reading the following articles:

About Ilkay 24 Articles
Ilkay has been administering databases including NoSQL and RDBMS for over 7 years. He is experienced working high-scale banking databases. He is currently working at ChistaDATA Inc. as Database Administrator.
Contact: Website