1. Home
  2. Knowledge Base
  3. ClickHouse Troubleshooting
  4. ClickHouse Recommended Troubleshooting Metrics

ClickHouse Recommended Troubleshooting Metrics

Alert Name

Shell or SQL command

Severity

ClickHouse status $ curl ‘http://localhost:8123/’

Ok.

Critical
Too many simultaneous queries. Maximum: 100 (by default) select value from system.metrics
where metric=’Query’
Critical
Replication status $ curl ‘http://localhost:8123/replicas_status’

Ok.

High
Read only replicas (reflected by replicas_status as well) select value from system.metrics
where metric=’ReadonlyReplica’
High
Some replication tasks are stuck select count()
from system.replication_queue
where num_tries > 100 or num_postponed > 1000
High
ZooKeeper is available select count() from system.zookeeper
where path=’/’
Critical for writes
ZooKeeper exceptions select value from system.events
where event=’ZooKeeperHardwareExceptions’
Medium
Other CH nodes are available $ for node in `echo “select distinct host_address from system.clusters where host_name !=’localhost'” | curl ‘http://localhost:8123/’ –silent –data-binary @-`; do curl “http://$node:8123/” –silent ; done | sort -u

Ok.

High
All CH clusters are available (i.e. every configured cluster has enough replicas to serve queries) for cluster in `echo “select distinct cluster from system.clusters where host_name !=’localhost'” | curl ‘http://localhost:8123/’ –silent –data-binary @-` ; do clickhouse-client –query=”select ‘$cluster’, ‘OK’ from cluster(‘$cluster’, system, one)” ; done Critical
There are files in ‘detached’ folders $ find /var/lib/clickhouse/data/*/*/detached/* -type d | wc -l; \ 19.8+

select count() from system.detached_parts

Medium
Too many parts: \ Number of parts is growing; \ Inserts are being delayed; \ Inserts are being rejected select value from system.asynchronous_metrics
where metric=’MaxPartCountForPartition’;select value from system.events/system.metrics
where event/metric=’DelayedInserts’; select value from system.events
where event=’RejectedInserts’
Critical
Dictionaries: exception select concat(name,’: ‘,last_exception)
from system.dictionaries
where last_exception != ”
Medium
ClickHouse has been restarted select uptime();

select value from system.asynchronous_metrics
where metric=’Uptime’

DistributedFilesToInsert should not be always increasing select value from system.metrics
where metric=’DistributedFilesToInsert’
Medium
A data part was lost select value from system.events
where event=’ReplicatedDataLoss’
High
Data parts are not the same on different replicas select value from system.events where event=’DataAfterMergeDiffersFromReplica’; \ select value from system.events where event=’DataAfterMutationDiffersFromReplica’ Medium

 

The following queries are recommended to be included in monitoring:

  • SELECT * FROM system.replicas.   –  For more information, see the ClickHouse guide on System Tables. Visit here.
  • SELECT * FROM system.merges     –  Checks on the speed and progress of currently executed merges.
  • SELECT * FROM system.mutations WHERE create_time desc –  This is the source of information on the speed and progress of currently executed merges.
Was this article helpful?

Related Articles

CHISTADATA IS COMMITTED TO OPEN SOURCE SOFTWARE AND BUILDING HIGH PERFORMANCE COLUMNSTORES

In the spirit of freedom, independence and innovation. ChistaDATA Corporation is not affiliated with ClickHouse Corporation 

Need Support?

Can't find the answer you're looking for?
Contact Support

ChistaDATA Inc. Knowledge base is licensed under the Apache License, Version 2.0 (the “License”)

Copyright 2022 ChistaDATA Inc

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.