Reduce Query Memory Usage in ClickHouse

How to Reduce Query Memory Usage in ClickHouse: Troubleshooting Memory Limit Exceeded Errors



Memory limit exceeded errors are among the most common challenges ClickHouse users face when running complex analytical queries. Whether you’re performing large aggregations, executing heavy joins, or processing massive datasets, understanding how to manage and optimize memory usage is crucial for maintaining query performance and system stability.

This comprehensive guide will walk you through proven strategies to troubleshoot and resolve memory limit exceeded errors in ClickHouse, covering configuration settings, external aggregation techniques, memory-efficient functions, and profiling tools.

Understanding Memory Limit Exceeded Errors

When ClickHouse encounters a memory limit exceeded error, it means a query has attempted to consume more RAM than allocated. These errors typically occur during:

  • Large aggregations with high cardinality GROUP BY operations
  • Complex joins involving multiple large tables
  • Sorting operations on substantial datasets
  • Subqueries that materialize large intermediate results

The key to resolving these issues lies in understanding ClickHouse’s memory management settings and implementing appropriate optimization strategies.4

Configuring max_memory_usage

The max_memory_usage setting is your first line of defense against runaway queries. This parameter defines the maximum amount of RAM a single query can consume on a server.1

Setting Memory Limits

You can configure max_memory_usage at multiple levels:

Session Level:

SET max_memory_usage = 10000000000; -- 10 GB

User Level:

-- Set in users.xml or via SQL
ALTER USER clickhouse_user SETTINGS max_memory_usage = 10000000000;

Query Level:

SELECT * FROM large_table
SETTINGS max_memory_usage = 5000000000;

Best Practices for max_memory_usage

  • Start conservative: Begin with lower limits and increase as needed
  • Monitor system resources: Ensure total memory allocation across concurrent queries doesn’t exceed available RAM
  • Consider query patterns: Adjust limits based on typical query complexity
  • Use constraints: Implement minimum and maximum bounds to prevent misconfiguration8

The max_memory_usage_for_user setting provides an additional layer of control by limiting total memory consumption across all queries from a specific user.3

Leveraging External Aggregation

When queries require more memory than available RAM, external aggregation allows ClickHouse to spill data to disk, preventing memory limit errors while maintaining query functionality.

Configuring max_bytes_before_external_group_by

The max_bytes_before_external_group_by setting determines when GROUP BY operations should start writing temporary data to the filesystem instead of keeping everything in memory.2

SET max_bytes_before_external_group_by = 20000000000; -- 20 GB

When this threshold is reached, ClickHouse automatically:

  1. Flushes aggregation data to temporary disk storage
  2. Continues processing with reduced memory footprint
  3. Merges results from disk and memory at query completion

External Sorting Configuration

Similarly, for sorting operations:

SET max_bytes_before_external_sort = 20000000000;

Performance Considerations

While external aggregation prevents memory errors, it comes with trade-offs:12

  • Slower query execution: Disk I/O is significantly slower than RAM operations
  • Increased disk usage: Ensure adequate temporary storage space
  • I/O contention: Multiple concurrent queries using external aggregation can strain disk subsystems

Optimization tip: Use external aggregation as a safety mechanism rather than a primary strategy. Optimize queries and data structures first to minimize memory requirements.

Distributed Aggregation Memory Efficiency

For distributed queries, enable memory-efficient aggregation:

SET distributed_aggregation_memory_efficient = 1;

This setting reduces memory consumption during distributed GROUP BY operations by processing data in smaller chunks.6

Memory-Efficient Aggregate Functions

Choosing the right aggregate functions can dramatically reduce memory consumption, especially when dealing with high-cardinality data.

uniqCombined: The Memory-Efficient Choice

The uniqCombined function is specifically designed for memory-efficient distinct counting:1116

SELECT uniqCombined(user_id) FROM events;

How uniqCombined works:

  • Small sets: Uses an array for exact counting
  • Medium sets: Transitions to a hash table
  • Large sets: Switches to HyperLogLog algorithm with fixed memory usage

Advantages over uniq():16

  • Consumes several times less memory
  • Provides higher accuracy for large datasets
  • Fixed memory footprint regardless of cardinality
  • Better suited for distributed aggregations

Memory Usage Comparison

-- Memory-intensive approach
SELECT uniq(high_cardinality_column) FROM large_table;

-- Memory-efficient approach
SELECT uniqCombined(high_cardinality_column) FROM large_table;

The uniqCombined function maintains consistent memory usage even with billions of unique values, making it ideal for large-scale analytics.5

Other Memory-Efficient Functions

Consider these alternatives for common operations:

  • uniqCombined64: For 64-bit hash values18
  • quantileTDigest: Memory-efficient percentile calculations
  • sumMap: Efficient key-value aggregations

Query Profiling to Identify Memory-Hungry Operations

Effective troubleshooting requires identifying which queries and operations consume excessive memory. ClickHouse provides robust profiling tools for this purpose.

Using system.query_log

The system.query_log table tracks detailed metrics for all executed queries:10

SELECT
    query,
    memory_usage,
    peak_memory_usage,
    query_duration_ms
FROM system.query_log
WHERE type = 'QueryFinish'
ORDER BY peak_memory_usage DESC
LIMIT 10;

This query identifies the top 10 memory-intensive queries, helping you prioritize optimization efforts.

Monitoring Active Queries

Track currently running queries and their memory consumption:7

SELECT
    query_id,
    user,
    memory_usage,
    peak_memory_usage,
    elapsed
FROM system.processes
ORDER BY memory_usage DESC;

User-Level Memory Tracking

Monitor memory usage by user:17

SELECT
    user,
    memory_usage,
    peak_memory_usage
FROM system.user_processes
ORDER BY memory_usage DESC;

Sampling Query Profiler

Enable the sampling profiler for detailed execution analysis:13

SET query_profiler_real_time_period_ns = 1000000;
SET query_profiler_cpu_time_period_ns = 1000000;

This provides insights into which code routines consume the most resources during query execution.

Allocation Profiling

For deep memory analysis, ClickHouse supports allocation profiling through jemalloc:9

SET memory_profiler_step = 4194304; -- 4 MB

This tracks memory allocations and helps identify memory leaks or unexpected allocation patterns.

Advanced Memory Management Techniques

Memory Overcommit

Configure memory overcommit to handle temporary memory spikes:14

SET memory_overcommit_ratio_denominator = 1073741824; -- 1 GB

When enabled, ClickHouse can terminate the most memory-intensive query if system memory is exhausted, protecting overall system stability.

Query Optimization Strategies

Beyond configuration, optimize queries to reduce inherent memory requirements:15

  1. Filter early: Apply WHERE clauses before aggregations
  2. Limit cardinality: Use LIMIT clauses when appropriate
  3. Optimize JOIN order: Place smaller tables first
  4. Use appropriate data types: Avoid nullable columns when possible
  5. Leverage materialized views: Pre-aggregate data for common queries

Debugging Memory Issues

When troubleshooting persistent memory problems:7

  1. Identify problematic queries using query_log
  2. Analyze query execution plans with EXPLAIN
  3. Review table schemas for optimization opportunities
  4. Monitor system resources during peak usage
  5. Test configuration changes in non-production environments

Best Practices Summary

To effectively manage memory usage in ClickHouse:

  • Set appropriate limits: Configure max_memory_usage based on available resources and query patterns
  • Enable external aggregation: Use max_bytes_before_external_group_by as a safety net for large operations
  • Choose efficient functions: Prefer uniqCombined and similar memory-optimized functions
  • Profile regularly: Monitor query performance and memory consumption proactively
  • Optimize queries: Focus on query structure and data modeling before relying solely on configuration
  • Plan capacity: Ensure adequate RAM and disk space for your workload
  • Test thoroughly: Validate configuration changes under realistic load conditions

Conclusion

Memory limit exceeded errors don’t have to be a roadblock in your ClickHouse journey. By understanding and properly configuring memory settings, leveraging external aggregation, using memory-efficient functions, and implementing comprehensive profiling, you can build robust analytical queries that scale with your data.

Remember that memory optimization is an iterative process. Start with conservative settings, monitor performance, and adjust based on your specific workload characteristics. With these tools and techniques, you’ll be well-equipped to handle even the most demanding analytical queries while maintaining system stability and performance.

Further Reading

You might also like:

About ChistaDATA Inc. 176 Articles
We are an full-stack ClickHouse infrastructure operations Consulting, Support and Managed Services provider with core expertise in performance, scalability and data SRE. Based out of California, Our consulting and support engineering team operates out of San Francisco, Vancouver, London, Germany, Russia, Ukraine, Australia, Singapore and India to deliver 24*7 enterprise-class consultative support and managed services. We operate very closely with some of the largest and planet-scale internet properties like PayPal, Garmin, Honda cars IoT project, Viacom, National Geographic, Nike, Morgan Stanley, American Express Travel, VISA, Netflix, PRADA, Blue Dart, Carlsberg, Sony, Unilever etc