PREWHERE vs WHERE in ClickHouse Queries

PREWHERE vs WHERE: When and How to Optimize Your ClickHouse Queries


Introduction

Query performance in ClickHouse often hinges on how efficiently you filter data. While most SQL users are familiar with the WHERE clause, ClickHouse offers a powerful optimization feature called PREWHERE that can dramatically reduce I/O and accelerate query execution. Understanding when and how to use PREWHERE versus WHERE is essential for database architects and performance engineers working with large-scale analytical workloads.


What Is PREWHERE and Why Does It Matter?

PREWHERE is a query execution optimization unique to ClickHouse that reduces I/O by filtering data before reading all columns.In columnar databases like ClickHouse, reading unnecessary columns is expensive—PREWHERE solves this by applying filters early in the query pipeline.

How PREWHERE Works

When you use PREWHERE, ClickHouse follows this execution pattern:

  1. Read only the columns needed for the PREWHERE condition
  2. Apply the filter to eliminate rows
  3. Read remaining columns only for rows that passed the filter

This approach is particularly effective when:

  • You’re selecting many columns but filtering on just a few
  • Your filter condition has high selectivity (eliminates most rows)
  • The filtered columns are smaller than the columns you’re selecting

WHERE Clause: The Standard Approach

The traditional WHERE clause reads all columns specified in the SELECT statement before applying filters. For queries that select many columns but filter on few, this results in unnecessary I/O that slows performance.


Automatic PREWHERE Optimization

One of ClickHouse’s most powerful features is its ability to automatically apply PREWHERE optimization without explicit user intervention.

When ClickHouse Applies PREWHERE Automatically

ClickHouse’s query optimizer automatically converts WHERE conditions to PREWHERE when:

  • The query targets MergeTree family tables
  • The filter condition operates on a minority of columns
  • The optimization is likely to reduce I/O significantly

This automatic behavior is controlled by the optimize_move_to_prewhere setting, which is enabled by default.

-- This setting controls automatic PREWHERE optimization
SET optimize_move_to_prewhere = 1;  -- Enabled (default)

Example of Automatic Optimization

-- You write this query with WHERE
SELECT user_id, name, email, address, phone, metadata
FROM users
WHERE country = 'US' AND age > 25;

-- ClickHouse automatically optimizes it to behave like PREWHERE
-- Reading country and age columns first, filtering, then reading other columns

When to Use PREWHERE Manually

While automatic optimization works well in most cases, there are scenarios where explicitly using PREWHERE gives you more control and guaranteed behavior.

Ideal Use Cases for Manual PREWHERE

1. Strong Filtration on Few Columns

When you know your filter will eliminate most rows and operates on a small subset of columns:

-- Explicit PREWHERE for guaranteed optimization
SELECT *
FROM events
PREWHERE event_type = 'purchase' AND timestamp > '2025-01-01'
WHERE user_id IN (SELECT user_id FROM premium_users);

2. Large Column Selection with Selective Filters

When selecting many or all columns but filtering on specific, highly selective conditions:

-- Reading all columns but filtering on just one
SELECT *
FROM logs
PREWHERE status_code >= 500;

3. Combining PREWHERE and WHERE

Use PREWHERE for the most selective filters and WHERE for additional conditions:

SELECT user_id, session_data, full_profile
FROM user_sessions
PREWHERE country_code = 'US'  -- High selectivity, small column
WHERE session_duration > 300;  -- Additional filter after initial reduction

When NOT to Use PREWHERE

Avoid PREWHERE when:

  • Filtering on columns that are part of the sorting key (use WHERE instead)
  • The filter condition is not selective (eliminates few rows)
  • You’re already filtering on most or all columns in the table
  • Working with non-MergeTree table engines

Measuring PREWHERE Performance Impact

Understanding the actual performance benefit requires measurement. Here’s how to evaluate PREWHERE effectiveness:

1. Use EXPLAIN to Analyze Query Plans

EXPLAIN indexes = 1
SELECT *
FROM large_table
PREWHERE category = 'electronics'
WHERE price > 100;

This shows how ClickHouse processes the query and which indexes it uses.

2. Compare Query Execution Metrics

Query the system.query_log table to compare performance:

-- Check read bytes and execution time
SELECT 
    query,
    read_bytes,
    read_rows,
    query_duration_ms
FROM system.query_log
WHERE query LIKE '%your_table%'
ORDER BY event_time DESC
LIMIT 10;

3. Monitor I/O Reduction

The key metric is bytes read. A successful PREWHERE optimization should show:

  • Significantly fewer read_bytes compared to equivalent WHERE queries
  • Lower query_duration_ms for queries selecting many columns

Practical Performance Example

-- Without PREWHERE (or automatic optimization disabled)
SELECT * FROM events WHERE user_id = 12345;
-- Reads: 10 GB, Duration: 5.2s

-- With PREWHERE
SELECT * FROM events PREWHERE user_id = 12345;
-- Reads: 500 MB, Duration: 0.3s

In this example, PREWHERE reduced I/O by 95% and improved query speed by 17x.


Best Practices for PREWHERE Optimization

1. Filter on Non-Key Columns First

Apply PREWHERE to columns not in the sorting key, then use WHERE for key columns:

SELECT *
FROM orders
PREWHERE status = 'completed'  -- Not in sorting key
WHERE order_date >= '2025-01-01';  -- In sorting key

2. Leverage Automatic Optimization

Trust ClickHouse’s automatic PREWHERE for most queries, but use explicit PREWHERE when you need guaranteed behavior or have domain knowledge about data distribution.

3. Test with Real Data

Always benchmark queries with production-like data volumes. PREWHERE benefits scale with:

  • Table size
  • Number of columns selected
  • Filter selectivity

4. Combine with Other Optimizations

PREWHERE works best alongside other ClickHouse optimizations:

  • Proper sorting keys
  • Appropriate partitioning
  • Compression codecs
  • Materialized views for common filters

5. Monitor Settings

Verify automatic optimization is enabled:

SELECT name, value 
FROM system.settings 
WHERE name = 'optimize_move_to_prewhere';

Common Pitfalls to Avoid

1. Overusing PREWHERE

Don’t force PREWHERE on every query. Let automatic optimization handle most cases.

2. PREWHERE with FINAL

When using FINAL with ReplacingMergeTree, PREWHERE optimization may not work as expected if the WHERE condition doesn’t contain key columns.

3. Ignoring Column Order

Place the most selective filters in PREWHERE, less selective ones in WHERE.


Conclusion

PREWHERE is a powerful optimization that can dramatically improve ClickHouse query performance by reducing unnecessary I/O. While ClickHouse automatically applies this optimization in most cases, understanding when to use it explicitly—and how to measure its impact—empowers you to fine-tune queries for maximum efficiency.

Key Takeaways:

  • PREWHERE filters data before reading all columns, reducing I/O
  • ClickHouse automatically applies PREWHERE optimization by default
  • Use explicit PREWHERE for strong filtration on minority columns
  • Measure impact through system.query_log metrics and EXPLAIN plans
  • Combine PREWHERE with proper indexing and partitioning for best results

By mastering PREWHERE versus WHERE, you’ll unlock significant performance gains in your ClickHouse analytical workloads—especially for queries that select many columns but filter on few.


Ready to optimize your ClickHouse queries? Start by analyzing your most expensive queries in system.query_log and experiment with explicit PREWHERE on high-volume tables!

Further Reading

You might also like:

About ChistaDATA Inc. 178 Articles
We are an full-stack ClickHouse infrastructure operations Consulting, Support and Managed Services provider with core expertise in performance, scalability and data SRE. Based out of California, Our consulting and support engineering team operates out of San Francisco, Vancouver, London, Germany, Russia, Ukraine, Australia, Singapore and India to deliver 24*7 enterprise-class consultative support and managed services. We operate very closely with some of the largest and planet-scale internet properties like PayPal, Garmin, Honda cars IoT project, Viacom, National Geographic, Nike, Morgan Stanley, American Express Travel, VISA, Netflix, PRADA, Blue Dart, Carlsberg, Sony, Unilever etc