Introduction to Materialized Views In ClickHouse

What is Materialized View | What should it mean to us?

While normal views only store the queries and fetch the data from the related tables according to the specified criteria when requested, materialized views also keep the data returned as a result of these queries as well as queries. So it acts as a kind of table. The purpose of using it is; to prepare only the data of the tables with huge data that meet the specified criteria and present them quickly. The materialized view stores the SQL data like a table in the database. The answer to the question, What is ClickHouse Materialized View is actually the advantages it offers us. Using Materialized Views gives us 2 advantages;

  • Duplicate data is created by copying table data. For example, instead of going to remote servers and querying a table, we can kind of copy a copy of the table to our own database with Materialized Views. This materialized view detects changes such as update-insert-delete in the table or view it is a copy of and updates itself at certain time intervals or after certain database operations. In this way, a copy of the table’s data on that remote server can always be kept up-to-date as mv. Therefore, we can now use our Materialized View, which is fed from the remote server from time to time, up-to-date on our own server. At the same time, the processing load on the main server is reduced.
  • Of course, it provides an amazing performance. For example, consider a view that uses data from several tables; it will take a lot of effort to fetch data from several tables. But if we create the expression, we are querying this view as Materialized View. It will use the data copied into the MV. So it will not go and fetch the data from its own tables.

Difference between Views and Materialized Views in SQL

As we know, the main constituent of any database is its table. To make data accessibility custom, there is a concept of Views. In other words, we can say that with the help of Views of a table, we can restrict any user to access only that data which is supposed to be accessed by him. Now, based on the characteristics and features of the views, we can distinguish between Views and Materialized Views.

Following are the important differences between Views and Materialized Views.

Sr. No. Key Views Materialized Views
1 Definition Technically View of a table is a logical virtual copy of the table created by “select query” but the result is not stored anywhere in the disk, and every time we need to fire the query when we need data, so always we get updated or latest data from original tables. Materialized Views are also the logical virtual copy of data driven by the select query, but the result of the query will get stored in the table or disk.
2 Storage In Views, the resulting tuples of the query expression are not getting stored on the disk. Only the query expression is stored on the disk. Materialized Views, both query expression and resulting tuples of the query get stored on the disk.
3 Query Execution As mentioned above, in the case of Views, the query expression is stored on the disk and not its result, so the query expression gets executed whenever a user tries to fetch data from it so that the user will get the latest updated value every time. Materialized Views the result of the query is getting stored on the disk, and hence the query expression does not get executed whenever a user tries to fetch the data, so the user will not get the latest updated value if it gets changed in the database.
4 Cost Effective As Views does not have any storage cost associated with it, so it also does not have any update cost associated with it. Materialized Views does have a storage cost associated with it, so it also has an updated cost associated with it.
5 Design Views in SQL are designed with a fixed architecture approach, due to which there is an SQL standard for defining a view. Materialized Views in SQL are designed with a generic architecture approach, so there is no SQL standard for defining them, and their functionality is provided by some database systems as an extension.
6 Usage Views are generally used when data is to be accessed infrequently, and data in the table get updated frequently. Materialized Views are used when data is to be accessed frequently, and data in the table does not get updated frequently.

Example 1

Following SQL statement returns result in 1.818 seconds.

SELECT pickup_location_id,
       sum(passenger_count) / pc_cnt AS pc_avg,
       count() AS pc_cnt
FROM tripdata
GROUP BY pickup_location_id
ORDER BY pc_cnt DESC
LIMIT 10

Now let’s create MATERIALIZED VIEW

CREATE MATERIALIZED VIEW tripdata_smt_mv
ENGINE = SummingMergeTree
PARTITION BY toYYYYMM(pickup_date)
ORDER BY (pickup_location_id, dropoff_location_id)
POPULATE
AS SELECT
       pickup_date,
       pickup_location_id,
       dropoff_location_id,
       sum(passenger_count) AS passenger_count_sum,
       count() AS trips_count
FROM tripdata
GROUP BY 
       pickup_date,
       pickup_location_id,
       dropoff_location_id

Then rerun the SELECT statement.

SELECT pickup_location_id, 
       sum(passenger_count) / pc_cnt AS pc_avg, 
       count() AS pc_cnt
FROM tripdata 
GROUP BY pickup_location_id 
ORDER BY pc_cnt DESC 
LIMIT 10

Result;

Elapsed Time: 0.016 seconds.

A materialized view is 113 times faster.

Example 2

Now we will give an example of another materialized view on aggregate functions.

Following select statement will return 1 row in 4.640 seconds.

SELECT min(fare_amount),
       avg(fare_amount),
       max(fare_amount),
       sum(fare_amount),
       count
FROM tripdata
WHERE (fare_amount > 0) AND (fare_amount < 500.)

Now let’s create a materialized view.

CREATE MATERIALIZED VEW tripdata_agg_mv
ENGINE = SummingMergeTree
PARTITION BY toYYYYMM(pickup_date)
ORDER BY (pickup_location_id, dropoff_location_id)
POPULATE AS SELECT
              pickup_date,
              pickup_location_id,
              dropoff_location_id,
              minState(fare_amount) AS fare_amount_min,
              avgState(fare_amount) AS fare_amount_avg,
              maxState(fare_amount) AS fare_amount_max,
              sumState(fare_amount) AS fare_amount_sum,
              countState() AS fare_amount_count
FROM tripdata
WHERE (fare_amount > 0 ) AND (fare_amount < 500.)
GROUP BY pickup_date, pickup_location_id, dropoff_location_id

Run the following SQL command. It will take only 0.017 seconds. That means materialized view is 274 times faster.

SELECT minMerge(fare_amount_min) AS fare_amount_min,
       avgMerge(fare_amount_avg) AS fare_amount_avg,
       sumMerge(fare_amount_sum) AS fare_amount_sum,
       countMerge(fare_amount_count) AS fare_amount_count
FROM tripdata_agg_mv

Materialized View Operations

Drop View

DROP TABLE tripdata_agg_mv

Update View

ALTER TABLE tripdata_agg_mv
      ADD COLUMN customer_id UInit32 AFTER sku,
      MODIFY ORDER BY ORDER BY (sku, hour, customer_id)

Recreate View

CREATE MATERIALIZED VIEW sales_amount_mv TO sales_amount_agg
AS SELECT 
      toStartOfHour(datetime) AS hour,
      sumState(amount) AS amount_sum,
      avgState(amount) AS amount_avg,
      sku, customer_id
FROM sales
GROUP BY hour, sku, customer_id

Conclusion

Materialized views, unlike normal views, keep your data in a table along with your queries. And this table is updated as you take action. Using a materialized view saves you a lot of time in your operations.

For more information, please visit the official ClickHouse docs here.

To learn more about Materialized Views in ClickHouse, read the following articles:

About Can Sayn 36 Articles
Can Sayın is experienced Database Administrator in open source relational and NoSql databases, working in complicated infrastructures. Over 5 years industry experience, he gain managing database systems. He is working at ChistaDATA Inc. His areas of interest are generally on open source systems.
Contact: Website