What is Materialized View | What should it mean to us?
While normal views only store the queries and fetch the data from the related tables according to the specified criteria when requested, materialized views also keep the data returned as a result of these queries as well as queries. So it acts as a kind of table. The purpose of using it is; to prepare only the data of the tables with huge data that meet the specified criteria and present them quickly. The materialized view stores the SQL data like a table in the database. The answer to the question, What is ClickHouse Materialized View is actually the advantages it offers us. Using Materialized Views gives us 2 advantages;
- Duplicate data is created by copying table data. For example, instead of going to remote servers and querying a table, we can kind of copy a copy of the table to our own database with Materialized Views. This materialized view detects changes such as update-insert-delete in the table or view it is a copy of and updates itself at certain time intervals or after certain database operations. In this way, a copy of the table’s data on that remote server can always be kept up-to-date as mv. Therefore, we can now use our Materialized View, which is fed from the remote server from time to time, up-to-date on our own server. At the same time, the processing load on the main server is reduced.
- Of course, it provides an amazing performance. For example, consider a view that uses data from several tables; it will take a lot of effort to fetch data from several tables. But if we create the expression, we are querying this view as Materialized View. It will use the data copied into the MV. So it will not go and fetch the data from its own tables.
Difference between Views and Materialized Views in SQL
As we know, the main constituent of any database is its table. To make data accessibility custom, there is a concept of Views. In other words, we can say that with the help of Views of a table, we can restrict any user to access only that data which is supposed to be accessed by him. Now, based on the characteristics and features of the views, we can distinguish between Views and Materialized Views.
Following are the important differences between Views and Materialized Views.
Sr. No. | Key | Views | Materialized Views |
1 | Definition | Technically View of a table is a logical virtual copy of the table created by “select query” but the result is not stored anywhere in the disk, and every time we need to fire the query when we need data, so always we get updated or latest data from original tables. | Materialized Views are also the logical virtual copy of data driven by the select query, but the result of the query will get stored in the table or disk. |
2 | Storage | In Views, the resulting tuples of the query expression are not getting stored on the disk. Only the query expression is stored on the disk. | Materialized Views, both query expression and resulting tuples of the query get stored on the disk. |
3 | Query Execution | As mentioned above, in the case of Views, the query expression is stored on the disk and not its result, so the query expression gets executed whenever a user tries to fetch data from it so that the user will get the latest updated value every time. | Materialized Views the result of the query is getting stored on the disk, and hence the query expression does not get executed whenever a user tries to fetch the data, so the user will not get the latest updated value if it gets changed in the database. |
4 | Cost Effective | As Views does not have any storage cost associated with it, so it also does not have any update cost associated with it. | Materialized Views does have a storage cost associated with it, so it also has an updated cost associated with it. |
5 | Design | Views in SQL are designed with a fixed architecture approach, due to which there is an SQL standard for defining a view. | Materialized Views in SQL are designed with a generic architecture approach, so there is no SQL standard for defining them, and their functionality is provided by some database systems as an extension. |
6 | Usage | Views are generally used when data is to be accessed infrequently, and data in the table get updated frequently. | Materialized Views are used when data is to be accessed frequently, and data in the table does not get updated frequently. |
Example 1
Following SQL statement returns result in 1.818 seconds.
SELECT pickup_location_id, sum(passenger_count) / pc_cnt AS pc_avg, count() AS pc_cnt FROM tripdata GROUP BY pickup_location_id ORDER BY pc_cnt DESC LIMIT 10
Now let’s create MATERIALIZED VIEW
CREATE MATERIALIZED VIEW tripdata_smt_mv ENGINE = SummingMergeTree PARTITION BY toYYYYMM(pickup_date) ORDER BY (pickup_location_id, dropoff_location_id) POPULATE AS SELECT pickup_date, pickup_location_id, dropoff_location_id, sum(passenger_count) AS passenger_count_sum, count() AS trips_count FROM tripdata GROUP BY pickup_date, pickup_location_id, dropoff_location_id
Then rerun the SELECT statement.
SELECT pickup_location_id, sum(passenger_count) / pc_cnt AS pc_avg, count() AS pc_cnt FROM tripdata GROUP BY pickup_location_id ORDER BY pc_cnt DESC LIMIT 10
Result;
Elapsed Time: 0.016 seconds.
A materialized view is 113 times faster.
Example 2
Now we will give an example of another materialized view on aggregate functions.
Following select statement will return 1 row in 4.640 seconds.
SELECT min(fare_amount), avg(fare_amount), max(fare_amount), sum(fare_amount), count FROM tripdata WHERE (fare_amount > 0) AND (fare_amount < 500.)
Now let’s create a materialized view.
CREATE MATERIALIZED VEW tripdata_agg_mv ENGINE = SummingMergeTree PARTITION BY toYYYYMM(pickup_date) ORDER BY (pickup_location_id, dropoff_location_id) POPULATE AS SELECT pickup_date, pickup_location_id, dropoff_location_id, minState(fare_amount) AS fare_amount_min, avgState(fare_amount) AS fare_amount_avg, maxState(fare_amount) AS fare_amount_max, sumState(fare_amount) AS fare_amount_sum, countState() AS fare_amount_count FROM tripdata WHERE (fare_amount > 0 ) AND (fare_amount < 500.) GROUP BY pickup_date, pickup_location_id, dropoff_location_id
Run the following SQL command. It will take only 0.017 seconds. That means materialized view is 274 times faster.
SELECT minMerge(fare_amount_min) AS fare_amount_min, avgMerge(fare_amount_avg) AS fare_amount_avg, sumMerge(fare_amount_sum) AS fare_amount_sum, countMerge(fare_amount_count) AS fare_amount_count FROM tripdata_agg_mv
Materialized View Operations
Drop View
DROP TABLE tripdata_agg_mv
Update View
ALTER TABLE tripdata_agg_mv ADD COLUMN customer_id UInit32 AFTER sku, MODIFY ORDER BY ORDER BY (sku, hour, customer_id)
Recreate View
CREATE MATERIALIZED VIEW sales_amount_mv TO sales_amount_agg AS SELECT toStartOfHour(datetime) AS hour, sumState(amount) AS amount_sum, avgState(amount) AS amount_avg, sku, customer_id FROM sales GROUP BY hour, sku, customer_id
Conclusion
Materialized views, unlike normal views, keep your data in a table along with your queries. And this table is updated as you take action. Using a materialized view saves you a lot of time in your operations.
For more information, please visit the official ClickHouse docs here.
To learn more about Materialized Views in ClickHouse, read the following articles: