1. Home
  2. Knowledge Base
  3. ClickHouse DBA
  4. How to Diff Two Tables in ClickHouse?
  1. Home
  2. Knowledge Base
  3. ClickHouse Support
  4. How to Diff Two Tables in ClickHouse?

How to Diff Two Tables in ClickHouse?

Assume we have two tables with identical amounts of data, each with roughly n million rows for each. If we want to compare the data row by row, we can utilize with sipHash64(*) hash function.

What is sipHash64?

This hashing operation uses cryptography. Compared to the MD5 function, it operates at least three times faster. The function converts each input parameter to a string and calculates its hash value. Then employs the following procedure to merge hashes:

The function obtains the array of hashes after hashing each supplied argument.
The function creates a hash for the array of the first and second components.
The function then produces a hash for the array comprising the hash value computed in the preceding step and the third member of the initial hash array.
The previous phase is repeated for each of the initial hash array’s remaining components.

Let’s learn more about the cars table and one of the tables and compare the two tables!

SELECT * FROM cars

┌─year─┬─brand───┬─gearbox─┬─km_info─┐
│ 2008 │ Audi    │ Auto    │  130000 │
│ 2016 │ VW      │ Manual  │   85000 │
│ 2019 │ BMW     │ Auto    │   60000 │
│ 2020 │ Peugeot │ Auto    │   30000 │
└──────┴─────────┴─────────┴─────────┘

Let’s learn more about the vehicle table, which can be compared to the cars table.

SELECT * FROM vehicle

┌─year─┬─type──┬─brand_name─┬─route─┬─duration─┬─fuel_type─┬─gear─┬─breakdown─┐
│ 2008 │ car   │ Audi       │ West  │ 1 Week   │ Gas       │ Auto │ Yes       │
│ 2010 │ car   │ Mercedes   │ South │ 30 Day   │ Gas       │ Auto │ No        │
│ 2019 │ car   │ BMW        │ South │ 1 Week   │ Diesel    │ Auto │ No        │
│ 2019 │ plane │ Airbus     │ North │ 1 Day    │ Avgas     │ Auto │ No        │
│ 2021 │ plane │ Airbus     │ East  │ 2 Day    │ Avgas     │ Auto │ No        │
└──────┴───────┴────────────┴───────┴──────────┴───────────┴──────┴───────────┘

Well, two tables have some mutual columns, and we want to compare the data in these mutual columns.

This query gives us the difference between cars and vehicle tables according to the data hashes.

SELECT * FROM cars
WHERE sipHash64(year, brand, gearbox) NOT IN (
    SELECT sipHash64(year, brand_name, gear)
    FROM vehicle)

┌─year─┬─brand───┬─gearbox─┬─km_info─┐
│ 2016 │ VW      │ Manual  │   85000 │
│ 2020 │ Peugeot │ Auto    │   30000 │
└──────┴─────────┴─────────┴─────────┘

Also, this query gives us the difference between vehicle and cars tables according to the data hashes.

SELECT * FROM vehicle
WHERE sipHash64(year, brand_name, gear) NOT IN (
    SELECT sipHash64(year, brand, gearbox)
    FROM cars)

┌─year─┬─type──┬─brand_name─┬─route─┬─duration─┬─fuel_type─┬─gear─┬─breakdown─┐
│ 2010 │ car   │ Mercedes   │ South │ 30 Day   │ Gas       │ Auto │ No        │
│ 2019 │ plane │ Airbus     │ North │ 1 Day    │ Avgas     │ Auto │ No        │
│ 2021 │ plane │ Airbus     │ East  │ 2 Day    │ Avgas     │ Auto │ No        │
└──────┴───────┴────────────┴───────┴──────────┴───────────┴──────┴───────────┘

 

Was this article helpful?

Related Articles

CHISTADATA IS COMMITTED TO OPEN SOURCE SOFTWARE AND BUILDING HIGH PERFORMANCE COLUMNSTORES

In the spirit of freedom, independence and innovation. ChistaDATA Corporation is not affiliated with ClickHouse Corporation 

Need Support?

Can't find the answer you're looking for?
Contact Support

ChistaDATA Inc. Knowledge base is licensed under the Apache License, Version 2.0 (the “License”)

Copyright 2022 ChistaDATA Inc

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.