In our previous KB, we mentioned the functions. You can view the tables and data from earlier KBs. We’ll attempt to go through count functions today and compare them.
10.000.000 records were used in this test. Uniq values are available in the columns.
DISTINCT
SELECT countDistinct(id, address, value) FROM exp_test ┌─uniqExact(id, address, value)─┐ │ 3876385 │ └───────────────────────────────┘ 1 row in set. Elapsed: 0.748 sec. Processed 10.00 million rows, 120.00 MB (13.36 million rows/s., 160.37 MB/s.)
uniqExact
SELECT uniqExact(*) FROM exp_test ┌─uniqExact(id, address, value)─┐ │ 3876385 │ └───────────────────────────────┘ 1 row in set. Elapsed: 0.712 sec. Processed 10.00 million rows, 120.00 MB (14.04 million rows/s., 168.45 MB/s.)
uniqCombined64
SELECT uniqCombined64(id, address, value) FROM exp_test ┌─uniqCombined64(id, address, value)─┐ │ 3871184 │ └────────────────────────────────────┘ 1 row in set. Elapsed: 0.184 sec. Processed 10.00 million rows, 120.00 MB (54.33 million rows/s., 651.96 MB/s.)
uniq
SELECT uniq(*) FROM exp_test ┌─uniq(id, address, value)─┐ │ 3864706 │ └──────────────────────────┘ 1 row in set. Elapsed: 0.114 sec. Processed 10.00 million rows, 120.00 MB (87.71 million rows/s., 1.05 GB/s.)
Similar findings were achieved when we ran this test with various count numbers. These findings lead us to the conclusion that the DISTINCT function equals uniqExact. For more imprecise results, use the uniqExact function; for results where speed is critical, use the uniq function on large datasets.