How to calculate the ratio of empty/zero values in every column in a table
Learn how to calculate the ratio of empty or zero values in every column of a ClickHouse table to optimize sparse column serialization.
How to calculate the ratio of empty/zero values in every column in a table
If a column is sparse (empty or contains mostly zeros), ClickHouse can encode it in a sparse format and automatically optimize calculations - the data does not require full decompression during queries. In fact, if you know how sparse a column is, you can define its ratio using the ratio_of_defaults_for_sparse_serialization setting to optimize serialization.
This handy query can take a while, but it analyzes every row in your table and determines the ratio of values that are zero (or the default) in every column in the specified table:
SELECT *
APPLY x -> (x = defaultValueOfArgumentType(x)) APPLY avg APPLY x -> round(x, 3)
FROM table_name
FORMAT Vertical
For example, we ran this query above on the environmental sensors dataset table named sensors which has over 20B rows and 19 columns:
SELECT *
APPLY x -> (x = defaultValueOfArgumentType(x)) APPLY avg APPLY x -> round(x, 3)
FROM sensors
FORMAT Vertical