ClickHouse doesn't have a PIVOT clause, but we can get close to this functionality using aggregate function combinators. Let's see how to do this using the UK housing prices dataset.
This isn't a particularly interesting example as it returns the same map that we passed in.
Let's now call sumMap over multiple rows of maps;
WITH values AS (
SELECT map('ClickHouse', 3) AS value
UNION ALL
SELECT map('ClickBench', 2, 'ClickHouse', 4) AS value
)
SELECT sumMap(value)
FROM values;
The key ClickHouse appeared on both rows and its values have been summed. The key ClickBench was only present on one line, so it sums a single value, which returns that value!
We can also use maxMap to find the maximum values per key:
WITH values AS (
SELECT map('ClickHouse', 3) AS value
UNION ALL
SELECT map('ClickBench', 2, 'ClickHouse', 4) AS value
)
SELECT maxMap(value)
FROM values;
Or we can use avgMap to find the average value per key:
WITH values AS (
SELECT map('ClickHouse', 3) AS value
UNION ALL
SELECT map('ClickBench', 2, 'ClickHouse', 4) AS value
)
SELECT avgMap(value)
FROM values;
Let's work out the median prices grouped by county for each decade in the dataset:
WITH year(toStartOfInterval(date, toIntervalYear(10))) AS year
SELECT
county,
medianMap(map(year, price)) AS medianPrices
FROM uk.uk_price_paid
GROUP BY ALL
ORDER BY max(price) DESC
LIMIT 10;
We can filter the results to only include data from 2010 and on:
WITH year(toStartOfInterval(date, toIntervalYear(10))) AS year
SELECT
county,
medianMap(map(year, price)) AS medianPrices
FROM uk.uk_price_paid
WHERE year >= 2010
GROUP BY ALL
ORDER BY max(price) DESC
LIMIT 10;
And if we want to find the maximum price per decade we can do that using the maxMap function that we saw earlier:
WITH year(toStartOfInterval(date, toIntervalYear(10))) AS year
SELECT
county,
medianMap(map(year, price)) AS medianPrices,
maxMap(map(year, price)) AS maxPrices
FROM uk.uk_price_paid
WHERE year >= 2010
GROUP BY ALL
ORDER BY max(price) DESC
LIMIT 10;
Alternatively, we can compute the average price using avgMap.
Those values have a lot of decimal points, which we can clean up by using the mapApply function to call the floor function on each value in the map:
WITH year(toStartOfInterval(date, toIntervalYear(10))) AS year
SELECT
county,
medianMap(map(year, price)) AS medianPrices,
mapApply((k, v) -> (k, floor(v)), avgMap(map(year, price))) AS avgPrices
FROM uk.uk_price_paid
WHERE year >= 2010
GROUP BY ALL
ORDER BY max(price) DESC
LIMIT 10;
Flexible grouping: counties, districts, and postcodes
Let's try to group by some different fields.
This time we're going to compute the median price per decade grouped by county and district:
WITH year(toStartOfInterval(date, toIntervalYear(10))) AS year
SELECT
county,
district,
medianMap(map(year, price)) AS medianPrices
FROM uk.uk_price_paid
WHERE year >= 2010
GROUP BY ALL
ORDER BY max(price) DESC
LIMIT 10
┌─county─────────────┬─district───────────────┬─medianPrices────────────────┐
1. │ GREATER LONDON │ CROYDON │ {2010:298475,2020:400000} │
2. │ GREATER LONDON │ CITY OF WESTMINSTER │ {2010:800000,2020:935000} │
3. │ GREATER LONDON │ SOUTHWARK │ {2010:437000,2020:540000} │
4. │ TYNE AND WEAR │ NEWCASTLE UPON TYNE │ {2010:144000,2020:162500} │
5. │ WEST MIDLANDS │ WALSALL │ {2010:137450,2020:162000} │
6. │ GREATER LONDON │ CITY OF LONDON │ {2010:725875,2020:840000} │
7. │ GREATER LONDON │ HILLINGDON │ {2010:329125,2020:439000} │
8. │ GREATER MANCHESTER │ MANCHESTER │ {2010:144972.5,2020:190000} │
9. │ GREATER LONDON │ HAMMERSMITH AND FULHAM │ {2010:622250,2020:750000} │
10. │ GREATER LONDON │ ISLINGTON │ {2010:500000,2020:640000} │
└────────────────────┴────────────────────────┴─────────────────────────────┘
We could also choose to group by year and then concatenate postcode1 and postcode2 in the map:
WITH year(toStartOfInterval(date, toIntervalYear(10))) AS year
SELECT
year,
medianMap(map(postcode1 || ' ' || postcode2, price)) AS medianPrices
FROM uk.uk_price_paid
WHERE postcode1 LIKE 'NP1'
GROUP BY ALL;