Time-series query performance

After optimizing storage, the next step is improving query performance. This section explores two key techniques: optimizing ORDER BY keys and using materialized views. We'll see how these approaches can reduce query times from seconds to milliseconds.

Optimize `ORDER BY` keys

Before attempting other optimizations, you should optimize ordering keys to ensure ClickHouse produces the fastest possible results. Choosing the right key largely depends on the queries you're going to run. Suppose most of our queries filter by the project and subproject columns. In this case, it's a good idea to add them to the ordering key — as well as the time column since we query on time as well.

Let's create another version of the table that has the same column types as wikistat, but is ordered by (project, subproject, time).

CREATE TABLE wikistat_project_subproject
(
    `time` DateTime,
    `project` String,
    `subproject` String,
    `path` String,
    `hits` UInt64
)
ENGINE = MergeTree
ORDER BY (project, subproject, time);

Let's now compare multiple queries to get an idea of how essential our ordering key expression is to performance. Note that we have haven't applied our previous data type and codec optimizations, so any query performance differences are only based on the sort order.

Query	`(time)`	`(project, subproject, time)`
`SELECT project, sum(hits) AS h FROM wikistat GROUP BY project ORDER BY h DESC LIMIT 10;`	2.381 sec	1.660 sec
`SELECT subproject, sum(hits) AS h FROM wikistat WHERE project = 'it' GROUP BY subproject ORDER BY h DESC LIMIT 10;`	2.148 sec	0.058 sec
`SELECT toStartOfMonth(time) AS m, sum(hits) AS h FROM wikistat WHERE (project = 'it') AND (subproject = 'zero') GROUP BY m ORDER BY m DESC LIMIT 10;`	2.192 sec	0.012 sec
`SELECT path, sum(hits) AS h FROM wikistat WHERE (project = 'it') AND (subproject = 'zero') GROUP BY path ORDER BY h DESC LIMIT 10;`	2.968 sec	0.010 sec

Materialized views

Another option is to use materialized views to aggregate and store the results of popular queries. These results can be queried instead of the original table. Suppose the following query is executed quite often in our case:

SELECT path, SUM(hits) AS v
FROM wikistat
WHERE toStartOfMonth(time) = '2015-05-01'
GROUP BY path
ORDER BY v DESC
LIMIT 10

┌─path──────────────────┬────────v─┐
│ -                     │ 89650862 │
│ Angelsberg            │ 19165753 │
│ Ana_Sayfa             │  6368793 │
│ Academy_Awards        │  4901276 │
│ Accueil_(homonymie)   │  3805097 │
│ Adolf_Hitler          │  2549835 │
│ 2015_in_spaceflight   │  2077164 │
│ Albert_Einstein       │  1619320 │
│ 19_Kids_and_Counting  │  1430968 │
│ 2015_Nepal_earthquake │  1406422 │
└───────────────────────┴──────────┘

10 rows in set. Elapsed: 2.285 sec. Processed 231.41 million rows, 9.22 GB (101.26 million rows/s., 4.03 GB/s.)
Peak memory usage: 1.50 GiB.

Create materialized view

We can create the following materialized view:

CREATE TABLE wikistat_top
(
    `path` String,
    `month` Date,
    hits UInt64
)
ENGINE = SummingMergeTree
ORDER BY (month, hits);

CREATE MATERIALIZED VIEW wikistat_top_mv 
TO wikistat_top
AS
SELECT
    path,
    toStartOfMonth(time) AS month,
    sum(hits) AS hits
FROM wikistat
GROUP BY path, month;

Backfilling destination table

This destination table will only be populated when new records are inserted into the wikistat table, so we need to do some backfilling.

The easiest way to do this is using an INSERT INTO SELECT statement to insert directly into the materialized view's target table using the view's SELECT query (transformation):

INSERT INTO wikistat_top
SELECT
    path,
    toStartOfMonth(time) AS month,
    sum(hits) AS hits
FROM wikistat
GROUP BY path, month;

Depending on the cardinality of the raw data set (we have 1 billion rows!), this can be a memory-intensive approach. Alternatively, you can use a variant that requires minimal memory:

Creating a temporary table with a Null table engine
Connecting a copy of the normally used materialized view to that temporary table
Using an INSERT INTO SELECT query, copying all data from the raw data set into that temporary table
Dropping the temporary table and the temporary materialized view.

With this approach, rows from the raw data set are copied block-wise into the temporary table (which doesn't store any of these rows), and for each block of rows, a partial state is calculated and written to the target table, where these states are incrementally merged in the background.

CREATE TABLE wikistat_backfill
(
    `time` DateTime,
    `project` String,
    `subproject` String,
    `path` String,
    `hits` UInt64
)
ENGINE = Null;

Next, we'll create a materialized view to read from wikistat_backfill and write into wikistat_top

CREATE MATERIALIZED VIEW wikistat_backfill_top_mv 
TO wikistat_top
AS
SELECT
    path,
    toStartOfMonth(time) AS month,
    sum(hits) AS hits
FROM wikistat_backfill
GROUP BY path, month;

And then finally, we'll populate wikistat_backfill from the initial wikistat table:

INSERT INTO wikistat_backfill
SELECT * 
FROM wikistat;

Once that query's finished, we can delete the backfill table and materialized view:

DROP VIEW wikistat_backfill_top_mv;
DROP TABLE wikistat_backfill;

Now we can query the materialized view instead of the original table:

SELECT path, sum(hits) AS hits
FROM wikistat_top
WHERE month = '2015-05-01'
GROUP BY ALL
ORDER BY hits DESC
LIMIT 10;

┌─path──────────────────┬─────hits─┐
│ -                     │ 89543168 │
│ Angelsberg            │  7047863 │
│ Ana_Sayfa             │  5923985 │
│ Academy_Awards        │  4497264 │
│ Accueil_(homonymie)   │  2522074 │
│ 2015_in_spaceflight   │  2050098 │
│ Adolf_Hitler          │  1559520 │
│ 19_Kids_and_Counting  │   813275 │
│ Andrzej_Duda          │   796156 │
│ 2015_Nepal_earthquake │   726327 │
└───────────────────────┴──────────┘

10 rows in set. Elapsed: 0.004 sec.

Our performance improvement here is dramatic. Before it took just over 2 seconds to compute the answer to this query and now it takes only 4 milliseconds.

Optimize ORDER BY keys​

Materialized views​

Create materialized view​

Backfilling destination table​

Optimize `ORDER BY` keys

Materialized views

Create materialized view

Backfilling destination table