JOIN Clause
Join produces a new table by combining columns from one or multiple tables by using values common to each. It is a common operation in databases with SQL support, which corresponds to relational algebra join. The special case of one table join is often referred to as “self-join”.
Syntax
SELECT <expr_list>
FROM <left_table>
[GLOBAL] [INNER|LEFT|RIGHT|FULL|CROSS] [OUTER|SEMI|ANTI|ANY|ALL|ASOF] JOIN <right_table>
(ON <expr_list>)|(USING <column_list>) ...
Expressions from
ON
clause and columns from
USING
clause are called “join keys”. Unless otherwise stated, join produces a
Cartesian product
from rows with matching “join keys”, which might produce results with much more rows than the source tables.
Related Content
- Blog: ClickHouse: A Blazingly Fast DBMS with Full SQL Join Support - Part 1
- Blog: ClickHouse: A Blazingly Fast DBMS with Full SQL Join Support - Under the Hood - Part 2
- Blog: ClickHouse: A Blazingly Fast DBMS with Full SQL Join Support - Under the Hood - Part 3
- Blog: ClickHouse: A Blazingly Fast DBMS with Full SQL Join Support - Under the Hood - Part 4
Supported Types of JOIN
All standard SQL JOIN types are supported:
-
INNER JOIN
, only matching rows are returned. -
LEFT OUTER JOIN
, non-matching rows from left table are returned in addition to matching rows. -
RIGHT OUTER JOIN
, non-matching rows from right table are returned in addition to matching rows. -
FULL OUTER JOIN
, non-matching rows from both tables are returned in addition to matching rows. -
CROSS JOIN
, produces cartesian product of whole tables, “join keys” are not specified.
JOIN
without specified type implies
INNER
. Keyword
OUTER
can be safely omitted. Alternative syntax for
CROSS JOIN
is specifying multiple tables in
FROM clause
separated by commas.
Additional join types available in ClickHouse:
-
LEFT SEMI JOIN
andRIGHT SEMI JOIN
, a whitelist on “join keys”, without producing a cartesian product. -
LEFT ANTI JOIN
andRIGHT ANTI JOIN
, a blacklist on “join keys”, without producing a cartesian product. -
LEFT ANY JOIN
,RIGHT ANY JOIN
andINNER ANY JOIN
, partially (for opposite side ofLEFT
andRIGHT
) or completely (forINNER
andFULL
) disables the cartesian product for standardJOIN
types. -
ASOF JOIN
andLEFT ASOF JOIN
, joining sequences with a non-exact match.ASOF JOIN
usage is described below. -
PASTE JOIN
, performs a horizontal concatenation of two tables.
When
join_algorithm
is set to
partial_merge
,
RIGHT JOIN
and
FULL JOIN
are supported only with
ALL
strictness (
SEMI
,
ANTI
,
ANY
, and
ASOF
are not supported).
Settings
The default join type can be overridden using join_default_strictness setting.
The behavior of ClickHouse server for
ANY JOIN
operations depends on the
any_join_distinct_right_table_keys
setting.
See also
- join_algorithm
- join_any_take_last_row
- join_use_nulls
- partial_merge_join_optimizations
- partial_merge_join_rows_in_right_blocks
- join_on_disk_max_files_to_merge
- any_join_distinct_right_table_keys
Use the
cross_to_inner_join_rewrite
setting to define the behavior when ClickHouse fails to rewrite a
CROSS JOIN
as an
INNER JOIN
. The default value is
1
, which allows the join to continue but it will be slower. Set
cross_to_inner_join_rewrite
to
0
if you want an error to be thrown, and set it to
2
to not run the cross joins but instead force a rewrite of all comma/cross joins. If the rewriting fails when the value is
2
, you will receive an error message stating "Please, try to simplify
WHERE
section".
ON Section Conditions
An
ON
section can contain several conditions combined using the
AND
and
OR
operators. Conditions specifying join keys must refer both left and right tables and must use the equality operator. Other conditions may use other logical operators but they must refer either the left or the right table of a query.
Rows are joined if the whole complex condition is met. If the conditions are not met, still rows may be included in the result depending on the
JOIN
type. Note that if the same conditions are placed in a
WHERE
section and they are not met, then rows are always filtered out from the result.
The
OR
operator inside the
ON
clause works using the hash join algorithm — for each
OR
argument with join keys for
JOIN
, a separate hash table is created, so memory consumption and query execution time grow linearly with an increase in the number of expressions
OR
of the
ON
clause.
If a condition refers columns from different tables, then only the equality operator (
=
) is supported so far.
Example
Consider
table_1
and
table_2
:
┌─Id─┬─name─┐ ┌─Id─┬─text───────────┬─scores─┐
│ 1 │ A │ │ 1 │ Text A │ 10 │
│ 2 │ B │ │ 1 │ Another text A │ 12 │
│ 3 │ C │ │ 2 │ Text B │ 15 │
└────┴──────┘ └────┴────────────────┴────────┘
Query with one join key condition and an additional condition for
table_2
:
SELECT name, text FROM table_1 LEFT OUTER JOIN table_2
ON table_1.Id = table_2.Id AND startsWith(table_2.text, 'Text');
Note that the result contains the row with the name
C
and the empty text column. It is included into the result because an
OUTER
type of a join is used.
┌─name─┬─text───┐
│ A │ Text A │
│ B │ Text B │
│ C │ │
└──────┴────────┘
Query with
INNER
type of a join and multiple conditions:
SELECT name, text, scores FROM table_1 INNER JOIN table_2
ON table_1.Id = table_2.Id AND table_2.scores > 10 AND startsWith(table_2.text, 'Text');
Result:
┌─name─┬─text───┬─scores─┐
│ B │ Text B │ 15 │
└──────┴────────┴────────┘
Query with
INNER
type of a join and condition with
OR
:
CREATE TABLE t1 (`a` Int64, `b` Int64) ENGINE = MergeTree() ORDER BY a;
CREATE TABLE t2 (`key` Int32, `val` Int64) ENGINE = MergeTree() ORDER BY key;
INSERT INTO t1 SELECT number as a, -a as b from numbers(5);
INSERT INTO t2 SELECT if(number % 2 == 0, toInt64(number), -number) as key, number as val from numbers(5);
SELECT a, b, val FROM t1 INNER JOIN t2 ON t1.a = t2.key OR t1.b = t2.key;
Result:
┌─a─┬──b─┬─val─┐
│ 0 │ 0 │ 0 │
│ 1 │ -1 │ 1 │
│ 2 │ -2 │ 2 │
│ 3 │ -3 │ 3 │
│ 4 │ -4 │ 4 │
└───┴────┴─────┘
Query with
INNER
type of a join and conditions with
OR
and
AND
:
By default, non-equal conditions are supported as long as they use columns from the same table.
For example,
t1.a = t2.key AND t1.b > 0 AND t2.b > t2.c
, because
t1.b > 0
uses columns only from
t1
and
t2.b > t2.c
uses columns only from
t2
.
However, you can try experimental support for conditions like
t1.a = t2.key AND t1.b > t2.key
, check out section below for more details.
SELECT a, b, val FROM t1 INNER JOIN t2 ON t1.a = t2.key OR t1.b = t2.key AND t2.val > 3;
Result:
┌─a─┬──b─┬─val─┐
│ 0 │ 0 │ 0 │
│ 2 │ -2 │ 2 │
│ 4 │ -4 │ 4 │
└───┴────┴─────┘
[experimental] Join with inequality conditions for columns from different tables
This feature is experimental. To use it, set
allow_experimental_join_condition
to 1 in your configuration files or by using the
SET
command:
SET allow_experimental_join_condition=1
Otherwise, you'll get
INVALID_JOIN_ON_EXPRESSION
.
Clickhouse currently supports
ALL/ANY/SEMI/ANTI INNER/LEFT/RIGHT/FULL JOIN
with inequality conditions in addition to equality conditions. The inequality conditions are supported only for
hash
and
grace_hash
join algorithms. The inequality conditions are not supported with
join_use_nulls
.
Example
Table
t1
:
┌─key──┬─attr─┬─a─┬─b─┬─c─┐
│ key1 │ a │ 1 │ 1 │ 2 │
│ key1 │ b │ 2 │ 3 │ 2 │
│ key1 │ c │ 3 │ 2 │ 1 │
│ key1 │ d │ 4 │ 7 │ 2 │
│ key1 │ e │ 5 │ 5 │ 5 │
│ key2 │ a2 │ 1 │ 1 │ 1 │
│ key4 │ f │ 2 │ 3 │ 4 │
└──────┴──────┴───┴───┴───┘
Table
t2
┌─key──┬─attr─┬─a─┬─b─┬─c─┐
│ key1 │ A │ 1 │ 2 │ 1 │
│ key1 │ B │ 2 │ 1 │ 2 │
│ key1 │ C │ 3 │ 4 │ 5 │
│ key1 │ D │ 4 │ 1 │ 6 │
│ key3 │ a3 │ 1 │ 1 │ 1 │
│ key4 │ F │ 1 │ 1 │ 1 │
└──────┴──────┴───┴───┴───┘
SELECT t1.*, t2.* from t1 LEFT JOIN t2 ON t1.key = t2.key and (t1.a < t2.a) ORDER BY (t1.key, t1.attr, t2.key, t2.attr);
key1 a 1 1 2 key1 B 2 1 2
key1 a 1 1 2 key1 C 3 4 5
key1 a 1 1 2 key1 D 4 1 6
key1 b 2 3 2 key1 C 3 4 5
key1 b 2 3 2 key1 D 4 1 6
key1 c 3 2 1 key1 D 4 1 6
key1 d 4 7 2 0 0 \N
key1 e 5 5 5 0 0 \N
key2 a2 1 1 1 0 0 \N
key4 f 2 3 4 0 0 \N
NULL values in JOIN keys
The NULL is not equal to any value, including itself. It means that if a JOIN key has a NULL value in one table, it won't match a NULL value in the other table.
Example
Table
A
:
┌───id─┬─name────┐
│ 1 │ Alice │
│ 2 │ Bob │
│ ᴺᵁᴸᴸ │ Charlie │
└──────┴─────────┘
Table
B
:
┌───id─┬─score─┐
│ 1 │ 90 │
│ 3 │ 85 │
│ ᴺᵁᴸᴸ │ 88 │
└──────┴───────┘
SELECT A.name, B.score FROM A LEFT JOIN B ON A.id = B.id
┌─name────┬─score─┐
│ Alice │ 90 │
│ Bob │ 0 │
│ Charlie │ 0 │
└─────────┴───────┘
Notice that the row with
Charlie
from table
A
and the row with score 88 from table
B
are not in the result because of the NULL value in the JOIN key.
In case you want to match NULL values, use the
isNotDistinctFrom
function to compare the JOIN keys.
SELECT A.name, B.score FROM A LEFT JOIN B ON isNotDistinctFrom(A.id, B.id)
┌─name────┬─score─┐
│ Alice │ 90 │
│ Bob │ 0 │
│ Charlie │ 88 │
└─────────┴───────┘
ASOF JOIN Usage
ASOF JOIN
is useful when you need to join records that have no exact match.
Algorithm requires the special column in tables. This column:
- Must contain an ordered sequence.
- Can be one of the following types: Int, UInt , Float , Date , DateTime , Decimal .
-
For
hash
join algorithm it can’t be the only column in theJOIN
clause.
Syntax
ASOF JOIN ... ON
:
SELECT expressions_list
FROM table_1
ASOF LEFT JOIN table_2
ON equi_cond AND closest_match_cond
You can use any number of equality conditions and exactly one closest match condition. For example,
SELECT count() FROM table_1 ASOF LEFT JOIN table_2 ON table_1.a == table_2.b AND table_2.t <= table_1.t
.
Conditions supported for the closest match:
>
,
>=
,
<
,
<=
.
Syntax
ASOF JOIN ... USING
:
SELECT expressions_list
FROM table_1
ASOF JOIN table_2
USING (equi_column1, ... equi_columnN, asof_column)
ASOF JOIN
uses
equi_columnX
for joining on equality and
asof_column
for joining on the closest match with the
table_1.asof_column >= table_2.asof_column
condition. The
asof_column
column is always the last one in the
USING
clause.
For example, consider the following tables:
table_1 table_2
event | ev_time | user_id event | ev_time | user_id
----------|---------|---------- ----------|---------|----------
... ...
event_1_1 | 12:00 | 42 event_2_1 | 11:59 | 42
... event_2_2 | 12:30 | 42
event_1_2 | 13:00 | 42 event_2_3 | 13:00 | 42
... ...
ASOF JOIN
can take the timestamp of a user event from
table_1
and find an event in
table_2
where the timestamp is closest to the timestamp of the event from
table_1
corresponding to the closest match condition. Equal timestamp values are the closest if available. Here, the
user_id
column can be used for joining on equality and the
ev_time
column can be used for joining on the closest match. In our example,
event_1_1
can be joined with
event_2_1
and
event_1_2
can be joined with
event_2_3
, but
event_2_2
can’t be joined.
ASOF JOIN
is supported only by
hash
and
full_sorting_merge
join algorithms.
It's
not
supported in the
Join
table engine.
PASTE JOIN Usage
The result of
PASTE JOIN
is a table that contains all columns from left subquery followed by all columns from the right subquery.