WITH [ RECURSIVE ] <cte_name> AS
<anchor_clause> UNION ALL <recursive_clause>
SELECT ... FROM ...;
Where:
anchor_clause
selects an initial row or set of rows that represent the top of the hierarchy. For
example, if you are trying to display all the employees in a company, the anchor clause would select the President
of the company.
The anchor clause is a SELECT statement and can contain any supported SQL constructs.
The anchor clause cannot reference the cte_name
.
recursive_clause
selects the next layer of the hierarchy based on the previous layer. In the first iteration, the previous layer is the
result set from the anchor clause. In subsequent iterations, the previous layer is the most recent completed iteration.
The recursive_clause
is a SELECT statement; however, the statement is restricted
to projections, joins, and filters. In addition, the following are not allowed in the statement:
Aggregate or window functions.
GROUP BY
, ORDER BY
, LIMIT
, or DISTINCT
.
The recursive clause can reference the cte_name
like a regular table or view.
For a more detailed description of the syntax, see WITH.
Logically, the recursive CTE is evaluated as follows:
The anchor_clause
is evaluated and its result is written to both the final result set and to a working table.
The cte_name
is effectively an alias to the working table; in other words, a query referencing the
cte_name
reads from the working table.
While the working table is not empty:
The recursive_clause
is evaluated, using the current contents of the working table wherever cte_name
is referenced.
The result of recursive_clause
is written to both the final result set and a temp table.
The working table is overwritten by the content of the temp table.
Effectively, the output of the previous iteration is stored in a working table named cte_name
, and that table is
then one of the inputs to the next iteration. The working table contains only the result of the most recent iteration.
The accumulated results from all iterations so far are stored elsewhere.
After the final iteration, the accumulated results are available to the main SELECT statement by referencing cte_name
.
Recursive CTE Considerations
Potential for Infinite Loops
Constructing a recursive CTE incorrectly can cause an infinite loop. In these cases, the query continues to run until the query
succeeds, the query times out (e.g. exceeds the number of seconds specified by the
STATEMENT_TIMEOUT_IN_SECONDS parameter), or you cancel the query.
For information on how infinite loops can occur and for guidelines on how to avoid this problem, see
Troubleshooting a Recursive CTE.
Non-Contiguous Hierarchies
This topic described hierarchies and how parent-child relationships can be used by recursive CTEs. In all of the examples
in this topic, the hierarchies are contiguous.
For information about non-contiguous hierarchies, see Querying Hierarchical Data.
Examples
This section includes both non-recursive and recursive CTEs examples to contrast the two types.
Non-Recursive, Two-Level, Self-joining CTE
This example uses a table of employees and managers:
CREATE OR REPLACE TABLE employees (title VARCHAR, employee_ID INTEGER, manager_ID INTEGER);
INSERT INTO employees (title, employee_ID, manager_ID) VALUES
('President', 1, NULL), -- The President has no manager.
('Vice President Engineering', 10, 1),
('Programmer', 100, 10),
('QA Engineer', 101, 10),
('Vice President HR', 20, 1),
('Health Insurance Analyst', 200, 20);
A two-level self-join of this employee table looks like:
SELECT
emps.title,
emps.employee_ID,
mgrs.employee_ID AS MANAGER_ID,
mgrs.title AS "MANAGER TITLE"
FROM employees AS emps LEFT OUTER JOIN employees AS mgrs
ON emps.manager_ID = mgrs.employee_ID
ORDER BY mgrs.employee_ID NULLS FIRST, emps.employee_ID;
+----------------------------+-------------+------------+----------------------------+
| TITLE | EMPLOYEE_ID | MANAGER_ID | MANAGER TITLE |
|----------------------------+-------------+------------+----------------------------|
| President | 1 | NULL | NULL |
| Vice President Engineering | 10 | 1 | President |
| Vice President HR | 20 | 1 | President |
| Programmer | 100 | 10 | Vice President Engineering |
| QA Engineer | 101 | 10 | Vice President Engineering |
| Health Insurance Analyst | 200 | 20 | Vice President HR |
+----------------------------+-------------+------------+----------------------------+
The query above shows all the employees. Each manager’s employees appear near their manager in the report. However, the
report doesn’t visually show the hierarchy. Without looking carefully at the data, you don’t know how many levels there
are in the organization, and you need to read each row in order to see which employees are associated with a specific
manager.
A recursive CTE can display this hierarchical data as a sideways tree, as shown in the next section.
Recursive CTE with Indented Output
Below are two examples of using a recursive CTE:
The first uses indentation to show the different levels of the hierarchy. To simplify this example, the code does not
produce the rows in a particular order.
The second example uses indentation and shows each manager’s employees immediately below their manager.
Unordered Output
Here is the first example.
1) WITH RECURSIVE managers
2) (indent, employee_ID, manager_ID, employee_title)
3) AS
4) (
6) SELECT '' AS indent, employee_ID, manager_ID, title AS employee_title
7) FROM employees
8) WHERE title = 'President'
10) UNION ALL
12) SELECT indent || '--- ',
13) employees.employee_ID, employees.manager_ID, employees.title
14) FROM employees JOIN managers
15) ON employees.manager_ID = managers.employee_ID
16) )
18) SELECT indent || employee_title AS Title, employee_ID, manager_ID
19) FROM managers
20) ;
The query includes the following sections:
Line 2 contains the column names for the “view” (CTE).
Lines 4 - 16 contain the CTE.
Lines 6 - 8 contain the anchor clause of the CTE.
Lines 12 - 15 contain the recursive clause of the CTE.
Lines 18 - 19 contain the main SELECT that uses the CTE as a view. This SELECT references:
The CTE name (managers
), defined in line 1.
The CTE’s columns (indent
, employee_id
, etc.) defined in line 2.
The CTE contains two SELECT statements:
The SELECT statement in the anchor clause is executed once and provides the set of rows from the
first (top) level of the hierarchy.
The SELECT in the recursive clause can reference the CTE. You can think of the query as
iterating, with each iteration building on the previous iterations’ query results.
In the manager/employee example, the anchor clause emits the first row, which is the row that describes the company
president.
In the next iteration of the recursive clause, the recursive clause finds all the rows whose manager is the company
president (i.e. it finds all of the vice presidents). The 3rd iteration finds all the employees whose manager is one
of the vice presidents. Iteration continues until there is an iteration in which all of the rows retrieved are rows of
leaf-level employees who do not manage anyone. The statement does one more iteration, looking for (but not finding)
any employees whose managers are leaf-level employees. That iteration produces 0 rows, and the iteration stops.
Throughout these iterations, the UNION ALL clause accumulates the results. The results of each iteration are added
to the results of the previous iterations. After the last iteration completes, the accumulated rows (like any rows
produced in a WITH clause) are made available to the main SELECT clause in the query. That main SELECT can then query
those rows.
This particular example query uses indentation to show the hierarchical nature of the data. If you look at the output,
you see that the lower the level of the employee, the further that employee’s data is indented.
The indentation is controlled by the column named indent
. The indentation starts at 0 characters (an empty string
in the anchor clause), and increases by 4 characters (---
) for each iteration (i.e. for each level in the hierarchy).
Not surprisingly, it is very important to construct the join(s) correctly, and to select the correct columns in the
recursive clause. The columns in the SELECT of the recursive clause must correspond correctly to the columns in
the anchor clause. Remember that the query starts with the President, then selects the Vice Presidents, and then
selects the people who report directly to the Vice Presidents, etc. Each iteration looks for employees whose
manager_id
field corresponds to one of the managers.employee_id
values produced in the previous iteration.
Expressed another way, the employee ID in the managers “view” is the manager ID for the next level of employees. The
employee IDs must progress downward through the hierarchy (President, Vice President, senior manager, junior manager, etc.)
during each iteration. If the employee IDs don’t progress, then the query can loop infinitely (if the same manager_ID
keeps appearing in the managers.employee_ID
column in different iterations), or skip a level, or fail in other ways.
Ordered Output
The previous example had no ORDER BY clause, so even though each employee’s record is indented properly, each employee did
not necessarily appear directly underneath their manager. The example below generates output with correct indentation, and
with each manager’s employees directly underneath their manager.
The query’s ORDER BY clause uses an additional column, named sort_key
. The sort key accumulates as the recursive clause
iterates; you can think of the sort key as a string that contains the entire chain of command above you (your manager, your
manager’s manager, etc.). The most senior person in that chain of command (the President) is at the beginning of the sort
key string. Although you normally wouldn’t display the sort key, the query below includes the sort key in the output so that
it is easier to understand the output.
Each iteration should increase the length of the sort key by the same amount (same number of characters), so the query uses
a UDF (user-defined function) named skey
, with the following definition, to generate consistent-length segments of the
sort key:
CREATE OR REPLACE FUNCTION skey(ID VARCHAR)
RETURNS VARCHAR
SUBSTRING('0000' || ID::VARCHAR, -4) || ' '
Here is an example of output from the SKEY
function:
SELECT skey(12);
+----------+
| SKEY(12) |
|----------|
| 0012 |
+----------+
Here is the final version of the query. This puts each manager’s employees immediately underneath that manager, and indents based
on the “level” of the employee:
WITH RECURSIVE managers
-- Column list of the "view"
(indent, employee_ID, manager_ID, employee_title, sort_key)
-- Common Table Expression
-- Anchor Clause
SELECT '' AS indent,
employee_ID, manager_ID, title AS employee_title, skey(employee_ID)
FROM employees
WHERE title = 'President'
UNION ALL
-- Recursive Clause
SELECT indent || '--- ',
employees.employee_ID, employees.manager_ID, employees.title,
sort_key || skey(employees.employee_ID)
FROM employees JOIN managers
ON employees.manager_ID = managers.employee_ID
-- This is the "main select".
SELECT
indent || employee_title AS Title, employee_ID,
manager_ID,
sort_key
FROM managers
ORDER BY sort_key
+----------------------------------+-------------+------------+-----------------+
| TITLE | EMPLOYEE_ID | MANAGER_ID | SORT_KEY |
|----------------------------------+-------------+------------+-----------------|
| President | 1 | NULL | 0001 |
| --- Vice President Engineering | 10 | 1 | 0001 0010 |
| --- --- Programmer | 100 | 10 | 0001 0010 0100 |
| --- --- QA Engineer | 101 | 10 | 0001 0010 0101 |
| --- Vice President HR | 20 | 1 | 0001 0020 |
| --- --- Health Insurance Analyst | 200 | 20 | 0001 0020 0200 |
+----------------------------------+-------------+------------+-----------------+
The next query shows how to reference a field from the previous (higher) level in the hierarchy; pay particular attention to the
mgr_title
column:
WITH RECURSIVE managers
-- Column names for the "view"/CTE
(employee_ID, manager_ID, employee_title, mgr_title)
-- Common Table Expression
-- Anchor Clause
SELECT employee_ID, manager_ID, title AS employee_title, NULL AS mgr_title
FROM employees
WHERE title = 'President'
UNION ALL
-- Recursive Clause
SELECT
employees.employee_ID, employees.manager_ID, employees.title, managers.employee_title AS mgr_title
FROM employees JOIN managers
ON employees.manager_ID = managers.employee_ID
-- This is the "main select".
SELECT employee_title AS Title, employee_ID, manager_ID, mgr_title
FROM managers
ORDER BY manager_id NULLS FIRST, employee_ID
+----------------------------+-------------+------------+----------------------------+
| TITLE | EMPLOYEE_ID | MANAGER_ID | MGR_TITLE |
|----------------------------+-------------+------------+----------------------------|
| President | 1 | NULL | NULL |
| Vice President Engineering | 10 | 1 | President |
| Vice President HR | 20 | 1 | President |
| Programmer | 100 | 10 | Vice President Engineering |
| QA Engineer | 101 | 10 | Vice President Engineering |
| Health Insurance Analyst | 200 | 20 | Vice President HR |
+----------------------------+-------------+------------+----------------------------+
Parts Explosion
Manager/employee hierarchies are not the only type of variable-depth hierarchies that you can store in a single table and process
with a recursive CTE. Another common example of hierarchical data is a “parts explosion”, in which each component can be listed with
its sub-components, each of which can be listed with its sub-sub-components.
For example, suppose that your table contains hierarchical data, such as the components of a car. Your car probably contains
components such as an engine, wheels, etc. Many of those components contain sub-components (e.g. an engine might contain a fuel pump).
The fuel pump might contain a motor, tubing, etc. You could list all the components and their sub-components using a recursive CTE.
For an example of a query that produces a parts explosion, see WITH.
Troubleshooting a Recursive CTE
Recursive CTE Query Runs Until It Succeeds or Times Out
This issue can be caused by two different scenarios:
Your data hierarchy might have a cycle.
You might have created an infinite loop.
Cause 1: Cyclic Data Hierarchy
If your data hierarchy contains a cycle (i.e. it is not a true tree), there are multiple possible solutions:
- Solution 1.1:
If the data is not supposed to contain a cycle, correct the data.
- Solution 1.2:
Limit the query in some way (e.g. limit the number of rows of output). For example:
WITH RECURSIVE t(n) AS
SELECT 1
UNION ALL
SELECT N + 1 FROM t
SELECT n FROM t LIMIT 10;
- Solution 1.3:
Do not use a query that contains a recursive CTE, which expects hierarchical data.
Cause 2: Infinite Loop
An infinite loop can happen if the projection clause in the recursive_clause
outputs a value
from the “parent” (the previous iteration) instead of the “child” (the current iteration) and then the next
iteration uses that value in a join when it should use the current iteration’s value in the join.
The following pseudo-code shows an approximate example of this:
CREATE TABLE employees (employee_ID INT, manager_ID INT, ...);
INSERT INTO employees (employee_ID, manager_ID) VALUES
(1, NULL),
(2, 1);
WITH cte_name (employee_ID, manager_ID, ...) AS
-- Anchor Clause
SELECT employee_ID, manager_ID FROM table1
UNION ALL
SELECT manager_ID, employee_ID -- <<< WRONG
FROM table1 JOIN cte_name
ON table1.manager_ID = cte_name.employee_ID
SELECT ...
In this example, the recursive clause passes its parent value (manager_id
) in the column that should have the
current/child value (employee_id
). The parent will show up as the “current” value in the next iteration, and will
be passed again as the “current” value to the following generation, so the query never progresses down through the
levels; it keeps processing the same level each time.
- Step 1:
Suppose that the anchor clause selects the values employee_id = 1
and manager_id = NULL
.
employee_ID manager_ID
----------- ---------
1 NULL
- Step 2:
During the first iteration of the recursive clause, employee_id = 2
and manager_id = 1
in table1
.
employee_ID manager_ID
----------- ----------
1 NULL
table1
:
employee_ID manager_ID
----------- ----------
Result of the join in the recursive clause:
table1.employee_ID table1.manager_ID cte.employee_ID cte.manager_ID
----------------- ----------------- --------------- --------------
2 1 1 NULL
Projection:
employee_ID manager_ID
----------- ----------
However, because the employee_id
and manager_id
columns are reversed in the projection, the actual output of
the query (and thus the content of the CTE at the start of the next iteration) is:
employee_ID manager_ID
----------- ----------
1 2 -- Because manager and employee IDs reversed
- Step 3:
During the second iteration of the recursive clause:
employee_ID manager_ID
----------- ----------
table1
:
employee_ID manager_ID
----------- ----------
Result of join in recursive clause:
table1.employee_ID table1.manager_ID cte.employee_ID cte.manager_ID
----------------- ----------------- --------------- --------------
2 1 1 2
Projection:
employee_ID manager_ID
----------- ----------
Result of the query (contents of CTE at start of next iteration):
employee_ID manager_ID
----------- ----------
1 2 -- Because manager and employee IDs reversed
As you can see, at the end of the second iteration, the row in the CTE is the same as it was at the start of the
iteration:
employee_id
is 1
.
manager_id
is 2
.
Thus, the result of the join during the next iteration will be the same as the result of the join during the current
iteration, and the query loops infinitely.
If you have created an infinite loop:
- Solution 2:
Make sure that the recursive clause passes the correct variables in the correct order.
Also make sure that the JOIN condition in the recursive clause is correct. In a typical case, the parent of the
“current” row should be joined to the child/current value of the parent row.