let’s say the following N1QL with CASE WHEN
SELECT CASE WHEN condition1(fieldA,fieldB,fieldC) THEN field1
WHEN condition2(fieldA,fieldD,fieldE) THEN field2
WHEN condition3(fieldA,fieldF,fieldG) THEN field3
WHEN condition4(fieldA,fieldF,fieldH) THEN field4
ELSE MISSING
END AS field
FROM default
WHERE condition(fieldX,fieldY)
AS evaluation process of CASE WHEN shows this N1QL can rewrite as following
SELECT field1 AS field
FROM default
WHERE condition(fieldX,fieldY)
AND condition1(fieldA,fieldB,fieldC)
UNION
SELECT field2 AS field
FROM default
WHERE condition(fieldX,fieldY)
AND NOT (condition1(fieldA,fieldB,fieldC))
AND condition2(fieldA,fieldD,fieldE)
UNION
SELECT field3 AS field
FROM default
WHERE condition(fieldX,fieldY)
AND NOT (condition1(fieldA,fieldB,fieldC))
AND NOT (condition2(fieldA,fieldD,fieldE))
AND condition3(fieldA,fieldF,fieldG)
UNION
SELECT field4 AS field
FROM default
WHERE condition(fieldX,fieldY)
AND NOT (condition1(fieldA,fieldB,fieldC))
AND NOT (condition2(fieldA,fieldD,fieldE))
AND NOT (condition3(fieldA,fieldF,fieldG))
AND condition4(fieldA,fieldF,fieldH)
am I right?
if so,let’s talk about which is better for N1QL query efficiency.
If I can use correct index I think the second N1QL should be more efficiency, because it will use index to get more documents.
the test step (on CB 5.0.0-2194 Enterprise Edition (build-2194)) as following:
I have create 2 index as following
create index idx_case_when_union_test_1 on default(fieldX,fieldY,fieldA,fieldB,fieldC,fieldD,fieldE,fieldF,fieldG,fieldH,field1,field2,field3,field4);
create index idx_case_when_union_test_2 on default(fieldA,fieldB,fieldC,fieldD,fieldE,fieldF,fieldG,fieldH,field1,field2,field3,field4) WHERE (fieldX == "X" AND fieldY == "Y");
and for the first N1QL
SELECT CASE WHEN fieldA == "A" AND fieldB == "B" AND fieldC == "C" THEN field1
WHEN fieldA == "A" AND fieldD == "D" AND fieldE == "E" THEN field2
WHEN fieldA == "A" AND fieldF == "F" AND fieldG == "G" THEN field3
WHEN fieldA == "A" AND fieldF == "F" AND fieldH == "H" THEN field4
ELSE MISSING
END AS field
FROM default
WHERE fieldX == "X" AND fieldY == "Y"
the explain shows
gist.github.com
CASE_WHEN_EXPLAIN
"#operator": "Sequence",
"#stats": {
"#phaseSwitches": 2,
"execTime": "964ns",
"kernTime": "2.489323ms"
"~children": [
"#operator": "Authorize",
This file has been truncated. show original
FROM default
WHERE fieldX == "X" AND fieldY == "Y"
AND fieldA == "A" AND fieldB == "B" AND fieldC == "C"
UNION
SELECT field2 AS field
FROM default
WHERE fieldX == "X" AND fieldY == "Y"
AND NOT (fieldA == "A" AND fieldB == "B" AND fieldC == "C")
AND fieldA == "A" AND fieldD == "D" AND fieldE == "E"
UNION
SELECT field3 AS field
FROM default
WHERE fieldX == "X" AND fieldY == "Y"
AND NOT (fieldA == "A" AND fieldB == "B" AND fieldC == "C")
AND NOT (fieldA == "A" AND fieldD == "D" AND fieldE == "E")
AND fieldA == "A" AND fieldF == "F" AND fieldG == "G"
UNION
SELECT field4 AS field
FROM default
WHERE fieldX == "X" AND fieldY == "Y"
AND NOT (fieldA == "A" AND fieldB == "B" AND fieldC == "C")
AND NOT (fieldA == "A" AND fieldD == "D" AND fieldE == "E")
AND NOT (fieldA == "A" AND fieldF == "F" AND fieldG == "G")
AND fieldA == "A" AND fieldF == "F" AND fieldH == "H"
the explain shows
gist.github.com
UNION_SELECT_EXPLAIN
"#operator": "Sequence",
"#stats": {
"#phaseSwitches": 2,
"execTime": "805ns",
"kernTime": "2.835992ms"
"~children": [
"#operator": "Authorize",
This file has been truncated. show original
it looks like both N1QL use cover index with index idx_case_when_union_test_1
.
which one is better?
PS: I use USE INDEX (`idx_case_when_union_test_2`)
for both N1QL,only the second N1QL works.
atom_yang:
PS: I use USE INDEX (idx_case_when_union_test_2
) for both N1QL,only the second N1QL works.
Query1 cannot use index idx_case_when_union_test_2 because leading index key is not part of the predicate.
Which one is better depends on the data and how many CASE statements etc.
In my opinion CASE WHEN will be better. Pros and Cons
Predicates are pushed exactly and single IndexScan will be used (less load on indexer)
CASE evaluation might be expensive depends on which WHEN condition it matches.
For single query latency might be higher than UNION but loaded system can scale better.
UNION Pros and Cons
As each UNION arm projects different attributes, there will not be any duplicates across UNION arms you can use UNION ALL.
Each UNION query will have different precise IndexScans and each IndexScan will have precise Spans. This result in more IndexScans and there might be load on indexer Also for each Spans all predicates may not pushed to indexer. This may lead to false positives form indexer.
Take example : fieldX == “X” AND fieldY == “Y”
AND fieldA == “A” AND fieldD == “D” AND fieldE == “E”
Given index only filedX,filedY,fieldA can be pushed to indexer not fieldD, fieldE
Query has NOT predicate which needs to be DNF transformed. NOT (fieldA == “A” AND fieldB == “B” AND fieldC == “C”) ===> (fieldA != “A” OR fieldB != “B” OR fieldC != “C”). This may require more indexes, IndexScans and face same problem as described above.