添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

JSON创建索引概述

对于JSON数据来说并没有什么特殊的SQL数据类型,所以创建索引的方式也并无特别之处。可以对 json_value 函数返回值创建B-tree索引或者位图索引,也可以对 is json js not json json_exists 等函数创建位图索引。

一般来说,位图索引通常是创建在那些唯一值很少的字段上,比如性别之类。

函数索引只会在对应函数的查询sql上才会生效,这在具体使用中无可避免存在一些局限性,所以在创建这些函数索引之前需要充分了解到业务经常使用的场景。

对于那些查询场景不确定的情况也有一种对应方式,那就是 JSON search index ,这是一种常规索引,不局限于特定的JSON路径表达式。它非常适合结构式的查询,比如查找JSON字段中的具体值,并且通过 json_textcontains 来进行全文查询,在各种字符串值中寻找特定的单词。

对于同一个JSON列来说,可以同时创建函数索引和全文索引,这里要注意的是JSON search index是专门用来对应JSON数据的全文索引。

Oracle建议使用AL32UTF8字符集,否则创建和使用索引时会自动的进行字符集的转换,而这种转换是有损的,也就表示某些数据查询的时候会无法返回

创建一张测试JSON表

1
2
3
4
5
6
7
drop table json_test purge;

create table json_test (
id raw(16) not null,
document clob
CONSTRAINT ensure_json CHECK (document IS JSON)
);

插入测试数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
INSERT INTO json_test
VALUES (SYS_GUID(),
'{ "PONumber" : 1600,
"Reference" : "ABULL-20140421",
"Requestor" : "Alexis Bull",
"User" : "ABULL",
"CostCenter" : "A50",
"ShippingInstructions" : { "name" : "Alexis Bull",
"Address": { "street" : "200 Sporting Green",
"city" : "South San Francisco",
"state" : "CA",
"zipCode" : 99236,
"country" : "United States of America" },
"Phone" : [ { "type" : "Office", "number" : "909-555-7307" },
{ "type" : "Mobile", "number" : "415-555-1234" } ] },
"Special Instructions" : null,
"AllowPartialShipment" : false,
"LineItems" : [ { "ItemNumber" : 1,
"Part" : { "Description" : "One Magic Christmas",
"UnitPrice" : 19.95,
"UPCCode" : 13131092899 },
"Quantity" : 9.0 },
{ "ItemNumber" : 2,
"Part" : { "Description" : "Lethal Weapon",
"UnitPrice" : 19.95,
"UPCCode" : 85391628927 },
"Quantity" : 5.0 } ] }');

创建位图索引

创建基于json_exists返回数据的位图索引,这种索引就非常适合json_exists,因为返回的结果只有true或者false

1
2
3
4
create bitmap index idx_name
on json_test (json_exists(document, '$.ShippingInstructions.name'));

Index created.

创建基于json_value函数返回数据的位图索引,这种就适合字段里只有少量可能的数据的情况

1
2
3
4
drop index idx_name;

create bitmap index idx_name
on json_test (json_value(document, '$.ShippingInstructions'));

创建JSON_VALUE函数索引

可以创建基于JSON_VALUE的函数索引,标准语法即可,显式的指定函数json_value,或者直接简单的用点表示法的语法。这些方法创建的索引对基于点表示法查询和json_value查询都可应用。

1
CREATE UNIQUE INDEX po_num_idx ON json_test po (po.document.PONumber);

这里基于点表示法创建了一个函数索引,另外也可以通过json_value函数来创建基于POnumber字段的索引

1
2
create unique index po_num_idx2 
on json_test (json_value(document, '$.PONumber' returning number error on error null on empty));

注意这里的 ERROR ON ERROR 表示如果数据包含不存在的POnumber的记录、有多个PONumber、或者有个PONumber不是数字型则索引就会创建失败,如果这个索引已经存在,则不符合要求的数据会插入失败。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
set autotrace traceonly

select a.document."User",a.document."CostCenter" from json_test a
where json_value(document, '$.PONumber' returning number error on error null on empty) = 1600;

Execution Plan
----------------------------------------------------------
Plan hash value: 1738154326

--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 27M| 30 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 8168 | 27M| 30 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| JSON_TEST | 1 | 3489 | 1 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | PO_NUM_IDX2 | 1 | | 1 (0)| 00:00:01 |
| 4 | JSONTABLE EVALUATION | | | | | |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access(JSON_VALUE("DOCUMENT" /*+ LOB_BY_VALUE */ FORMAT JSON , '$.POnumber'
RETURNING NUMBER ERROR ON ERROR NULL ON EMPTY)=1600)

如果使用点表示法来查询,则同样可以使用这个索引

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
select a.document."User",a.document."CostCenter" from json_test a
where a.document."PONumber" = 1600;

Execution Plan
----------------------------------------------------------
Plan hash value: 1738154326

--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 27M| 30 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 8168 | 27M| 30 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| JSON_TEST | 1 | 3489 | 1 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | PO_NUM_IDX2 | 1 | | 1 (0)| 00:00:01 |
| 4 | JSONTABLE EVALUATION | | | | | |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access(JSON_VALUE("DOCUMENT" /*+ LOB_BY_VALUE */ FORMAT JSON , '$.PONumber'
RETURNING NUMBER ERROR ON ERROR NULL ON EMPTY)=1600)

这里可以看到,尽管使用点表示法时,但是谓词仍将其看做JSON_VALUE格式的数据而使用json_value的索引,并没有看见其进行隐式转换成JSON_QUERY,这里主要是优化器进行了更优解而选择了json_value表达式的索引。

而当我们不使用这个索引时,就会看到预期的json_query转换

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
drop index po_num_idx2;

select a.document."User",a.document."CostCenter" from json_test a
where a.document."PONumber" = '1600';

Execution Plan
----------------------------------------------------------
Plan hash value: 4223338378

-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 27M| 30 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 8168 | 27M| 30 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| JSON_TEST | 1 | 3496 | 1 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | PO_NUM_IDX | 1 | | 1 (0)| 00:00:01 |
| 4 | JSONTABLE EVALUATION | | | | | |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access(JSON_QUERY("DOCUMENT" /*+ LOB_BY_VALUE */ FORMAT JSON ,
'$.PONumber' RETURNING VARCHAR2(4000) ASIS WITHOUT ARRAY WRAPPER NULL ON
ERROR)='1600')

Oracle建议使用下面的方式对json_value创建函数索引

索引值可以灵活对应点表示法的查询,无论何时都会返回JSON数据。包含非标量的JSON数据,可以匹配除了json_value查询的点表示法查询。

  • json_value表达式,指定 RETURNING 数据类型,使用 ERROR ON ERROR (可选 NULL ON EMPTY

    索引值只有指定数据类型的标量数据,但是仍可以适用于那些返回标量数据的点表示法查询。

    JSON_VALUE索引和查询的数据类型注意事项

    默认情况下,json_value函数返回的是varchar2类型数据,当创建一个基于json_value的函数索引时,除非指定 RETURNING 语句返回一个不同类型的数据,否则当谓词条件是一个非varchar2值的时候,就并不会适用索引。

    返回值为varchar2类型,谓词为number型

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    select a.document."User",a.document."CostCenter" from json_test a
    where json_value(document, '$.PONumber' error on error null on empty) = 1600;

    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 2422004586

    ------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 82 | 278K| 32 (0)| 00:00:01 |
    | 1 | NESTED LOOPS | | 82 | 278K| 32 (0)| 00:00:01 |
    | 2 | TABLE ACCESS FULL | JSON_TEST | 1 | 3476 | 3 (0)| 00:00:01 |
    |* 3 | JSONTABLE EVALUATION | | | | | |
    ------------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    3 - filter(TO_NUMBER("P"."C_01$")=1600)

    多JSON字段创建复合索引

    当对JSON对象的多个字段建复合索引时,首选需要对其创建虚拟列,然后在虚拟列上创建B-tree索引。

    1
    2
    3
    4
    5
    6
    7
    alter table json_test add (userid varchar2(50)
    generated always as (json_value(document, '$.User' returning varchar2(50))));

    alter table json_test add (cost varchar2(50)
    generated always as (json_value(document, '$.CostCenter' returning varchar2(50))));

    create index idx_json on json_test(userid,cost);

    两种方式查询可以用到复合索引

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    select document from json_test
    where userid='ABULL'
    and cost='A50';

    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 521514918

    -------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    -------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | 3496 | 1 (0)| 00:00:01 |
    | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| JSON_TEST | 1 | 3496 | 1 (0)| 00:00:01 |
    |* 2 | INDEX RANGE SCAN | IDX_JSON | 1 | | 1 (0)| 00:00:01 |
    -------------------------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    2 - access("USERID"='ABULL' AND "COST"='A50')
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    select document from json_test
    where json_value(document,'$.User' returning varchar2(50))='ABULL'
    and json_value(document,'$.CostCenter' returning varchar2(50))='A50';

    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 521514918

    -------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    -------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | 3496 | 1 (0)| 00:00:01 |
    | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| JSON_TEST | 1 | 3496 | 1 (0)| 00:00:01 |
    |* 2 | INDEX RANGE SCAN | IDX_JSON | 1 | | 1 (0)| 00:00:01 |
    -------------------------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    2 - access("JSON_TEST"."USERID"='ABULL' AND "JSON_TEST"."COST"='A50')

    JSON全文索引

    JSON search索引是一个常规索引,它可以提高临时结构查询和全文搜索的性能。 这是一个Oracle Text索引,专门设计用于JSON数据。

    如果你使用12cR1版本创建过JSON search索引,则在新版本之后建议drop后重建

    JSON search index

    通过关键字 FOR JSON 来执行 CREATE SEARCH INDEX 语句进行索引的创建,JSON全文索引的维护是异步的,这样可以避免在业务高峰期的索引维护成本,而选择在业务低谷的时候进行这些消耗资源的操作。异步维护也就意味着在索引同步之前,那些被修改的数据或新插入的数据并不会在索引里存在记录。

    创建JSON全文索引

    1
    create search index idx_search on json_test(document) for json;

    当JSON全文索引创建完毕以后,就可以使用 json_textcontains 来进行查询

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    select document from json_test
    where json_textcontains(document, '$.LineItems.Part.Description', 'Magic');

    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 1340174430

    ------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | 3488 | 1 (0)| 00:00:01 |
    | 1 | TABLE ACCESS BY INDEX ROWID| JSON_TEST | 1 | 3488 | 1 (0)| 00:00:01 |
    |* 2 | DOMAIN INDEX | IDX_SEARCH | | | 1 (0)| 00:00:01 |
    ------------------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    2 - access("CTXSYS"."CONTAINS"("JSON_TEST"."DOCUMENT",'Magic INPATH
    (/LineItems/Part/Description)')>0)

    执行计划中出现了DOMAIN INDEX表示sql使用了全文索引。

    上面的例子仅查询了描述字段里出现Magic的情况,而比如想查询Description同时包含 Magic Christmas 的情况则需要使用操作符 & and

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    select document from json_test
    where json_textcontains(document, '$.LineItems.Part.Description', 'Magic \& Christmas');

    { "PONumber" : 1600,
    "Reference" : "ABULL-20140421",
    "Requestor" : "Alexis Bull",
    "User" : "ABULL",
    "CostCenter" : "A50",
    "ShippingInstructions" : { "name" : "Alexis Bull",
    "Address": { "street" : "200 Sporting Green",
    "city" : "South San Francisco",
    "state" : "CA",
    "zipCode" : 99236,
    "country" : "United States of America" },
    "Phone" : [ { "type" : "Office", "number" : "909-555-7307" },
    { "type" : "Mobile", "number" : "415-555-1234" } ] },
    "Special Instructions" : null,
    "AllowPartialShipment" : false,
    "LineItems" : [ { "ItemNumber" : 1,
    "Part" : { "Description" : "One Magic Christmas",
    "UnitPrice" : 19.95,
    "UPCCode" : 13131092899 },
    "Quantity" : 9.0 },
    { "ItemNumber" : 2,
    "Part" : { "Description" : "Lethal Weapon",
    "UnitPrice" : 19.95,
    "UPCCode" : 85391628927 },
    "Quantity" : 5.0 } ] }

    这里使用了反斜杠转义 &

    也可以用and,但是注意这里查询包含保留字and,所以需要加上花括号{}

    1
    2
    select document from json_test
    where json_textcontains(document, '$.LineItems.Part.Description', '{Magic and Christmas}');

    JSON全文索引同样支持那些不包含json_textcontains的查询

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    SELECT COUNT(*)
    FROM json_test
    WHERE JSON_EXISTS(document, '$.Reference');

    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 2409884336

    -------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    -------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | 3488 | 1 (0)| 00:00:01 |
    | 1 | SORT AGGREGATE | | 1 | 3488 | | |
    |* 2 | DOMAIN INDEX | IDX_SEARCH | | | 1 (0)| 00:00:01 |
    -------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    2 - access("CTXSYS"."CONTAINS"("JSON_TEST"."DOCUMENT",'HASPATH(/Refer
    ence)')>0)
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    SELECT COUNT(*)
    FROM json_test
    where json_value(document, '$.ShippingInstructions.name') = 'Alexis Bull';

    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 3056167535

    -------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    -------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | 3488 | 1 (0)| 00:00:01 |
    | 1 | SORT AGGREGATE | | 1 | 3488 | | |
    |* 2 | TABLE ACCESS BY INDEX ROWID| JSON_TEST | 1 | 3488 | 1 (0)| 00:00:01 |
    |* 3 | DOMAIN INDEX | IDX_SEARCH | | | 1 (0)| 00:00:01 |
    -------------------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    2 - filter(JSON_VALUE("DOCUMENT" FORMAT JSON , '$.ShippingInstructions.name'
    RETURNING VARCHAR2(4000) NULL ON ERROR)='Alexis Bull')
    3 - access("CTXSYS"."CONTAINS"("JSON_TEST"."DOCUMENT",'{Alexis Bull} INPATH
    (/ShippingInstructions/name)')>0)
  •