JSON functions and operators #

The SQL standard describes functions and operators to process JSON data. They allow you to access JSON data according to its structure, generate JSON data, and store it persistently in SQL tables.

Importantly, the SQL standard imposes that there is no dedicated data type to represent JSON data in SQL. Instead, JSON data is represented as character or binary strings. Although Trino supports JSON type, it is not used or produced by the following functions.

Trino supports three functions for querying JSON data: json_exists , json_query , and json_value . Each of them is based on the same mechanism of exploring and processing JSON input using JSON path.

Trino also supports two functions for generating JSON data – json_array , and json_object .

JSON path language #

The JSON path language is a special language, used exclusively by certain SQL operators to specify the query to perform on the JSON input. Although JSON path expressions are embedded in SQL queries, their syntax significantly differs from SQL. The semantics of predicates, operators, etc. in JSON path expressions generally follow the semantics of SQL. The JSON path language is case-sensitive for keywords and identifiers.

JSON path syntax and semantics #

JSON path expressions are recursive structures. Although the name “path” suggests a linear sequence of operations going step by step deeper into the JSON structure, a JSON path expression is in fact a tree. It can access the input JSON item multiple times, in multiple ways, and combine the results. Moreover, the result of a JSON path expression is not a single item, but an ordered sequence of items. Each of the sub-expressions takes one or more input sequences, and returns a sequence as the result.

In the lax mode, most path operations first unnest all JSON arrays in the input sequence. Any divergence from this rule is mentioned in the following listing. Path modes are explained in JSON path modes .

The JSON path language features are divided into: literals, variables, arithmetic binary expressions, arithmetic unary expressions, and a group of operators collectively known as accessors.

literals #

numeric literals

They include exact and approximate numbers, and are interpreted as if they were SQL values.

-1, 1.2e3, NaN
string literals
They are enclosed in double quotes.
"Some text"
boolean literals
true, false
null literal
It has the semantics of the JSON null, not of SQL null. See Comparison rules.
current item variable
It is used inside the filter expression to refer to the currently processed
item from the input sequence.
last subscript variable
It refers to the last index of the innermost enclosing array. Array indexes
in JSON path expressions are zero-based.
arithmetic binary expressions#
The JSON path language supports five arithmetic binary operators:
<path1> + <path2>
<path1> - <path2>
<path1> * <path2>
<path1> / <path2>
<path1> % <path2>
Both operands, <path1> and <path2>, are evaluated to sequences of
items. For arithmetic binary operators, each input sequence must contain a
single numeric item. The arithmetic operation is performed according to SQL
semantics, and it returns a sequence containing a single element with the
result.
The operators follow the same precedence rules as in SQL arithmetic operations,
and parentheses can be used for grouping.
arithmetic unary expressions#
+ <path>
- <path>
The operand <path> is evaluated to a sequence of items. Every item must be
a numeric value. The unary plus or minus is applied to every item in the
sequence, following SQL semantics, and the results form the returned sequence.
member accessor#
The member accessor returns the value of the member with the specified key for
each JSON object in the input sequence.
<path>.key
<path>."key"
The condition when a JSON object does not have such a member is called a
structural error. In the lax mode, it is suppressed, and the faulty object is
excluded from the result.
Let <path> return a sequence of three JSON objects:
{"customer" : 100, "region" : "AFRICA"},
{"region" : "ASIA"},
{"customer" : 300, "region" : "AFRICA", "comment" : null}
the expression <path>.customer succeeds in the first and the third object,
but the second object lacks the required member. In strict mode, path
evaluation fails. In lax mode, the second object is silently skipped, and the
resulting sequence is 100, 300.
All items in the input sequence must be JSON objects.
Trino does not support JSON objects with duplicate keys.
wildcard member accessor#
Returns values from all key-value pairs for each JSON object in the input
sequence. All the partial results are concatenated into the returned sequence.
<path>.*
Let <path> return a sequence of three JSON objects:
{"customer" : 100, "region" : "AFRICA"},
{"region" : "ASIA"},
{"customer" : 300, "region" : "AFRICA", "comment" : null}
The results is:
100, "AFRICA", "ASIA", 300, "AFRICA", null
All items in the input sequence must be JSON objects.
The order of values returned from a single JSON object is arbitrary. The
sub-sequences from all JSON objects are concatenated in the same order in which
the JSON objects appear in the input sequence.
descendant member accessor#
Returns the values associated with the specified key in all JSON objects on all
levels of nesting in the input sequence.
<path>..key
<path>.."key"
The order of returned values is that of preorder depth first search. First, the
enclosing object is visited, and then all child nodes are visited.
This method does not perform array unwrapping in the lax mode. The results
are the same in the lax and strict modes. The method traverses into JSON
arrays and JSON objects. Non-structural JSON items are skipped.
Let <path> be a sequence containing a JSON object:
    "id" : 1,
    "notes" : [{"type" : 1, "comment" : "foo"}, {"type" : 2, "comment" : null}],
    "comment" : ["bar", "baz"]
<path>..comment --> ["bar", "baz"], "foo", null
array accessor#
Returns the elements at the specified indexes for each JSON array in the input
sequence. Indexes are zero-based.
<path>[ <subscripts> ]
The <subscripts> list contains one or more subscripts. Each subscript
specifies a single index or a range (ends inclusive):
<path>[<path1>, <path2> to <path3>, <path4>,...]
In lax mode, any non-array items resulting from the evaluation of the input
sequence are wrapped into single-element arrays. Note that this is an exception
to the rule of automatic array wrapping.
Each array in the input sequence is processed in the following way:
The variable last is set to the last index of the array.
All subscript indexes are computed in order of declaration. For a
singleton subscript <path1>, the result must be a singleton numeric item.
For a range subscript <path2> to <path3>, two numeric items are expected.
The specified array elements are added in order to the output sequence.
Let <path> return a sequence of three JSON arrays:
[0, 1, 2], ["a", "b", "c", "d"], [null, null]
The following expression returns a sequence containing the last element from
every array:
<path>[last] --> 2, "d", null
The following expression returns the third and fourth element from every array:
<path>[2 to 3] --> 2, "c", "d"
Note that the first array does not have the fourth element, and the last array
does not have the third or fourth element. Accessing non-existent elements is a
structural error. In strict mode, it causes the path expression to fail. In lax
mode, such errors are suppressed, and only the existing elements are returned.
Another example of a structural error is an improper range specification such
as 5 to 3.
Note that the subscripts may overlap, and they do not need to follow the
element order. The order in the returned sequence follows the subscripts:
<path>[1, 0, 0] --> 1, 0, 0, "b", "a", "a", null, null, null
wildcard array accessor#
Returns all elements of each JSON array in the input sequence.
<path>[*]
In lax mode, any non-array items resulting from the evaluation of the input
sequence are wrapped into single-element arrays. Note that this is an exception
to the rule of automatic array wrapping.
The output order follows the order of the original JSON arrays. Also, the order
of elements within the arrays is preserved.
Let <path> return a sequence of three JSON arrays:
[0, 1, 2], ["a", "b", "c", "d"], [null, null]
<path>[*] --> 0, 1, 2, "a", "b", "c", "d", null, null
filter#
Retrieves the items from the input sequence which satisfy the predicate.
<path>?( <predicate> )
JSON path predicates are syntactically similar to boolean expressions in SQL.
However, the semantics are different in many aspects:
They operate on sequences of items.
They have their own error handling (they never fail).
They behave different depending on the lax or strict mode.
The predicate evaluates to true, false, or unknown. Note that some
predicate expressions involve nested JSON path expression. When evaluating the
nested path, the variable @ refers to the currently examined item from the
input sequence.
The following predicate expressions are supported:
Conjunction
<predicate1> && <predicate2>
Disjunction
<predicate1> || <predicate2>
Negation
! <predicate>
exists predicate
exists( <path> )
Returns true if the nested path evaluates to a non-empty sequence, and
false when the nested path evaluates to an empty sequence. If the path
evaluation throws an error, returns unknown.
starts with predicate
<path> starts with "Some text"
<path> starts with $variable
The nested <path> must evaluate to a sequence of textual items, and the
other operand must evaluate to a single textual item. If evaluating of either
operand throws an error, the result is unknown. All items from the sequence
are checked for starting with the right operand. The result is true if a
match is found, otherwise false. However, if any of the comparisons throws
an error, the result in the strict mode is unknown. The result in the lax
mode depends on whether the match or the error was found first.
is unknown predicate
( <predicate> ) is unknown
Returns true if the nested predicate evaluates to unknown, and
false otherwise.
Comparisons
<path1> == <path2>
<path1> <> <path2>
<path1> != <path2>
<path1> < <path2>
<path1> > <path2>
<path1> <= <path2>
<path1> >= <path2>
Both operands of a comparison evaluate to sequences of items. If either
evaluation throws an error, the result is unknown




    
. Items from the left and
right sequence are then compared pairwise. Similarly to the starts with
predicate, the result is true if any of the comparisons returns true,
otherwise false. However, if any of the comparisons throws an error, for
example because the compared types are not compatible, the result in the strict
mode is unknown. The result in the lax mode depends on whether the true
comparison or the error was found first.
Comparison rules#
Null values in the context of comparison behave different than SQL null:
null == null –> true
null != null, null < null, … –> false
null compared to a scalar value –> false
null compared to a JSON array or a JSON object –> false
When comparing two scalar values, true or false is returned if the
comparison is successfully performed. The semantics of the comparison is the
same as in SQL. In case of an error, e.g. comparing text and number,
unknown is returned.
Comparing a scalar value with a JSON array or a JSON object, and comparing JSON
arrays/objects is an error, so unknown is returned.
Examples of filter#
Let <path> return a sequence of three JSON objects:
{"customer" : 100, "region" : "AFRICA"},
{"region" : "ASIA"},
{"customer" : 300, "region" : "AFRICA", "comment" : null}
<path>?(@.region != "ASIA") --> {"customer" : 100, "region" : "AFRICA"},
                                {"customer" : 300, "region" : "AFRICA", "comment" : null}
<path>?(!exists(@.customer)) --> {"region" : "ASIA"}
The following accessors are collectively referred to as item methods.
double()#
Converts numeric or text values into double values.
<path>.double()
Let <path> return a sequence -1, 23e4, "5.6":
<path>.double() --> -1e0, 23e4, 5.6e0
ceiling(), floor(), and abs()#
Gets the ceiling, the floor or the absolute value for every numeric item in the
sequence. The semantics of the operations is the same as in SQL.
Let <path> return a sequence -1.5, -1, 1.3:
<path>.ceiling() --> -1.0, -1, 2.0
<path>.floor() --> -2.0, -1, 1.0
<path>.abs() --> 1.5, 1, 1.3
keyvalue()#
Returns a collection of JSON objects including one object per every member of
the original object for every JSON object in the sequence.
<path>.keyvalue()
The returned objects have three members:
“name”, which is the original key,
“value”, which is the original bound value,
“id”, which is the unique number, specific to an input object.
Let <path> be a sequence of three JSON objects:
{"customer" : 100, "region" : "AFRICA"},
{"region" : "ASIA"},
{"customer" : 300, "region" : "AFRICA", "comment" : null}
<path>.keyvalue() --> {"name" : "customer", "value" : 100, "id" : 0},
                      {"name" : "region", "value" : "AFRICA", "id" : 0},
                      {"name" : "region", "value" : "ASIA", "id" : 1},
                      {"name" : "customer", "value" : 300, "id" : 2},
                      {"name" : "region", "value" : "AFRICA", "id" : 2},
                      {"name" : "comment", "value" : null, "id" : 2}
It is required that all items in the input sequence are JSON objects.
The order of the returned values follows the order of the original JSON
objects. However, within objects, the order of returned entries is arbitrary.
type()#
Returns a textual value containing the type name for every item in the
sequence.
<path>.type()
This method does not perform array unwrapping in the lax mode.
The returned values are:
"null" for JSON null,
"number" for a numeric item,
"string" for a textual item,
"boolean" for a boolean item,
"date" for an item of type date,
"time without time zone" for an item of type time,
"time with time zone" for an item of type time with time zone,
"timestamp without time zone" for an item of type timestamp,
"timestamp with time zone" for an item of type timestamp with time zone,
"array" for JSON array,
"object" for JSON object,
size()#
Returns a numeric value containing the size for every JSON array in the
sequence.
<path>.size()
This method does not perform array unwrapping in the lax mode. Instead, all
non-array items are wrapped in singleton JSON arrays, so their size is 1.
It is required that all items in the input sequence are JSON arrays.
Let <path> return a sequence of three JSON arrays:
[0, 1, 2], ["a", "b", "c", "d"], [null, null]
<path>.size() --> 3, 4, 2
Limitations#
The SQL standard describes the datetime() JSON path item method and the
like_regex() JSON path predicate. Trino does not support them.
JSON path modes#
The JSON path expression can be evaluated in two modes: strict and lax. In the
strict mode, it is required that the input JSON data strictly fits the schema
required by the path expression. In the lax mode, the input JSON data can
diverge from the expected schema.
The following table shows the differences between the two modes.
Performing an operation which requires a non-array on an array, e.g.:
$.key requires a JSON object
$.floor() requires a numeric value
ERROR
The array is automatically unnested, and the operation is performed on
each array element.
Performing an operation which requires an array on an non-array, e.g.:
$[0], $[*], $.size()
ERROR
The non-array item is automatically wrapped in a singleton array, and
the operation is performed on the array.
A structural error: accessing a non-existent element of an array or a
non-existent member of a JSON object, e.g.:
$[-1] (array index out of bounds)
$.key, where the input JSON object does not have a member key
ERROR
The error is suppressed, and the operation results in an empty sequence.
Examples of the lax mode behavior#
Let <path> return a sequence of three items, a JSON array, a JSON object,
and a scalar numeric value:
[1, "a", null], {"key1" : 1.0, "key2" : true}, -2e3
The following example shows the wildcard array accessor in the lax mode. The
JSON array returns all its elements, while the JSON object and the number are
wrapped in singleton arrays and then unnested, so effectively they appear
unchanged in the output sequence:
<path>[*] --> 1, "a", null, {"key1" : 1.0, "key2" : true}, -2e3
When calling the size() method, the JSON object and the number are also
wrapped in singleton arrays:
<path>.size() --> 3, 1, 1
In some cases, the lax mode cannot prevent failure. In the following example,
even though the JSON array is unwrapped prior to calling the floor()
method, the item "a" causes type mismatch.
<path>.floor() --> ERROR
json_exists#
The json_exists function determines whether a JSON value satisfies a JSON
path specification.
JSON_EXISTS(
    json_input [ FORMAT JSON [ ENCODING { UTF8 | UTF16 | UTF32 } ] ],
    json_path
    [ PASSING json_argument [, ...] ]
    [ { TRUE | FALSE | UNKNOWN | ERROR } ON ERROR ]
The json_path is evaluated using the json_input as the context variable
($), and the passed arguments as the named variables ($variable_name).
The returned value is true if the path returns a non-empty sequence, and
false if the path returns an empty sequence. If an error occurs, the
returned value depends on the ON ERROR clause. The default value returned
ON ERROR is FALSE. The ON ERROR clause is applied for the following
kinds of errors:
Input conversion errors, such as malformed JSON
JSON path evaluation errors, e.g. division by zero
json_input is a character string or a binary string. It should contain
a single JSON item. For a binary string, you can specify encoding.
json_path is a string literal, containing the path mode specification, and
the path expression, following the syntax rules described in
JSON path syntax and semantics.
'strict ($.price + $.tax)?(@ > 99.9)'
'lax $[0 to 1].floor()?(@ > 10)'
In the PASSING clause you can pass arbitrary expressions to be used by the
path expression.
PASSING orders.totalprice AS O_PRICE,
        orders.tax % 10 AS O_TAX
The passed parameters can be referenced in the path expression by named
variables, prefixed with $.
'lax $?(@.price > $O_PRICE || @.tax > $O_TAX)'
Additionally to SQL values, you can pass JSON values, specifying the format and
optional encoding:
PASSING orders.json_desc FORMAT JSON AS o_desc,
        orders.binary_record FORMAT JSON ENCODING UTF16 AS o_rec
Note that the JSON path language is case-sensitive, while the unquoted SQL
identifiers are upper-cased. Therefore, it is recommended to use quoted
identifiers in the PASSING clause:
'lax $.$KeyName' PASSING nation.name AS KeyName --> ERROR; no passed value found
'lax $.$KeyName' PASSING nation.name AS "KeyName" --> correct
Examples#
Let customers be a table containing two columns: id:bigint,
description:varchar.
description
‘{“comment” : “nice”, “children” : [10, 13, 16]}’
‘{“comment” : “problematic”, “children” : [8, 11]}’
‘{“comment” : “knows best”, “children” : [2]}’
The following query checks which customers have children above the age of 10:
SELECT
      json_exists(
                  description,
                  'lax $.children[*]?(@ > 10)'
                 ) AS children_above_ten
FROM customers
In the following query, the path mode is strict. We check the third child for
each customer. This should cause a structural error for the customers who do
not have three or more children. This error is handled according to the ON ERROR clause.
SELECT
      json_exists(
                  description,
                  'strict $.children[2]?(@ > 10)'
                  UNKNOWN ON ERROR
                 ) AS child_3_above_ten
FROM customers
The json_query function extracts a JSON value from a JSON value.
JSON_QUERY(
    json_input [ FORMAT JSON [ ENCODING { UTF8 | UTF16 | UTF32 } ] ],
    json_path
    [ PASSING json_argument [, ...] ]
    [ RETURNING type [ FORMAT JSON [ ENCODING { UTF8 | UTF16 | UTF32 } ] ] ]
    [ WITHOUT [ ARRAY ] WRAPPER |
      WITH [ { CONDITIONAL | UNCONDITIONAL } ] [ ARRAY ] WRAPPER ]
    [ { KEEP | OMIT } QUOTES [ ON SCALAR STRING ] ]
    [ { ERROR | NULL | EMPTY ARRAY | EMPTY OBJECT } ON EMPTY ]
    [ { ERROR | NULL | EMPTY ARRAY | EMPTY OBJECT } ON ERROR ]
The constant string json_path is evaluated using the json_input as the
context variable ($), and the passed arguments as the named variables
($variable_name).
The returned value is a JSON item returned by the path. By default, it is
represented as a character string (varchar). In the RETURNING clause,
you can specify other character string type or varbinary. With
varbinary, you can also specify the desired encoding.
json_input is a character string or a binary string. It should contain
a single JSON item. For a binary string, you can specify encoding.
json_path is a string literal, containing the path mode specification, and
the path expression, following the syntax rules described in
JSON path syntax and semantics.
'strict $.keyvalue()?(@.name == $cust_id)'
'lax $[5 to last]'
In the PASSING clause you can pass arbitrary expressions to be used by the
path expression.
PASSING orders.custkey AS CUST_ID
The passed parameters can be referenced in the path expression by named
variables, prefixed with $.
'strict $.keyvalue()?(@.value == $CUST_ID)'
Additionally to SQL values, you can pass JSON values, specifying the format and
optional encoding:
PASSING orders.json_desc FORMAT JSON AS o_desc,
        orders.binary_record FORMAT JSON ENCODING UTF16 AS o_rec
Note that the JSON path language is case-sensitive, while the unquoted SQL
identifiers are upper-cased. Therefore, it is recommended to use quoted
identifiers in the PASSING clause:
'lax $.$KeyName' PASSING nation.name AS KeyName --> ERROR; no passed value found
'lax $.$KeyName' PASSING nation.name AS "KeyName" --> correct
The ARRAY WRAPPER clause lets you modify the output by wrapping the results
in a JSON array. WITHOUT ARRAY WRAPPER is the default option. WITH CONDITIONAL ARRAY WRAPPER wraps every result which is not a singleton JSON
array or JSON object. WITH UNCONDITIONAL ARRAY WRAPPER wraps every result.
The QUOTES clause lets you modify the result for a scalar string by
removing the double quotes being part of the JSON string representation.
Examples#
Let customers be a table containing two columns: id:bigint,
description:varchar.
description
‘{“comment” : “nice”, “children” : [10, 13, 16]}’
‘{“comment” : “problematic”, “children” : [8, 11]}’
‘{“comment” : “knows best”, “children” : [2]}’
The following query gets the children array for each customer:
SELECT
      json_query(
                 description,
                 'lax $.children'
                ) AS children
FROM customers
The following query gets the collection of children for each customer.
Note that the json_query function can only output a single JSON item. If
you don’t use array wrapper, you get an error for every customer with multiple
children. The error is handled according to the ON ERROR clause.
SELECT
      json_query(
                 description,
                 'lax $.children[*]'
                 WITHOUT ARRAY WRAPPER
                 NULL ON ERROR
                ) AS children
FROM customers
The following query gets all children above the age of 12 for each customer,
wrapped in a JSON array. The second and the third customer don’t have children
of this age. Such case is handled according to the ON EMPTY clause. The
default value returned ON EMPTY is NULL. In the following example,
EMPTY ARRAY ON EMPTY is specified.
SELECT
      json_query(
                 description,
                 'strict $.children[*]?(@ > 12)'
                 WITH ARRAY WRAPPER
                 EMPTY ARRAY ON EMPTY
                ) AS children
FROM customers
The following query shows the result of the QUOTES clause. Note that KEEP QUOTES is the default.
SELECT
      json_query(description, 'strict $.comment' KEEP QUOTES) AS quoted_comment,
      json_query(description, 'strict $.comment' OMIT QUOTES) AS unquoted_comment
FROM customers
If an error occurs, the returned value depends on the



        

        

    
        推荐文章
    

    

        

            
                
                

                    
                        
                    
                    
                    
                        
                            豁达的板栗
                             · 
                            Passing all values of an array into a single x-www-form-urlencoded body - How To - Make Community
                        
                        
                        1 月前
                        
                            
                        
                    
                
                
            
        

    

        

            
                
                

                    
                        
                    
                    
                    
                        
                            低调的核桃
                             · 
                            Offsets array in a unstructured grid (VTU) - ParaView Support - ParaView
                        
                        
                        1 月前
                        
                            
                        
                    
                
                
            
        

    

        

            
                
                

                    
                        
                    
                    
                    
                        
                            鼻子大的人字拖
                             · 
                            How to Prevent Out of Bounds Read
                        
                        
                        1 月前
                        
                            
                        
                    
                
                
            
        

    

        

            
                
                

                    
                        
                    
                    
                    
                        
                            安静的手套
                             · 
                            Postman flow - dealing with large arrays - out of memory - Ask the Experts and Postman Tips - Postma
                        
                        
                        1 月前
                        
                            
                        
                    
                
                
            
        

    

        

            
                
                

                    
                        
                    
                    
                    
                        
                            闷骚的树叶
                             · 
                            How Fast is .NET Garbage Collector? Part 2- CodeProject
                        
                        
                        3 周前
                        
                            
                        
                    
                
                
            
        

    

        

            
                
                

                    
                        
                    
                    
                    
                        
                            睿智的楼房
                             · 
                            什么是用户体验设计？-即时设计
                        
                        
                        3 月前
                        
                            
                        
                    
                
                
            
        

    

        

            
                
                

                    
                        
                    
                    
                    
                        
                            魁梧的小刀
                             · 
                            省十三届人大二次会议《关于解决沈阳西部与主城间交通不畅问题的建议》(1501号)答复-省人大十三届二次会议2019-沈阳市人民政府
                        
                        
                        3 月前
                        
                            
                        
                    
                
                
            
        

    

        

            
                
                

                    
                        
                    
                    
                    
                        
                            买醉的草稿本
                             · 
                            Fluid -24- Leancloud 失效解决方案 —— 自建站点 PV UV 统计-腾讯云开发者社区-腾讯云
                        
                        
                        6 月前
                        
                            
                        
                    
                
                
            
        

    

        

            
                
                

                    
                        
                    
                    
                    
                        
                            发财的楼房
                             · 
                            Volvo FE。 您的多功能城市貨車。 | Volvo Trucks
                        
                        
                        8 月前
                        
                            
                        
                    
                
                
            
        

    

        

            
                
                

                    
                        
                    
                    
                    
                        
                            温柔的酱牛肉
                             · 
                            与《源氏物语》合为“平安文学双璧”，村上春树、川端康成一生为之着迷_日本
                        
                        
                        8 月前