Functions for Splitting Strings
splitByChar
Splits a string into substrings separated by a specified character. Uses a constant string
separator
which consists of exactly one character.
Returns an array of selected substrings. Empty substrings may be selected if the separator occurs at the beginning or end of the string, or if there are multiple consecutive separators.
Syntax
splitByChar(separator, s[, max_substrings]))
Arguments
-
separator
— The separator which should contain exactly one character. String . -
s
— The string to split. String . -
max_substrings
— An optionalInt64
defaulting to 0. Ifmax_substrings
> 0, the returned array will contain at mostmax_substrings
substrings, otherwise the function will return as many substrings as possible.
Returned value(s)
-
An array of selected substrings. Array ( String ).
Empty substrings may be selected when:
-
A separator occurs at the beginning or end of the string;
-
There are multiple consecutive separators;
-
The original string
s
is empty.
The behavior of parameter
max_substrings
changed starting with ClickHouse v22.11. In versions older than that,
max_substrings > 0
meant that
max_substring
-many splits were performed and that the remainder of the string was returned as the final element of the list.
For example,
-
in v22.10:
SELECT splitByChar('=', 'a=b=c=d', 2);
returned['a','b','c=d']
-
in v22.11:
SELECT splitByChar('=', 'a=b=c=d', 2);
returned['a','b']
A behavior similar to ClickHouse pre-v22.11 can be achieved by setting
splitby_max_substrings_includes_remaining_string
SELECT splitByChar('=', 'a=b=c=d', 2) SETTINGS splitby_max_substrings_includes_remaining_string = 1 -- ['a', 'b=c=d']
Example
SELECT splitByChar(',', '1,2,3,abcde');
Result:
┌─splitByChar(',', '1,2,3,abcde')─┐
│ ['1','2','3','abcde'] │
└─────────────────────────────────┘
splitByString
Splits a string into substrings separated by a string. It uses a constant string
separator
of multiple characters as the separator. If the string
separator
is empty, it will split the string
s
into an array of single characters.
Syntax
splitByString(separator, s[, max_substrings]))
Arguments
-
separator
— The separator. String . -
s
— The string to split. String . -
max_substrings
— An optionalInt64
defaulting to 0. Whenmax_substrings
> 0, the returned substrings will be no more thanmax_substrings
, otherwise the function will return as many substrings as possible.
Returned value(s)
Empty substrings may be selected when:
- A non-empty separator occurs at the beginning or end of the string;
- There are multiple consecutive non-empty separators;
-
The original string
s
is empty while the separator is not empty.
Setting
splitby_max_substrings_includes_remaining_string
(default: 0) controls if the remaining string is included in the last element of the result array when argument
max_substrings
> 0.
Example
SELECT splitByString(', ', '1, 2 3, 4,5, abcde');
Result:
┌─splitByString(', ', '1, 2 3, 4,5, abcde')─┐
│ ['1','2 3','4,5','abcde'] │
└───────────────────────────────────────────┘
SELECT splitByString('', 'abcde');
Result:
┌─splitByString('', 'abcde')─┐
│ ['a','b','c','d','e'] │
└────────────────────────────┘
splitByRegexp
Splits a string into substrings separated by a regular expression. It uses a regular expression string
regexp
as the separator. If the
regexp
is empty, it will split the string
s
into an array of single characters. If no match is found for this regular expression, the string
s
won't be split.
Syntax
splitByRegexp(regexp, s[, max_substrings]))
Arguments
-
regexp
— Regular expression. Constant. String or FixedString . -
s
— The string to split. String . -
max_substrings
— An optionalInt64
defaulting to 0. Whenmax_substrings
> 0, the returned substrings will be no more thanmax_substrings
, otherwise the function will return as many substrings as possible.
Returned value(s)
Empty substrings may be selected when:
- A non-empty regular expression match occurs at the beginning or end of the string;
- There are multiple consecutive non-empty regular expression matches;
-
The original string
s
is empty while the regular expression is not empty.
Setting
splitby_max_substrings_includes_remaining_string
(default: 0) controls if the remaining string is included in the last element of the result array when argument
max_substrings
> 0.
Example
SELECT splitByRegexp('\\d+', 'a12bc23de345f');
Result:
┌─splitByRegexp('\\d+', 'a12bc23de345f')─┐
│ ['a','bc','de','f'] │
└────────────────────────────────────────┘
SELECT splitByRegexp('', 'abcde');
Result:
┌─splitByRegexp('', 'abcde')─┐
│ ['a','b','c','d','e'] │
└────────────────────────────┘
splitByWhitespace
Splits a string into substrings separated by whitespace characters. Returns an array of selected substrings.
Syntax
splitByWhitespace(s[, max_substrings]))
Arguments
-
s
— The string to split. String . -
max_substrings
— An optionalInt64
defaulting to 0. Whenmax_substrings
> 0, the returned substrings will be no more thanmax_substrings
, otherwise the function will return as many substrings as possible.
Returned value(s)
Setting
splitby_max_substrings_includes_remaining_string
(default: 0) controls if the remaining string is included in the last element of the result array when argument
max_substrings
> 0.
Example
SELECT splitByWhitespace(' 1! a, b. ');
Result:
┌─splitByWhitespace(' 1! a, b. ')─┐
│ ['1!','a,','b.'] │
└─────────────────────────────────────┘
splitByNonAlpha
Splits a string into substrings separated by whitespace and punctuation characters.