Please use the form below to provide your feedback. Because your feedback is valuable to us,
the information you submit in this form is recorded in our issue tracking system (JIRA), which is publicly available.
You can track the status of your feedback using the ticket number displayed in the dialog once you submit the form.
The
CREATE INDEX
statement allows you to create a secondary index.
Secondary indexes contain a filtered or a full set of keys in a given keyspace.
Secondary indexes are optional but increase query efficiency on a keyspace.
CREATE INDEX
allows you to make multiple concurrent index creation requests.
The command starts a task to create the index definition in the background.
If there is an index creation task already running, the Index Service queues the incoming index creation request.
CREATE INDEX
returns as soon as the index creation phase is complete.
By default, when the index creation phase is complete, the Index Service triggers the index build phase.
If you lose connectivity, the index build operation continues in the background.
You can defer the index build phase using the
defer_build
clause.
In deferred build mode,
CREATE INDEX
creates the index definition, but does not trigger the index build phase.
You can then build the index using the
BUILD INDEX
command.
You can create multiple identical secondary indexes on a keyspace and place them on separate nodes for better index availability.
The recommended way to do this is using the
num_replica
option.
RBAC Privileges
User executing the CREATE INDEX statement must have the
Query Manage Index
privilege granted on the keyspace.
For more details about user roles, see
Authorization
.
create-index ::= 'CREATE' 'INDEX' index-name ( 'IF' 'NOT' 'EXISTS' )? 'ON' keyspace-ref
'(' index-key lead-key-attribs? ( ( ',' index-key key-attribs? )+ )? ')'
index-partition? where-clause? index-using? index-with?
[Required] A unique name that identifies the index.
Valid GSI index names can contain any of the following characters:
A-Z
a-z
0-9
#
_
, and must start with a letter, [
A-Z
a-z
].
The minimum length of an index name is 1 character and there is no maximum length set for an index name.
When querying, if the index name contains a
#
or
_
character, you must enclose the index name within backticks.
[Required] Specifies the keyspace where the index is created.
Refer to
Keyspace Reference
below.
index-key
[Required] Specifies an index key.
Refer to
Index Key
below.
lead-key-attribs
[Optional] Specifies attributes for the leading index key.
Refer to
Index Key Attributes
below.
key-attribs
[Optional] Specifies attributes for a non-leading index key.
Refer to
Index Key Attributes
below.
index-partition
[Optional] Specifies index partitions.
Refer to
PARTITION BY HASH Clause
below.
where-clause
[Optional] Specifies filters for a partial index.
Refer to
WHERE Clause
below.
index-using
[Optional] Specifies the index type.
Refer to
USING Clause
below.
index-with
[Optional] Specifies options for the index.
Refer to
WITH Clause
below.
The optional
IF NOT EXISTS
clause enables the statement to complete successfully when the specified index already exists.
If an index with the same name already exists within the specified keyspace, then:
If there is a hyphen (-) inside any part of the keyspace reference, you must wrap that part of the keyspace reference in backticks (` `).
Refer to the examples below.
If the keyspace is a named collection, or the default collection in the default scope within a bucket, the keyspace reference may be a keyspace path.
In this case, the
query context
should not be set.
namespace
(Optional) An
identifier
that refers to the
namespace
of the keyspace.
Currently, only the
default
namespace is available.
If the namespace name is omitted, the default namespace in the current session is used.
bucket
(Required) An
identifier
that refers to the
bucket name
of the keyspace.
scope
(Optional) An
identifier
that refers to the
scope name
of the keyspace.
If omitted, the bucket’s default scope is used.
collection
(Optional) An
identifier
that refers to the
collection name
of the keyspace.
If omitted, the default collection in the bucket’s default scope is used.
For example,
default:`travel-sample`
indicates the default collection in the default scope in the
travel-sample
bucket in the
default
namespace.
Similarly,
default:`travel-sample`.inventory.airline
indicates the
airline
collection in the
inventory
scope in the
travel-sample
bucket in the
default
namespace.
Alternatively, if the keyspace is a named collection, the keyspace reference may be just the collection name with no path.
In this case, you must set the
query context
to indicate the required namespace, bucket, and scope.
collection
(Required) An
identifier
that refers to the
collection name
of the keyspace.
Refers to an attribute name or a scalar function or an ARRAY expression on the attribute.
This constitutes an index-key for the index.
A SQL++
expression
over any fields in the document.
This cannot use constant expressions, aggregate functions, or sub-queries.
array-expr
An array expression.
Array indexing enables you to create global indexes on array elements and optimize the execution of queries involving array elements.
For details, refer to
Array Indexing
.
The optional
INCLUDE MISSING
clause ensures that documents which do not include the index key field are indexed regardless.
If this clause is not present, then documents without the index key field are not indexed.
The
INCLUDE MISSING
clause can only be applied to the leading index key.
The
INCLUDE MISSING
clause may be included before or after the
ASC
or
DESC
keyword.
Used to partition the index.
Index partitioning helps increase the query performance by dividing and spreading a large index of documents across multiple nodes, horizontally scaling out an index as needed.
For details, refer to
Index Partitioning
.
where-clause ::= 'WHERE' cond
[Optional] An array of strings, each of which represents a node name.
You can specify multiple nodes to distribute replicas of an index across nodes running the indexing service: for example,
WITH {"nodes": ["node1:8091", "node2:8091", "node3:8091"]}
.
For details and examples, refer to
Index Replication
.
If specifying both
nodes
and
num_replica
, the number of nodes in the array must be one greater than the specified number of replicas otherwise the index creation will fail.
If
nodes
is not specified, then the system chooses nodes on which to place the new index and any replicas, in order to achieve the best resource utilization across nodes running the indexing service.
This is done by taking into account the current resource usage statistics of index nodes.
A node name passed to the
nodes
property must include the cluster administration port, by default 8091.
For example
WITH {"nodes": ["192.0.2.0:8091"]}
instead of
WITH {"nodes": ["192.0.2.0"]}
.
When set to
true
, the
CREATE INDEX
operation queues the task for building the index but immediately pauses the building of the index of type GSI.
Index building requires an expensive scan operation.
Deferring building of the index with multiple indexes can optimize the expensive scan operation.
Admins can defer building multiple indexes and, using the
BUILD INDEX
statement, multiple indexes to be built efficiently with one efficient scan of bucket data.
false
When set to
false
, the
CREATE INDEX
operation queues the task for building the index and immediately kicks off the building of the index of type GSI.
The indexer will automatically distribute these replicas amongst index nodes in the cluster for load-balancing and high availability purposes.
The indexer will attempt to distribute the replicas based on the server groups in use in the cluster where possible.
If the value of this property is not less than the number of index nodes in the cluster, then the index creation will fail.
Index metadata provides a state field.
This state field and other index metadata can be queried using
system:indexes
.
The index state may be
scheduled for creation
,
deferred
,
building
,
pending
,
online
,
offline
, or
abridged
.
You can also monitor the index state using the Couchbase Web Console.
If you kick off multiple index creation operations concurrently, you may sometimes see transient errors similar to the following.
If this error occurs, the Index Service tries to run the failed operation again in the background until it succeeds, up to a maximum of 1000 retries.
"code": 5000,
"msg": "GSI CreateIndex() - cause: Encountered transient error. Index creation will be retried in background. Error: Index ... will retry building in the background for reason: Build Already In Progress. Keyspace ...",
"query": "..."
If the Index Service still cannot create the index after the maximum number of retries, the index state is marked as
offline
.
You must drop the failed index using the
DROP INDEX
command.
If you have an index on a simple expression, such as
geo.alt
, you can use that index to satisfy a query on an
aggregate
of that expression, such as
MIN(geo.alt)
or
MAX(geo.alt)
.
For details and examples, refer to
Operator Pushdowns
.
In the
Indexes screen in the Couchbase Web Console
, index replicas are marked with their replica ID.
If you select
view by server node
from the drop-down menu, you can see the server node where each index and index replica is placed.
You can also query the
system:indexes
catalog to find the ID of an index replica and see which node it is placed on.
By default, index replicas are used to serve index scans.
The system automatically load-balances an index scan across the index and all its replicas.
Adding index replicas enables you to scale scan throughput, in addition to providing high availability.
To try the examples in this section, you must set the query context as described in each example.
Example 1. Create an index in the default scope and collection
For this example, unset the query context.
For more information, see
Query Context
.
Create a secondary index that contains airports with an
alt
value greater than 1000 on the node
127.0.0.1
.
CREATE INDEX idx_default_over1000
ON `travel-sample`(geo.alt)
WHERE geo.alt > 1000
USING GSI
WITH {"nodes": ["127.0.0.1:8091"]};
For this example, the path to the required keyspace is specified by the query, so you do not need to set the query context.
Create a secondary index that contains airports with an
alt
value greater than 1000 on the node
127.0.0.1
.
CREATE INDEX idx_airport_over1000
ON `travel-sample`.inventory.airport(geo.alt)
WHERE geo.alt > 1000
USING GSI
WITH {"nodes": ["127.0.0.1:8091"]};
For this example, set the query context to the
inventory
scope in the travel sample dataset.
For more information, see
Query Context
.
Create a secondary index with the
defer_build
option.
CREATE INDEX idx_landmark_country
ON landmark(country)
USING GSI
WITH {"defer_build":true};
Query
system:indexes
for the status of the index.
SELECT * FROM system:indexes WHERE name="idx_landmark_country";
Results
"indexes": {
"bucket_id": "travel-sample",
"datastore_id": "http://127.0.0.1:8091",
"id": "d079aec40eb0c6cc",
"index_key": [
"`country`"
"keyspace_id": "landmark",
"name": "idx_landmark_country",
"namespace_id": "default",
"scope_id": "inventory",
"state": "deferred",
(1)
"using": "gsi"
For this example, set the query context to the
inventory
scope in the travel sample dataset.
For more information, see
Query Context
.
Kick off a deferred build using the index name.
BUILD INDEX ON landmark(idx_landmark_country) USING GSI;
Query
system:indexes
for the status of the index.
SELECT * FROM system:indexes WHERE name="idx_landmark_country";
Results
"indexes": {
"bucket_id": "travel-sample",
"datastore_id": "http://127.0.0.1:8091",
"id": "d079aec40eb0c6cc",
"index_key": [
"`country`"
"keyspace_id": "landmark",
"name": "idx_landmark_country",
"namespace_id": "default",
"scope_id": "inventory",
"state": "online",
(1)
"using": "gsi"
For this example, set the query context to the
inventory
scope in the travel sample dataset.
For more information, see
Query Context
.
The following statement will not index airports where the
district
field is missing.
CREATE INDEX idx_airport_missing
ON airport(district, name);
The following statement will index all airports, even if the
district
field is not included in the document.
CREATE INDEX idx_airport_include
ON airport(district INCLUDE MISSING, name);