This topic introduces schema in Milvus. Schema is used to define the properties of a collection and the fields within.
A field schema is the logical definition of a field. It is the first thing you need to define before defining a
collection schema
and
managing collections
.
Milvus supports only one primary key field in a collection.
To reduce the complexity in data inserts, Milvus allows you to specify a default value for each scalar field during field schema creation, excluding the primary key field. This indicates that if you leave a field empty when inserting data, the default value you specified for this field applies.
Create a regular field schema:
from pymilvus import FieldSchema
id_field = FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, description="primary id")
age_field = FieldSchema(name="age", dtype=DataType.INT64, description="age")
embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128, description="vector")
position_field = FieldSchema(name="position", dtype=DataType.VARCHAR, max_length=256, is_partition_key=True)
Create a field schema with default field values:
from pymilvus import FieldSchema
fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
FieldSchema(name="age", dtype=DataType.INT64, default_value=25, description="age"),
embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128, description="vector")
DataType
defines the kind of data a field contains. Different fields support different data types.
Primary key field supports:
INT64: numpy.int64
VARCHAR: VARCHAR
Scalar field supports:
BOOL: Boolean (true
or false
)
INT8: numpy.int8
INT16: numpy.int16
INT32: numpy.int32
INT64: numpy.int64
FLOAT: numpy.float32
DOUBLE: numpy.double
VARCHAR: VARCHAR
JSON: JSON
Array: Array
JSON as a composite data type is available. A JSON field comprises key-value pairs. Each key is a string, and a value can be a number, string, boolean value, array, or list. For details, refer to JSON: a new data type.
Vector field supports:
BINARY_VECTOR: Stores binary data as a sequence of 0s and 1s, used for compact feature representation in image processing and information retrieval.
FLOAT_VECTOR: Stores 32-bit floating-point numbers, commonly used in scientific computing and machine learning for representing real numbers.
FLOAT16_VECTOR: Stores 16-bit half-precision floating-point numbers, used in deep learning and GPU computations for memory and bandwidth efficiency.
BFLOAT16_VECTOR: Stores 16-bit floating-point numbers with reduced precision but the same exponent range as Float32, popular in deep learning for reducing memory and computational requirements without significantly impacting accuracy.
SPARSE_FLOAT_VECTOR: Stores a list of non-zero elements and their corresponding indices, used for representing sparse vectors. For more information, refer to Sparse Vectors.
Milvus supports multiple vector fields in a collection. For more information, refer to Hybrid Search.
A collection schema is the logical definition of a collection. Usually you need to define the field schema before defining a collection schema and managing collections.
Properties
|
Description
|
is_primary
|
Whether to set the field as the primary key field or not
|
Data type: Boolean (
true
or
false
).
Mandatory for the primary key field
|
auto_id
(Mandatory for primary key field)
|
Switch to enable or disable automatic ID (primary key) allocation.
|
True
or
False
|
max_length
(Mandatory for VARCHAR field)
|
Maximum length of strings allowed to be inserted.
|
[1, 65,535]
|
Dimension of the vector
|
Data type: Integer ∈[1, 32768].
Mandatory for a dense vector field. Omit for a
sparse vector
field.
|
is_partition_key
|
Whether this field is a partition-key field.
|
Data type: Boolean (
true
or
false
).
|
Define the field schemas before defining a collection schema.
from pymilvus import FieldSchema, CollectionSchema
id_field = FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, description="primary id")
age_field = FieldSchema(name="age", dtype=DataType.INT64, description="age")
embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128, description="vector")
position_field = FieldSchema(name="position", dtype=DataType.VARCHAR, max_length=256, is_partition_key=True)
schema = CollectionSchema(fields=[id_field, age_field, embedding_field], auto_id=False, enable_dynamic_field=True, description="desc of a collection")
Create a collection with the schema specified:
from pymilvus import Collection
collection_name1 = "tutorial_1"
collection1 = Collection(name=collection_name1, schema=schema, using='default', shards_num=2)
You can define the shard number with shards_num
.
You can define the Milvus server on which you wish to create a collection by specifying the alias in using
.
You can enable the partition key feature on a field by setting is_partition_key
to True
on the field if you need to implement partition-key-based multi-tenancy.
You can enable dynamic schema by setting enable_dynamic_field
to True
in the collection schema if you need to enable dynamic field.
You can also create a collection with Collection.construct_from_dataframe
, which automatically generates a collection schema from DataFrame and creates a collection.
import pandas as pd
df = pd.DataFrame({
"id": [i for i in range(nb)],
"age": [random.randint(20, 40) for i in range(nb)],
"embedding": [[random.random() for _ in range(dim)] for _ in range(nb)],
"position": "test_pos"
collection, ins_res = Collection.construct_from_dataframe(
'my_collection',
primary_field='id',
auto_id=False
Properties
|
Description
|
partition_key_field
|
Name of a field that is designed to act as the partition key.
|
Data type: String.
Optional
|
enable_dynamic_field
|
Whether to enable dynamic schema or not
|
Data type: Boolean (
true
or
false
).
Optional, defaults to
False
.
For details on dynamic schema, refer to
Dynamic Schema
and the user guides for managing collections.
|