添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

This topic introduces schema in Milvus. Schema is used to define the properties of a collection and the fields within.

Field schema

A field schema is the logical definition of a field. It is the first thing you need to define before defining a collection schema and managing collections .

Milvus supports only one primary key field in a collection.

Field schema properties

Create a field schema

To reduce the complexity in data inserts, Milvus allows you to specify a default value for each scalar field during field schema creation, excluding the primary key field. This indicates that if you leave a field empty when inserting data, the default value you specified for this field applies.

Create a regular field schema:

from pymilvus import FieldSchema
id_field = FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, description="primary id")
age_field = FieldSchema(name="age", dtype=DataType.INT64, description="age")
embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128, description="vector")
# The following creates a field and use it as the partition key
position_field = FieldSchema(name="position", dtype=DataType.VARCHAR, max_length=256, is_partition_key=True)

Create a field schema with default field values:

from pymilvus import FieldSchema
fields = [
  FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
  # configure default value `25` for field `age`
  FieldSchema(name="age", dtype=DataType.INT64, default_value=25, description="age"),
  embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128, description="vector")

Supported data types

DataType defines the kind of data a field contains. Different fields support different data types.

  • Primary key field supports:

  • INT64: numpy.int64
  • VARCHAR: VARCHAR
  • Scalar field supports:

  • BOOL: Boolean (true or false)
  • INT8: numpy.int8
  • INT16: numpy.int16
  • INT32: numpy.int32
  • INT64: numpy.int64
  • FLOAT: numpy.float32
  • DOUBLE: numpy.double
  • VARCHAR: VARCHAR
  • JSON: JSON
  • Array: Array
  • JSON as a composite data type is available. A JSON field comprises key-value pairs. Each key is a string, and a value can be a number, string, boolean value, array, or list. For details, refer to JSON: a new data type.

  • Vector field supports:

  • BINARY_VECTOR: Stores binary data as a sequence of 0s and 1s, used for compact feature representation in image processing and information retrieval.
  • FLOAT_VECTOR: Stores 32-bit floating-point numbers, commonly used in scientific computing and machine learning for representing real numbers.
  • FLOAT16_VECTOR: Stores 16-bit half-precision floating-point numbers, used in deep learning and GPU computations for memory and bandwidth efficiency.
  • BFLOAT16_VECTOR: Stores 16-bit floating-point numbers with reduced precision but the same exponent range as Float32, popular in deep learning for reducing memory and computational requirements without significantly impacting accuracy.
  • SPARSE_FLOAT_VECTOR: Stores a list of non-zero elements and their corresponding indices, used for representing sparse vectors. For more information, refer to Sparse Vectors.
  • Milvus supports multiple vector fields in a collection. For more information, refer to Hybrid Search.

    Collection schema

    A collection schema is the logical definition of a collection. Usually you need to define the field schema before defining a collection schema and managing collections.

    Collection schema properties

    Properties Description is_primary Whether to set the field as the primary key field or not Data type: Boolean ( true or false ).
    Mandatory for the primary key field
    auto_id (Mandatory for primary key field) Switch to enable or disable automatic ID (primary key) allocation. True or False max_length (Mandatory for VARCHAR field) Maximum length of strings allowed to be inserted. [1, 65,535] Dimension of the vector Data type: Integer ∈[1, 32768].
    Mandatory for a dense vector field. Omit for a sparse vector field.
    is_partition_key Whether this field is a partition-key field. Data type: Boolean ( true or false ).

    Create a collection schema

    Define the field schemas before defining a collection schema.
    from pymilvus import FieldSchema, CollectionSchema
    id_field = FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, description="primary id")
    age_field = FieldSchema(name="age", dtype=DataType.INT64, description="age")
    embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128, description="vector")
    # Enable partition key on a field if you need to implement multi-tenancy based on the partition-key field
    position_field = FieldSchema(name="position", dtype=DataType.VARCHAR, max_length=256, is_partition_key=True)
    # Set enable_dynamic_field to True if you need to use dynamic fields. 
    schema = CollectionSchema(fields=[id_field, age_field, embedding_field], auto_id=False, enable_dynamic_field=True, description="desc of a collection")
    

    Create a collection with the schema specified:

    from pymilvus import Collection
    collection_name1 = "tutorial_1"
    collection1 = Collection(name=collection_name1, schema=schema, using='default', shards_num=2)
    
  • You can define the shard number with shards_num.
  • You can define the Milvus server on which you wish to create a collection by specifying the alias in using.
  • You can enable the partition key feature on a field by setting is_partition_key to True on the field if you need to implement partition-key-based multi-tenancy.
  • You can enable dynamic schema by setting enable_dynamic_field to True in the collection schema if you need to enable dynamic field.
  • You can also create a collection with Collection.construct_from_dataframe, which automatically generates a collection schema from DataFrame and creates a collection.

    import pandas as pd
    df = pd.DataFrame({
        "id": [i for i in range(nb)],
        "age": [random.randint(20, 40) for i in range(nb)],
        "embedding": [[random.random() for _ in range(dim)] for _ in range(nb)],
        "position": "test_pos"
    collection, ins_res = Collection.construct_from_dataframe(
        'my_collection',
        primary_field='id',
        auto_id=False
    

    What’s next

    Properties Description partition_key_field Name of a field that is designed to act as the partition key. Data type: String.
    Optional
    enable_dynamic_field Whether to enable dynamic schema or not Data type: Boolean ( true or false ).
    Optional, defaults to False .
    For details on dynamic schema, refer to Dynamic Schema and the user guides for managing collections.