pymilvus - read the docs

39
pymilvus Release 0.3.0 Milvus Jan 21, 2021

Upload: others

Post on 02-Dec-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: pymilvus - Read the Docs

pymilvusRelease 0.3.0

Milvus

Jan 21, 2021

Page 2: pymilvus - Read the Docs
Page 3: pymilvus - Read the Docs

TABLE OF CONTENTS

1 Overview 11.1 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Installing via pip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.2 Installing in a virtual environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.3 Installing a specific PyMilvus version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.4 Installing from source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.5 Verifying installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2.2 Connect to Milvus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2.3 Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.3.1 collection_param . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2.3.2 segment_row_limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2.3.3 auto_id . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.4 Create Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2.5 Create Partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2.6 Entities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2.7 Insert Entities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.8 Flush . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.9 Get Detailed information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.10 Count Entities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.11 Get . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.11.1 Get Entities by ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.12 Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.2.12.1 Search Entities by Vector Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . 81.2.12.2 Search Entities filtered by fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.2.13 Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.2.13.1 Delete Entities by id . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.2.13.2 Drop a Partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.2.13.3 Drop a Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.3 API reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.3.1 Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.3.1.1 Constructor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.3.1.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.3.1.3 APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.4 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.4.1 Intro to Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.4.1.1 IVF_FLAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.4.1.2 BIN_IVF_FLAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.4.1.3 IVF_PQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

i

Page 4: pymilvus - Read the Docs

1.4.1.4 IVF_SQ8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.4.1.5 IVF_SQ8_HYBRID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.4.1.6 ANNOY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.4.1.7 HNSW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221.4.1.8 RHNSW_PQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231.4.1.9 RHNSW_SQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231.4.1.10 NSG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

1.5 Query DSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251.5.1 Leaf query clauses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251.5.2 Compound query clauses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261.5.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261.5.4 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

1.6 Search results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311.6.1 How to deal with search results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

1.7 Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311.8 FAQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

1.8.1 I’m getting random “socket operation on non-socket” errors from gRPC. . . . . . . . . . . . 321.8.2 How to fix the error when I install PyMilvus on Windows? . . . . . . . . . . . . . . . . . . 32

1.9 Contributing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321.9.1 Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321.9.2 Submit Pull Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331.9.3 Github workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331.9.4 Contribution guideline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

1.10 About this documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Index 35

ii

Page 5: pymilvus - Read the Docs

CHAPTER

ONE

OVERVIEW

PyMilvus is a python SDK for Milvus and is a recommended way to work with Milvus. This documentation coversevery thing you need to know about PyMilvus.

Installation Instructions on how to install PyMilvus.

Tutorial A quick start to use PyMilvus.

API reference The complete API documentation.

Index Index and relevant parameters.

Query DSL Query Domain Specific Language(DSL).

Search results How to deal with search results.

Changelog Changes in the latest PyMilvus.

Contributing Method of contribution, bug shooting and contribution guide.

FAQ Some questions that come up often.

About this documentation How this documentation is generated.

1.1 Installation

1.1.1 Installing via pip

PyMilvus is in the Python Package Index.

PyMilvus only support python3(>= 3.6), usually, it’s ok to install PyMilvus like below.

$ python3 -m pip install pymilvus

1.1.2 Installing in a virtual environment

It’s recommended to use PyMilvus in a virtual environment, using virtual environment allows you to avoid installingPython packages globally which could break system tools or other projects. We use virtualenv as an example todemonstrate how to install and using PyMilvus in a virtual environment. See virtualenv for more information aboutwhy and how.

$ python3 -m pip install virtualenv$ virtualenv venv$ source venv/bin/activate(venv) $ pip install pymilvus

1

Page 6: pymilvus - Read the Docs

pymilvus, Release 0.3.0

If you want to exit the virtualenv venv, you can use deactivate.

(venv) $ deactivate$

1.1.3 Installing a specific PyMilvus version

Here we assume you are already in a virtual environment.

Suitable PyMilvus version depends on Milvus version you are using. See install pymilvus for recommended pymilvusversion.

If you want to install a specific version of PyMilvus:

(venv) $ pip install pymilvus==0.3.0

If you want to upgrade PyMilvus into the latest version published:

(venv) $ pip install --upgrade pymilvus

1.1.4 Installing from source

This will install the latest PyMilvus into your virtual environment.

(venv) $ pip install git+https://github.com/milvus-io/pymilvus.git

1.1.5 Verifying installation

Your installation is correct if the following command in the Python shell doesn’t raise an exception.

(venv) $ python -c "from milvus import Milvus, DataType"

Section author: Yangxuan@milvus

1.2 Tutorial

This is a basic introduction to Milvus by PyMilvus.

For a runnable python script, checkout example.py on PyMilvus Github, or hello milvus on milvus offical website. It’sa good recommended start to get started with Milvus and PyMilvus as well.

1.2.1 Prerequisites

Before we start, there are some prerequisites.

Make sure that:

• You have a running Milvus instance.

• PyMilvus is correctly installed.

2 Chapter 1. Overview

Page 7: pymilvus - Read the Docs

pymilvus, Release 0.3.0

1.2.2 Connect to Milvus

First of all, we need to import PyMilvus.

>>> from milvus import Milvus, DataType

Then, we can make connection with Milvus server. By default Milvus runs on localhost in port 19530, so you can usedefault value to connect to Milvus.

>>> host = '127.0.0.1'>>> port = '19530'>>> client = Milvus(host, port)

After connecting, we can communicate with Milvus in the following ways. If you are confused about the terminology,see Milvus Terminology for explanations.

1.2.3 Collection

Now let’s create a new collection. Before we start, we can list all the collections already exist. For a brand new Milvusrunning instance, the result should be empty.

>>> client.list_collections()[]

To create collection, we need to provide a unique collection name and some other parameters. collection_nameshould be a unique string to collections already exist. collection_param consists of 3 components, they arefields, segment_row_limit and auto_id.

>>> collection_name = 'demo_film_tutorial'>>> collection_param = {... "fields": [... {... "name": "release_year",... "type": DataType.INT32... },... {... "name": "embedding",... "type": DataType.FLOAT_VECTOR,... "params": {"dim": 8}... },... ],... "segment_row_limit": 4096,... "auto_id": False... }

1.2. Tutorial 3

Page 8: pymilvus - Read the Docs

pymilvus, Release 0.3.0

1.2.3.1 collection_param

In the collection_param, there are 2 fields in fields: embedding and release_year.

release_year: The name of the first field is release_year, and the type of it is DataType.INT32, it’s a field tostore release year of a film.

embedding: The name of the second field is embedding, the type of it is DataType.FLOAT_VECTOR. It alsohas an extra parameter “params” with dimension 8. It’s a float vector field to store embedding of a film. For aFLOAT_VECTOR, the “dim” in “params” is required. You can also add params in other types of field as inFLOAT_VECTOR field.

1.2.3.2 segment_row_limit

Milvus controls the size of data segment according to the segment_row_limit, you can refer Storage Conceptsfor more information about segment and segment_row_limit.

1.2.3.3 auto_id

auto_id is used to tell Milvus if we want ids auto-generated or ids user provided for each entity. If False, itmeans we’ll have to provide our own ids for entities while inserting, on the contrary, if True, Milvus will generateids automatically and we can’t provide our own ids for entities.

1.2.4 Create Collection

Now we can create a collection:

>>> client.create_collection(collection_name, collection_param)

Then you can list collections and ‘demo_film_tutorial’ will be in the result.

>>> client.list_collections()['demo_film_tutorial']

You can also get info of the collection.

Note: For a better output format, we use pprint to print the result.

>>> from pprint import pprint>>> info = client.get_collection_info(collection_name)>>> pprint(info){'auto_id': False,'fields': [{'indexes': [{}],

'name': 'release_year','params': {},'type': <DataType.INT32: 4>},

{'indexes': [{}],'name': 'embedding','params': {'dim': 8},'type': <DataType.FLOAT_VECTOR: 101>}],

'segment_row_limit': 4096}

4 Chapter 1. Overview

Page 9: pymilvus - Read the Docs

pymilvus, Release 0.3.0

You can see from the output, all the infos are the same as we provide, but there’s one more called indexes.

This tutorial is a basic intro tutorial, building index won’t be covered by this tutorial. If you want to go further intoMilvus with indexes, it’s recommended to check our example_index.py.

If you’re already known about indexes from example_index.py, and you want a full lists of params supported byPyMilvus, you check out Index chapter of the PyMilvus documentation.

Further more, if you want to get a thorough view of indexes, check our official website for Vector Index.

1.2.5 Create Partition

If you don’t create a partition, there will be a default one called “_default”, all the entities will be inserted into the“_default” partition. You can check it by list_partitions()

>>> client.list_partitions(collection_name)['_default']

You can provide a partition tag to create a new partition.

>>> client.create_partition(collection_name, "American")>>> client.list_partitions(collection_name)['American', '_default']

1.2.6 Entities

An entities is a group of fields that correspond to real world objects. Here is an example of 3 entities structured in listof dictionary.

>>> import random>>> The_Lord_of_the_Rings = [... {... "id": 1,... "title": "The_Fellowship_of_the_Ring",... "release_year": 2001,... "embedding": [random.random() for _ in range(8)]... },... {... "id": 2,... "title": "The_Two_Towers",... "release_year": 2002,... "embedding": [random.random() for _ in range(8)]... },... {... "id": 3,... "title": "The_Return_of_the_King",... "release_year": 2003,... "embedding": [random.random() for _ in range(8)]... }... ]

1.2. Tutorial 5

Page 10: pymilvus - Read the Docs

pymilvus, Release 0.3.0

1.2.7 Insert Entities

To insert entities into Milvus, we need to group data from the same field like below.

>>> ids = [k.get("id") for k in The_Lord_of_the_Rings]>>> release_years = [k.get("release_year") for k in The_Lord_of_the_Rings]>>> embeddings = [k.get("embedding") for k in The_Lord_of_the_Rings]

Then we can create hybrid entities to insert into Milvus.

>>> hybrid_entities = [... # Milvus doesn't support string type yet,... # so we cannot insert "title".... {... "name": "release_year",... "values": release_years,... "type": DataType.INT32... },... {... "name": "embedding",... "values": embeddings,... "type": DataType.FLOAT_VECTOR... },... ]

If the hybrid entities inserted successfully, ids we provided will be returned.

Note: If we create collection with auto_id = True, we can’t provide ids of our own, and the returned idsis automatically generated by Milvus. If partition_tag isn’t provided, these entities will be inserted into the“_default” partition.

>>> client.insert(collection_name, hybrid_entities, ids, partition_tag="American")[1, 2, 3]

1.2.8 Flush

After successfully inserting 3 entities into Milvus, we can Flush data from memory to disk so that we can retrievethem. Milvus also performs an automatic flush with a fixed interval(1 second), see Data Flushing.

You can flush multiple collections at one time, so be aware the parameter is a list.

>>> client.flush([collection_name])

1.2.9 Get Detailed information

After insert, we can get the detail of collection statistics info by get_collection_stats()

Note: Again, we are using pprint to provide a better format.

6 Chapter 1. Overview

Page 11: pymilvus - Read the Docs

pymilvus, Release 0.3.0

>>> info = client.get_collection_stats(collection_name)>>> pprint(info){'data_size': 18156,'partition_count': 2,'partitions': [{'data_size': 0,

'id': 13,'row_count': 0,'segment_count': 0,'segments': None,'tag': '_default'},

{'data_size': 18156,'id': 14,'row_count': 3,'segment_count': 1,'segments': [{'data_size': 18156,

'files': [{'data_size': 4124,'field': '_id','name': '_raw','path': '/C_7/P_14/S_7/F_49'},{'data_size': 5724,'field': '_id','name': '_blf','path': '/C_7/P_14/S_7/F_53'},{'data_size': 4112,'field': 'release_year','name': '_raw','path': '/C_7/P_14/S_7/F_51'},{'data_size': 4196,'field': 'embedding','name': '_raw','path': '/C_7/P_14/S_7/F_50'}],

'id': 7,'row_count': 3}],

'tag': 'American'}],'row_count': 3}

1.2.10 Count Entities

We can also count how many entities are there in the collection.

>>> client.count_entities(collection_name)3

1.2.11 Get

1.2.11.1 Get Entities by ID

You can get entities by their ids.

>>> films = client.get_entity_by_id(collection_name, ids=[1, 200])

If id exists, an entity will be returned. If id doesn’t exist, None will be return. For the example above, collection“demo_film_tutorial” has an entity(id = 1), but doesn’t have an entity(id = 200), so the result films will

1.2. Tutorial 7

Page 12: pymilvus - Read the Docs

pymilvus, Release 0.3.0

only have one entity, the other is None. You can get the entity fields like below. Because embeddings are randomgenerated, so the value of embedding may differ.

>>> for film in films:... if film is not None:... print(film.id, film.get("release_year"), film.get("embedding"))...1 2001 [0.5146051645278931, 0.9257888197898865, 0.8659316301345825, 0.→˓8082002401351929, 0.33681046962738037, 0.7135953307151794, 0.14593836665153503, 0.→˓9224222302436829]

If you want to know all the fields names, you can get them by:

>>> for film in films:... if film is not None:... film.fields...['release_year', 'embedding']

1.2.12 Search

1.2.12.1 Search Entities by Vector Similarity

You can get entities by vector similarity. Assuming we have a film_A like below, and we want to get top 2 films thatare most similar with it.

>>> film_A = {... "title": "random_title",... "release_year": 2002,... "embedding": [random.random() for _ in range(8)]... }

We need to prepare query DSL(Domain Specific Language) for this search, for more information about does anddon’ts for Query DSL , please refer to PyMilvus documentation Query DSL chapter.

>>> dsl = {... "bool": {... "must": [... {... "vector": {... "embedding": {... "topk": 2,... "query": [film_A.get("embedding")],... "metric_type": "L2"... }... }... }... ]... }... }

Then we can search by this dsl.

Note: If we don’t provide anything in “fields”, there will only be ids and distances in the results. Only what wehave provided in the “fields” can be obtained finally.

8 Chapter 1. Overview

Page 13: pymilvus - Read the Docs

pymilvus, Release 0.3.0

>>> results = client.search(collection_name, dsl, fields=["release_year"])

The returned results is a 1 * 2 structure, 1 for 1 entity querying, 2 for top 2. For more clarity, we obtain the film asbelow. If you want to know how to deal with search result in a better way, you can refer to search result in PyMilvusdoc.

>>> entities = results[0]>>> film_1 = entities[0]>>> film_2 = entities[1]

Then how do we get ids, distances and fields? It’s as below.

Note: Because embeddings are randomly generated, so the retrieved film id, distance and field may differ.

>>> film_1.id # id3

>>> film_1.distance # distance0.3749755918979645

>>> film_1.entity.get("release_year") # fields2003

1.2.12.2 Search Entities filtered by fields.

Milvus can also search entities back by vector similarity combined with fields filtering. Again we will be using queryDSL, please refer to PyMilvus documentation Query DSL for more information.

For the same film_A, now we also want to search back top 2 most similar films, but with one more condition: therelease year of films searched back should be the same as film_A. Here is how we organize query DSL.

>>> dsl_hybrid = {... "bool": {... "must": [... {... "term": {"release_year": [film_A.get("release_year")]}... },... {... "vector": {... "embedding": {... "topk": 2,... "query": [film_A.get("embedding")],... "metric_type": "L2"... }... }... }... ]... }... }

Then we’ll do search as above. This time we will only get 1 film back because there is only 1 film whose release yearis 2002, the same as film_A. You can confirm it by len().

1.2. Tutorial 9

Page 14: pymilvus - Read the Docs

pymilvus, Release 0.3.0

>>> results = client.search(collection_name, dsl_hybrid, fields=["release_year"])>>> len(results[0])1

We can also search back entities with fields in specific range like below, we want to get top 2 films that are most similarwith film_A, and we want the release_year of entities returned must be larger than film_A’s release_year

>>> dsl_hybrid = {... "bool": {... "must": [... {... # "GT" for greater than... "range": {"release_year": {"GT": film_A.get("release_year")}}... },... {... "vector": {... "embedding": {"topk": 2, "query": [film_A.get("embedding")],→˓"metric_type": "L2"}... }... }... ]... }... }

This query will only get 1 film back too, because there is only 1 film whose release year is larger than 2002.

Again, for more information about query DSL, please refer to our documentation Query DSL.

1.2.13 Deletion

Finally, let’s move on to deletion in Milvus. We can delete entities by ids, drop a whole partition, or drop the entirecollection.

1.2.13.1 Delete Entities by id

You can delete entities by their ids.

>>> client.delete_entity_by_id(collection_name, ids=[1, 2])Status(code=0, message='OK')

>>> client.count_entities(collection_name)1

1.2.13.2 Drop a Partition

You can also drop a partition.

Danger: Once you drop a partition, all the data in this partition will be deleted too.

>>> client.drop_partition(collection_name, "American")Status(code=0, message='OK')

10 Chapter 1. Overview

Page 15: pymilvus - Read the Docs

pymilvus, Release 0.3.0

>>> client.count_entities(collection_name)0

1.2.13.3 Drop a Collection

Finally, you can drop an entire collection.

Danger: Once you drop a collection, all the data in this collection will be deleted too.

>>> client.drop_collection(collection_name)Status(code=0, message='OK')

Section author: Yangxuan@milvus

1.3 API reference

1.3.1 Client

1.3.1.1 Constructor

Constructor DescriptionMilvus() milvus client

1.3. API reference 11

Page 16: pymilvus - Read the Docs

pymilvus, Release 0.3.0

1.3.1.2 Methods

API Descriptioncreate_collection() Create a collection.has_collection() Check if collection exists.get_collection_info() Obtain collection information.count_entities() Obtain the number of entity in a collection.list_collections() Get the list of collections.get_collection_stats() Obtain collection statistics information.load_collection() Load collection from disk to memory.drop_collection() drop a collection.insert() insert entities into specified collection.get_entity_by_id() Obtain entities by providing entity ids.list_id_in_segment() Obtain the list of ids in specified segment.create_index() Create an index on specified field.drop_index() Drop index on specified field.create_partition() Create a partition under specified collection.has_partition() Check if specified partition exists under a collection.list_partitions() Obtain list of partitions under a collection.drop_partition() Drop specified partition under a collection.search() Search approximate nearest entities.search_in_segment() Search approximate nearest entities in specified segments.delete_entity_by_id() Delete entities by providing entity ids.flush() Flush collection data from memory to storage.compact() Compact specified collection.

1.3.1.3 APIs

class milvus.Milvus(host=None, port=None, handler='GRPC', pool='SingletonThread', **kwargs)

compact(collection_name, threshold=0.2, timeout=None, **kwargs)Compacts a specified collection. After deleting some data in a segment, you can call compact to free upthe disk space occupied by the deleted data. Calling compact also deletes empty segments, but does notmerge segments.

Parameters

• collection_name (str) – The name of the collection to compact.

• threshold – The threshold for compact. When the percentage of deleted entities in asegment is below the threshold, the server skips this segment when compacting the collec-tion. The default value is 0.2, range is [0, 1].

Returns Status of compact request. The compact request will still execute successfully if serverskip some of collections, in this case the returned status will differ. Note that in currentversion his is an EXPERIMENTAL function.

Return type Status.

Raises RpcError: If grpc encounter an error ParamError: If parameters are invalid BaseExcep-tion: If the return result from server is not ok

count_entities(collection_name, timeout=30)Returns the number of entities in a specified collection.

12 Chapter 1. Overview

Page 17: pymilvus - Read the Docs

pymilvus, Release 0.3.0

Parameters collection_name (str) – The name of the collection to count entities of.

Returns The number of entities

Return type int

Raises RpcError: If grpc encounter an error ParamError: If parameters are invalid BaseExcep-tion: If the return result from server is not ok

create_collection(collection_name, fields, timeout=30)Creates a collection.

Parameters collection_name – The name of the collection. A collection name can onlyinclude

numbers, letters, and underscores, and must not begin with a number. :type str :param fields: Field param-eters. :type fields: str

Raises RpcError: If grpc encounter an error ParamError: If parameters are invalid BaseExcep-tion: If the return result from server is not ok

create_index(collection_name, field_name, params, timeout=None, **kwargs)Creates an index for a field in a specified collection. Milvus does not support creating multiple indexes fora field. In a scenario where the field already has an index, if you create another one that is equivalent (interms of type and parameters) to the existing one, the server returns this index to the client; otherwise, theserver replaces the existing index with the new one.

Parameters

• collection_name (str) – The name of the collection to create field indexes.

• field_name (str) – The name of the field to create an index for.

• params (dict) – Indexing parameters.

Raises RpcError: If grpc encounter an error ParamError: If parameters are invalid BaseExcep-tion: If the return result from server is not ok

create_partition(collection_name, partition_tag, timeout=30)Creates a partition in a specified collection. You only need to import the parameters of partition_tag tocreate a partition. A collection cannot hold partitions of the same tag, whilst you can insert the same tagin different collections.

Parameters

• collection_name (str) – The name of the collection to create partitions in.

• partition_tag (str) – Name of the partition.

• partition_tag – The tag name of the partition.

Raises RpcError: If grpc encounter an error ParamError: If parameters are invalid BaseExcep-tion: If the return result from server is not ok

delete_entity_by_id(collection_name, ids, timeout=None)Deletes the entities specified by a given list of IDs.

Parameters

• collection_name (str) – The name of the collection to remove entities from.

• ids (list[int]) – A list of IDs of the entities to delete.

Returns Status of delete request. The delete request will still execute successfully if Some ofids may not exist in specified collection, in this case the returned status will differ. Note thatin current version his is an EXPERIMENTAL function.

1.3. API reference 13

Page 18: pymilvus - Read the Docs

pymilvus, Release 0.3.0

Return type Status.

Raises RpcError: If grpc encounter an error ParamError: If parameters are invalid BaseExcep-tion: If the return result from server is not ok

drop_collection(collection_name, timeout=30)Deletes a specified collection.

Parameters collection_name (str) – The name of the collection to delete.

Raises RpcError: If grpc encounter an error ParamError: If parameters are invalid BaseExcep-tion: If the return result from server is not ok

drop_index(collection_name, field_name, timeout=30)Removes the index of a field in a specified collection.

Parameters

• collection_name (str) – The name of the collection to remove the field index from.

• field_name (str) – The name of the field to remove the index of.

Raises RpcError: If grpc encounter an error ParamError: If parameters are invalid BaseExcep-tion: If the return result from server is not ok

drop_partition(collection_name, partition_tag, timeout=30)Deletes the specified partitions in a collection.

Parameters

• collection_name (str) – The name of the collection to delete partitions from.

• partition_tag (str) – The tag name of the partition to delete.

Raises RpcError: If grpc encounter an error ParamError: If parameters are invalid BaseExcep-tion: If the return result from server is not ok

flush(collection_name_array=None, timeout=None, **kwargs)Flushes data in the specified collections from memory to disk. When you insert or delete data, the serverstores the data in the memory temporarily and then flushes it to the disk at fixed intervals. Calling flushensures that the newly inserted data is visible and the deleted data is no longer recoverable.

Parameters collection_name_array (An array of names of thecollections to flush.) – list[str]

Raises RpcError: If grpc encounter an error ParamError: If parameters are invalid BaseExcep-tion: If the return result from server is not ok

get_collection_info(collection_name, timeout=30)Returns information of a specified collection, including field information of the collection and index infor-mation of fields.

Parameters collection_name (str) – The name of the collection to describe.

Returns The information of collection to describe.

Return type dict

Raises RpcError: If grpc encounter an error ParamError: If parameters are invalid BaseExcep-tion: If the return result from server is not ok

get_collection_stats(collection_name, timeout=30)Returns statistical information about a specified collection, including the number of entities and the storagesize of each segment of the collection.

Parameters collection_name (str) – The name of the collection to get statistics about.

14 Chapter 1. Overview

Page 19: pymilvus - Read the Docs

pymilvus, Release 0.3.0

Returns The collection stats.

Return type dict

Raises RpcError: If grpc encounter an error ParamError: If parameters are invalid BaseExcep-tion: If the return result from server is not ok

get_entity_by_id(collection_name, ids, fields=None, timeout=None)Returns the entities specified by given IDs.

Parameters

• collection_name (str) – The name of the collection to retrieve entities from.

• ids (list[int]) – A list of IDs of the entities to retrieve.

Returns The entities specified by given IDs.

Return type Entities

Raises RpcError: If grpc encounter an error ParamError: If parameters are invalid BaseExcep-tion: If the return result from server is not ok

has_collection(collection_name, timeout=30)Checks whether a specified collection exists.

Parameters collection_name (str) – The name of the collection to check.

Returns If specified collection exists

Return type bool

Raises RpcError: If grpc encounter an error ParamError: If parameters are invalid BaseExcep-tion: If the return result from server is not ok

has_partition(collection_name, partition_tag, timeout=30)Checks if a specified partition exists in a collection.

Parameters

• collection_name (str) – The name of the collection to find the partition in.

• partition_tag (str) – The tag name of the partition to check

Returns Whether a specified partition exists in a collection.

Return type bool

Raises RpcError: If grpc encounter an error ParamError: If parameters are invalid BaseExcep-tion: If the return result from server is not ok

insert(collection_name, entities, ids=None, partition_tag=None, params=None, timeout=None,**kwargs)

Inserts entities in a specified collection.

Parameters

• collection_name (str.) – The name of the collection to insert entities in.

• entities (list) – The entities to insert.

• ids (list[int]) – The list of ids corresponding to the inserted entities.

• partition_tag (str) – The name of the partition to insert entities in. The defaultvalue is None. The server stores entities in the “_default” partition by default.

Returns list of ids of the inserted vectors.

Return type list[int]

1.3. API reference 15

Page 20: pymilvus - Read the Docs

pymilvus, Release 0.3.0

Raises RpcError: If grpc encounter an error ParamError: If parameters are invalid BaseExcep-tion: If the return result from server is not ok

list_collections(timeout=30)Returns a list of all collection names.

Returns List of collection names, return when operation is successful

Return type list[str]

Raises RpcError: If grpc encounter an error ParamError: If parameters are invalid BaseExcep-tion: If the return result from server is not ok

list_id_in_segment(collection_name, segment_id, timeout=None)Returns all entity IDs in a specified segment.

Parameters

• collection_name (str) – The name of the collection that contains the specified seg-ment

• segment_id (int) – The ID of the segment. You can get segment IDs by calling theget_collection_stats() method.

Returns List of IDs in a specified segment.

Return type list[int]

Raises RpcError: If grpc encounter an error ParamError: If parameters are invalid BaseExcep-tion: If the return result from server is not ok

list_partitions(collection_name, timeout=30)Returns a list of all partition tags in a specified collection.

Parameters collection_name (str) – The name of the collection to retrieve partition tagsfrom.

Returns A list of all partition tags in specified collection.

Return type list[str]

Raises RpcError: If grpc encounter an error ParamError: If parameters are invalid BaseExcep-tion: If the return result from server is not ok

load_collection(collection_name, timeout=None)

Loads a specified collection from disk to memory.

Parameters collection_name (str) – The name of the collection to load.

Raises RpcError: If grpc encounter an error ParamError: If parameters are invalid BaseExcep-tion: If the return result from server is not ok

search(collection_name, dsl, partition_tags=None, fields=None, timeout=None, **kwargs)Searches a collection based on the given DSL clauses and returns query results.

Parameters

• collection_name (str) – The name of the collection to search.

• dsl (dict) – The DSL that defines the query.

• partition_tags (list[str]) – The tags of partitions to search.

• fields (list[str]) – The fields to return in the search result

16 Chapter 1. Overview

Page 21: pymilvus - Read the Docs

pymilvus, Release 0.3.0

Returns Query result.

Return type QueryResult

Raises RpcError: If grpc encounter an error ParamError: If parameters are invalid BaseExcep-tion: If the return result from server is not ok

search_in_segment(collection_name, segment_ids, dsl, fields=None, timeout=None, **kwargs)Searches in the specified segments of a collection.

The Milvus server stores entity data into multiple files. Searching for entities in specific files is a methodused in Mishards. Obtain more detail about Mishards, see <a href=”https://github.com/milvus-io/milvus/tree/master/shards”>

Parameters

• collection_name (str:param collection_name: table name beenqueried) – The name of the collection to search.

• dsl (dict) – The DSL that defines the query.:type collection_name: str

• partition_tags (list[str]:param file_ids: Specified filesid array) – The tags of partitions to search.

• fields (list[str]:type query_records: list[list[float]]) – Thefields to return in the search result

Returns Query result.

Return type QueryResult

Raises RpcError: If grpc encounter an error ParamError: If parameters are invalid BaseExcep-tion: If the return result from server is not ok

1.4 Index

1.4.1 Intro to Index

For more detailed information about indexes, please refer to Milvus documentation index chapter.

To learn how to choose an appropriate index for your application scenarios, please read How to Select an Index inMilvus.

To learn how to choose an appropriate index for a metric, see Distance Metrics.

• IVF_FLAT

• BIN_IVF_FLAT

• IVF_PQ

• IVF_SQ8

• IVF_SQ8_HYBRID

• ANNOY

• HNSW

• RHNSW_PQ

• RHNSW_SQ

• NSG

1.4. Index 17

Page 22: pymilvus - Read the Docs

pymilvus, Release 0.3.0

1.4.1.1 IVF_FLAT

IVF (Inverted File) is an index type based on quantization. It divides the points in space into nlist units by clusteringmethod. During searching vectors, it compares the distances between the target vector and the center of all the units,and then select the nprobe nearest unit. Then, it compares all the vectors in these selected cells to get the final result.

IVF_FLAT is the most basic IVF index, and the encoded data stored in each unit is consistent with the original data.

• building parameters:

nlist: Number of cluster units.

# IVF_FLATclient.create_index(collection_name, field_name, {

"index_type": "IVF_FLAT","metric_type": "L2", # one of L2, IP"params": {

"nlist": 100 # int. 1~65536}

})

• search parameters:

nprobe: Number of inverted file cell to probe.

# IVF_FLATvector_query = {

"field_name": {"topk": top_k,"query": queries,"metric_type": "L2", # one of L2, IP"params": {

"nprobe": 8 # int. 1~nlist(cpu), 1~min[2048, nlist](gpu)}

}}

1.4.1.2 BIN_IVF_FLAT

BIN_IVF_FLAT is a binary variant of IVF_FLAT.

• building parameters:

nlist: Number of cluster units.

# BIN_IVF_FLATclient.create_index(collection_name, field_name, {

"index_type": "BIN_IVF_FLAT","metric_type": "jaccard", # one of jaccard, hamming, tanimoto"params": {

"nlist": 100 # int. 1~65536}

})

• search parameters:

nprobe: Number of inverted file cell to probe.

18 Chapter 1. Overview

Page 23: pymilvus - Read the Docs

pymilvus, Release 0.3.0

# BIN_IVF_FLATvector_query = {

"field_name": {"topk": top_k,"query": queries,"metric_type": "jaccard", # one of jaccard, hamming, tanimoto"params": {

"nprobe": 8 # int. 1~nlist(cpu), 1~min[2048, nlist](gpu)}

}}

1.4.1.3 IVF_PQ

PQ (Product Quantization) uniformly decomposes the original high-dimensional vector space into Cartesian productsof m low-dimensional vector spaces, and then quantizes the decomposed low-dimensional vector spaces. Instead ofcalculating the distances between the target vector and the center of all the units, product quantization enables thecalculation of distances between the target vector and the clustering center of each low-dimensional space and greatlyreduces the time complexity and space complexity of the algorithm.

IVF_PQ performs IVF index clustering, and then quantizes the product of vectors. Its index file is even smaller thanIVF_SQ8, but it also causes a loss of accuracy during searching.

• building parameters:

nlist: Number of cluster units.

m: Number of factors of product quantization. CPU-only Milvus: m dim (mod m); GPU-enabled Milvus:m {1, 2, 3, 4, 8, 12, 16, 20, 24, 28, 32, 40, 48, 56, 64, 96}, and (dim / m) {1, 2, 3, 4, 6, 8, 10, 12, 16, 20, 24, 28,32}. (m x 1024) MaxSharedMemPerBlock of your graphics card.

# IVF_PQclient.create_index(collection_name, field_name, {

"index_type": "IVF_PQ","metric_type": "L2", # one of L2, IP"params": {

"nlist": 100, # int. 1~65536"m": 8

}})

• search parameters:

nprobe: Number of inverted file cell to probe.

# IVF_PQvector_query = {

"field_name": {"topk": top_k,"query": queries,"metric_type": "L2", # one of L2, IP"params": {

"nprobe": 8 # int. 1~nlist(cpu), 1~min[2048, nlist](gpu)}

}}

1.4. Index 19

Page 24: pymilvus - Read the Docs

pymilvus, Release 0.3.0

1.4.1.4 IVF_SQ8

IVF_SQ8 does scalar quantization for each vector placed in the unit based on IVF. Scalar quantization converts eachdimension of the original vector from a 4-byte floating-point number to a 1-byte unsigned integer, so the IVF_SQ8index file occupies much less space than the IVF_FLAT index file. However, scalar quantization results in a loss ofaccuracy during searching vectors.

• building parameters:

nlist: Number of cluster units.

# IVF_SQ8client.create_index(collection_name, field_name, {

"index_type": "IVF_SQ8","metric_type": "L2", # one of L2, IP"params": {

"nlist": 100 # int. 1~65536}

})

• search parameters:

nprobe: Number of inverted file cell to probe.

# IVF_SQ8vector_query = {

"field_name": {"topk": top_k,"query": queries,"metric_type": "L2", # one of L2, IP"params": {

"nprobe": 8 # int. 1~nlist(cpu), 1~min[2048, nlist](gpu)}

}}

1.4.1.5 IVF_SQ8_HYBRID

Optimized version of IVF_SQ8 that requires both CPU and GPU to work. Unlike IVF_SQ8, IVF_SQ8H uses aGPU-based coarse quantizer, which greatly reduces time to quantize.

IVF_SQ8H is an IVF_SQ8 index that optimizes query execution.

The query method is as follows:

• If nq gpu_search_threshold, GPU handles the entire query task.

• If nq < gpu_search_threshold, GPU handles the task of retrieving the nprobe nearest unit in the IVFindex file, and CPU handles the rest.

• building parameters:

nlist: Number of cluster units.

# IVF_SQ8_HYBRIDclient.create_index(collection_name, field_name, {

"index_type": "IVF_SQ8_HYBRID","metric_type": "L2", # one of L2, IP"params": {

(continues on next page)

20 Chapter 1. Overview

Page 25: pymilvus - Read the Docs

pymilvus, Release 0.3.0

(continued from previous page)

"nlist": 100 # int. 1~65536}

})

• search parameters:

nprobe: Number of inverted file cell to probe.

# IVF_SQ8_HYBRIDvector_query = {

"field_name": {"topk": top_k,"query": queries,"metric_type": "L2", # one of L2, IP"params": {

"nprobe": 8 # int. 1~nlist(cpu), 1~min[2048, nlist](gpu)}

}}

1.4.1.6 ANNOY

ANNOY (Approximate Nearest Neighbors Oh Yeah) is an index that uses a hyperplane to divide a high-dimensionalspace into multiple subspaces, and then stores them in a tree structure.

When searching for vectors, ANNOY follows the tree structure to find subspaces closer to the target vector, andthen compares all the vectors in these subspaces (The number of vectors being compared should not be less thansearch_k) to obtain the final result. Obviously, when the target vector is close to the edge of a certain subspace,sometimes it is necessary to greatly increase the number of searched subspaces to obtain a high recall rate. There-fore, ANNOY uses n_trees different methods to divide the whole space, and searches all the dividing methodssimultaneously to reduce the probability that the target vector is always at the edge of the subspace.

• building parameters:

n_trees: The number of methods of space division.

# ANNOYclient.create_index(collection_name, field_name, {

"index_type": "ANNOY","metric_type": "L2", # one of L2, IP"params": {

"n_trees": 8 # int. 1~1024}

})

• search parameters:

search_k: The number of nodes to search. -1 means 5% of the whole data.

# ANNOYvector_query = {

"field_name": {"topk": top_k,"query": queries,"metric_type": "L2", # one of L2, IP"params": {

(continues on next page)

1.4. Index 21

Page 26: pymilvus - Read the Docs

pymilvus, Release 0.3.0

(continued from previous page)

"search_k": -1 # int. {-1} U [top_k, n*n_trees], n represents vectors→˓count.

}}

}

1.4.1.7 HNSW

HNSW (Hierarchical Navigable Small World Graph) is a graph-based indexing algorithm. It builds a multi-layernavigation structure for an image according to certain rules. In this structure, the upper layers are more sparse and thedistances between nodes are farther; the lower layers are denser and he distances between nodes are closer. The searchstarts from the uppermost layer, finds the node closest to the target in this layer, and then enters the next layer to beginanother search. After multiple iterations, it can quickly approach the target position.

In order to improve performance, HNSW limits the maximum degree of nodes on each layer of the graph to M. Inaddition, you can use efConstruction (when building index) or ef (when searching targets) to specify a searchrange.

• building parameters:

M: Maximum degree of the node.

efConstruction: Take the effect in stage of index construction.

# HNSWclient.create_index(collection_name, field_name, {

"index_type": "HNSW","metric_type": "L2", # one of L2, IP"params": {

"M": 16, # int. 4~64"efConstruction": 40 # int. 8~512

}})

• search parameters:

ef: Take the effect in stage of search scope, should be larger than top_k.

# HNSWvector_query = {

"field_name": {"topk": top_k,"query": queries,"metric_type": "L2", # one of L2, IP"params": {

"ef": 64 # int. top_k~32768}

}}

22 Chapter 1. Overview

Page 27: pymilvus - Read the Docs

pymilvus, Release 0.3.0

1.4.1.8 RHNSW_PQ

RHNSW_PQ is a variant index type combining PQ and HNSW. It first uses PQ to quantize the vector, then usesHNSW to quantize the PQ quantization result to get the index.

• building parameters:

M: Maximum degree of the node.

efConstruction: Take effect in stage of index construction.

PQM: m for PQ.

# RHNSW_PQclient.create_index(collection_name, field_name, {

"index_type": "RHNSW_PQ","metric_type": "L2","params": {

"M": 16, # int. 4~64"efConstruction": 40, # int. 8~512"PQM": 8, # int. CPU only. PQM = dim (mod m)

}})

• search parameters:

ef: Take the effect in stage of search scope, should be larger than top_k.

# RHNSW_PQvector_query = {

"field_name": {"topk": top_k,"query": queries,"metric_type": "L2", # one of L2, IP"params": {

"ef": 64 # int. top_k~32768}

}}

1.4.1.9 RHNSW_SQ

RHNSW_SQ is a variant index type combining SQ and HNSW. It first uses SQ to quantize the vector, then usesHNSW to quantize the SQ quantization result to get the index.

• building parameters:

M: Maximum degree of the node.

efConstruction: Take effect in stage of index construction, search scope.

# RHNSW_SQclient.create_index(collection_name, field_name, {

"index_type": "RHNSW_SQ","metric_type": "L2", # one of L2, IP"params": {

"M": 16, # int. 4~64"efConstruction": 40 # int. 8~512

(continues on next page)

1.4. Index 23

Page 28: pymilvus - Read the Docs

pymilvus, Release 0.3.0

(continued from previous page)

}})

• search parameters:

ef: Take the effect in stage of search scope, should be larger than top_k.

# RHNSW_SQvector_query = {

"field_name": {"topk": top_k,"query": queries,"metric_type": "L2", # one of L2, IP"params": {

"ef": 64 # int. top_k~32768}

}}

1.4.1.10 NSG

NSG (Refined Navigating Spreading-out Graph) is a graph-based indexing algorithm. It sets the center position ofthe whole image as a navigation point, and then uses a specific edge selection strategy to control the out-degree ofeach point (less than or equal to out_degree). Therefore, it can reduce memory usage and quickly locate the targetposition nearby during searching vectors.

The graph construction process of NSG is as follows:

1. Find knng nearest neighbors for each point.

2. Iterate at least search_length times based on knng nearest neighbor nodes to selectcandidate_pool_size possible nearest neighbor nodes.

3. Construct the out-edge of each point in the selected candidate_pool_size nodes according to the edgeselection strategy.

The query process is similar to the graph building process. It starts from the navigation point and iterates at leastsearch_length times to get the final result.

• building parameters:

search_length: Number of query iterations.

out_degree: Maximum out-degree of the node.

candidate_pool_size: Candidate pool size of the node.

knng: Number of nearest neighbors

# NSGclient.create_index(collection_name, field_name, {

"index_type": "NSG","metric_type": "L2","params": {

"search_length": 60, # int. 10~300"out_degree": 30, # int. 5~300"candidate_pool_size": 300, # int. 50~1000"knng": 50 # int. 5~300

(continues on next page)

24 Chapter 1. Overview

Page 29: pymilvus - Read the Docs

pymilvus, Release 0.3.0

(continued from previous page)

}})

• search parameters:

search_length: Number of query iterations

# NSGvector_query = {

"field_name": {"topk": top_k,"query": queries,"metric_type": "L2", # one of L2, IP"params": {

"search_length": 100 # int. 10~300}

}}

Section author: Godchen@milvus

1.5 Query DSL

Inspired by ElasticSearch, Milvus provides a Query DSL(Domain Specific Language) consisting of two types ofclauses to define queries:

1.5.1 Leaf query clauses

Leaf query clauses look for a particular value in a particular field. Currently milvus support term, range, vectorqueries.

• : term query matches the entities whose specified field value are in the specified list. Theformat is "term": {"$field": [$value1, $value2, ...]} or "term": {"$field":{"values": [$value1, $value2, ...]}}

• : range query matches the entities whose specified field value are under the specified range. The format is"range": {"$field": {"$range-type": $value1, ... }}. The supported range typesare:

– “GT”(greater than)

– “GTE”(greater than or equal)

– “LT”(less than)

– “LTE”(less than or equal).

• : vector query only takes effect on vector fields, and approximately search nearest vectors. The format shouldbe {"topk": $topk, "query": $vectors, "params": {...}, "metric_type":$metric}.

Here, topk is the number of approximately nearest top-k entities for each query vector. query is a list ofquery vectors. params is search parameters, and metric_type indicates which distance computation typeto be used. metric_type is not necessary if it is specified when creating index.

Note that, the ``term`` and ``range`` query act as filter queries for ``vector`` query, they are pre-filtered.

1.5. Query DSL 25

Page 30: pymilvus - Read the Docs

pymilvus, Release 0.3.0

1.5.2 Compound query clauses

Currently, milvus only support boolean query, i.e. bool. There are three types:

Types Descriptionmust The clause must appear in matching entities.must_not The clause must not appear in matching entities.should At least one query in the clause must appear in matching entities.

1.5.3 Examples

Here are some examples:

• Example 1

In this example, we demonstrate writing a dsl. Assume that the target collection contains two scalarfield named ‘A’ and ‘B’, and a vector field named ‘Vec’ with dim 2. We require the result satisfy theconditions as follows:

1. The value of field ‘A’ is in [1, 2, 5].

2. The value of field ‘B’ is in the range between 1 and 100.

To satisfy condition 1, we need a term query which is:

term_query = {"term": {

"A": [1, 2, 5]}

}

To satisfy condition 2, we need a range query which is:

range_query = {"range": {

"B": {"GT": 1,"LT": 100

}}

}

Besides, we want to find the results which are top 10 most approximate vectors compared with [0.1, 0.2] in the filteredresults. So a possible vector query is like:

vector_query = {"vector": {

"topk": 10,"query": [[0.1, 0.2]],"metric_type": "L2","params": {

"nprobe": 10 # assume "Vec" field has been indexed by 'IVF_FLAT'}

}}

26 Chapter 1. Overview

Page 31: pymilvus - Read the Docs

pymilvus, Release 0.3.0

To satisfy the two conditions at the same time, we need a boolean query must , under which the term_query ,range_query and vector_query lie.

According to the above, we can output the dsl:

dsl = {"bool": {

"must": [term_query, range_query, vector_query]}

}

The full view is:

{"bool": {

"must":[{

"term": {"A": [1, 2, 5]

}},{

"range": {"B": {

"GT": 1,"LT": 100

}}

},{

"vector": {"Vec": {

"topk": 10,"query": [[0.1, 0.2]],"metric_type": "L2","params": {

"nprobe": 10}

}}

}]

}}

For each query vector, the results are sorted by distance in descending order.

• Example 2

In this example, we demonstrate writing a dsl with should query. Assume that the target col-lection contains two scalar field named rating and release_year, and a vector field namedembedding with dim 2. We require the result satisfy the conditions as follows:

1. The rating is high enough or recently released, which means rating >= 8.0 or release_year is 2019or 2020.

To satisfy condition 1, we will use should query including a range query and a term query as below.

1.5. Query DSL 27

Page 32: pymilvus - Read the Docs

pymilvus, Release 0.3.0

range_query = {"range": {

"rating": {"gte": 8.0}}

},

term_query = {"term": {

"release_year": [2019, 2020]}

}

should_query = { "should": [ range_query, term_query ] }

This is a typical vector query.

vector_query = {"vector": {

"topk": 10,"query": [[0.1, 0.2]],"metric_type": "L2","params": {

"nprobe": 10 # assume "Vec" field has been indexed by 'IVF_FLAT'}

}}

If we want to combine these two querys, the recommended pattern is:

dsl = {"bool": {

"must": [{ "must": [ should_query ] },{ "must": [ vector_query ] }

]}

}

The full view is:

{"bool": {

"must": [{

"must": [{

"should": [{

"range": {"rating": {"gte": 8.0}

}},{

"term": {"release_year": [2019, 2020]

}}

(continues on next page)

28 Chapter 1. Overview

Page 33: pymilvus - Read the Docs

pymilvus, Release 0.3.0

(continued from previous page)

]}

]},{

"must": [{

"vector": {"embedding": {

"topk": 10,"query": [[0.1, 0.2]],"metric_type": "L2","params": {

"nprobe": 10}

}}

}]

}]

}}

1.5.4 Constraints

Caution: The dsl clause abide by the follow rules.

• vector query cannot belong should and must_not. The follow clauses are not permitted:

# This is an invalid clause because `vector` is under `should`{

"should": ["vector": {...},...

]}

# This is an invalid clause because `vector` is under `must_not`{

"must_not": ["vector": {...},...

]}

• bool query cannot have a should or must_not clause directly. The follow clauses are not permitted:

# This is an invalid clause because `should` is under `bool`{

"bool": {"should": [...]

(continues on next page)

1.5. Query DSL 29

Page 34: pymilvus - Read the Docs

pymilvus, Release 0.3.0

(continued from previous page)

}}

# This is an invalid clause because `must_not` is under `bool`{

"bool": {"must_not": [...]

}}

• A leaf query cannot combine with compound query in the same clause. The follow clause is not permitted:

# This is an invalid clause because `must` is side by side with `term`{

"bool": {"must": [...],"term": {...}

}}

• The whole clause must and can only contain a vector query. The follow clauses are not permitted:

# This is an invalid clause because `vector` not exists.{

"bool": {"must": [

{"term": {...}

},{

"range": {...}}

]}

}

# This is an invalid clause because there are `vector` queries.{

"bool": {"must": [

{"vector": {...}

},{

"vector": {...}}

]}

}

Section author: Bosszou@milvus

30 Chapter 1. Overview

Page 35: pymilvus - Read the Docs

pymilvus, Release 0.3.0

1.6 Search results

1.6.1 How to deal with search results

The invocation of search() is like this:

>>> results = client.search('demo', dsl)

The result object can be used as a 2-D array. results[i] (0 <= i < len(results)) represents topk results of i-th queryvector, and results[i][j] (0 <= j < len( results[i] )) represents j-th result of i-th query vector. To get result id anddistance, you can invoke like this:

>>> id = results[i][j].id>>> distance = results[i][j].distance

The results object can be iterated, so you can traverse the results with two-level loop:

>>> for raw_result in results:... for result in raw_result:... id = result.id # result id... distance = result.distance

You can obtain field values in the results object, but first you need to specify fields you wanted before search:

>>> results = client.search('demo', dsl, fields=['A', 'B']) # specify wanted fields→˓are 'A' and 'B'

then you can obtain them:

>>> value_A = results[i][j].entity.get('A')

or

>>> value_A = getattr(results[i][j].entity, 'A')

or

>>> value_A = results[i][j].entity.value_of_field('A')

Section author: Bosszou@milvus

1.7 Changelog

v0.3.0

• Incompatibly upgrade APIs for supporting hybrid functionality. The passing parameters of the following APIshas been changed:

– create_collection()

– insert()

– create_index()

– drop_index()

– search()

1.6. Search results 31

Page 36: pymilvus - Read the Docs

pymilvus, Release 0.3.0

– search_in_segment()

– get_entity_by_id()

• Add passing parameter threshold in API compact()

• Raise exception if the status of returned values from server is not OK for all the APIs.

• Remove parameter status in return values of APIs except compact() and delete_entity_by_id() where statusis reserved experimentally.

Section author: Bosszou@milvus

1.8 FAQ

• I’m getting random “socket operation on non-socket” errors from gRPC.

• How to fix the error when I install PyMilvus on Windows?

1.8.1 I’m getting random “socket operation on non-socket” errors from gRPC.

Make sure to set the environment variable GRPC_ENABLE_FORK_SUPPORT=1. For reference, see this post.

1.8.2 How to fix the error when I install PyMilvus on Windows?

Try installing PyMilvus in a Conda environment.

Section author: Yangxuan@milvus

1.9 Contributing

• Open Issues

• Submit Pull Requests

• Github workflow

• Contribution Guideline

Contributing is warmly welcomed. You can contribute to PyMilvus project by opening issues and submitting pullrequests on PyMilvus Github page.

1.9.1 Open Issues

To request a new feature, report a bug or ask a question, it’s recommended for you to open an issue.

For a feature You can tell us why you need it and we will decide whether to implement it soon. If we think it’s agood improvement, we will make it a feature request and start to work on it. It’s also welcomed for you to openan issue with your PR as a solution.

For a bug You need to tell us as much information as possible, better start with our bug report template. Withinformation, we can reproduce the bug easily and solve it later.

For a question It’s welcomed to ask any questions about PyMilvus and Milvus, we are pleased to communicate withyou.

32 Chapter 1. Overview

Page 37: pymilvus - Read the Docs

pymilvus, Release 0.3.0

1.9.2 Submit Pull Requests

If you have improvements to PyMilvus, please submit pull requests(PR) to master, see workflow below.

PR for codes, you need to tell us why we need it, mentioning an existing issue would be better.

PR for docs, you also need to tell us why we need it.

Your PRs will be reviewed and checked, merged into our project if approved.

1.9.3 Github workflow

This is a brief instruction of Github workflow for beginners.

• Fork the PyMilvus repository on Github.

• Clone your fork to your local machine with git clone [email protected]:<your_user_name>/pymilvus.git.

• Create a new branch with git checkout -b my_working_branch.

• Make your changes, commit, then push to your forked repository.

• Visit Github and make you PR.

If you already have an existing local repository, always update it before you start to make changes like below:

$ git remote add upstream [email protected]:milvus-io/pymilvus.git$ git checkout master$ git pull upstream master$ git checkout -b my_working_branch

1.9.4 Contribution guideline

1. Update CHANGELOG.md

2. Add unit tests for your codes

3. Pass pylint check

4. For documentations

You need to enter the doc directory and run make html, please refer to About this documentations.

Section author: Yangxuan@milvus

1.10 About this documentation

This documentation is generated using the Sphinx documentation generator. The source files for the documentationare located in the doc/ directory of the pymilvus distribution. To generate the docs locally run the following commandunder directory doc/ :

$ make html

The documentation should be generated under directory build/html.

To preview it, you can open index.html in your browser.

Or run a web server in that directory:

1.10. About this documentation 33

Page 38: pymilvus - Read the Docs

pymilvus, Release 0.3.0

$ python3 -m http.server

Then open your browser to http://localhost:8000.

Section author: Bosszou@milvus

Section author: Bosszou@milvus, Godchen@milvus, Yangxuan@milvus

34 Chapter 1. Overview

Page 39: pymilvus - Read the Docs

INDEX

Ccompact() (milvus.Milvus method), 12count_entities() (milvus.Milvus method), 12create_collection() (milvus.Milvus method), 13create_index() (milvus.Milvus method), 13create_partition() (milvus.Milvus method), 13

Ddelete_entity_by_id() (milvus.Milvus method),

13drop_collection() (milvus.Milvus method), 14drop_index() (milvus.Milvus method), 14drop_partition() (milvus.Milvus method), 14

Fflush() (milvus.Milvus method), 14

Gget_collection_info() (milvus.Milvus method),

14get_collection_stats() (milvus.Milvus

method), 14get_entity_by_id() (milvus.Milvus method), 15

Hhas_collection() (milvus.Milvus method), 15has_partition() (milvus.Milvus method), 15

Iinsert() (milvus.Milvus method), 15

Llist_collections() (milvus.Milvus method), 16list_id_in_segment() (milvus.Milvus method),

16list_partitions() (milvus.Milvus method), 16load_collection() (milvus.Milvus method), 16

MMilvus (class in milvus), 12

Ssearch() (milvus.Milvus method), 16search_in_segment() (milvus.Milvus method), 17

35