Vector stores

The vector store is responsible for storing the embeddings of the chunks and retrieving the most relevant chunks for a given query. Every vector store in Dingo must be a subclass of the agent_dingo.rag.base.BaseVectorStore class. The vector store should implement two methods:

upsert_chunks: which takes a list of (embedded) chunks and stores them in the database.
retrieve: which takes an embedded query and returns a list of the most relevant chunks.

Currently, Dingo supports two major open-sorce vector databases: Qdrant and Chroma DB.

from agent_dingo.rag.vector_stores.qdrant import Qdrant

chunks = [...] # list of embedded chunks

query = "query"
query_embedding = embedder.embed([query])[0]

vector_store = Qdrant(collection_name="collection_name", embedding_size=384)
vector_store.upsert_chunks(chunks)
retrieved_chunks = vector_store.retrieve(k = 1, query = query_embedding)
print(retrieved_chunks)
#RetrievedChunk(content='...', document_metadata={'source': 'file.docx', 'paragraph': 7}, score=0.913)

Supported parameters:

agent_dingo.rag.vector_stores.qdrant.Qdrant:
- collection_name: the name of the collection in Qdrant.
- embedding_size: the size of the embeddings.
- host: the host of the Qdrant server, by default is None.
- port: the port of the Qdrant server, by default is None.
- path: the path of the Qdrant database (in case of local instance), by default is None, in which case the database is stored in memory.
- url: the full url of the Qdrant hosted qdrant service, by default None.
- api_key: the api key for the Qdrant service, by default None.
- recreate_collection: a boolean flag that determines whether to recreate the collection if it already exists, by default is False.
- upsert_batch_size: the number of chunks to upsert in a single batch, by default is 32.
agent_dingo.rag.vector_stores.chromadb.ChromaDB:
- collection_name: the name of the collection in ChromaDB.
- path: the path of the ChromaDB database, by default is None.
- host: the host of the ChromaDB server, by default is None.
- port: the port of the ChromaDB server, by default is None.
- recreate_collection: a boolean flag that determines whether to recreate the collection if it already exists, by default is False.
- upsert_batch_size: the number of chunks to upsert in a single batch, by default is 32.