RAG
Embedder
The embedder is responsible for converting the chunks into vectors. Every embedder in Dingo must be a subclass of the agent_dingo.rag.base.BaseEmbedder
class. The embedder should implement two methods:
embed
: which takes a list of strings and returns a list of embeddings.embed_chunks
: which takes a list of chunks and modifies the embedding field of each chunk.
At the moment, Dingo provides open source SentenceTransformer
embedder that uses the Sentence Transformers library, as well as an OpenAIEmbedder
that supports any provider that is compatible with a standard OpenAI REST API.
from agent_dingo.rag.embedders.sentence_transformer import SentenceTransformer
embedder = SentenceTransformer(model_name="paraphrase-MiniLM-L6-v2")
embeddings = embedder.embed(["text 1"])
print(embeddings)
# [[...]]
Supported parameters:
agent_dingo.rag.embedders.sentence_transformer.SentenceTransformer
:model_name
: the model to use, by default isparaphrase-MiniLM-L6-v2
.batch_size
: the number of samples passed in one request, by default is 128.
agent_dingo.rag.embedders.openai.OpenAIEmbedder
:model
: the model to use, by default istext-embedding-3-small
.base_url
: the base URL of the provider. By default is None, which means it will use the OpenAI API.dimensions
: the number of dimensions the resulting output embeddings should have. By default is None, which means that the default dimension size defined by the provider will be used.