Skip to content

yaab.rag

Built-in RAG: documents, chunkers, embedders, stores, rerankers, KnowledgeBase.

yaab.rag

Built-in, provider-neutral RAG (retrieval-augmented generation).

Unlike SDKs that delegate RAG to a managed cloud service, YAAB ships the whole pipeline as open, swappable components — and includes governance pieces: per-user/document access control at retrieval, source citations, embedding caching, incremental dedup indexing, retrieval guardrails, and RAG faithfulness evaluation.

from yaab import Agent
from yaab.rag import KnowledgeBase, Document

kb = KnowledgeBase()
kb.add(Document(text="Paris is the capital of France.", source="geo.md"))
agent = Agent("assistant", model="openai/gpt-4o", tools=[kb.as_tool()])

CharacterChunker

Fixed-size character windows with overlap.

ParagraphChunker

Split on blank lines; oversized paragraphs fall back to character windows.

SentenceChunker

Pack whole sentences up to chunk_size characters.

FaithfulnessEvaluator

LLM-judge groundedness: does the answer follow from the context?

Model-agnostic and best-effort: a parse failure returns 0.0 so an ungradeable answer is never silently treated as faithful.

BM25Index

An in-memory Okapi BM25 index over short documents keyed by id.

search

search(query: str, *, k: int = 10) -> list[tuple[str, float]]

Top-k (doc_id, score) pairs for query, highest first.

KnowledgeBase

A ready-to-use RAG knowledge base over pluggable components.

add

add(documents: list[Document] | Document, *, dedup: bool = True) -> int

Chunk, embed, and store documents. Returns the chunk count added.

With dedup (default), chunks whose content was already indexed are skipped — re-ingesting an unchanged corpus is a cheap no-op and repeated runs don't duplicate context (incremental indexing).

delete

delete(*, source: str) -> int

Remove all chunks originating from source. Returns count removed.

reindex

reindex(documents: list[Document] | Document, *, source: str) -> int

Replace a source's chunks with freshly-ingested ones (incremental update).

retrieve async

retrieve(query: str, *, k: int = 5, where: Filter | None = None, rerank_top_n: int | None = None) -> list[RetrievedChunk]

Retrieve the top chunks for query (recall → optional rerank).

With hybrid=True the dense (embedding) recall is fused with a sparse BM25 recall via reciprocal-rank fusion before any reranking, so a query's exact rare terms reliably surface their chunk.

augment async

augment(query: str, *, k: int = 5, where: Filter | None = None) -> tuple[str, list[RetrievedChunk]]

Return a context block (with citations) plus the retrieved chunks.

Use for classic context-stuffing RAG: prepend the block to a prompt.

as_tool

as_tool(*, name: str | None = None, description: str | None = None, k: int = 5, scope_from_deps: str | None = None) -> Any

Expose retrieval as an agent tool.

scope_from_deps, if set, names a field read from ctx.deps and used as a metadata filter value (keyed by the same name) — so an agent run for user "alice" only retrieves alice's documents.

KnowledgeBaseMemory

A :class:~yaab.memory.MemoryService backed by a :class:KnowledgeBase.

add ingests a memory as a single-chunk document (no splitting — a memory statement is already atomic); search retrieves the most similar memories and adapts the RAG RetrievedChunk results back into MemoryRecord / score tuples so it is a drop-in for InMemoryVectorMemory.

search also accepts app_name / user_id as named parameters so the :class:~yaab.runner.Runner (which inspects the signature to thread the run's identity/app scope) and :class:~yaab.memory.manager.MemoryManager can scope retrieval — namespace filtering pushes down to the store's metadata where filter, keeping per-user/app isolation cheap even at scale.

add async

add(text: str, *, metadata: dict | None = None) -> MemoryRecord

Store a memory statement. Returns the corresponding MemoryRecord.

The memory is indexed as a one-chunk document so it survives wherever the KnowledgeBase's vector store lives. dedup=False because consolidation is the caller's job (MemoryManager); we never silently drop a write.

search async

search(query: str, *, k: int = 5, app_name: str | None = None, user_id: str | None = None) -> list[tuple[MemoryRecord, float]]

Retrieve up to k memories most similar to query.

app_name / user_id (when given) become a metadata filter so only memories in that namespace are returned. Results are adapted from RAG RetrievedChunks into (MemoryRecord, score) tuples.

CrossEncoderReranker

Cross-encoder reranker (pip install sentence-transformers).

Scores each (query, chunk) pair with a cross-encoder model — the precision standard for reranking. The model is loaded lazily on first use; pass a preloaded model to inject one (or for testing).

KeywordReranker

Blend the vector score with query-term overlap (lexical hybrid).

Final score = (1 - weight) * vector_score + weight * lexical_overlap, where overlap is the fraction of distinct query terms present in the chunk.

LLMReranker

Score each chunk's relevance with a model and keep the top n.

Best-effort and model-agnostic; parsing failures fall back to the original retrieval score so a flaky judge never drops valid context.

InMemoryVectorStore

Process-local vector store over the Rust top-k similarity op.

PgVectorStore

pgvector-backed store (psycopg v3, imported lazily).

Stores chunks in a table with a vector column and a JSONB metadata column; similarity uses pgvector's <=> cosine-distance operator. Metadata filters are applied as JSONB containment so per-tenant isolation pushes down to the database.

VectorStore

Bases: Protocol

Pluggable vector storage + similarity search.

Chunk

Bases: BaseModel

A retrievable unit produced by splitting a :class:Document.

Carries an embedding (filled at index time) and a back-reference to its document so retrieved context can be attributed to a source (citations).

Document

Bases: BaseModel

A source document before chunking: raw text plus metadata.

source is a stable identifier (path, URL, db id) used for lineage and document-level access control; metadata carries anything else (app_name, user_id, tags, timestamps).

RetrievedChunk

Bases: BaseModel

A chunk returned from retrieval, with its relevance score.

citation

citation() -> str

A short, human-readable source attribution for this chunk.

context_relevance

context_relevance(query: str, chunks: list[RetrievedChunk]) -> float

Fraction of query terms covered by the retrieved context (0–1).

A cheap recall proxy: low values mean retrieval surfaced off-topic context.

faithfulness

faithfulness(answer: str, chunks: list[RetrievedChunk]) -> float

Fraction of answer terms supported by the retrieved context (0–1).

A deterministic groundedness proxy: low values flag answer content that is not present in any retrieved chunk (potential hallucination). Stopword-ish short tokens are ignored to reduce noise.

reciprocal_rank_fusion

reciprocal_rank_fusion(rankings: list[list[str]], *, k: int = 60) -> list[tuple[str, float]]

Fuse several ranked id-lists into one by reciprocal-rank fusion.

Each list contributes 1 / (k + rank) to every id it ranks (rank starting at 1). Order-only, so the lists need no comparable scores. Returns (id, fused_score) sorted highest first.

load

load(path: str) -> list[Document]

Load a file into Documents, dispatching on its extension.

load_bytes

load_bytes(data: bytes, *, source: str, fmt: str = 'text') -> list[Document]

Load from in-memory bytes (e.g. an upload) using a named format.

load_directory

load_directory(directory: str, *, glob: str = '**/*', recursive: bool = True) -> list[Document]

Load every supported file under directory matching glob.