yaab.rag¶
Built-in RAG: documents, chunkers, embedders, stores, rerankers, KnowledgeBase.
yaab.rag ¶
Built-in, provider-neutral RAG (retrieval-augmented generation).
Unlike SDKs that delegate RAG to a managed cloud service, YAAB ships the whole pipeline as open, swappable components — and includes governance pieces: per-user/document access control at retrieval, source citations, embedding caching, incremental dedup indexing, retrieval guardrails, and RAG faithfulness evaluation.
from yaab import Agent
from yaab.rag import KnowledgeBase, Document
kb = KnowledgeBase()
kb.add(Document(text="Paris is the capital of France.", source="geo.md"))
agent = Agent("assistant", model="openai/gpt-4o", tools=[kb.as_tool()])
CharacterChunker ¶
Fixed-size character windows with overlap.
ParagraphChunker ¶
Split on blank lines; oversized paragraphs fall back to character windows.
SentenceChunker ¶
Pack whole sentences up to chunk_size characters.
FaithfulnessEvaluator ¶
LLM-judge groundedness: does the answer follow from the context?
Model-agnostic and best-effort: a parse failure returns 0.0 so an ungradeable answer is never silently treated as faithful.
BM25Index ¶
An in-memory Okapi BM25 index over short documents keyed by id.
search ¶
Top-k (doc_id, score) pairs for query, highest first.
KnowledgeBase ¶
A ready-to-use RAG knowledge base over pluggable components.
add ¶
Chunk, embed, and store documents. Returns the chunk count added.
With dedup (default), chunks whose content was already indexed are
skipped — re-ingesting an unchanged corpus is a cheap no-op and repeated
runs don't duplicate context (incremental indexing).
delete ¶
Remove all chunks originating from source. Returns count removed.
reindex ¶
Replace a source's chunks with freshly-ingested ones (incremental update).
retrieve
async
¶
retrieve(query: str, *, k: int = 5, where: Filter | None = None, rerank_top_n: int | None = None) -> list[RetrievedChunk]
Retrieve the top chunks for query (recall → optional rerank).
With hybrid=True the dense (embedding) recall is fused with a sparse
BM25 recall via reciprocal-rank fusion before any reranking, so a query's
exact rare terms reliably surface their chunk.
augment
async
¶
augment(query: str, *, k: int = 5, where: Filter | None = None) -> tuple[str, list[RetrievedChunk]]
Return a context block (with citations) plus the retrieved chunks.
Use for classic context-stuffing RAG: prepend the block to a prompt.
as_tool ¶
as_tool(*, name: str | None = None, description: str | None = None, k: int = 5, scope_from_deps: str | None = None) -> Any
Expose retrieval as an agent tool.
scope_from_deps, if set, names a field read from ctx.deps and
used as a metadata filter value (keyed by the same name) — so an agent
run for user "alice" only retrieves alice's documents.
KnowledgeBaseMemory ¶
A :class:~yaab.memory.MemoryService backed by a :class:KnowledgeBase.
add ingests a memory as a single-chunk document (no splitting — a memory
statement is already atomic); search retrieves the most similar memories
and adapts the RAG RetrievedChunk results back into MemoryRecord /
score tuples so it is a drop-in for InMemoryVectorMemory.
search also accepts app_name / user_id as named parameters so the
:class:~yaab.runner.Runner (which inspects the signature to thread the
run's identity/app scope) and :class:~yaab.memory.manager.MemoryManager
can scope retrieval — namespace filtering pushes down to the store's metadata
where filter, keeping per-user/app isolation cheap even at scale.
add
async
¶
add(text: str, *, metadata: dict | None = None) -> MemoryRecord
Store a memory statement. Returns the corresponding MemoryRecord.
The memory is indexed as a one-chunk document so it survives wherever the
KnowledgeBase's vector store lives. dedup=False because consolidation
is the caller's job (MemoryManager); we never silently drop a write.
search
async
¶
search(query: str, *, k: int = 5, app_name: str | None = None, user_id: str | None = None) -> list[tuple[MemoryRecord, float]]
Retrieve up to k memories most similar to query.
app_name / user_id (when given) become a metadata filter so only
memories in that namespace are returned. Results are adapted from RAG
RetrievedChunks into (MemoryRecord, score) tuples.
CrossEncoderReranker ¶
Cross-encoder reranker (pip install sentence-transformers).
Scores each (query, chunk) pair with a cross-encoder model — the precision
standard for reranking. The model is loaded lazily on first use; pass a
preloaded model to inject one (or for testing).
KeywordReranker ¶
Blend the vector score with query-term overlap (lexical hybrid).
Final score = (1 - weight) * vector_score + weight * lexical_overlap,
where overlap is the fraction of distinct query terms present in the chunk.
LLMReranker ¶
Score each chunk's relevance with a model and keep the top n.
Best-effort and model-agnostic; parsing failures fall back to the original retrieval score so a flaky judge never drops valid context.
InMemoryVectorStore ¶
Process-local vector store over the Rust top-k similarity op.
PgVectorStore ¶
pgvector-backed store (psycopg v3, imported lazily).
Stores chunks in a table with a vector column and a JSONB metadata
column; similarity uses pgvector's <=> cosine-distance operator. Metadata
filters are applied as JSONB containment so per-tenant isolation pushes down
to the database.
VectorStore ¶
Bases: Protocol
Pluggable vector storage + similarity search.
Chunk ¶
Bases: BaseModel
A retrievable unit produced by splitting a :class:Document.
Carries an embedding (filled at index time) and a back-reference to its document so retrieved context can be attributed to a source (citations).
Document ¶
Bases: BaseModel
A source document before chunking: raw text plus metadata.
source is a stable identifier (path, URL, db id) used for lineage and
document-level access control; metadata carries anything else
(app_name, user_id, tags, timestamps).
RetrievedChunk ¶
Bases: BaseModel
A chunk returned from retrieval, with its relevance score.
context_relevance ¶
context_relevance(query: str, chunks: list[RetrievedChunk]) -> float
Fraction of query terms covered by the retrieved context (0–1).
A cheap recall proxy: low values mean retrieval surfaced off-topic context.
faithfulness ¶
faithfulness(answer: str, chunks: list[RetrievedChunk]) -> float
Fraction of answer terms supported by the retrieved context (0–1).
A deterministic groundedness proxy: low values flag answer content that is not present in any retrieved chunk (potential hallucination). Stopword-ish short tokens are ignored to reduce noise.
reciprocal_rank_fusion ¶
Fuse several ranked id-lists into one by reciprocal-rank fusion.
Each list contributes 1 / (k + rank) to every id it ranks (rank starting
at 1). Order-only, so the lists need no comparable scores. Returns
(id, fused_score) sorted highest first.