RAG (retrieval-augmented generation)¶

YAAB ships RAG built-in and provider-neutral — neither delegated to a managed cloud service nor left entirely to you to assemble. The pipeline mirrors the de-facto standard — Document → Chunk → Embedder → VectorStore → Retriever → Reranker — wrapped in one KnowledgeBase object, and adds the governance pieces the ecosystem still lacks: per-user/document access control at retrieval, source citations, embedding caching, incremental dedup indexing, retrieval guardrails, and faithfulness evaluation.

Quickstart¶

from yaab import Agent, Document, KnowledgeBase

kb = KnowledgeBase()
kb.add(Document(text="Paris is the capital of France.", source="geo.md"))

# Use it as an agent tool — the agent retrieves on demand:
agent = Agent("assistant", model="openai/gpt-4o", tools=[kb.as_tool()])
print(agent.run_sync("What is the capital of France?").output)

Or retrieve directly (classic context-stuffing), with citations:

block, chunks = await kb.augment("capital of France?", k=3)
# block: "[geo.md#0] Paris is the capital of France."

The pipeline (all swappable)¶

from yaab.rag import KnowledgeBase, SentenceChunker, InMemoryVectorStore, KeywordReranker
from yaab.memory.embedders import LiteLLMEmbedder, CachingEmbedder

kb = KnowledgeBase(
    chunker=SentenceChunker(chunk_size=800),
    embedder=CachingEmbedder(LiteLLMEmbedder("openai/text-embedding-3-small")),
    store=InMemoryVectorStore(),
    reranker=KeywordReranker(weight=0.4),
)

Every component is a typing.Protocol, so Chroma/Qdrant/Pinecone stores or cross-encoder rerankers drop in behind VectorStore / Reranker. Built-ins:

Concern	Ships
Chunkers	`CharacterChunker`, `SentenceChunker`, `ParagraphChunker`
Embedders	`hashing_embedder` (offline), `LiteLLMEmbedder` (any provider), `CachingEmbedder`
Vector stores	in-memory · pgvector/Aurora · Chroma · Qdrant · OpenSearch · Oracle 23ai
Rerankers	`KeywordReranker` (lexical hybrid), `LLMReranker`, `CrossEncoderReranker`

Hybrid search (sparse + dense)¶

KnowledgeBase(hybrid=True) runs a sparse BM25 recall alongside the dense (embedding) recall and fuses the two rankings by reciprocal rank — so exact rare terms (an error code, a product SKU, a surname) that an embedding glosses over still surface, while semantic matches keep their reach.

from yaab.rag import KnowledgeBase, Document

kb = KnowledgeBase(hybrid=True)
kb.add([
    Document(text="The mitochondria is the powerhouse of the cell.", source="c"),
    Document(text="Photosynthesis converts light into chemical energy.", source="b"),
])
chunks = await kb.retrieve("mitochondria powerhouse", k=1)

Both arms over-fetch and the fused top-k is returned — order-only fusion, so the two recalls need no comparable scores. Everything else (chunking, citations, access control) is unchanged.

Production vector stores¶

All stores satisfy one VectorStore protocol and honor metadata where filters (per-tenant isolation pushes down to the DB/cluster). Pick by class or by name — see the full matrix in Storage & backends.

from yaab.rag import KnowledgeBase, PgVectorStore           # yaab-sdk[postgres]

# Postgres / Amazon Aurora PostgreSQL with pgvector:
kb = KnowledgeBase(store=PgVectorStore("postgresql://…@aurora-endpoint/db", dim=1536))

# Amazon OpenSearch Service / Serverless:           yaab-sdk[opensearch]
from yaab.rag import OpenSearchVectorStore
kb = KnowledgeBase(store=OpenSearchVectorStore(index="kb", hosts=[{"host": "...", "port": 443}]))

# Oracle Database 23ai AI Vector Search:            yaab-sdk[oracle]
from yaab.rag import OracleVectorStore
kb = KnowledgeBase(store=OracleVectorStore(dsn="...", user="...", password="..."))

# Chroma (yaab-sdk[chroma]) and Qdrant (yaab-sdk[qdrant]) likewise.

Governance features¶

Per-user / document-level access control¶

Tag documents with metadata, then filter at retrieval — so an agent run for one user never retrieves another's documents:

kb.add(Document(text="Alice's note", source="a", metadata={"user": "alice"}))
results = await kb.retrieve("note", where={"user": "alice"})   # only alice's

Wire it to the agent automatically with scope_from_deps:

tool = kb.as_tool(scope_from_deps="user")   # reads ctx.deps.user as the filter

Source citations¶

Every RetrievedChunk carries a citation (source#index); augment() prepends them so answers can attribute their context.

Incremental, dedup indexing¶

Re-ingesting unchanged content is a cheap no-op; update a source in place:

kb.add(docs)                                   # dedups by content hash
kb.reindex(new_docs, source="policy.md")       # replace one source's chunks
kb.delete(source="policy.md")                  # remove a source

Retrieval guardrails¶

Filter weak or unsafe context before it reaches the model (context-poisoning / leakage defense):

kb = KnowledgeBase(
    min_score=0.2,                               # drop weak recall
    context_guard=lambda rc: "secret" not in rc.text.lower(),
)

Faithfulness evaluation¶

Is the answer grounded in the retrieved context? RAGAS-style metrics, native:

from yaab.rag import faithfulness, context_relevance, FaithfulnessEvaluator

faithfulness(answer, chunks)        # deterministic 0–1 groundedness proxy
context_relevance(query, chunks)    # deterministic 0–1 retrieval-recall proxy
await FaithfulnessEvaluator("openai/gpt-4o").ascore(answer, chunks)  # LLM judge

These plug into the governance eval framework and the drift monitor for ongoing RAG quality tracking.

Embedding cache¶

CachingEmbedder wraps any embedder to avoid re-embedding identical text (re-indexing, repeated queries) — a recurring RAG cost sink:

emb = CachingEmbedder(LiteLLMEmbedder("openai/text-embedding-3-small"))
print(emb.hits, emb.misses)