Streaming & events¶
YAAB gives you two complementary streams:
- Token stream — text deltas for a single answering turn (chat UX).
- Event stream — typed semantic events across the whole tool loop (UIs, tracing, audit).
Token streaming¶
agent.stream(...) returns an async iterator of strings. It runs a single
answering turn (no tool loop) and respects input guardrails. Over HTTP, the
serve layer exposes this as Server-Sent Events at POST /chat/stream
(see Serving).
The semantic event stream¶
agent.stream_events(...) drives the whole tool loop and yields a typed
Event for every step — token deltas, tool calls, sub-agent transfers, and the
final result:
from yaab import EventType
async for event in agent.stream_events("What's the weather in Paris?"):
if event.type is EventType.TEXT_DELTA:
print(event.payload["delta"], end="", flush=True)
elif event.type is EventType.TOOL_CALL:
print(f"\n[calling {event.payload['name']}]")
Event types (yaab.EventType) and when the runner emits them:
| Type | When | Key payload |
|---|---|---|
RUN_START |
run begins | prompt |
USER_MESSAGE |
user turn recorded | content |
TEXT_DELTA |
token delta (only on stream_events / Runner.stream_run) |
delta |
MODEL_DELTA |
a reasoning/thinking trace arrives | reasoning |
MODEL_RESPONSE |
a model reply | content, tool_calls |
TOOL_CALL |
a tool is invoked | name, arguments |
TOOL_RESULT |
a tool returns | name, result |
AGENT_TRANSFER |
run delegated to a sub-agent | to |
FINAL_OUTPUT |
final answer produced | output |
RUN_END |
run complete | result |
ERROR |
run failed | error |
The non-streaming run/run_sync collect the same events (minus TEXT_DELTA)
into result.events, so you can audit a finished run the same way you'd watch a
live one.
Reasoning traces¶
When a provider exposes a reasoning/thinking trace (o-series, DeepSeek R1,
Anthropic extended thinking), it is captured on ModelResponse.reasoning and
emitted as a MODEL_DELTA event carrying a reasoning payload — so you can
surface the model's thinking without parsing it out of the answer.
Streaming over HTTP (SSE)¶
The FastAPI app exposes both streams:
POST /run/stream # semantic events as SSE (event: <type>, data: <json>)
POST /chat/stream # token deltas as SSE (data: <token>), terminated by [DONE]
from yaab.serve import fastapi_server_app
app = fastapi_server_app(agent) # mount with uvicorn / your ASGI server
Consume with any SSE client: