- Published on
> Building the Self-Driving Entity
- Authors
- Name
- Fred Pope
- @fred_pope
Modern agent frameworks let us go beyond "assistants that answer" to entities that operate. A self-driving entity is an AI system that:
- Perceives its environment and context
- Plans across time and uncertainty
- Acts through tools and services with guardrails
- Learns from outcomes to improve future behavior
- Explains what it did and why
This post offers a system design you can ship, not just a metaphor.
The Agent Stack (from signals to outcomes)
Layer 0 — Interfaces Events, webhooks, cron, chat, tickets, emails, API calls. Normalize all triggers into a common Intent object (who, what, when, constraints, priority, risk tier).
Layer 1 — Perception Ingest the state needed to decide: recent events, relevant docs/records, feature flags, calendars, prices, ledgers, etc. Use retrieval pipelines to construct a minimal, verifiable context ("context diet" principle).
Layer 2 — Memory
- Working memory: scratchpad for the current episode.
- Episodic memory: append-only log of interactions and actions.
- Semantic memory: durable knowledge (KB/graph + vector index).
- Skills memory: reusable plans/tools (macros) discovered from past success.
Layer 3 — Reasoning & Planning Planner selects a strategy (single-shot, multi-step plan, debate, tool search). Supports time horizons (T+0 execution vs. T+N with deadline-aware scheduling).
Layer 4 — Action & Tooling Capability registry with typed tool contracts. Every tool declares:
- Inputs/outputs and schemas
- Side-effect class (read-only, idempotent, transactional)
- Risk tier & required approvals
- Cost profile and latency SLO
Layer 5 — Learning Close the loop with reflection (self-critique), reward shaping (did the outcome match the goal?), dataset curation (promote good traces), and skill extraction (convert recurring successful traces into named skills).
Layer 6 — Governance & Safety Policies, permissions, audit trails, human-in-the-loop gates, sandboxes, and rollback. Treat every action as a signed, explainable event.
Layer 7 — Observability & Economics Traces, metrics, cost per outcome, regret/rollback rate, autonomy ratio, and an autonomy budget that limits spend per task or per day.
Key idea: Plot outcomes, not tokens. Your north star is time-to-resolution at target quality, with bounded risk and cost.
Autonomy Levels (adapted from SAE for vehicles)
- L0 — Advisor: Suggests actions; never executes.
- L1 — Assisted: Executes read-only tools; proposes write ops.
- L2 — Co-Pilot: Executes low-risk writes in sandbox; requires human review for medium/high.
- L3 — Bounded Autonomy: Executes within capsules (pre-approved scopes: systems, data, spend). Automatic rollback on failure.
- L4 — Domain Autonomy: Runs entire workflows end-to-end inside a domain (e.g., support triage→resolution) with exception escalation.
- L5 — Open-World: Cross-domain autonomy with dynamic capability discovery and governance (rare, aspirational in production).
Use gates to promote between levels based on metrics, tests, and incident history.
Core Design Principles
- Context diet → retrieve only what the plan requires; prefer facts over prose; prefer tables over blobs.
- Typed actions → tools are contracts with validation; never free-form APIs.
- Plan, then act → require plans to be simulated before execution.
- Progressive disclosure of power → capabilities unlock with proven reliability.
- Safety-first economics → budget and risk are first-class inputs, not afterthoughts.
- Explainability by construction → every decision has a traceable rationale and references.
- Data flywheel → good traces become skills; skills update the planner's policy.
Coordination Patterns
- Manager ↔ Workers: A manager agent decomposes tasks; workers own tools.
- Router: Lightweight gatekeeper routes intents to the best specialist.
- Debate / Critic: Two planners propose; a critic selects/edits the plan.
- Market of Agents: Specialists bid with expected utility; orchestrator picks.
- Memory-centric: One agent with strong memory/skills—not always multi-agent.
Pick the simplest pattern that meets the objective; multi-agent is a means, not a badge.
Reference Architecture (Mermaid)
flowchart TD
E[External Events / User] --> I[Intent Normalizer]
I --> P[Planner]
M[(Memory Layer)] --> P
P -->|Plan| S[Simulator]
S -->|OK| G{Policy+Budget Gate}
G -- approved --> A[Actuator / Tool Runner]
A --> R[Results]
R --> C[Critic / Reflection]
C --> M
R --> O[Observer]
G -- needs approval --> H[Human Gate]
H --> A
Minimal Agent Loop (framework-agnostic pseudocode)
while True:
intent = receive_intent()
ctx = retrieve_context(intent)
plan = planner.propose(intent, ctx)
sim = simulate(plan, ctx)
if not sim.ok:
plan = planner.revise(plan, sim.feedback)
gate = policy.check(plan, budget=intent.budget, risk=intent.risk)
if gate.requires_human: plan = human.review(plan)
result = tools.execute(plan)
critique = critic.review(intent, plan, result)
learn.update_traces(intent, plan, result, critique)
report.emit(trace(intent, plan, result, critique))
Drop-in frameworks (examples): LangGraph (graph-orchestrated agents), AutoGen (dialog-centric multi-agent), CrewAI (role-based teams), LlamaIndex Agents, Haystack Agents. Choose based on ergonomics, tracing, and tool typing support.
Governance, Risk & Controls (GRC)
Risk tiers: R0 read-only; R1 idempotent writes; R2 transactional writes with auto-rollback; R3 irreversible writes (require human gate).
Controls: allow-lists, capability tokens, environment sandboxes, PII redaction, egress filters, rate limits, kill-switch.
Audit: immutable action ledger with references to inputs, plan, tool calls, and outcomes.
Metrics That Matter
- Task success rate (at target quality)
- Time-to-resolution (p50/p95)
- Regret/rollback rate ("we had to undo it")
- Autonomy ratio (# tasks fully automated / total)
- Memory hit rate and retrieval precision
- Tool success rate and latency
- Unit economics: $/successful outcome
Tie promotions between autonomy levels to these metrics.
Implementation Checklist
Week 0–1: Shape the problem
- Define top 3 outcome metrics and guardrails.
- Inventory tools with typed contracts and risk tiers.
- Build a golden-path test set (10–20 real tasks with ground truth).
Week 2–3: Build the loop
- Implement intent normalization and retrieval pipelines (context diet).
- Ship planner→simulate→gate→act→reflect loop with tracing.
- Add human gate + rollback for R2+ actions.
Week 4–6: Earn autonomy
- Roll out L1→L2 on low-risk tasks; collect traces.
- Start skill extraction from repeated successes.
- Add budgets, alerts, and dashboards.
Week 7+: Scale & harden
- Promote to L3 in bounded capsules; run chaos and red-team scenarios.
- Introduce multi-agent only where single-agent saturates.
Failure Modes & How to Avoid Them
- Hallucination by over-context → enforce context diet + tool-first design.
- Action drift (doing more than asked) → strict schemas & policy gates.
- Silent failures → required tracing and anomaly alerts on drop in success rate.
- Data exfiltration → egress filters, redaction, capability scoping.
- Tool flakiness → idempotency keys, retries with backoff, circuit breakers.
Where to Start (three pragmatic wedges)
- Close the loop on a single workflow (e.g., ticket triage→resolution or invoice validation→posting).
- Automate investigations first (diagnostics, not writes).
- Create a skills library from recurring successful traces.
One-Paragraph Summary
A self-driving entity is an agentic system that pairs a typed action surface with plan–simulate–gate–act–learn loops, governed by policy and budgets, and powered by memory. Start with bounded autonomy on a narrow workflow, earn trust with metrics and auditability, and promote autonomy level by level. Multi-agent patterns are tools, not goals. The result is software that not only acts—but improves with every run.