Published on

> Building the Self-Driving Entity

Authors

Modern agent frameworks let us go beyond "assistants that answer" to entities that operate. A self-driving entity is an AI system that:

  1. Perceives its environment and context
  2. Plans across time and uncertainty
  3. Acts through tools and services with guardrails
  4. Learns from outcomes to improve future behavior
  5. Explains what it did and why

This post offers a system design you can ship, not just a metaphor.

The Agent Stack (from signals to outcomes)

Layer 0 — Interfaces Events, webhooks, cron, chat, tickets, emails, API calls. Normalize all triggers into a common Intent object (who, what, when, constraints, priority, risk tier).

Layer 1 — Perception Ingest the state needed to decide: recent events, relevant docs/records, feature flags, calendars, prices, ledgers, etc. Use retrieval pipelines to construct a minimal, verifiable context ("context diet" principle).

Layer 2 — Memory

  • Working memory: scratchpad for the current episode.
  • Episodic memory: append-only log of interactions and actions.
  • Semantic memory: durable knowledge (KB/graph + vector index).
  • Skills memory: reusable plans/tools (macros) discovered from past success.

Layer 3 — Reasoning & Planning Planner selects a strategy (single-shot, multi-step plan, debate, tool search). Supports time horizons (T+0 execution vs. T+N with deadline-aware scheduling).

Layer 4 — Action & Tooling Capability registry with typed tool contracts. Every tool declares:

  • Inputs/outputs and schemas
  • Side-effect class (read-only, idempotent, transactional)
  • Risk tier & required approvals
  • Cost profile and latency SLO

Layer 5 — Learning Close the loop with reflection (self-critique), reward shaping (did the outcome match the goal?), dataset curation (promote good traces), and skill extraction (convert recurring successful traces into named skills).

Layer 6 — Governance & Safety Policies, permissions, audit trails, human-in-the-loop gates, sandboxes, and rollback. Treat every action as a signed, explainable event.

Layer 7 — Observability & Economics Traces, metrics, cost per outcome, regret/rollback rate, autonomy ratio, and an autonomy budget that limits spend per task or per day.

Key idea: Plot outcomes, not tokens. Your north star is time-to-resolution at target quality, with bounded risk and cost.

Autonomy Levels (adapted from SAE for vehicles)

  • L0 — Advisor: Suggests actions; never executes.
  • L1 — Assisted: Executes read-only tools; proposes write ops.
  • L2 — Co-Pilot: Executes low-risk writes in sandbox; requires human review for medium/high.
  • L3 — Bounded Autonomy: Executes within capsules (pre-approved scopes: systems, data, spend). Automatic rollback on failure.
  • L4 — Domain Autonomy: Runs entire workflows end-to-end inside a domain (e.g., support triage→resolution) with exception escalation.
  • L5 — Open-World: Cross-domain autonomy with dynamic capability discovery and governance (rare, aspirational in production).

Use gates to promote between levels based on metrics, tests, and incident history.

Core Design Principles

  1. Context diet → retrieve only what the plan requires; prefer facts over prose; prefer tables over blobs.
  2. Typed actions → tools are contracts with validation; never free-form APIs.
  3. Plan, then act → require plans to be simulated before execution.
  4. Progressive disclosure of power → capabilities unlock with proven reliability.
  5. Safety-first economics → budget and risk are first-class inputs, not afterthoughts.
  6. Explainability by construction → every decision has a traceable rationale and references.
  7. Data flywheel → good traces become skills; skills update the planner's policy.

Coordination Patterns

  • Manager ↔ Workers: A manager agent decomposes tasks; workers own tools.
  • Router: Lightweight gatekeeper routes intents to the best specialist.
  • Debate / Critic: Two planners propose; a critic selects/edits the plan.
  • Market of Agents: Specialists bid with expected utility; orchestrator picks.
  • Memory-centric: One agent with strong memory/skills—not always multi-agent.

Pick the simplest pattern that meets the objective; multi-agent is a means, not a badge.

Reference Architecture (Mermaid)

flowchart TD
  E[External Events / User] --> I[Intent Normalizer]
  I --> P[Planner]
  M[(Memory Layer)] --> P
  P -->|Plan| S[Simulator]
  S -->|OK| G{Policy+Budget Gate}
  G -- approved --> A[Actuator / Tool Runner]
  A --> R[Results]
  R --> C[Critic / Reflection]
  C --> M
  R --> O[Observer]
  G -- needs approval --> H[Human Gate]
  H --> A

Minimal Agent Loop (framework-agnostic pseudocode)

while True:
    intent = receive_intent()
    ctx = retrieve_context(intent)
    plan = planner.propose(intent, ctx)
    sim = simulate(plan, ctx)
    if not sim.ok:
        plan = planner.revise(plan, sim.feedback)
    gate = policy.check(plan, budget=intent.budget, risk=intent.risk)
    if gate.requires_human: plan = human.review(plan)
    result = tools.execute(plan)
    critique = critic.review(intent, plan, result)
    learn.update_traces(intent, plan, result, critique)
    report.emit(trace(intent, plan, result, critique))

Drop-in frameworks (examples): LangGraph (graph-orchestrated agents), AutoGen (dialog-centric multi-agent), CrewAI (role-based teams), LlamaIndex Agents, Haystack Agents. Choose based on ergonomics, tracing, and tool typing support.

Governance, Risk & Controls (GRC)

Risk tiers: R0 read-only; R1 idempotent writes; R2 transactional writes with auto-rollback; R3 irreversible writes (require human gate).

Controls: allow-lists, capability tokens, environment sandboxes, PII redaction, egress filters, rate limits, kill-switch.

Audit: immutable action ledger with references to inputs, plan, tool calls, and outcomes.

Metrics That Matter

  • Task success rate (at target quality)
  • Time-to-resolution (p50/p95)
  • Regret/rollback rate ("we had to undo it")
  • Autonomy ratio (# tasks fully automated / total)
  • Memory hit rate and retrieval precision
  • Tool success rate and latency
  • Unit economics: $/successful outcome

Tie promotions between autonomy levels to these metrics.

Implementation Checklist

Week 0–1: Shape the problem

  • Define top 3 outcome metrics and guardrails.
  • Inventory tools with typed contracts and risk tiers.
  • Build a golden-path test set (10–20 real tasks with ground truth).

Week 2–3: Build the loop

  • Implement intent normalization and retrieval pipelines (context diet).
  • Ship planner→simulate→gate→act→reflect loop with tracing.
  • Add human gate + rollback for R2+ actions.

Week 4–6: Earn autonomy

  • Roll out L1→L2 on low-risk tasks; collect traces.
  • Start skill extraction from repeated successes.
  • Add budgets, alerts, and dashboards.

Week 7+: Scale & harden

  • Promote to L3 in bounded capsules; run chaos and red-team scenarios.
  • Introduce multi-agent only where single-agent saturates.

Failure Modes & How to Avoid Them

  • Hallucination by over-context → enforce context diet + tool-first design.
  • Action drift (doing more than asked) → strict schemas & policy gates.
  • Silent failures → required tracing and anomaly alerts on drop in success rate.
  • Data exfiltration → egress filters, redaction, capability scoping.
  • Tool flakiness → idempotency keys, retries with backoff, circuit breakers.

Where to Start (three pragmatic wedges)

  1. Close the loop on a single workflow (e.g., ticket triage→resolution or invoice validation→posting).
  2. Automate investigations first (diagnostics, not writes).
  3. Create a skills library from recurring successful traces.

One-Paragraph Summary

A self-driving entity is an agentic system that pairs a typed action surface with plan–simulate–gate–act–learn loops, governed by policy and budgets, and powered by memory. Start with bounded autonomy on a narrow workflow, earn trust with metrics and auditability, and promote autonomy level by level. Multi-agent patterns are tools, not goals. The result is software that not only acts—but improves with every run.