Propose-then-act: the architecture of an AI agent for production ops
Why an AI agent that acts without asking is unacceptable in production, and how propose-then-act cuts costs by ~70% while preserving the audit trail.
Propose-then-act architecture: why an AI agent must ask permission before it acts
Many AI-agent demos look captivating: the user asks for something, the agent plans, the agent executes, the result appears. In production, this flow is unacceptable. If the agent runs against real infrastructure — deploys, modifies DNS, changes firewall configurations, opens transactions with financial impact — execution without confirmation becomes, statistically, a source of incidents.
IRIS, the orchestrator agent CAI Technology operates internally for data-centre and operations work, is built on the propose-then-act architecture. In this article we explain why we chose this pattern, what it cost us to implement correctly, and what economic surprise emerged: the propose-then-act structure, combined with cost-aware routing across models, reduced our inference cost by roughly 70% compared with a naive „one big model does everything” baseline.
TL;DR
- An AI agent that acts in production without confirmation is a legal and operational liability, however good the model.
- The propose-then-act pattern separates two activities: the agent proposes a textual plan, the human approves, and only then does the concrete action run.
- The approval surface is where value emerges: the human sees what the agent would do before the action happens, not after.
- Cost-aware routing: the expensive model is used only for the plan-design phase; the cheap model handles repetitive execution; a very small local model handles state polling.
- In our internal production use, this structure cut inference cost by roughly 70% compared with a single-model architecture.
Why execution without confirmation is wrong
In a demo, an agent that „repairs an alert” in 15 seconds without human intervention looks miraculous. On real infrastructure, the scenario has three problems:
Risk problems. A language model can misinterpret the request, hallucinate context, or merge two distinct operations into one. In a demo, an error costs 15 seconds of re-running. In production, an error can mean an hour of downtime, a blocked firewall, a duplicated transaction.
Audit problems. A confirmation-less process leaves an open question in any incident: „who decided this action?” The answer „the agent” does not satisfy an audit board, a SOC 2 or ISO 27001 compliance client, or a manager who answers for operations. Human approval at decision time leaves a trace with name, date, and reasoning.
Organisational trust problems. An AI agent that „acts on its own” rapidly erodes ops-team acceptance. The first time the agent is wrong, trust is broken; the project’s sustainability depends on it. An agent that proposes and waits preserves the team’s autonomy and amplifies it.
What propose-then-act looks like in practice
The pattern has three rigidly separated phases.
Phase 1: Propose. The agent receives the user’s request, analyses context (reading from databases, inventory sources, logs), and produces a textual plan. The plan describes what will be done, in what order, with what preconditions, and the risk of each step. The plan does not execute anything.
Phase 2: Approve. The plan is presented to the user via an asynchronous channel (Telegram, web UI, email). The user replies with approval, modification, or rejection. The approval is recorded in an audit database with timestamp, user ID, plan ID. If the user requests modifications, the agent returns to Phase 1 with the updated context.
Phase 3: Act. Only after explicit approval, the agent executes the plan. Execution is step by step, with verification after each step. If a step fails unexpectedly, execution stops and returns to Phase 2 with the error report.
This separation has an important consequence: the user does not have to be a technical expert to approve or reject. The plan is described in natural language with optional technical detail. An operations manager who does not read bash commands can understand „I will add a new DNS record for subdomain X with IP Y and TTL 300” and answer yes or no with full understanding.
IRIS flow diagram
User writes on Telegram →
IRIS (orchestrator agent on premium model) receives the message →
Recognises intent and classifies complexity →
Pulls context from the database (inventory, existing configurations) →
Produces textual plan with steps, preconditions, risk →
Sends plan to Telegram for inline approval →
Waits for user response →
If APPROVED:
IRIS (now on a cheaper execution model) runs plan step by step →
Each step: controlled subprocess command, output validation →
On error: stops, reports to user, asks how to proceed →
On total success: confirms in Telegram, writes to audit DB →
If REJECTED:
IRIS asks user for additional context →
Reformulates plan and returns to approval step
State polling in background:
Very small local model reads server status, alerts the user
only on anomalies, generates no plans
The important detail: a single agent does not run on a single model. Different phases use different models, chosen for the cost/quality ratio of the task at hand.
Cost-aware routing: why it cut the bill
Naively, we could run everything on a powerful premium model. The cost would scale linearly with daily invocations. For an ops agent serving dozens of requests per day plus continuous polling, the bill rises fast.
Cost-aware routing exploits the observation that different workflow phases have different cognitive requirements:
Design phase (propose). Requires real reasoning: understanding an ambiguous request, navigating documentation, inferring system state, generating a structured plan. Here a powerful premium model is worth the cost. Called rarely (dozens of invocations per day).
Execution phase. Once the plan is approved, execution is a sequence of structured commands with validation at every step. The logic is largely deterministic; the language model is used only for parsing outputs, recognising error conditions, and formatting reports. Here a mid-tier model is perfectly adequate. Called frequently (hundreds of invocations per execution).
State polling. Background, reads status, decides whether the user should be alerted. Low cognitive demand. Here a very small local model running on our own infrastructure, with no per-call cost, is enough. Called constantly (thousands of invocations per hour).
In our internal configuration, cost distribution shifts as follows:
- Mono-model premium: 100% volume, cost 100% (baseline)
- Cost-aware routing: 5% volume on premium, 25% on mid-tier, 70% on local; aggregated cost ~30%
The roughly 70% reduction is not a projected estimate; it is measured over three months of internal operation. Aggregated quality (approved plan success rate, execution error rate, qualitative user satisfaction) is indistinguishable from the mono-model variant.
Three implementation traps
Trap 1: automated confirmation. After a few weeks, users start auto-approving everything the agent sends, without reading the plan. Approval becomes an empty step. Our solution: for high-impact actions (DNS, firewall, deploy), the agent refuses simple „ok” approval and requires a specific phrase confirming the action. For low-impact actions, „ok” is enough. Categorisation is per action, not per agent.
Trap 2: over-detailed plan. Early iterations produced 40-line plans listing every bash command. Users did not read them. We narrowed plans to 5–10 lines at user-action abstraction level („I will add a DNS record”), with technical detail available on demand. Approval is given on action understanding, not command syntax.
Trap 3: cheap model does propose. An obvious temptation: if the cheap model can execute, can it also plan? Answer: no. Plan quality determines execution quality. A vague or wrong plan consumes human time at approval/rejection and introduces operational risk. Paying for a premium model at propose time amortises immediately.
Questions for a CTO evaluating an AI agent
Three checks before accepting an AI agent in production:
-
„Does it ask for permission before executing impact actions?” If the answer involves „trust the model”, reconsider. Audit demands a human signature, not a model probability.
-
„Do you have a per-action audit log?” Who approved, when, on what data. Operational logs alone are insufficient; the link between action and human approver must exist.
-
„How do you manage cost at scale?” An agent that answers rarely on a premium model works in a pilot. At 10x scale, the bill becomes visible. Cost-aware routing is a maturity practice.
Operational conclusion
The propose-then-act pattern is not a restriction on an AI agent’s capability. It is the correct shape of an AI agent participating in real operations. The team stays in control, the agent accelerates, the audit is intact. The economic advantage — reduced cost through cost-aware routing — is a pleasant side effect, not the primary motive.
IRIS has been running internally for months. Our team prefers working with it over without it; our internal audit is more detailed than before; human queries with no impact (status, info, lookup) are served instantly without consuming the orchestrator’s time. The decision to keep a human in the loop did not slow operations; it consolidated them.
Related articles
- Anti-hallucination for legal chatbots
- Why our HG907 quotation engine uses no LLM at all
- Pillar IRIS — the CAI Technology orchestrator agent
External sources
- NIST AI Risk Management Framework — Govern function — governance of automated decisions
- Anthropic Constitutional AI — design context for constrained agents
- OWASP Top 10 for LLM Applications — LLM application risks, including „LLM06: Sensitive Information Disclosure” and „LLM08: Excessive Agency”
- ISO/IEC 42001:2023 — AI Management Systems — the recent AI governance standard
Next step
If your team is evaluating how to introduce an AI agent into operations without losing control, we offer a 30-minute technical consultation at no cost.