158 AI agents on a single SEAP bid: anatomy of a procurement pipeline
A SEAP bid is not solved with one big GPT prompt. Bid365 runs 158+ specialised agents across 6 systems with dual-pass QA and 11 HITL gates. Here is the architecture and why it matters.
158 AI agents on a single SEAP bid: anatomy of a procurement pipeline
The question we receive most often from IT directors at large firms: “Why do you need 158 AI agents? Can’t GPT-5 read the documentation and write the bid?”
The honest technical answer is: GPT-5 or Claude Opus 4.7 can write something. But that “something” will not pass the legal evaluator at a Romanian contracting authority on SEAP. It will not respect art. 210 of Law 98/2016. It will not bind quantity to team to execution Gantt. And, when the evaluator asks for a technical contestation on the mathematical scoring, the big prompt collapses — because it lacks requirement-to-response traceability.
Bid365 is our automation platform for Romanian public procurement. This article explains why our architecture has 158+ specialised agents across 6 systems, how they coordinate, and which trade-offs we have consciously accepted.
TL;DR
- A complete SEAP bid has three envelopes (administrative, technical, financial) with dozens of interdependent requirements. A single LLM cannot maintain cross-document consistency.
- 158+ specialised agents in Bid365, grouped into 6 systems: OF2 (bids), CS (specifications), SF (feasibility studies), PT (technical projects), CK (integrity checker), HG907 cost engine.
- Dual-pass QA: deterministic plus semantic. 11+ mandatory HITL gates (a human approves every irreversible decision).
- Accepted trade-off: longer latency (10-30 minutes per complete bid) in exchange for verifiable quality and complete auditability.
Why one LLM is not enough
A SEAP bid for a medium-complexity construction work (300,000 EUR) typically contains:
- 60-90 requirements in the Specifications, each with its own admissibility condition.
- 12-25 legal requirements (art. 164 Law 98/2016, DUAE declarations, ANAF certificates, etc.).
- Financial proposal with HG 907/2016 + Annex 8 cost breakdown (43 chapters, ~150 cost articles).
- Team with 8-15 experts, each with verified CVs.
- Gantt execution chart over 6-18 months.
- Approximately 100-200 pages of documents, plus annexes.
If you write all of that into a single prompt and ask an LLM to solve it, you get what is called plausible output. A text that looks correct but breaks under spot-checking — for example the total quantity in the F3 cost form does not match the F6 physical-value chart, or the proposed experts do not cover all key personnel requirements.
The problem is not the LLM’s capacity, but the fact that any serious bid contains interdependent constraints that cannot be solved in one pass. Change one quantity, you must update the recap, the cash flow, the physical-value chart, and sometimes the team. A multi-agent orchestrator handles those interdependencies explicitly through dependency graphs; a monolithic prompt ignores them.
The six systems
Bid365 operates six agent systems, each independent but orchestrated from a central coordination layer.
OF2 — Bid generator (31 agents, 9 layers)
The flagship system. Automatically generates complete bids in the SEAP three-envelope structure.
L0 — Master orchestrator
L1 — Input and analysis
OF2-02 Advanced intake (CS requirement extraction)
OF2-03 Agentic RAG (consults 9 vector collections)
OF2-04 Project reconstructor
L2 — Evaluation and intelligence
OF2-05 Gap analysis ────► HITL gate (GO / NO-GO)
OF2-06 Scoring reverse engine
OF2-07 Competition simulator (3 bidder profiles)
OF2-30 Fiscal-financial law expert
OF2-31 Public procurement law expert
L3 — Strategy
OF2-08 Strategy & win maximization ────► HITL gate
OF2-09 Compliance pre-check (ANAF)
L4 — Costs
OF2-10 Real cost engine
OF2-11 Market price validator
OF2-12 Margin strategy optimizer
L5 — Content generation (parallel)
OF2-13 Technical proposal
OF2-14 Financial proposal
OF2-15 Team and expert profiles
OF2-16 Schedule and methodology
OF2-17 Documents and declarations
L6 — Scoring
OF2-18 Mathematical scoring engine
OF2-19 Scoring optimizer (iterative)
L7 — Simulated evaluation (parallel)
OF2-20 Eval committee orchestrator
OF2-21 Strict technical evaluator
OF2-22 Financial evaluator
OF2-23 Legal evaluator
OF2-24 Sceptical evaluator (adversarial)
L8 — QA and assembly
OF2-25 Meta supervisor (cross-agent consistency)
OF2-26 Global QA enhanced (10 dimensions)
OF2-27 Stress test & Monte Carlo
OF2-28 Bid compilation (3 SEAP envelopes)
OF2-29 Critic & self-optimization ────► HITL gate
└─ loop max 3 iterations if score < 85%
Notice the design: layers L1-L4 do analysis and strategy, L5 generates in parallel along five distinct dimensions, L6-L7 do scoring and adversarial self-evaluation, L8 does final assembly. At three points (gap analysis, strategy, final critic), a human validates and authorises continuation.
CK — Integrity checker (20 agents, 9 layers)
Sits above all other systems. Operates in three modes: contestation (when a bid loses and the client wants to challenge), defence (when the client is contested), full audit (procedure verification at the contracting authority’s request).
CK-04 Document Normalizer builds a Unified Document Model. CK-05 Requirement Graph Builder turns the CS into an oriented requirement graph. CK-06 Traceability checks, through semantic similarity, that every CS requirement has an explicit answer in the bid. CK-10 Collusion Detector looks for ten collusion indicators (bids with identical patterns, anomalous pricing, correlated beneficial owners). CK-18 CNSC/Court Outcome Simulator runs 500 Monte Carlo scenarios to estimate the probability that a contestation is admitted.
CS, SF, PT — Technical documentation generators
CS (Specifications) — 20 agents across seven layers. Generates specifications validated quad-dimensionally (technical, legal, market, historical).
SF (Feasibility Studies) — 17 agents across five layers. Generates SF/DALI with full ACB per HG 907/2016.
PT (Technical Projects) — 15 agents across five layers. Generates technical projects for IT and construction.
HG907 cost engine — pure computational
This one is intentionally no LLM — pure computational engine with Decimal ROUND_HALF_UP at two decimals. Generates cost breakdowns 100% compliant with HG 907/2016 + Annex 8 + Fiscal Code art. 220¹. Produces DOCX, XLSX with live formulas, PDF.
Why no LLM? Because a one-leu error on a 300,000 EUR bid can disqualify it. Financial arithmetic cannot be entrusted to a probabilistic model. EligibilityRouter decides whether the tender requires HG907 (works) or a simple summary table is enough (servers, software). LegalValidator has 14 validation rules (hg907.*, math.*, fiscal.*, annex8.*).
Verified end-to-end on real bids: the pipeline reproduces the F3 recap to the leu — T1 = 13,750.84 → T4 = 16,530.76 → Total with VAT 316,762.97. All 14 rules PASS.
Coordination through layers
All six systems coordinate through a four-layer hierarchy:
Layer 1 — Coordinator. The OF2-01 master orchestrator (or equivalent in other systems) manages the dependency graph between agents. Decides what runs when, what input goes into which agent, when control hands over to a HITL gate.
Layer 2 — Generator. Agents that produce actual content: technical proposals, financial proposals, declarations, charts. Each specialised on a single dimension.
Layer 3 — Validator. Agents that verify what layer 2 produced. Legal, technical, financial, sceptical (adversarial — actively look for problems) validators. Bid365 has 40+ dedicated QA agents.
Layer 4 — HITL gate. The point at which the decision rises to a human operator. The eleven gates are: GO/NO-GO post-gap-analysis, strategy validation, cost validation, team variant choice, schedule approval, market price validation, critic iteration decision, technical proposal signature, financial proposal signature, DUAE signature, final submission.
Nobody at our company believes an AI can sign a SEAP bid on behalf of a firm. The operator keeps legal responsibility; the AI cuts working time from days to minutes while preserving traceability of every decision.
RAG-AGI+ v2.0 — knowledge layers
All agents consume knowledge through our hybrid RAG pipeline. Modern open-source stack: vector database, multilingual 1024-dimension embedding, hybrid dense + sparse retrieval with RRF (Reciprocal Rank Fusion), CRAG (corrective RAG), HyDE (Hypothetical Document Embeddings), reranker on results.
Knowledge volumes:
- 300,000+ vectors indexed across eight collections.
- 2.8M+ legal documents (legislation + Official Gazette + Supreme Court + CNSC).
- 32,097 monitored tenders (PNRR 19,807 + EU Funds 1,167 + SEAP 11,047 + TED 76).
- 8,000+ public beneficiaries with automatic enrichment from ONRC + ANAF.
The pipeline runs daily cron: 04:00 PNRR fetch, 04:30 EU Funds, 02:30 SEAP delta 30 days, 03:30 TED Romania, 05:00 lifecycle manager, 06:00 award + winning firm, 07:00 ingestion of new documents into the vector DB.
Trade-offs we have accepted
Higher latency. A complete bid runs 10-30 minutes end-to-end (depending on the size of the CS). A big GPT prompt may produce a draft in 2-3 minutes. The trade-off pays off: the time saved on manual verification of a 158-agent bid < the time saved on manual verification of a monolithic LLM bid, because in our case the final verification is direct (HITL gates have already verified the key points).
Higher compute cost. 158 agents × ~100K tokens average = ~15M tokens per bid. On open-source models run locally (our stack uses open-source LLMs running on our EU-resident infrastructure), cost per token is far below external markets. On GPT-4 or Claude over API, it would be prohibitive.
Maintenance complexity. 158 agents means 158 prompts to update when a law changes. We have a prompt versioning system with regression testing on historical bids. The change to Law 98/2016 through OUG 19/2024 required updates to 23 prompts; regression tests caught 4 incompatibilities in the first run.
Dependency on trained HITL operators. An operator who clicks “approve” on every HITL gate without reading equals a system worse than a monolithic LLM. We invest in operator training: two-day instruction on the pipeline architecture, what to verify at each gate, what is a red flag.
Multi-agent vs monolithic prompt pattern
In short, when do you use multi-agent vs a single large LLM?
Use multi-agent when:
- The output has interdependent constraints that can be mechanically verified (cost breakdown, scoring, legal compliance).
- There are points where a human must decide (GO/NO-GO, variant choice).
- You need full auditability (who decided what, when, with what input).
- Quality matters more than first-draft speed.
Use a monolithic prompt when:
- The output is descriptive or creative (an email, an article, a summary).
- Final verification is manual on free text anyway.
- There are no mathematical or legal constraints that create incompatibilities between parts of the output.
Bid365 is clearly in the first category. A blog article (like this one) is clearly in the second.
Related articles
- HG907 quotation engine without LLM: deterministic-legal precision
- Propose-then-act: the architecture of an AI agent for production ops
- Citation grounding: implementing a 4-gate pipeline
- Cost-aware LLM routing: how to cut 70% of the bill while keeping quality
- Pillar Bid365 — public procurement automation
Next steps
If your firm works with SEAP, PNRR, or EU Funds and you want to evaluate bid pipeline automation — the Bid365 page has the full specs, a live demo, and a 30-day pilot plan. Or write to contact for a technical discussion.
Related reading: Pre-production hardening (how we test platforms internally before production) · Bid365 page · Iris page (the AI orchestrator that coordinates agents).