CAI Technology
Menu ☰
aegis · · 12 min read

AI incident analysis with a local LLM: triage from 30 minutes to 30 seconds

SOC pipeline: alert → context from a similar-incidents DB → AI narrative → next-action proposed. How an open-source 30B-class LLM delivers quality triage without sending logs to the cloud.

CAI Technology · Last reviewed: 4/30/2026
AI incident analysis with a local LLM: triage from 30 minutes to 30 seconds

AI incident analysis with a local LLM: triage from 30 minutes to 30 seconds

TL;DR

A modern SecOps stack (Wazuh + Suricata + Zeek + Falco + observability metrics) generates 1,000-10,000 alerts/day for a 200-500 server estate. Without automated triage, the operator reads 100-500 significant alerts per day, each requiring 15-45 minutes of investigation. The arithmetic does not work — either you over-staff the SOC, or you ignore alerts and miss real attacks.

This article describes the AI incident analysis pattern we use in AEGIS, its architecture, and why a local LLM (not a frontier API) is the right choice.

The problem: alert fatigue

Public studies (Ponemon, IBM Cost of Data Breach) show SOCs miss attacks because analysts are overwhelmed. Alert volume exceeds human processing capacity. The typical reaction:

  1. Threshold raise — raise the minimum severity you respond to. You miss attacks below the threshold.
  2. Suppress — block noisy rules. You miss the attacks you were ignoring.
  3. Outsourcing — outsource triage to an MSP. You lose business-specific context.

None is good. All sacrifice sensitivity or context.

The correct solution: keep sensitivity high (all important alerts come in) but automate first-line triage. The operator sees the AI summary plus raw alerts, not every alert in detail.

The pattern: 5 stages

┌──────────────────────────────────────────────────────────────────┐
│                   AI INCIDENT ANALYSIS PIPELINE                   │
│                                                                   │
│  1. ALERT       2. CONTEXT      3. RAG          4. LLM            │
│     Suricata,   Aggregator      Similar         Triage:           │
│     Wazuh,      collects        incidents       severity,         │
│     Falco,      logs in time    from history    type,             │
│     Prometheus  window for      DB              recommendation    │
│                 src/dst                                           │
│                                                                   │
│       ▼            ▼               ▼                ▼             │
│       └────────────┴───────────────┴────────────────┘             │
│                                                                   │
│  5. OPERATOR DASHBOARD                                            │
│     - Original alert                                              │
│     - AI narrative                                                │
│     - Recommended action (block, isolate, ignore, escalate)       │
│     - Operator acts: approve / modify / reject                    │
└──────────────────────────────────────────────────────────────────┘

Stage 1: alert

The primary source. Suricata emits an alert on signature match. Wazuh emits an alert on escalated HIDS rule. Prometheus alerts on metric anomaly. Falco alerts on runtime container event.

All alerts enter Alertmanager (or equivalent), which routes them to the AI pipeline.

Standard format: JSON with mandatory fields — timestamp, source (suricata/wazuh/falco/prometheus), severity, host, src_ip, dst_ip, description, raw_event.

Stage 2: context aggregator

A FastAPI receives the alert and aggregates context from the last N minutes (typically 30-60) for involved hosts and IPs:

Output: structured context, ~5-15 KB JSON.

Stage 3: RAG with similar-incidents DB

This is the differentiator versus a simple LLM call. We have a database of historical incidents — each with the original alert, context, AI narrative, operator decision, and final outcome (real attack vs false positive).

For a new alert, we run similarity search: find 3-5 similar incidents from history. “Similar” = embedding similarity on alert description + IP/host pattern.

Output: 3-5 similar incidents with their narratives and outcome. These enter the LLM prompt as few-shot examples.

The incidents DB is built in the first 3-6 months of operation. Each operator-triaged incident is saved with the final decision — as the DB grows, AI triage quality grows.

Stage 4: LLM triage

The LLM prompt:

You are a senior SOC analyst. You receive a SIEM alert with full context and
similar historical incidents. Produce structured triage:

1. Estimated SEVERITY (Low/Medium/High/Critical) with 1-sentence rationale.
2. Attack TYPE (brute-force / RCE / exfiltration / persistence / scan / other).
3. STATE (active / contained / past).
4. NARRATIVE (3-5 sentences, explanation for the operator).
5. RECOMMENDATION (block IP / isolate host / ignore / escalate / investigate).

ALERT:
{alert_json}

CONTEXT (last 60 min):
{context_json}

SIMILAR INCIDENTS:
{similar_incidents}

The LLM produces structured output (JSON) in 2-5 seconds.

Model used. Modern open-source 30-70B-parameter class (Qwen3 family, Llama 3.x, or equivalent), fine-tuned on the internal incident corpus. Runs on an enterprise GPU (A100/H100 or open equivalent). Latency: 2-5 seconds end-to-end.

Why not a frontier API. Two reasons: log confidentiality (see On-premise SIEM with a local LLM) and cost (10K alerts/day × 5K tokens/alert = 50M tokens/day, prohibitive for a frontier API).

Stage 5: operator dashboard

The dashboard shows a card with:

The operator reads the narrative, validates against the original alert, decides. For ~70% of alerts (the clear ones), they accept the AI recommendation directly. For 25% they modify (different severity, different action). For 5% they reject completely (false positive that AI did not catch).

Triage quality in production

In current production (3 months of operation after the pilot phase):

Compared with manual triage (baseline estimate):

For a 200-server estate with 800 significant alerts/day, that is the difference between needing 3 SOC analysts and 1 SOC analyst.

Pitfalls

AI hallucination. LLMs can invent details. Mitigation: prompt requires only structured fields, output schema validation, original alert presented IN PARALLEL with the narrative (operator sees both).

Bias in incidents DB. If in early months the operator triaged certain types incorrectly, AI learns the mistakes. Mitigation: quarterly review of the DB, manual labeling of edge cases.

Quality drift. Attacks evolve. Old patterns disappear, new patterns appear. Mitigation: quarterly re-training with fresh data, plus a safety rule — alerts with no similarity to history go straight to the operator without AI pre-triage.

GPU infrastructure. Enterprise GPU costs. Mitigation: a single GPU serves 1,000-5,000 inferences/hour — sufficient capacity for a 500-1,000 server estate.

Interpretability. “Why did AI say it is Critical?” — the operator needs to see the reasoning. Mitigation: prompt explicitly requires “1-sentence rationale” for each field, plus links to similar incidents used as context.

Integration with Wazuh Active Response

The AI triage pattern is complementary to Wazuh Active Response. AR auto-acts on clear rules (SSH brute, AV signature match) — there is no point waiting for the LLM. AI triage adds value on ambiguous alerts, where the decision requires context and reasoning.

Combined scheme:

Next steps

Implementing the pattern in a AEGIS stack takes 2-3 months: GPU + LLM deployment, Alertmanager integration, building the context aggregator, the first 3 months of incidents DB, fine-tuning. See AEGIS for the full roadmap or contact for a technical session.

Related: On-premise SIEM with a local LLM · Wazuh Active Response patterns · Suricata + Zeek coexistence · Propose-then-act Iris.

References

We start with a 30-minute conversation.

Free AI-readiness audit for companies with 50+ employees. We reply within 24 hours.