aegis · April 30, 2026 · 12 min read

AI incident analysis with a local LLM: triage from 30 minutes to 30 seconds

SOC pipeline: alert → context from a similar-incidents DB → AI narrative → next-action proposed. How an open-source 30B-class LLM delivers quality triage without sending logs to the cloud.

CAI Technology · Last reviewed: 4/30/2026

AI incident analysis with a local LLM: triage from 30 minutes to 30 seconds

TL;DR

Manual triage of a SOC alert typically takes 15-45 minutes (read logs, correlate with similar incidents, evaluate severity, write summary).
The AI incident analysis pattern automates this phase: alert → context from history DB → AI narrative → next-action proposed → operator validates.
With an open-source 30-70B-parameter LLM run locally (enterprise GPU), triage drops to 30 seconds per alert.
The operator receives a ready-made summary and approves/rejects — their attention is used only for irreversible decisions.
At 1,000+ alerts/day, the pattern reduces alert fatigue and lets a small SOC cover a large estate.

A modern SecOps stack (Wazuh + Suricata + Zeek + Falco + observability metrics) generates 1,000-10,000 alerts/day for a 200-500 server estate. Without automated triage, the operator reads 100-500 significant alerts per day, each requiring 15-45 minutes of investigation. The arithmetic does not work — either you over-staff the SOC, or you ignore alerts and miss real attacks.

This article describes the AI incident analysis pattern we use in AEGIS, its architecture, and why a local LLM (not a frontier API) is the right choice.

The problem: alert fatigue

Public studies (Ponemon, IBM Cost of Data Breach) show SOCs miss attacks because analysts are overwhelmed. Alert volume exceeds human processing capacity. The typical reaction:

Threshold raise — raise the minimum severity you respond to. You miss attacks below the threshold.
Suppress — block noisy rules. You miss the attacks you were ignoring.
Outsourcing — outsource triage to an MSP. You lose business-specific context.

None is good. All sacrifice sensitivity or context.

The correct solution: keep sensitivity high (all important alerts come in) but automate first-line triage. The operator sees the AI summary plus raw alerts, not every alert in detail.

The pattern: 5 stages

┌──────────────────────────────────────────────────────────────────┐
│                   AI INCIDENT ANALYSIS PIPELINE                   │
│                                                                   │
│  1. ALERT       2. CONTEXT      3. RAG          4. LLM            │
│     Suricata,   Aggregator      Similar         Triage:           │
│     Wazuh,      collects        incidents       severity,         │
│     Falco,      logs in time    from history    type,             │
│     Prometheus  window for      DB              recommendation    │
│                 src/dst                                           │
│                                                                   │
│       ▼            ▼               ▼                ▼             │
│       └────────────┴───────────────┴────────────────┘             │
│                                                                   │
│  5. OPERATOR DASHBOARD                                            │
│     - Original alert                                              │
│     - AI narrative                                                │
│     - Recommended action (block, isolate, ignore, escalate)       │
│     - Operator acts: approve / modify / reject                    │
└──────────────────────────────────────────────────────────────────┘

Stage 1: alert

The primary source. Suricata emits an alert on signature match. Wazuh emits an alert on escalated HIDS rule. Prometheus alerts on metric anomaly. Falco alerts on runtime container event.

All alerts enter Alertmanager (or equivalent), which routes them to the AI pipeline.

Standard format: JSON with mandatory fields — timestamp, source (suricata/wazuh/falco/prometheus), severity, host, src_ip, dst_ip, description, raw_event.

Stage 2: context aggregator

A FastAPI receives the alert and aggregates context from the last N minutes (typically 30-60) for involved hosts and IPs:

All syslog logs from the target host in the time window.
All Suricata/Zeek alerts for the attacker IP in the last 24h.
All Wazuh events on the target host.
Relevant Prometheus metrics (CPU, RAM, network) on the host in the window.
GeoIP, ASN, threat intel match for the attacker IP.

Output: structured context, ~5-15 KB JSON.

Stage 3: RAG with similar-incidents DB

This is the differentiator versus a simple LLM call. We have a database of historical incidents — each with the original alert, context, AI narrative, operator decision, and final outcome (real attack vs false positive).

For a new alert, we run similarity search: find 3-5 similar incidents from history. “Similar” = embedding similarity on alert description + IP/host pattern.

Output: 3-5 similar incidents with their narratives and outcome. These enter the LLM prompt as few-shot examples.

The incidents DB is built in the first 3-6 months of operation. Each operator-triaged incident is saved with the final decision — as the DB grows, AI triage quality grows.

Stage 4: LLM triage

The LLM prompt:

You are a senior SOC analyst. You receive a SIEM alert with full context and
similar historical incidents. Produce structured triage:

1. Estimated SEVERITY (Low/Medium/High/Critical) with 1-sentence rationale.
2. Attack TYPE (brute-force / RCE / exfiltration / persistence / scan / other).
3. STATE (active / contained / past).
4. NARRATIVE (3-5 sentences, explanation for the operator).
5. RECOMMENDATION (block IP / isolate host / ignore / escalate / investigate).

ALERT:
{alert_json}

CONTEXT (last 60 min):
{context_json}

SIMILAR INCIDENTS:
{similar_incidents}

The LLM produces structured output (JSON) in 2-5 seconds.

Model used. Modern open-source 30-70B-parameter class (Qwen3 family, Llama 3.x, or equivalent), fine-tuned on the internal incident corpus. Runs on an enterprise GPU (A100/H100 or open equivalent). Latency: 2-5 seconds end-to-end.

Why not a frontier API. Two reasons: log confidentiality (see On-premise SIEM with a local LLM) and cost (10K alerts/day × 5K tokens/alert = 50M tokens/day, prohibitive for a frontier API).

Stage 5: operator dashboard

The dashboard shows a card with:

Original alert (raw).
AI narrative (readable in 30 seconds).
AI recommendation with action buttons.
Links to similar historical incidents.
“Feedback” button — the operator marks triage as correct/incorrect (feeds future training).

The operator reads the narrative, validates against the original alert, decides. For ~70% of alerts (the clear ones), they accept the AI recommendation directly. For 25% they modify (different severity, different action). For 5% they reject completely (false positive that AI did not catch).

Triage quality in production

In current production (3 months of operation after the pilot phase):

Alerts/day: ~3,500 initial → ~800 after automatic Graylog filtering → AI triage.
AI triage latency: 3-7 seconds median (p95 at 12 seconds).
Acceptance rate: 73% AI recommendations accepted without modification.
Modification: 22% recommendations modified (usually severity or detail).
Reject: 5% recommendations fully rejected.
False negative: 2 cases in 3 months where AI marked “Low” for what turned out to be a real attack. Detected retrospectively via other signals.

Compared with manual triage (baseline estimate):

Manual triage: 15-30 minutes median per significant alert.
AI triage + operator validation: 30-60 seconds median.
Operator throughput: from ~30 alerts/day (manual) to ~300 alerts/day (AI-assisted).

For a 200-server estate with 800 significant alerts/day, that is the difference between needing 3 SOC analysts and 1 SOC analyst.

Pitfalls

AI hallucination. LLMs can invent details. Mitigation: prompt requires only structured fields, output schema validation, original alert presented IN PARALLEL with the narrative (operator sees both).

Bias in incidents DB. If in early months the operator triaged certain types incorrectly, AI learns the mistakes. Mitigation: quarterly review of the DB, manual labeling of edge cases.

Quality drift. Attacks evolve. Old patterns disappear, new patterns appear. Mitigation: quarterly re-training with fresh data, plus a safety rule — alerts with no similarity to history go straight to the operator without AI pre-triage.

GPU infrastructure. Enterprise GPU costs. Mitigation: a single GPU serves 1,000-5,000 inferences/hour — sufficient capacity for a 500-1,000 server estate.

Interpretability. “Why did AI say it is Critical?” — the operator needs to see the reasoning. Mitigation: prompt explicitly requires “1-sentence rationale” for each field, plus links to similar incidents used as context.

Integration with Wazuh Active Response

The AI triage pattern is complementary to Wazuh Active Response. AR auto-acts on clear rules (SSH brute, AV signature match) — there is no point waiting for the LLM. AI triage adds value on ambiguous alerts, where the decision requires context and reasoning.

Combined scheme:

Clear alerts with obvious action → AR directly, plus AI does post-triage for audit.
Ambiguous alerts → AI triage primary, operator validates, then action (manual or AR-trigger).
Alerts with no historical pattern → operator directly, no pre-triage.

Next steps

Implementing the pattern in a AEGIS stack takes 2-3 months: GPU + LLM deployment, Alertmanager integration, building the context aggregator, the first 3 months of incidents DB, fine-tuning. See AEGIS for the full roadmap or contact for a technical session.

AI incident analysis with a local LLM: triage from 30 minutes to 30 seconds

AI incident analysis with a local LLM: triage from 30 minutes to 30 seconds

TL;DR

The problem: alert fatigue

The pattern: 5 stages

Stage 1: alert

Stage 2: context aggregator

Stage 3: RAG with similar-incidents DB

Stage 4: LLM triage

Stage 5: operator dashboard

Triage quality in production

Pitfalls

Integration with Wazuh Active Response

Next steps

References

We start with a 30-minute conversation.

AI incident analysis with a local LLM: triage from 30 minutes to 30 seconds

TL;DR

The problem: alert fatigue

The pattern: 5 stages

Stage 1: alert

Stage 2: context aggregator

Stage 3: RAG with similar-incidents DB

Stage 4: LLM triage

Stage 5: operator dashboard

Triage quality in production

Pitfalls

Integration with Wazuh Active Response

Related articles

Next steps

References

We start with a 30-minute conversation.