Eliminating Bottlenecks in Incident Response

Your alerts tell you what broke. They can't tell you why. Using AI-based failure detection can close that gap.

On-call engineers spend most of an incident not detecting it but explaining it. The alerts fire and the dashboard shows a metric crossing a line. Everything after that is manual work – correlating the alert with recent deploys, walking the log streams, and finding the actual cause. Detection is largely solved. Explanation isn't, and that's where AI failure detection has the clearest practical payoff right now.

What threshold alerts actually do

Rules-based and threshold alerts fire when a metric crosses a line. They do not notice when several unrelated metrics together describe a coherent failure. A queue depth creeping up, a downstream service quietly retrying, a feature flag flipped two hours ago – the alert that fires is usually the symptom three layers downstream from the cause.

Modern observability platforms already collect the high-cardinality data needed to find that cause. Charity Majors at Honeycomb has been pointing this out for years: the signal is in the data, the bottleneck is the operator reading it.

That bottleneck doesn't scale with human readers, but it does with models.

A small model as a log reader

A small, fast model placed in front of the log stream and alert queue can read what an on-call engineer would read. It doesn't make decisions. It summarizes and correlates.

A Haiku-class model can answer questions like these in seconds:

  • "These three alerts fired in this order. Are they describing one incident or three?"
  • "A deployment landed forty minutes ago. What's in the new logs that wasn't in the old?"
  • "User complaints are spiking. Error rates are flat. What is degrading silently?"
  • "This stack trace is almost identical to one from last Tuesday. Did the underlying race actually get fixed?"

The output is a finding, not a generic alert. A short narrative with the suspected pattern, the supporting evidence, and a suggested next check.

The same primitive works on legacy systems

The same pattern applies to a fifteen-year-old Java monolith and to an agentic workflow shipped last quarter. A model reading logs is indifferent to whether the system writing them is deterministic, generative, or somewhere in between.

That matters because most production stacks are mixed. An ETL job from 2014 runs next to a RAG pipeline from last month, and the on-call engineer holds both in their head. A reader on the unified log stream gives those two halves a common language for what's actually happening.

Escalate to a stronger model first, a human second

When the small reader finds something worth escalating, the first stop should be a stronger model with full context, not the on-call engineer. Sonnet- or Opus-class models can correlate across services, draft a hypothesis, propose a fix, and only then surface to the operator.

Human-in-the-loop doesn't go away. It moves up the stack. The artifact a human reviews changes from a log line to a root-cause analysis.

Frameworks are packaging the pattern

Frameworks like Flue describe this as a four-step loop: route simple operations to cheap models or deterministic logic, monitor everything for overt and silent failures, escalate complex cases to stronger models with context, and let the escalated session take corrective tool calls when appropriate.

The fourth step is what's new. The same model that writes the root-cause analysis can also restart the pod, roll back the deploy, or open the PR. The operator's job shifts from execution to review.

What this changes in practice

The alert is no longer the artifact an on-call engineer reviews; the model's finding is. The runbook follows: from a static "if X then Y" file to a structured prompt the stronger model uses to investigate, with the operator as final reviewer.

The win isn't detecting more failures. It's explaining the ones we already detect.

The question isn't whether AI belongs in your incident pipeline. It's how much longer your team can afford to keep paging humans to do work the model is ready to do.