A Slack Engineering Triage Agent That Would Actually Help
A Slack Engineering Triage Agent That Would Actually Help#
Every engineering team eventually creates the same Slack channel.
#eng-help
Then it becomes a junk drawer:
- "Can someone check this error?"
- "Who owns billing webhooks?"
- "Is this deploy related?"
- "Can we get a status update?"
- "Is this a bug or expected behavior?"
A Slack agent sounds perfect here. But the naive version is just another noisy bot.
The useful version is a workflow.
What the agent should own#
The agent should not be the final authority on everything.
It should own the first 10 minutes of triage:
- Classify the request.
- Find the likely system or owner.
- Gather the obvious evidence.
- Draft the next action.
- Ask for approval before anything risky.
- Leave a clean trail for the human who takes over.
That is valuable because most Slack support pain is not the hard technical answer. It is the context gathering.
The workflow shape#
I would split the workflow into four agents.
Intake Router
Reads the Slack thread and decides whether this is support, bug, incident, product, deploy, billing, or access work.
Context Scout
Pulls the boring-but-critical context: related GitHub files, recent deploys, Linear issues, runbooks, dashboards, and errors.
Resolution Planner
Turns the evidence into a short plan: likely cause, confidence, owner, next action, and draft reply.
Policy Auditor
Checks whether the plan is allowed. Anything involving production mutation, billing, customer data, or external messaging needs approval.
This is not fancy. That is why it works.
Tools it needs#
The tool list should be small at first:
- Slack for thread context and replies.
- GitHub for code search, PRs, and CI status.
- Linear or Jira for ownership and existing work.
- Notion or docs for runbooks.
- Observability read APIs for errors, traces, and deploy events.
Default to read-only. Make writing a separate permission.
Reading a GitHub file is not the same risk as creating an issue. Reading a Slack thread is not the same risk as posting an incident conclusion.
The Slack reply should be boring#
This is what a good reply looks like:
I found two related signals. The billing webhook service deployed 18 minutes before the first error, and the failing requests are all missing
customer_id. I found one related PR and one open issue. Suggested next step: ask the billing owner to confirm whether the new parser handles legacy payloads. Confidence: medium.
Notice what is missing:
- No fake certainty.
- No giant explanation.
- No "I have resolved the issue" nonsense.
- No action taken without approval.
The agent is useful because it turns a vague Slack thread into a sourced handoff.
What to measure#
Do not measure "messages sent."
Measure:
- Correct request classification.
- Correct owner routing.
- Source link coverage.
- Human correction rate.
- Time to useful first response.
- Unsafe action rejection.
An agent that sends 500 Slack messages is not a win. An agent that saves 10 minutes on 40 internal requests a week is.
Build it in Codelit#
Try this in Agent Workflow mode:
Build a Slack engineering triage agent called Hermes. It should classify engineering requests, gather GitHub, Linear, docs, and observability context, draft a response, route owners, and require human approval before risky actions.
Open the Hermes workflow template
Start small. Let it triage. Make it earn write access later.
Try it on Codelit
Agent Workflow Builder
Map agents, tools, model routing, approvals, evals, and deployment before wiring connectors
Related articles
Try these templates
Build this agent workflow
Generate a production workflow for A Slack Engineering Triage Agent That Would Actually Help in seconds.
Try it in Codelit →
Comments