AI agentsagentic workflowarchitectureLLMtool useproduction

Agentic Workflow Builder: Design AI Agents Before You Wire Tools

May 21, 2026 7 min readBy Codelit Team Discussion

Agentic Workflow Builder#

Most teams start building agents at the prompt layer:

"You are a helpful assistant. Use Slack, GitHub, and Notion..."

That is not enough. A real agent is not just a prompt with tools attached. It is an operating system for work: who can trigger it, what context it can read, which actions require approval, which model handles each task, what gets logged, and how the team knows it is safe to run.

Codelit now has a third creation mode for that: Agent Workflow.

Architecture answers: How is the system shaped?

Product Board answers: What should we build?

Agent Workflow answers: How should autonomous work happen?

The Agent Workflow Template#

Use this structure before you connect a single production account.

{
  "title": "Slack Engineering Triage Agent",
  "goal": "Resolve or route routine engineering requests with auditable evidence",
  "triggers": ["Slack mention", "incident keyword", "scheduled digest"],
  "agents": ["Intake Router", "Context Scout", "Resolution Planner", "Policy Auditor"],
  "tools": ["Slack", "GitHub", "Linear", "Notion", "Observability API"],
  "modelRoutes": ["classification", "context synthesis", "policy review", "final response"],
  "guardrails": ["read-only default", "human approval for writes", "audit every tool call"],
  "evaluations": ["routing accuracy", "grounded response rate", "unsafe action rejection"]
}

The important part is not the JSON. It is the discipline behind it.

Add Skills As The Memory Of The Workflow#

Skills are where agent design starts to become reusable. Instead of stuffing every instruction into one giant prompt, package specialized behavior into small instruction packs:

A workflow skill for the task contract, approval policy, and output format.
A domain skill for incident triage, support replies, PR review, research, or browser operations.
A policy skill for permissions, sensitive data, billing, deploys, and external communication.
An eval skill for red-team scenarios and release gates.

Each skill should have activation rules. The agent should load the full instructions, references, and scripts only when the task actually needs them.

That keeps context clean and makes expertise portable across agents.

1. Start With The Operating Goal#

Weak goal:

"Make a Slack agent."

Production goal:

"Resolve or route 80% of routine engineering requests in Slack while preserving source links, approval gates, and audit logs."

That goal tells you what the agent is allowed to optimize for. It also tells you what not to build. A support triage agent should not become a deployment agent just because it has GitHub access.

2. Split One Agent Into Specialist Agents#

Single-agent systems become vague quickly. The model has to classify, retrieve, reason, write, check policy, and decide whether to act.

Split the work:

Intake Router classifies the request and priority.
Context Scout gathers docs, issues, code, logs, and source links.
Resolution Planner turns evidence into a plan or response.
Policy Auditor decides whether the next action is allowed.

This gives you cleaner prompts, easier evals, better logs, and a safer escalation model.

3. Treat Tools As Permissions, Not Decorations#

Tool access is where agent design gets real. Every connector should define:

What the agent can read.
What the agent can write.
Whether the auth comes from OAuth, API key, service account, or user session.
Whether the action is low, medium, or high risk.
Which actions require human approval.

For example, a GitHub tool might allow code search and CI inspection by default, but require approval before posting a review, opening an issue, or pushing a commit.

4. Put MCP Between Agents And Systems#

MCP is the right boundary for many agent workflows because it separates the agent from the systems it wants to touch.

Model your MCP layer explicitly:

Tools for actions like pulls.diff, browser.snapshot, tests.run, or customer.read.
Resources for docs, runbooks, schemas, traces, source ledgers, and code roots.
Prompts for reusable workflows like incident summaries, support drafts, and review rubrics.
Roots for approved filesystem or repository boundaries.
Sampling only when a server should request model work through the client with a human-visible approval path.

Do not hide this inside a vague "connect tools" box. Make each server, capability, exposed action, auth mode, and approval policy visible.

5. Route Models By Task#

Different tasks need different models.

Use a cheap fast model for:

Intent classification
Tool routing
Basic extraction

Use a stronger reasoning model for:

Root-cause analysis
Multi-source synthesis
Code review
Planning risky actions

Use a policy-focused route for:

Approval checks
Data access boundaries
Irreversible action review

This is also where BYOK matters. If a user brings OpenAI, Anthropic, Gemini, or OpenRouter keys, the workflow should respect those provider choices instead of hiding model decisions in a generic chat call.

6. Add Guardrails Before The Demo#

Good guardrails are specific enough to enforce:

Default every connector to read-only.
Require a human click before writes, billing changes, deploy actions, or external messages.
Never quote secrets, tokens, or private customer identifiers.
Persist an audit record for every tool call.
Mark unsupported claims as inference.

Guardrails should be visible in the workflow, not buried in a prompt nobody reads after launch.

7. Define Evals And Harnesses That Match The Job#

An agent workflow needs tests that match the work.

For a Slack engineering triage agent:

Routing accuracy on labeled Slack threads.
Grounded response rate with source links.
Unsafe action rejection on red-team cases.
Time to useful answer.
Human override rate.

For a GitHub PR review agent:

Useful finding rate.
False blocking rate.
Critical path coverage for auth, billing, deploy, and migrations.
Secret redaction.
CI failure explanation quality.

Do not wait until production to decide what "good" means.

Then add harnesses that make the checks operational:

Sandbox harness for browser-operation workflows.
Replay harness for incident and support conversations.
Prompt-regression harness for PR review and research workflows.
Release-gate harness for agent eval labs.
Approval harness for high-risk tool calls.

The harness is how a policy becomes something the system can enforce.

8. Generate The Architecture Last#

Once the workflow is clear, the architecture is much easier:

Webhook receiver for Slack, GitHub, Linear, or app events.
Agent orchestrator.
Tool execution layer.
Queue for long-running work.
Memory or context store.
Connector token vault.
Audit log.
Evaluation pipeline.
Human approval UI.
Observability and alerting.

That is why Agent Workflow fits next to Architecture and Product Board in Codelit. It is not a replacement for them. It is the missing planning layer for autonomous work.

Try It In Codelit#

Start with one of these:

"Build a Hermes-style Slack agent for engineering triage, incidents, and internal platform requests."
"Build an OpenClaw browser operations agent that completes approved tasks across internal web apps."
"Build a PatchPilot issue-to-PR coding agent for scoped Linear and GitHub tickets."
"Build a GitHub pull request review agent that checks architecture risk, tests, and security before merge."
"Build an internal operations agent for an AI startup that connects Slack, GitHub, Stripe, and Notion."
"Build an EvalForge agent evaluation lab for red-team scenarios, guardrails, and release gates."

Then use the generated workflow to produce the product board or architecture.

The shape is simple: design the work, then design the system that runs the work.

Try it on Codelit

Agent Workflow Builder

Map agents, tools, model routing, approvals, evals, and deployment before wiring connectors

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this agent workflow →

Comments

AI agents

Agent Skills Are the New Runbooks

3 min read

AI agents

Agent Workflows for AI Infrastructure Teams

2 min read

AI agents

From Agent Workflow to Production Architecture

3 min read

Try these templates

Netflix Video Streaming Architecture

Global video streaming platform with adaptive bitrate, CDN distribution, and recommendation engine.

10 components

Search Engine Architecture

Web-scale search with crawling, indexing, ranking, and sub-second query serving.

8 components

Figma Collaborative Design Platform

Browser-based design tool with real-time multiplayer editing, component libraries, and developer handoff.

10 components

Build this agent workflow

Generate a production workflow for Agentic Workflow Builder in seconds.

Try it in Codelit →

AI agentsagentic workflowarchitectureLLMtool useproduction

Agentic Workflow Builder: Design AI Agents Before You Wire Tools

May 21, 2026 7 min readBy Codelit Team Discussion

Agentic Workflow Builder#

Most teams start building agents at the prompt layer:

"You are a helpful assistant. Use Slack, GitHub, and Notion..."

Codelit now has a third creation mode for that: Agent Workflow.

Architecture answers: How is the system shaped?

Product Board answers: What should we build?

Agent Workflow answers: How should autonomous work happen?

The Agent Workflow Template#

Use this structure before you connect a single production account.

{
  "title": "Slack Engineering Triage Agent",
  "goal": "Resolve or route routine engineering requests with auditable evidence",
  "triggers": ["Slack mention", "incident keyword", "scheduled digest"],
  "agents": ["Intake Router", "Context Scout", "Resolution Planner", "Policy Auditor"],
  "tools": ["Slack", "GitHub", "Linear", "Notion", "Observability API"],
  "modelRoutes": ["classification", "context synthesis", "policy review", "final response"],
  "guardrails": ["read-only default", "human approval for writes", "audit every tool call"],
  "evaluations": ["routing accuracy", "grounded response rate", "unsafe action rejection"]
}

The important part is not the JSON. It is the discipline behind it.

Add Skills As The Memory Of The Workflow#

Skills are where agent design starts to become reusable. Instead of stuffing every instruction into one giant prompt, package specialized behavior into small instruction packs:

A workflow skill for the task contract, approval policy, and output format.
A domain skill for incident triage, support replies, PR review, research, or browser operations.
A policy skill for permissions, sensitive data, billing, deploys, and external communication.
An eval skill for red-team scenarios and release gates.

Each skill should have activation rules. The agent should load the full instructions, references, and scripts only when the task actually needs them.

That keeps context clean and makes expertise portable across agents.

1. Start With The Operating Goal#

Weak goal:

"Make a Slack agent."

Production goal:

"Resolve or route 80% of routine engineering requests in Slack while preserving source links, approval gates, and audit logs."

That goal tells you what the agent is allowed to optimize for. It also tells you what not to build. A support triage agent should not become a deployment agent just because it has GitHub access.

2. Split One Agent Into Specialist Agents#

Single-agent systems become vague quickly. The model has to classify, retrieve, reason, write, check policy, and decide whether to act.

Split the work:

Intake Router classifies the request and priority.
Context Scout gathers docs, issues, code, logs, and source links.
Resolution Planner turns evidence into a plan or response.
Policy Auditor decides whether the next action is allowed.

This gives you cleaner prompts, easier evals, better logs, and a safer escalation model.

3. Treat Tools As Permissions, Not Decorations#

Tool access is where agent design gets real. Every connector should define:

What the agent can read.
What the agent can write.
Whether the auth comes from OAuth, API key, service account, or user session.
Whether the action is low, medium, or high risk.
Which actions require human approval.

For example, a GitHub tool might allow code search and CI inspection by default, but require approval before posting a review, opening an issue, or pushing a commit.

4. Put MCP Between Agents And Systems#

MCP is the right boundary for many agent workflows because it separates the agent from the systems it wants to touch.

Model your MCP layer explicitly:

Tools for actions like pulls.diff, browser.snapshot, tests.run, or customer.read.
Resources for docs, runbooks, schemas, traces, source ledgers, and code roots.
Prompts for reusable workflows like incident summaries, support drafts, and review rubrics.
Roots for approved filesystem or repository boundaries.
Sampling only when a server should request model work through the client with a human-visible approval path.

Do not hide this inside a vague "connect tools" box. Make each server, capability, exposed action, auth mode, and approval policy visible.

5. Route Models By Task#

Different tasks need different models.

Use a cheap fast model for:

Intent classification
Tool routing
Basic extraction

Use a stronger reasoning model for:

Root-cause analysis
Multi-source synthesis
Code review
Planning risky actions

Use a policy-focused route for:

Approval checks
Data access boundaries
Irreversible action review

6. Add Guardrails Before The Demo#

Good guardrails are specific enough to enforce:

Default every connector to read-only.
Require a human click before writes, billing changes, deploy actions, or external messages.
Never quote secrets, tokens, or private customer identifiers.
Persist an audit record for every tool call.
Mark unsupported claims as inference.

Guardrails should be visible in the workflow, not buried in a prompt nobody reads after launch.

7. Define Evals And Harnesses That Match The Job#

An agent workflow needs tests that match the work.

For a Slack engineering triage agent:

Routing accuracy on labeled Slack threads.
Grounded response rate with source links.
Unsafe action rejection on red-team cases.
Time to useful answer.
Human override rate.

For a GitHub PR review agent:

Useful finding rate.
False blocking rate.
Critical path coverage for auth, billing, deploy, and migrations.
Secret redaction.
CI failure explanation quality.

Do not wait until production to decide what "good" means.

Then add harnesses that make the checks operational:

Sandbox harness for browser-operation workflows.
Replay harness for incident and support conversations.
Prompt-regression harness for PR review and research workflows.
Release-gate harness for agent eval labs.
Approval harness for high-risk tool calls.

The harness is how a policy becomes something the system can enforce.

8. Generate The Architecture Last#

Once the workflow is clear, the architecture is much easier:

Webhook receiver for Slack, GitHub, Linear, or app events.
Agent orchestrator.
Tool execution layer.
Queue for long-running work.
Memory or context store.
Connector token vault.
Audit log.
Evaluation pipeline.
Human approval UI.
Observability and alerting.

That is why Agent Workflow fits next to Architecture and Product Board in Codelit. It is not a replacement for them. It is the missing planning layer for autonomous work.

Try It In Codelit#

Start with one of these:

"Build a Hermes-style Slack agent for engineering triage, incidents, and internal platform requests."
"Build an OpenClaw browser operations agent that completes approved tasks across internal web apps."
"Build a PatchPilot issue-to-PR coding agent for scoped Linear and GitHub tickets."
"Build a GitHub pull request review agent that checks architecture risk, tests, and security before merge."
"Build an internal operations agent for an AI startup that connects Slack, GitHub, Stripe, and Notion."
"Build an EvalForge agent evaluation lab for red-team scenarios, guardrails, and release gates."

Then use the generated workflow to produce the product board or architecture.

The shape is simple: design the work, then design the system that runs the work.

Try it on Codelit

Agent Workflow Builder

Map agents, tools, model routing, approvals, evals, and deployment before wiring connectors

Chaos Mode

Simulate node failures and watch cascading impact across your architecture

Build this agent workflow →

Comments

AI agents

Build this agent workflow

Generate a production workflow for Agentic Workflow Builder in seconds.

Try it in Codelit →

Agentic Workflow Builder: Design AI Agents Before You Wire Tools

Agentic Workflow Builder#

The Agent Workflow Template#

Add Skills As The Memory Of The Workflow#

1. Start With The Operating Goal#

2. Split One Agent Into Specialist Agents#

3. Treat Tools As Permissions, Not Decorations#

4. Put MCP Between Agents And Systems#

5. Route Models By Task#

6. Add Guardrails Before The Demo#

7. Define Evals And Harnesses That Match The Job#

8. Generate The Architecture Last#

Try It In Codelit#

Comments

Related articles

Agent Skills Are the New Runbooks

Agent Workflows for AI Infrastructure Teams

From Agent Workflow to Production Architecture

Try these templates

Netflix Video Streaming Architecture

Search Engine Architecture

Figma Collaborative Design Platform

Build this agent workflow

Agentic Workflow Builder: Design AI Agents Before You Wire Tools

Agentic Workflow Builder#

The Agent Workflow Template#

Add Skills As The Memory Of The Workflow#

1. Start With The Operating Goal#

2. Split One Agent Into Specialist Agents#

3. Treat Tools As Permissions, Not Decorations#

4. Put MCP Between Agents And Systems#

5. Route Models By Task#

6. Add Guardrails Before The Demo#

7. Define Evals And Harnesses That Match The Job#

8. Generate The Architecture Last#

Try It In Codelit#

Comments

Related articles

Agent Skills Are the New Runbooks

Agent Workflows for AI Infrastructure Teams

From Agent Workflow to Production Architecture

Try these templates

Netflix Video Streaming Architecture

Search Engine Architecture

Figma Collaborative Design Platform

Build this agent workflow