← Back to Agent

Agent Guardrails

Safety rules, anomaly detection, DLP, onboarding enforcement, and human-in-the-loop approvals for this agent.

Contents

Overview

For Everyone

Guardrails are the safety net for your AI agents. They're the rules that prevent agents from going rogue — detecting anomalies, blocking data leaks, enforcing policies, and requiring human approval for sensitive actions. Think of guardrails as the combination of a security guard, compliance officer, and safety inspector, all watching your agent 24/7.

For Developers

The Guardrails section aggregates eight API endpoints on mount:

Emergency controls: POST /guardrails/pause/:agentId, POST /guardrails/resume/:agentId, POST /guardrails/kill/:agentId.

How It Works

  1. Status Check — The status bar shows whether the agent is active or paused, with emergency controls (Pause, Resume, Kill).
  2. Rule Evaluation — Rules are evaluated at runtime against agent behavior. Rules can be agent-specific or global (org-wide).
  3. Intervention — When a rule triggers, an intervention is recorded. Depending on the rule's action, the agent may be warned, paused, or killed.
  4. DLP Scanning — Outbound content is scanned for sensitive data. Violations are logged with severity and action taken.
  5. Approval Flow — Some actions require human approval before proceeding. Pending approvals appear in the Approvals tab.

Key Concepts

Status Bar & Emergency Controls

The top status bar provides at-a-glance safety information and emergency actions:

Emergency controls:

Rules Tab

Rules are organized by category, each with an icon and distinct color. Agent-relevant rules include both agent-specific rules and global rules (no agentIds filter).

Rule Categories

CategoryRule Types
Anomaly DetectionError Rate, Cost Velocity, Volume Spike, Off-Hours Activity, Session Anomaly
Policy CompliancePolicy Violation, Escalation Failure
CommunicationTone Violation, Keyword Detection
MemoryMemory Flood
OnboardingOnboarding Bypass
SecurityData Leak Attempt, Repeated Error, Prompt Injection

Rule Configuration

Each rule has:

Interventions Tab

A log of every guardrail trigger. Each entry shows:

Tip: Frequent interventions from the same rule may mean the agent's instructions conflict with the guardrail. Review and adjust either the agent's behavior or the rule.

DLP Tab

Data Loss Prevention violations occur when the agent tries to share sensitive data (PII, credentials, proprietary information). Each violation shows the triggering rule, severity, matched content snippet (truncated to 100 chars), and action taken (detected, redacted, blocked).

Onboarding Tab

Tracks whether the agent has completed required onboarding steps:

Approvals Tab

The human-in-the-loop safety system. Two sections:

Best Practices

Troubleshooting

Agent keeps getting paused

Check the Interventions tab to identify which rule is triggering. The rule may be too sensitive (lower the threshold or increase the window) or the agent's behavior needs adjustment.

Rule not triggering

Verify: (1) the rule is enabled (toggle is on), (2) it targets this agent or is global, (3) the threshold hasn't been set too high, (4) the cooldown hasn't silenced recent triggers.

DLP violations for legitimate content

DLP rules may be too broad. Review the matched content to verify it's actually sensitive. Adjust DLP patterns at the organization level or create agent-specific overrides in the Security tab.

Kill agent didn't work

Kill sends a termination signal. If the agent has external processes, they may continue running. Check the deployment system (PM2, Docker, etc.) to verify all processes stopped.

AgenticMail Enterprise Documentation Report an issue