← Back to Agent
Agent Guardrails
Safety rules, anomaly detection, DLP, onboarding enforcement, and human-in-the-loop approvals for this agent.
Overview
For Everyone
Guardrails are the safety net for your AI agents. They're the rules that prevent agents from going rogue — detecting anomalies, blocking data leaks, enforcing policies, and requiring human approval for sensitive actions. Think of guardrails as the combination of a security guard, compliance officer, and safety inspector, all watching your agent 24/7.
For Developers
The Guardrails section aggregates eight API endpoints on mount:
GET /guardrails/status/:agentId — Pause state and intervention count.
GET /guardrails/rules?orgId=… — All org rules (filtered client-side to agent-relevant ones).
GET /guardrails/interventions?agentId=… — Intervention history.
GET /dlp/violations?agentId=… — Data Loss Prevention violations.
GET /onboarding/status/:agentId — Onboarding completion status.
GET /onboarding/progress/:agentId — Per-policy acknowledgment progress.
GET /approvals/pending?agentId=… — Awaiting human approval.
GET /approvals/history?agentId=… — Past approval decisions.
Emergency controls: POST /guardrails/pause/:agentId, POST /guardrails/resume/:agentId, POST /guardrails/kill/:agentId.
How It Works
- Status Check — The status bar shows whether the agent is active or paused, with emergency controls (Pause, Resume, Kill).
- Rule Evaluation — Rules are evaluated at runtime against agent behavior. Rules can be agent-specific or global (org-wide).
- Intervention — When a rule triggers, an intervention is recorded. Depending on the rule's action, the agent may be warned, paused, or killed.
- DLP Scanning — Outbound content is scanned for sensitive data. Violations are logged with severity and action taken.
- Approval Flow — Some actions require human approval before proceeding. Pending approvals appear in the Approvals tab.
Key Concepts
Status Bar & Emergency Controls
The top status bar provides at-a-glance safety information and emergency actions:
- Active — Agent is operating normally.
- Paused — Agent is suspended. All processing is halted.
- Intervention Count — Total number of guardrail triggers.
- Active Rules — Shows "X/Y rules active" (enabled vs total).
Emergency controls:
- Pause — Immediately suspends the agent. It can be resumed later.
- Resume — Restarts a paused agent.
- Kill — Terminates all running processes immediately. Requires confirmation. Use as a last resort.
Rules Tab
Rules are organized by category, each with an icon and distinct color. Agent-relevant rules include both agent-specific rules and global rules (no agentIds filter).
Rule Categories
| Category | Rule Types |
| Anomaly Detection | Error Rate, Cost Velocity, Volume Spike, Off-Hours Activity, Session Anomaly |
| Policy Compliance | Policy Violation, Escalation Failure |
| Communication | Tone Violation, Keyword Detection |
| Memory | Memory Flood |
| Onboarding | Onboarding Bypass |
| Security | Data Leak Attempt, Repeated Error, Prompt Injection |
Rule Configuration
Each rule has:
- Name & Description — Human-readable identification.
- Action — What happens when triggered: Alert, Notify, Log, Pause Agent, or Kill Agent.
- Severity — Low, Medium, High, or Critical. Color-coded.
- Threshold & Window — Numeric threshold within a time window (e.g., "10 errors in 60 minutes").
- Keywords & Patterns — For communication and security rules. Comma-separated keywords or regex patterns.
- Cooldown — Minutes between repeated triggers of the same rule.
- Toggle Switch — Enable/disable without deleting.
- Global Badge — Rules without agent-specific targeting show a "Global" badge.
Interventions Tab
A log of every guardrail trigger. Each entry shows:
- Timestamp — When the intervention occurred.
- Type — Block (red), Warn (yellow), or Log (blue).
- Severity — Color-coded severity level.
- Description — What happened.
- Resolution — What action was taken.
Tip: Frequent interventions from the same rule may mean the agent's instructions conflict with the guardrail. Review and adjust either the agent's behavior or the rule.
DLP Tab
Data Loss Prevention violations occur when the agent tries to share sensitive data (PII, credentials, proprietary information). Each violation shows the triggering rule, severity, matched content snippet (truncated to 100 chars), and action taken (detected, redacted, blocked).
Onboarding Tab
Tracks whether the agent has completed required onboarding steps:
- Status Badge — "Onboarded" (green) or "Not Onboarded" (yellow).
- Progress Checklist — Each policy with a checkmark (✓) or empty circle showing acknowledgment status.
- Actions — "Start" to initiate onboarding, "Force Complete" to skip remaining steps (use cautiously).
Approvals Tab
The human-in-the-loop safety system. Two sections:
- Pending Approvals — Actions waiting for human review. Each shows a description, timestamp, and Approve/Reject buttons.
- Approval History — Past decisions with status (Approved/Rejected), description, timestamp, and who decided.
Best Practices
- Start with alerting, then escalate — Begin with "Alert" actions and monitor. Only escalate to "Pause" or "Kill" for rules you're confident about.
- Set reasonable thresholds — Too strict causes constant interruptions; too loose defeats the purpose. Calibrate based on observed behavior.
- Use cooldowns to prevent alert fatigue — A 30-minute cooldown prevents the same rule from firing hundreds of times.
- Review interventions weekly — Patterns in interventions reveal agent behavior issues or overly strict rules.
- Approve or reject pending items promptly — Agents waiting for approval are blocked. Don't leave them hanging.
- If you always approve the same type, update the rule — Persistent approvals for the same action type suggest the rule is too restrictive.
- Complete onboarding before production use — Don't force-complete unless absolutely necessary. Onboarding ensures the agent understands org policies.
Troubleshooting
Agent keeps getting paused
Check the Interventions tab to identify which rule is triggering. The rule may be too sensitive (lower the threshold or increase the window) or the agent's behavior needs adjustment.
Rule not triggering
Verify: (1) the rule is enabled (toggle is on), (2) it targets this agent or is global, (3) the threshold hasn't been set too high, (4) the cooldown hasn't silenced recent triggers.
DLP violations for legitimate content
DLP rules may be too broad. Review the matched content to verify it's actually sensitive. Adjust DLP patterns at the organization level or create agent-specific overrides in the Security tab.
Kill agent didn't work
Kill sends a termination signal. If the agent has external processes, they may continue running. Check the deployment system (PM2, Docker, etc.) to verify all processes stopped.