Hacker Bob logo Hacker Bob

Architecture briefing

How Hacker Bob works

A simple explanation of the agents, pipeline, MCP memory, and evidence flow behind Bob.

Hacker Bob mascot
Event slide deckArrow keys or space to navigate
Hacker Bob logo Hacker Bob

Scan this first

Open Hacker Bob on GitHub.

Repo, install command, release notes, issues, and source code are all here.

github.com/vmihalis/hacker-bob
QR code for https://github.com/vmihalis/hacker-bob

github.com/vmihalis/hacker-bob

Repositoryhttps://github.com/vmihalis/hacker-bob
Hacker Bob logo Hacker Bob

First principle

Ethical hacking is hacking with permission.

Same technical curiosity. Different contract, boundaries, evidence handling, and disclosure path.

Set the frameAuthorization changes everything
Hacker Bob logo Hacker Bob

Rules of engagement

1Get authorization

Only test targets, accounts, and methods that are explicitly allowed.

2Stay in scope

Respect program limits, rate limits, data rules, and third-party boundaries.

3Minimize harm

Collect the proof needed, redact sensitive data, and report responsibly.

Why this mattersBob should amplify discipline, not risk
Hacker Bob logo Hacker Bob

Bug bounty basics

Permission to evaluate, with a rulebook.

A company publishes scope, rules, rewards, and a report form. You test only the allowed assets and submit reproducible evidence.

1

Read the scope before touching anything.

2

Test carefully and avoid real user harm.

3

Report proof that someone can reproduce.

4

Wait for triage, fix, and reward decisions.

Where to startPublic programs, beginner-friendly scopes, and company security pages
Hacker Bob logo Hacker Bob

Threat landscape: April 2026

The attack surface is moving faster than static playbooks.

This month alone: software supply chain compromise, stolen SaaS tokens, OAuth abuse, active zero-days, and critical infrastructure intrusion disclosures.

Research checked Apr 30, 2026Selected public incidents, not a complete global census
Hacker Bob logo Hacker Bob

Recent incidents: supply chain

Apr 29SAP npm packages

Official SAP-related packages were compromised to steal developer and cloud credentials.

Apr 22Bitwarden CLI

A malicious npm delivery path briefly distributed a backdoored CLI package.

Apr 7Snowflake customers

Stolen third-party integration tokens drove data-theft attempts against customer accounts.

PatternTrusted software paths became attack paths
Hacker Bob logo Hacker Bob

Recent incidents: enterprise and data

Apr 20Vercel

A third-party AI/OAuth integration became a route into internal systems and customer data exposure.

Apr 26Itron

The utility technology firm disclosed unauthorized access to internal systems.

Apr 13Booking / Basic-Fit

Public breach reports included reservation data exposure and a 1M-member fitness breach.

PatternIdentity, SaaS, and data access are primary targets
Hacker Bob logo Hacker Bob

Recent incidents: impact demonstration

Zero-days and exposed admin surfaces are still live fire.

April reporting included active impact demonstration around Fortinet EMS, Microsoft Defender, Windows Shell, LiteLLM, Qinglong, cPanel/WHM, and WordPress plugin backdoors.

Exposed management
Weak auth
Known impact proofed flaws
Credential theft
Data impact
PatternAttackers combine old gaps with fast impact demonstration
Hacker Bob logo Hacker Bob
AI

The big shift

SECURITY IS BECOMING AGENTIC

Attack and defense are moving from fixed scripts to systems that observe, decide, use tools, and adapt.

Transition slideThis is why Bob's architecture matters
Hacker Bob logo Hacker Bob

Agent-assisted attacks

The attacker loop is becoming agent-assisted.

Not just static scripts: operators now use AI to research, plan, generate lures, test infrastructure, triage data, and adapt faster.

Precise claimHuman operators still steer many attacks, but agents compress time and skill
Hacker Bob logo Hacker Bob

Why this changes defense

Static scripts execute.

Agents decide what to try next.

1

Observe the target.

2

Choose the next test.

3

Call tools.

4

Adapt from results.

Bridge to BobEthical automation needs memory, scope, evidence, and human control
Hacker Bob logo Hacker Bob
Use this slide if challengedClaims are current as of Apr 30, 2026
Hacker Bob logo Hacker Bob

The problem

Bug bounty evaluating is not one task.

It is surface-discovery, auth, testing, chaining, verifying, collecting evidence, grading, and writing.

Start hereBob is built around the workflow
Hacker Bob logo Hacker Bob

One sentence

Bob turns the evaluate into a state machine with receipts.

Every phase leaves structured evidence behind.

Simple mental modelState machine + artifacts
Hacker Bob logo Hacker Bob

Architecture in three layers

1Commands

The human starts, resumes, checks, and debugs Bob.

2Orchestrator

The root skill chooses the next phase and starts agents.

3MCP memory

The local server owns state, findings, evidence, and telemetry.

What to rememberHuman control, agent work, durable state
Hacker Bob logo Hacker Bob

Layer 1

Commands are the control panel.

The operator does not manage every file. They use simple Claude Code commands.

Common commandsClaude Code
/bob-evaluate target.com
/bob-evaluate resume target.com
/bob-status
/bob-debug
/bob-egress
User-facing surfaceSimple commands over a complex workflow
Hacker Bob logo Hacker Bob
1

Read the current session state.

2

Decide the next phase.

3

Spawn the right specialist agent.

4

Wait for structured output.

Layer 2

The orchestrator coordinates.

It does not do random target testing. It delegates work and keeps the run on rails.

Root skill: bob-evaluateCoordinates, delegates, resumes
Hacker Bob logo Hacker Bob

Layer 3

MCP is Bob's memory and rulebook.

The local `bountyagent` server validates tools, writes artifacts, enforces phase gates, and records telemetry.

state findings coverage evidence grade audit
Source of truthJSON and JSONL, not chat memory
Hacker Bob logo Hacker Bob

Context strategy

Long runs can drown the model.

Surface-discovery data, traffic, auth, findings, retries, dead ends, and evidence can quickly become too much for one chat to carry.

The challengeToo much context causes drift
Hacker Bob logo Hacker Bob

Context strategy

Bob keeps memory outside the chat.

The chat coordinates. MCP stores. Agents receive only the slice they need.

compact briefs surface assignments artifact reads structured handoffs
The answerUse context like a budget
Hacker Bob logo Hacker Bob

Context strategy

Each agent gets a narrow slice.

EvaluatorOne surface

Enough detail to test one target area well.

VerifierOne claim set

Enough detail to replay and challenge findings.

ReporterOnly survivors

Enough detail to write from verified evidence.

Strategic compressionLess context per agent, better focus
Hacker Bob logo Hacker Bob

Context strategy

The evaluator brief is curated.

Bob does not paste the whole session. It builds a compact brief from MCP state.

Example: evaluator brief contentshigh level
{
  "assigned_surface": "API-1",
  "auth_hint": "attacker + victim",
  "coverage_summary": "2 tested, 1 promising",
  "traffic_hints": ["/api/invoices/:id"],
  "dead_ends": ["/old-login"],
  "ranking_reason": "auth + object IDs"
}
Important ideaBriefs are generated from durable state
Hacker Bob logo Hacker Bob
Example: state.jsonsimplified
{
  "target": "example.com",
  "phase": "EVALUATE",
  "evaluation_wave": 2,
  "pending_wave": null,
  "auth_status": "attacker_and_victim",
  "total_findings": 1
}

Context strategy

State makes Bob resumable.

If the chat stops, Bob can read the session state and continue from the right place.

Key ideaProgress lives on disk, not only in the conversation
Hacker Bob logo Hacker Bob

The whole run

Seven phases, one direction.

SURFACE_DISCOVERYFind surfaces
AUTHGet profiles
EVALUATETest surfaces
CHAINCombine impact
VERIFYReplay claims
GRADEDecide quality
REPORTWrite result
FSMSURFACE_DISCOVERY -> AUTH -> EVALUATE -> CHAIN -> VERIFY -> GRADE -> REPORT
Hacker Bob logo Hacker Bob
01

Pipeline step

SURFACE_DISCOVERY

Bob builds the map before anyone starts testing.

Phase 1 of 7Output: attack_surface.json
Hacker Bob logo Hacker Bob

Phase 1: SURFACE_DISCOVERY

Bob first builds a map.

Surface-discovery finds attack surfaces and gives each one a stable ID.

Example: attack_surface.jsonsimplified
{
  "target": "example.com",
  "surfaces": [
    {
      "id": "API-1",
      "type": "api",
      "url": "https://api.example.com",
      "priority": "HIGH",
      "hints": ["auth", "object_ids"]
    }
  ]
}
OutputSurfaces evaluators can be assigned to
Hacker Bob logo Hacker Bob
02

Pipeline step

AUTH

Bob tries to get useful testing identities.

Phase 2 of 7Goal: attacker and victim profiles
Hacker Bob logo Hacker Bob

Phase 2: AUTH

Bob tries to get useful accounts.

Attacker and victim profiles let Bob test access control with contrast instead of guessing.

API signup Browser signup Manual capture Auth store
Why it mattersIDOR and access-control testing need two identities
Hacker Bob logo Hacker Bob

Network visibility

Bob records how it touches the target.

HTTP scans go through MCP, which writes redacted audit metadata and egress information.

Example: http-audit.jsonl linesimplified
{
  "method": "GET",
  "url": "https://api.example.com/api/...",
  "status": 200,
  "auth_profile": "attacker",
  "egress_profile": "default",
  "egress_profile_identity_hash": "..."
}
Important for operatorsAudited requests, redacted URLs, visible egress
Hacker Bob logo Hacker Bob

Egress control

Bob does not silently rotate networks.

Session state binds the chosen egress profile identity; route drift fails closed, while same-route credential rotation is allowed.

/bob-egress operator choice no proxy URLs in audit
BoundaryRouting is controlled by the human
Hacker Bob logo Hacker Bob
03

Pipeline step

EVALUATE

Specialist agents test assigned surfaces in waves.

Phase 3 of 7Output: findings, coverage, handoffs
Hacker Bob logo Hacker Bob

Phase 3: EVALUATE

Evaluators work in waves.

One evaluator gets one surface. That keeps parallel testing focused.

Start

Assign surfaces

MCP writes wave assignments.

Run

Spawn evaluators

Agents test in background.

Merge

Validate handoffs

MCP updates session state.

Wave ruleLaunch, wait, merge, then decide
Hacker Bob logo Hacker Bob

JSON example

A evaluator leaves a handoff.

This tells Bob what happened to the assigned surface.

Example: handoff-w1-a2.jsonsimplified
{
  "wave": "w1",
  "agent": "a2",
  "surface_id": "API-1",
  "surface_status": "promising",
  "findings": ["F-1"],
  "lead_surface_ids": ["WEB-3"],
  "dead_ends": ["/old-login"]
}
Why it mattersParallel agents become structured state
Hacker Bob logo Hacker Bob
Example: coverage.jsonl linesimplified
{
  "surface_id": "API-1",
  "endpoint": "/api/invoices/123",
  "bug_class": "idor",
  "auth_profile": "attacker",
  "status": "tested",
  "notes": "Victim invoice blocked"
}

JSON example

Coverage prevents repeated guessing.

Bob records what was tested, blocked, promising, or needs auth.

Coverage is durableEvaluators log meaningful tests through MCP
Hacker Bob logo Hacker Bob

JSON example

A finding is a claim with context.

It is not a report yet. It is a candidate that must survive chaining and verification.

Example: findings.jsonl linesimplified
{
  "id": "F-1",
  "title": "Invoice IDOR",
  "severity": "HIGH",
  "surface_id": "API-1",
  "auth": "attacker_vs_victim",
  "evidence": "Attacker can read victim invoice"
}
Important distinctionFinding does not mean reportable yet
Hacker Bob logo Hacker Bob
04

Pipeline step

CHAIN

Bob checks whether separate signals combine into stronger impact.

Phase 4 of 7Output: chain-attempts.jsonl
Hacker Bob logo Hacker Bob

Phase 4: CHAIN

Bob asks: can this become worse?

The chain-builder tests whether findings, auth context, traffic, and handoff notes combine into stronger impact.

Outputchain-attempts.jsonl
Hacker Bob logo Hacker Bob
Example: chain-attempts.jsonl linesimplified
{
  "id": "C-1",
  "finding_ids": ["F-1", "F-2"],
  "hypothesis": "IDOR exposes billing data",
  "outcome": "confirmed",
  "impact": "Victim invoice and plan data readable"
}

JSON example

Every chain attempt is recorded.

Confirmed, denied, blocked, and not-applicable outcomes all matter.

Why it mattersBob records negative results too
Hacker Bob logo Hacker Bob
05

Pipeline step

VERIFY

Bob tries to kill weak bugs before they reach the report.

Phase 5 of 7Output: verification rounds
Hacker Bob logo Hacker Bob

Phase 5: VERIFY

Most weak bugs should die here.

Bob argues with itself before it writes anything report-like.

Findings
Brutalist review
Balanced review
Final replay
Reportable survivors
Verification flowSkeptical first, balanced second, final replay last
Hacker Bob logo Hacker Bob

Verification system

Three agents challenge the same findings.

1 BrutalistTry to kill it

Skeptical replay. Deny weak proof and downgrade loose severity.

2 BalancedCatch mistakes

Review brutalist decisions for false negatives and over-corrections.

3 Final replayFresh proof only

Re-run reportable survivors with fresh requests before evidence collection.

Artifactsbrutalist.json -> balanced.json -> verified-final.json
Hacker Bob logo Hacker Bob

JSON example

Final verification decides survivors.

Only final reportable findings move into evidence collection.

Example: verified-final.jsonsimplified
{
  "round": "final",
  "results": [
    {
      "finding_id": "F-1",
      "status": "confirmed",
      "reportable": true,
      "confidence": "high"
    }
  ]
}
GateNo final reportable finding, no evidence requirement
Hacker Bob logo Hacker Bob

Evidence

A report needs receipts.

Bob collects bounded, redacted evidence packs for final reportable findings.

redacted bounded per finding required before grade
Outputevidence-packs.json
Hacker Bob logo Hacker Bob
Example: evidence-packs.jsonsimplified
{
  "packs": [
    {
      "finding_id": "F-1",
      "sample_count": 3,
      "redacted": true,
      "summary": "3 victim invoices readable"
    }
  ]
}

JSON example

Evidence is separate from the finding.

That keeps the report grounded in replayable proof, not just agent notes.

SafetySecrets and auth material are redacted
Hacker Bob logo Hacker Bob
06

Pipeline step

GRADE

Bob decides whether the result is ready, needs more work, or should be skipped.

Phase 6 of 7Output: grade.json
Hacker Bob logo Hacker Bob

Phase 6: GRADE

Bob makes a quality decision.

The verdict is simple: submit, hold, or skip.

SUBMIT
Ready

Verified, evidenced, and worth reporting.

HOLD
Needs more work

Return to evaluating with feedback.

SKIP
Not reportable

Close out with durable reasoning.

Outputgrade.json
Hacker Bob logo Hacker Bob

JSON example

The grade explains the decision.

Bob does not just say yes or no. It records why.

Example: grade.jsonsimplified
{
  "verdict": "SUBMIT",
  "total_score": 82,
  "finding_ids": ["F-1"],
  "reason": "Confirmed IDOR with evidence"
}
Decision artifactUsed by the report writer
Hacker Bob logo Hacker Bob
07

Pipeline step

REPORT

Bob writes the final output from verified, evidenced, graded data.

Phase 7 of 7Output: report.md
Hacker Bob logo Hacker Bob

Phase 7: REPORT

The report is the last step, not the first draft.

It is built from final verification, evidence packs, chain attempts, and grade verdict.

Outputreport.md
Hacker Bob logo Hacker Bob

Why Bob stays coherent

Guardrails are built into the workflow.

HooksWrite guard

Agents cannot directly edit MCP-owned state files.

GatesPhase rules

Bob cannot skip required artifacts and evidence.

TelemetryDebug trail

Status and debug commands show what got stuck.

Reliability modelConstraints make autonomy inspectable
Hacker Bob logo Hacker Bob

Operational visibility

Bob can explain where it is stuck.

Status and debug commands read telemetry and artifacts instead of guessing from chat history.

/bob-statusWhere are we?

Phase, wave, findings, evidence, grade, and next command.

/bob-debugWhat went wrong?

Stale waves, missing handoffs, failed tools, policy stalls, and report trust.

Debug modelTelemetry first, artifacts second
Hacker Bob logo Hacker Bob

Important boundary

Bob automates work. The human owns authorization.

Bob can send requests, use accounts, and prepare reports. The operator decides scope and submission.

Bob handles

  • Workflow
  • Agents
  • Evidence
  • Reports

Operator owns

  • Authorization
  • Scope rules
  • Data handling
  • Submission
Event wordingAuthorized testing only
Hacker Bob logo Hacker Bob

Set expectations

Bob is not magic authorization.

It is disciplined automation for targets, accounts, and methods the operator is allowed to test.

not auto-submit not scope approval not a replacement for judgment
Clear boundaryHuman judgment stays in the loop
Hacker Bob logo Hacker Bob

How it is installed

Bob installs into one Claude Code project.

The package copies commands, agents, hooks, settings, and the local MCP runtime.

Install and runhigh level
npx -y hacker-bob@latest install /path/to/project
hacker-bob doctor /path/to/project

# inside Claude Code
/bob-evaluate target.com
Canonical packagehacker-bob
Hacker Bob logo Hacker Bob

Recap

6

things to remember

Pipeline Context budget MCP memory Agents Evidence gates Human control
QR code for https://github.com/vmihalis/hacker-bob

github.com/vmihalis/hacker-bob

Closing slideScan for the repo and install notes
Arrow keys, space, F for fullscreen