MVP Execution Plan

5-Hour Hackathon Build | AgentForge SG Super AI Edition

PREFLIGHT: HUMAN-IN-LOOP → RESOLVED

Preflight Review Summary

Four parallel agent reviews (Eng, Design, QA, Security) flagged critical contradictions in the initial plan. All have been resolved. This document reflects the corrected MVP scope.

Eng Confidence
85/100
Architecture sound after CVE fix
Design Confidence
82/100
Scope revised; interactions defined
QA Confidence
75/100
Smoke tests; fallbacks pre-tested
Security Confidence
85/100
Known risks documented

🔧 Blockers Resolved

  1. Log4Shell contradiction: Java library, not Node.js. Replaced with Prototype Pollution RCE (Node.js native).
  2. Sandbox isolation vs. exploit mechanism: No longer required. Prototype Pollution works in isolated sandbox.
  3. Claude API fallback: Removed. Nosana-only patch generation. Cold start risk accepted.
  4. Timeline realism: 5h timeline is aggressive; 3-person split mitigates. Prioritization: CLI + Exploit verify → Bot/Dashboard → Polish.
  5. Test coverage: Zero automated tests for MVP. Smoke tests on demo path only. Pre-record fallback video.

✓ Corrected MVP Scope

  • Demo CVE 1: CVE-2023-44487 (HTTP/2 Rapid Reset) — DoS, genuine Node.js vulnerability
  • Demo CVE 2: Prototype Pollution RCE — Code execution, Node.js ecosystem
  • Patch Generation: Nosana LLM only (no Claude fallback). If cold start > 60s, show "Nosana unavailable" gracefully.
  • GitHub Bot: Real OAuth + PR comments + auto-fix PR creation (write access required for branches).
  • Dashboard: Functional React (Technical + Executive views). Unlisted URL (no auth). Known limitation for MVP.
  • Testing: Manual rehearsal + fallback video. No automated test suite.

Problem & Solution (PRD Snapshot)

The Problem

60% of data breaches in 2026 involve vulnerabilities for which a patch was already available. SAST tools like Snyk identify CVEs theoretically present in dependencies, but cannot confirm whether a specific codebase is actually exploitable. Engineers deprioritize patches based on theoretical risk, leading to breaches costing an average of $4.9M per incident.

The Solution

An autonomous AI agent that confirms which CVEs are actually exploitable in a target codebase by:

  • Running known PoC exploits in isolated Daytona sandboxes
  • Verifying success/failure (Confirmed Exploitable vs. Theoretical Risk)
  • Generating verified patches using Nosana LLM
  • Providing results via CLI, GitHub Bot, CI/CD, and MCP

Business Impact

Ship security fixes faster with confidence. Every confirmed exploitable CVE fixed = $4.9M breach cost averted.

The "Wow Moment" (Demo Day)

2-minute live demo: User runs codeprobe scan <demo-repo>

  1. Progress output: "Scraping CVE databases... Found 14 vulnerabilities" (Bright Data + cache)
  2. Sandbox spawning: "Spinning up 2 isolated sandboxes for CRITICAL CVEs..." (Daytona)
  3. Live exploit execution: ✓ CVE-2023-44487: CONFIRMED EXPLOITABLE (DoS in 2.1s)
  4. Patch generation: "Generating patches..." (Nosana LLM)
  5. Report: Risk Score: 8.2/10 | Confirmed: 1 | Theoretical: 13 | Estimated breach cost: $4.9M
  6. Dashboard: Judges see full CVE breakdown + business impact in browser

3-Workstream Split (5-Hour Build)

Each workstream is assigned to one person. Workstream 2 (Verify-Fix) is demo-critical and must not be split. Clock starts at 10:30 AM. Submit by 4:30 PM (6 hours with 30min buffer).

1
Find Vulns
Dependency analysis & CVE discovery
Person
Backend engineer (Node.js parser, API design)
Scope
  • Dependency parser (extract versions from package.json, package-lock.json)
  • Bright Data scraper (async CVE database scraping, NVD + Exploit-DB)
  • CVE matcher (semver matching of deps to CVEs)
  • Report builder (JSON output)
  • API: POST /api/scan, GET /api/scan/:id
Deliverable
Node.js module that takes a repo path, returns JSON with CVEs.
1.5 hours (core) + 0.5h (testing)
Dependencies
Requires:
  • Bright Data API key + sandbox proof
  • Demo repo (Node.js, intentional vulns)
  • Zod for validation
Risks
Bright Data rate-limited. Mitigation: pre-cache CVE data for demo repo.
2
Verify & Fix
Exploit execution + patch generation (DEMO-CRITICAL)
Person
Lead engineer (orchestration, sandbox, exploit verification)
Scope
  • Daytona orchestrator: Spawn isolated Node.js containers, install vulnerable packages, run PoCs
  • Exploit runner: Inject PoC scripts (HTTP/2 Rapid Reset + Prototype Pollution), capture output/errors
  • Verification logic: Success detection (exploit output patterns, exit codes)
  • Nosana integration: Call LLM to generate patches, validate syntax
  • Fallback: Nosana cold start > 60s → show "unavailable" (don't fall back to Claude)
  • Retry logic: Sandbox crash → retry once, mark as "Verification Failed" on 2nd failure
Deliverable
Orchestrator module + exploit runner. Takes CVE list, returns results with exploit evidence + patches.
2.5 hours (core exploit) + 1h (Nosana + fallback)
Demo CVEs (Pre-tested)
CVE-2023-44487 (HTTP/2 Rapid Reset): Node.js http2 module, DoS via stream reset flood. PoC: send rapid RST_STREAM frames. Verify: connection drops or timeout.

Prototype Pollution RCE: Node.js object/array prototype override. PoC: pollution via vulnerable package input, execute code via Object.defineProperty. Verify: code execution in stdout.
Dependencies
Requires:
  • Daytona API key + sandbox pool tested
  • Nosana GPU container or API key
  • PoC scripts for both CVEs (pre-written)
  • Demo repo with vulnerable packages installed
  • Workstream 1 output (CVE list)
CRITICAL
This is the "wow moment." Demo relies 100% on live exploit execution. Pre-record a 2-minute working video as insurance.
3
Surfaces
CLI + Dashboard + GitHub Bot
Person
Full-stack engineer (CLI, React dashboard, GitHub integration)
Scope A: CLI
  • Bun CLI entry point: codeprobe scan <repo>
  • Real-time progress output (colors, progress indicator)
  • Terminal table: CVE list with exploit status (✓ Confirmed / ✗ Theoretical)
  • Risk score + business impact messaging
  • Config file: ~/.codeprobe/config (API keys, auth token)
  • Optional: `codeprobe scan --fix` (generate patches, push to branch)
Scope B: Dashboard
  • React + Vite (TailwindCSS)
  • Technical view: CVE table, severity, package version, PoC evidence, patch diffs
  • Executive view: Risk score (0–10 gauge), $4.9M impact callout, count of confirmed/theoretical
  • View switcher (tabs or button)
  • Scan metadata: timestamp, duration, repo URL
  • Known limitation: Unlisted URL, no authentication (MVP)
Scope C: GitHub Bot
  • GitHub App (webhook for PR events)
  • OAuth flow for CLI: `codeprobe init` (opens browser, user authorizes)
  • PR comment: scan results table + business impact
  • "Auto-fix available" button → triggers patch generation → opens new PR
  • New PR title: `[CodeProbe] Fix 2 confirmed CVEs (PR #123)`
Deliverable
  • CLI: Bun binary with Chalk colors + terminal output
  • Dashboard: Vite React app (served static or S3)
  • Bot: GitHub App handler (webhook receiver + PR commenter)
1h (CLI) + 1.5h (Dashboard) + 1h (Bot) = 3.5h
Dependencies
Requires:
  • Workstream 1 + 2 modules (loaded as libraries)
  • GitHub App credentials (pre-created)
  • Demo repo URL (for test PR)
  • S3 bucket or local file storage for scan results
Risks
Dashboard scope creep. Two full React views in 1.5 hours is tight. Prioritize: Technical view (CVE table) first, Executive (gauge + $4.9M message) second. If time runs out, Executive view becomes a static screenshot.

5-Hour Timeline

10:30 – 10:45 (15m)
Kickoff & Setup: Distribute workstreams, verify API keys (Bright Data, Daytona, Nosana), test sandbox spawn, confirm demo repo is ready.
10:45 – 12:15 (1.5h)
Parallel build: WS1 builds parser + Bright Data, WS2 sets up Daytona + PoC injection, WS3 initializes Bun CLI project.
12:15 – 13:45 (1.5h)
Integration: WS1 + WS2 merge. CLI skeleton calls both. Dashboard Vite build starts. Bot OAuth flow tested offline.
13:45 – 14:45 (1h)
Demo path E2E: Full CLI scan on demo repo. Both CVEs execute. Results flow through dashboard. GitHub bot posts comment on test PR.
14:45 – 15:30 (45m)
Polish & Fallbacks: Terminal output formatting. Dashboard styling. Pre-record exploit video backup. Test Nosana cold-start fallback.
15:30 – 16:30 (1h)
Rehearsal & Buffer: Dry-run 2-minute demo 3x. Troubleshoot issues. Prepare backup video. Final checks.
16:30
SUBMIT

⏱️ Time Crunch Cuts (If Running Behind)

  1. Hour 4.5, 15 min behind: Skip `--fix` flag. Keep `scan` only.
  2. Hour 4.5, 30 min behind: Skip GitHub bot auto-fix PR creation. Keep PR comment only.
  3. Hour 4.5, 45 min behind: Skip Executive view dashboard. Show Technical view only + static $4.9M message.
  4. Hour 5, any time: Use pre-recorded demo video.

Demo Success Criteria

Must Have (Demo Will Not Work Without)

  • Working CLI: codeprobe scan <demo-repo> executes end-to-end
  • Live Bright Data CVE scraping (or cached fallback shown)
  • Daytona sandbox spawning + exploit execution visible
  • At least 1 confirmed exploitable CVE shown (✓ status)
  • Patch generated + displayed (from Nosana, or example)
  • Terminal output is colored, readable, impressive

Should Have (Strong Demo)

  • GitHub bot PR comment (real OAuth, real webhook)
  • Dashboard view: both Technical + Executive visible
  • Business impact messaging: "$4.9M breach cost" clearly stated
  • Risk score displayed (0–10 gauge or number)
  • 2 confirmed exploitable CVEs (both demo CVEs)

Nice to Have (Impressive Demo)

  • Auto-fix PR creation (working branch push + PR open)
  • Dashboard responsive on mobile (judges may view on phone)
  • Dark mode polish (smooth transitions, accent colors)
  • Real-time progress animation during scan

Known Limitations (Documented for Judges)

Aspect MVP Scope Why (Reason)
Node.js Only Python, Rust, Java support cut 5-hour MVP; N-language support is post-launch work
Dashboard Auth Unlisted URL (no login required) OAuth adds 1–2 hours; acceptable for internal demo
Test Coverage Manual demo only; no automated tests Full test suite = 20+ hours; hackathon prioritizes demo
Patch Validation Nosana output trusted; no re-exploit verification Validation loop adds 2–3 hours; accepted risk for demo
MCP Server Skipped Nice-to-have; CLI + Bot demonstrate agent integration
CI/CD Action Skipped Time trade-off; bot + dashboard show multi-interface capability
Nosana Cold Start If > 60s, show "unavailable" (no Claude fallback) Transparent about limitations; pure Nosana branding

Risks & Fallback Plans

Risk Likelihood Fallback
Bright Data rate-limited Medium Pre-cache demo repo CVE data. Show "Using cached CVE data" message.
Daytona sandbox timeout Low Retry once. If still fails, mark CVE as "Verification Failed". Continue scan.
Nosana cold start > 60s Medium Show "Nosana unavailable" gracefully. Display example patch instead.
GitHub OAuth fails Low Manual token input. Users provide PAT (Personal Access Token).
Demo CVE PoC fails Low Pre-record 2-minute working video. Play video if live fails.
Dashboard React build fails Low Show static HTML screenshot. Skip interactive demo.
Time overrun (30+ min behind) Medium Use pre-recorded video + screenshots of full system. Focus demo on exploit execution.