Preflight Review Summary
Four parallel agent reviews (Eng, Design, QA, Security) flagged critical contradictions in the initial plan. All have been resolved. This document reflects the corrected MVP scope.
🔧 Blockers Resolved
- Log4Shell contradiction: Java library, not Node.js. Replaced with Prototype Pollution RCE (Node.js native).
- Sandbox isolation vs. exploit mechanism: No longer required. Prototype Pollution works in isolated sandbox.
- Claude API fallback: Removed. Nosana-only patch generation. Cold start risk accepted.
- Timeline realism: 5h timeline is aggressive; 3-person split mitigates. Prioritization: CLI + Exploit verify → Bot/Dashboard → Polish.
- Test coverage: Zero automated tests for MVP. Smoke tests on demo path only. Pre-record fallback video.
✓ Corrected MVP Scope
- Demo CVE 1: CVE-2023-44487 (HTTP/2 Rapid Reset) — DoS, genuine Node.js vulnerability
- Demo CVE 2: Prototype Pollution RCE — Code execution, Node.js ecosystem
- Patch Generation: Nosana LLM only (no Claude fallback). If cold start > 60s, show "Nosana unavailable" gracefully.
- GitHub Bot: Real OAuth + PR comments + auto-fix PR creation (write access required for branches).
- Dashboard: Functional React (Technical + Executive views). Unlisted URL (no auth). Known limitation for MVP.
- Testing: Manual rehearsal + fallback video. No automated test suite.
Problem & Solution (PRD Snapshot)
The Problem
60% of data breaches in 2026 involve vulnerabilities for which a patch was already available. SAST tools like Snyk identify CVEs theoretically present in dependencies, but cannot confirm whether a specific codebase is actually exploitable. Engineers deprioritize patches based on theoretical risk, leading to breaches costing an average of $4.9M per incident.
The Solution
An autonomous AI agent that confirms which CVEs are actually exploitable in a target codebase by:
- Running known PoC exploits in isolated Daytona sandboxes
- Verifying success/failure (Confirmed Exploitable vs. Theoretical Risk)
- Generating verified patches using Nosana LLM
- Providing results via CLI, GitHub Bot, CI/CD, and MCP
Business Impact
Ship security fixes faster with confidence. Every confirmed exploitable CVE fixed = $4.9M breach cost averted.
The "Wow Moment" (Demo Day)
2-minute live demo: User runs codeprobe scan <demo-repo>
- Progress output: "Scraping CVE databases... Found 14 vulnerabilities" (Bright Data + cache)
- Sandbox spawning: "Spinning up 2 isolated sandboxes for CRITICAL CVEs..." (Daytona)
- Live exploit execution: ✓ CVE-2023-44487: CONFIRMED EXPLOITABLE (DoS in 2.1s)
- Patch generation: "Generating patches..." (Nosana LLM)
- Report: Risk Score: 8.2/10 | Confirmed: 1 | Theoretical: 13 | Estimated breach cost: $4.9M
- Dashboard: Judges see full CVE breakdown + business impact in browser
3-Workstream Split (5-Hour Build)
Each workstream is assigned to one person. Workstream 2 (Verify-Fix) is demo-critical and must not be split. Clock starts at 10:30 AM. Submit by 4:30 PM (6 hours with 30min buffer).
- Dependency parser (extract versions from package.json, package-lock.json)
- Bright Data scraper (async CVE database scraping, NVD + Exploit-DB)
- CVE matcher (semver matching of deps to CVEs)
- Report builder (JSON output)
- API: POST /api/scan, GET /api/scan/:id
- Bright Data API key + sandbox proof
- Demo repo (Node.js, intentional vulns)
- Zod for validation
- Daytona orchestrator: Spawn isolated Node.js containers, install vulnerable packages, run PoCs
- Exploit runner: Inject PoC scripts (HTTP/2 Rapid Reset + Prototype Pollution), capture output/errors
- Verification logic: Success detection (exploit output patterns, exit codes)
- Nosana integration: Call LLM to generate patches, validate syntax
- Fallback: Nosana cold start > 60s → show "unavailable" (don't fall back to Claude)
- Retry logic: Sandbox crash → retry once, mark as "Verification Failed" on 2nd failure
Prototype Pollution RCE: Node.js object/array prototype override. PoC: pollution via vulnerable package input, execute code via Object.defineProperty. Verify: code execution in stdout.
- Daytona API key + sandbox pool tested
- Nosana GPU container or API key
- PoC scripts for both CVEs (pre-written)
- Demo repo with vulnerable packages installed
- Workstream 1 output (CVE list)
- Bun CLI entry point:
codeprobe scan <repo> - Real-time progress output (colors, progress indicator)
- Terminal table: CVE list with exploit status (✓ Confirmed / ✗ Theoretical)
- Risk score + business impact messaging
- Config file: ~/.codeprobe/config (API keys, auth token)
- Optional: `codeprobe scan --fix` (generate patches, push to branch)
- React + Vite (TailwindCSS)
- Technical view: CVE table, severity, package version, PoC evidence, patch diffs
- Executive view: Risk score (0–10 gauge), $4.9M impact callout, count of confirmed/theoretical
- View switcher (tabs or button)
- Scan metadata: timestamp, duration, repo URL
- Known limitation: Unlisted URL, no authentication (MVP)
- GitHub App (webhook for PR events)
- OAuth flow for CLI: `codeprobe init` (opens browser, user authorizes)
- PR comment: scan results table + business impact
- "Auto-fix available" button → triggers patch generation → opens new PR
- New PR title: `[CodeProbe] Fix 2 confirmed CVEs (PR #123)`
- CLI: Bun binary with Chalk colors + terminal output
- Dashboard: Vite React app (served static or S3)
- Bot: GitHub App handler (webhook receiver + PR commenter)
- Workstream 1 + 2 modules (loaded as libraries)
- GitHub App credentials (pre-created)
- Demo repo URL (for test PR)
- S3 bucket or local file storage for scan results
5-Hour Timeline
⏱️ Time Crunch Cuts (If Running Behind)
- Hour 4.5, 15 min behind: Skip `--fix` flag. Keep `scan` only.
- Hour 4.5, 30 min behind: Skip GitHub bot auto-fix PR creation. Keep PR comment only.
- Hour 4.5, 45 min behind: Skip Executive view dashboard. Show Technical view only + static $4.9M message.
- Hour 5, any time: Use pre-recorded demo video.
Demo Success Criteria
Must Have (Demo Will Not Work Without)
- Working CLI:
codeprobe scan <demo-repo>executes end-to-end - Live Bright Data CVE scraping (or cached fallback shown)
- Daytona sandbox spawning + exploit execution visible
- At least 1 confirmed exploitable CVE shown (✓ status)
- Patch generated + displayed (from Nosana, or example)
- Terminal output is colored, readable, impressive
Should Have (Strong Demo)
- GitHub bot PR comment (real OAuth, real webhook)
- Dashboard view: both Technical + Executive visible
- Business impact messaging: "$4.9M breach cost" clearly stated
- Risk score displayed (0–10 gauge or number)
- 2 confirmed exploitable CVEs (both demo CVEs)
Nice to Have (Impressive Demo)
- Auto-fix PR creation (working branch push + PR open)
- Dashboard responsive on mobile (judges may view on phone)
- Dark mode polish (smooth transitions, accent colors)
- Real-time progress animation during scan
Known Limitations (Documented for Judges)
| Aspect | MVP Scope | Why (Reason) |
|---|---|---|
| Node.js Only | Python, Rust, Java support cut | 5-hour MVP; N-language support is post-launch work |
| Dashboard Auth | Unlisted URL (no login required) | OAuth adds 1–2 hours; acceptable for internal demo |
| Test Coverage | Manual demo only; no automated tests | Full test suite = 20+ hours; hackathon prioritizes demo |
| Patch Validation | Nosana output trusted; no re-exploit verification | Validation loop adds 2–3 hours; accepted risk for demo |
| MCP Server | Skipped | Nice-to-have; CLI + Bot demonstrate agent integration |
| CI/CD Action | Skipped | Time trade-off; bot + dashboard show multi-interface capability |
| Nosana Cold Start | If > 60s, show "unavailable" (no Claude fallback) | Transparent about limitations; pure Nosana branding |
Risks & Fallback Plans
| Risk | Likelihood | Fallback |
|---|---|---|
| Bright Data rate-limited | Medium | Pre-cache demo repo CVE data. Show "Using cached CVE data" message. |
| Daytona sandbox timeout | Low | Retry once. If still fails, mark CVE as "Verification Failed". Continue scan. |
| Nosana cold start > 60s | Medium | Show "Nosana unavailable" gracefully. Display example patch instead. |
| GitHub OAuth fails | Low | Manual token input. Users provide PAT (Personal Access Token). |
| Demo CVE PoC fails | Low | Pre-record 2-minute working video. Play video if live fails. |
| Dashboard React build fails | Low | Show static HTML screenshot. Skip interactive demo. |
| Time overrun (30+ min behind) | Medium | Use pre-recorded video + screenshots of full system. Focus demo on exploit execution. |