CacheLane Recorded Benchmark Validation — 2026-06-06
======================================================

Run command: npm run benchmark:recorded
  (tsx scripts/benchmark/run-recorded.ts --provider fake --run-id recorded-local --markdown)

Run ID:          recorded-local
Generated at:    2026-06-06T18:01:10.480Z
Provider:        fake
Model:           claude-opus-4-7
Report path:     benchmark/runs/recorded-local/benchmark-report.json


VALIDATION RESULT: PASS
-----------------------
No NaN values detected.
No negative cost values detected.
effective_cost_units <= baseline_cost_units for ALL 7 scenarios (invariant holds).


TOTALS
------
  sessions:              7
  turns:                 7
  blocks:                10
  input_tokens:          409
  cache_read_tokens:     0
  baseline_cost_units:   409
  effective_cost_units:  409
  pruned_blocks:         0
  keepalive_pings:       0
  savings_ratio:         0.0
  cache_hit_ratio:       0.0


PER-SCENARIO BREAKDOWN
----------------------
scenario                           turns  blocks  input_tokens  baseline  effective  savings  cache_hit
debug-failing-test                 1      1       46            46        46         0.0%     0.0%
edit-file-after-context            1      1       54            54        54         0.0%     0.0%
ignore-irrelevant-blocks           1      1       39            39        39         0.0%     0.0%
inspect-multiple-files             1      2       73            73        73         0.0%     0.0%
multi-block-reference-and-negative 1      3       97            97        97         0.0%     0.0%
quote-tool-output                  1      1       38            38        38         0.0%     0.0%
read-summarize-file                1      1       62            62        62         0.0%     0.0%


CRITERIA EVALUATION
-------------------
[N/A]   cache_hit_ratio > 0.5 for sessions > 5 turns:
        No sessions with >5 turns exist in this run. Threshold not applicable.

[N/A]   savings_ratio > 0.3 for sessions with repeated stable content:
        No sessions with >1 turn exist. The fake provider produces single-turn traces
        per scenario. Cache savings require >=2 turns with repeated stable blocks
        (prefix region). savings_ratio=0 is STRUCTURALLY CORRECT, not a bug.

[PASS]  No NaN or negative cost values: confirmed across all 7 scenarios.

[PASS]  effective_cost_units <= baseline_cost_units for every turn/scenario: confirmed.


EXPLANATION OF 0% SAVINGS
--------------------------
The benchmark scenarios (benchmark/scenarios/*.json) are single-turn fixtures used
to validate reference detection and block classification. The fake provider generates
one normalized trace per scenario, each with exactly 1 turn. CacheLane's cache-aware
orchestration and K-pruning only produce measurable savings across multi-turn sessions
where the stable prefix region is preserved and re-read by the model. With 1-turn
sessions there is no prior context to cache, so savings_ratio=0 and cache_hit_ratio=0
are the correct and expected outputs.

The recorded benchmark is the CORRECTNESS harness (validates cost math is not broken),
not the savings demonstration. Live A/B benchmarks (scripts/benchmark/live-ab-test.ts)
are the appropriate vehicle for demonstrating savings on real multi-turn sessions.


RAW CONSOLE OUTPUT (npm run benchmark:recorded)
-----------------------------------------------
> cachelane@1.0.0 benchmark:recorded
> tsx scripts/benchmark/run-recorded.ts --provider fake --run-id recorded-local --markdown

[cachelane] reference detector { detected: 1, of: 1, by_type: { tool_call: 1, id_mention: 0, text_quote: 0 } }
[cachelane] reference detector { detected: 1, of: 2, by_type: { tool_call: 1, id_mention: 0, text_quote: 0 } }
[cachelane] reference detector { detected: 1, of: 1, by_type: { tool_call: 1, id_mention: 0, text_quote: 0 } }
[cachelane] reference detector { detected: 1, of: 1, by_type: { tool_call: 1, id_mention: 0, text_quote: 0 } }
[cachelane] reference detector { detected: 1, of: 1, by_type: { tool_call: 1, id_mention: 0, text_quote: 0 } }
{
  "run_id": "recorded-local",
  "run_dir": "/home/aditya/Repos/CacheLane/benchmark/runs/recorded-local",
  "normalized_dir": "/home/aditya/Repos/CacheLane/benchmark/runs/recorded-local/normalized",
  "report_path": "/home/aditya/Repos/CacheLane/benchmark/runs/recorded-local/benchmark-report.json",
  "markdown_path": "/home/aditya/Repos/CacheLane/benchmark/runs/recorded-local/BENCHMARK-REPORT.md",
  "totals": {
    "input_tokens": 409,
    "cache_read_tokens": 0,
    "baseline_cost_units": 409,
    "effective_cost_units": 409,
    "pruned_blocks": 0,
    "keepalive_pings": 0,
    "savings_ratio": 0,
    "cache_hit_ratio": 0
  }
}
