# Multi-Round Audit Report: cs-ai/research-harnesses

Generated: 2026-06-05T10:03:39+00:00

## Verdict: needs work

## Submission readiness

- Status: blocked
- Requirement mode: draft audit with final-readiness blockers
- Blocker: 2 evidence row(s) have unclear source depth; mark them full-text verified or preliminary.
- Blocker: 2 preliminary-linked claim(s) remain; do not promote to final support.
- Blocker: Brief mixes supported and preliminary-linked claims; make final vs draft language explicit.
- Blocker: Paper mentions 'peer-reviewed' without an explicit disclaimer that the draft itself is not peer-reviewed.
- Blocker: 1 full-text supported row(s) do not surface their source_quote in the paper body: 2603.28589v1.
- Blocker: Demo proof score 0.75 below required floor 0.8 (6/8 claims independently re-verified against cached PDFs).
- Blocker: Claim not independently re-verified: 2504.08066v1 (overlap=1.0, substring=True).
- Blocker: Claim not independently re-verified: 2506.01372v2 (overlap=1.0, substring=False).
- Blocker: correctness detail: 2504.08066v1: missing source_quote/page/checked_at
- Blocker: correctness detail: 2506.01372v2: missing source_quote/page/checked_at

## Audit artifacts

- Run directory: `workspace/cs-ai/research-harnesses/audit_runs/2026-06-05T10-03-39-00-00`
- Per-round JSON: `round_1.json` ... `round_13.json`
- Input hashes: `input_hashes.json`

| Round | Check | Verdict | Issues | Warnings |
| ---: | --- | --- | ---: | ---: |
| 1 | Ledger integrity | pass | 0 | 0 |
| 2 | Evidence depth and numerical discipline | needs work | 0 | 1 |
| 3 | Paper quality and framing | pass | 0 | 0 |
| 4 | Coverage, taxonomy leakage, and missing-literature risk | pass | 0 | 0 |
| 5 | Claim calibration and submission readiness | needs work | 0 | 2 |
| 6 | Positive-signal floor | pass | 0 | 0 |
| 7 | Academic format and scholarly correctness | needs work | 0 | 2 |
| 8 | Demo and proof (independent re-verification) | needs work | 0 | 3 |
| 9 | Direction coherence (anti-boilerplate-leak) | pass | 0 | 0 |
| 10 | Research-value (gap/contradiction/surprise/recency) | pass | 0 | 0 |
| 11 | System correctness (claim→quote→PDF) | needs work | 0 | 2 |
| 12 | Cross-model reviewer committee | pass | 0 | 0 |
| 13 | Citation integrity (cited→cached metadata) | pass | 0 | 0 |

## Evidence profile

- Filled evidence rows: 8
- Full-text verified rows: 6
- Preliminary / abstract-derived rows: 0
- Source-depth unclear rows: 2

## Round details

### Round 1: Ledger integrity — pass

**Notes**

- claim_rows=8
- supported_claims=6
- preliminary_claims=2
- filled_evidence_rows=8
- This round checks structure and status calibration: supported means full-text verified; preliminary-linked means traceable draft evidence.

### Round 2: Evidence depth and numerical discipline — needs work

**Warnings**

- 2 evidence row(s) have unclear source depth; mark them full-text verified or preliminary.

**Notes**

- full_text_verified=6/8
- preliminary_or_abstract=0/8
- unclear_source_depth=2/8

### Round 3: Paper quality and framing — pass

**Notes**

- paper=workspace/cs-ai/research-harnesses/paper/main.md
- finding_sections=3
- filled_evidence_rows=8

### Round 4: Coverage, taxonomy leakage, and missing-literature risk — pass

**Notes**

- triage_rows=8
- claimed_evidence_rows=8
- target_categories=cs.AI, cs.LG, cs.SE
- Coverage gaps still require human/domain reviewer search beyond arXiv metadata.

### Round 5: Claim calibration and submission readiness — needs work

**Warnings**

- 2 preliminary-linked claim(s) remain; do not promote to final support.
- Brief mixes supported and preliminary-linked claims; make final vs draft language explicit.

**Notes**

- claim_rows=8
- supported_claims=6
- preliminary_claims=2
- draft_only_claims=0
- unsupported_claims=0

### Round 6: Positive-signal floor — pass

**Notes**

- numeric_result_rows=7/8 (floor=2)
- comparative_rows=8/8 (floor=1)
- unique_cited_papers=8 (floor=3)
- correctness_score=0.793 (floor=0.5)
- novelty_score=0.979 (floor=0.35)

### Round 7: Academic format and scholarly correctness — needs work

**Warnings**

- Paper mentions 'peer-reviewed' without an explicit disclaimer that the draft itself is not peer-reviewed.
- 1 full-text supported row(s) do not surface their source_quote in the paper body: 2603.28589v1.

**Notes**

- paper=workspace/cs-ai/research-harnesses/paper/main.md
- abstract_words=131
- total_words=3836
- references_listed=8
- missing_format_sections=none

### Round 8: Demo and proof (independent re-verification) — needs work

**Warnings**

- Demo proof score 0.75 below required floor 0.8 (6/8 claims independently re-verified against cached PDFs).
- Claim not independently re-verified: 2504.08066v1 (overlap=1.0, substring=True).
- Claim not independently re-verified: 2506.01372v2 (overlap=1.0, substring=False).

**Notes**

- demo=workspace/cs-ai/research-harnesses/paper/demo.py
- proof=workspace/cs-ai/research-harnesses/paper/proof.json
- proof_score=0.75
- passed=6/8
- verdict=pass

### Round 9: Direction coherence (anti-boilerplate-leak) — pass

**Notes**

- direction_id=unknown
- family=agents
- keywords_checked=0
- keyword_hits=0
- cross_family_leaks=0

### Round 10: Research-value (gap/contradiction/surprise/recency) — pass

**Notes**

- value_score=0.944 threshold=0.35
- gap_count=71 contradictions=30 surprises=23 recent_papers=5/8
- components gap=1.0 contradiction=1.0 surprise=1.0 recency=0.625

### Round 11: System correctness (claim→quote→PDF) — needs work

**Warnings**

- correctness detail: 2504.08066v1: missing source_quote/page/checked_at
- correctness detail: 2506.01372v2: missing source_quote/page/checked_at

**Notes**

- correctness_score=0.793 floor=0.55
- rows_scored=8
- pdfs_missing=0
- quote_in_pdf_avg=1.0
- claim_support_avg=0.478
- locator_present_avg=0.75

### Round 12: Cross-model reviewer committee — pass

**Notes**

- LLM disabled — cross-model jury skipped (deterministic baseline).

### Round 13: Citation integrity (cited→cached metadata) — pass

**Notes**

- citations_checked=8
- fabricated=0 year_mismatch=0 title_drift=0
- cached_corpus_size=8
- This round is deterministic: it cross-checks printed citations against cached arXiv metadata only.

## Interpretation

- `pass` means the deterministic audit found no structural, source-depth, taxonomy, or paper-quality warnings.
- `needs work` means the draft is traceable but still needs full-text/source-depth, taxonomy, or quality cleanup.
- `unsupported` means claims or required artifacts are missing or inconsistent enough to block trust.
- `submission_readiness=blocked` means the draft must not be treated as final or deployed as ready, even if it is useful as a transparent draft.
