# Multi-Round Audit Report: medicine-bio/medical-ai

Generated: 2026-06-06T10:02:53+00:00

## Verdict: needs work

## Submission readiness

- Status: blocked
- Requirement mode: draft audit with final-readiness blockers
- Blocker: 2 evidence row(s) have unclear source depth; mark them full-text verified or preliminary.
- Blocker: 2 preliminary-linked claim(s) remain; do not promote to final support.
- Blocker: Brief mixes supported and preliminary-linked claims; make final vs draft language explicit.
- Blocker: 1 full-text supported row(s) do not surface their source_quote in the paper body: 2109.02722v2.
- Blocker: Demo proof score 0.714 below required floor 0.8 (5/7 claims independently re-verified against cached PDFs).
- Blocker: Claim not independently re-verified: 1803.08691v1 (overlap=1.0, substring=False).
- Blocker: Claim not independently re-verified: 2212.08228v2 (overlap=1.0, substring=False).
- Blocker: correctness detail: 1803.08691v1: missing source_quote/page/checked_at
- Blocker: correctness detail: 2212.08228v2: missing source_quote/page/checked_at

## Audit artifacts

- Run directory: `workspace/medicine-bio/medical-ai/audit_runs/2026-06-06T10-02-53-00-00`
- Per-round JSON: `round_1.json` ... `round_13.json`
- Input hashes: `input_hashes.json`

| Round | Check | Verdict | Issues | Warnings |
| ---: | --- | --- | ---: | ---: |
| 1 | Ledger integrity | pass | 0 | 0 |
| 2 | Evidence depth and numerical discipline | needs work | 0 | 1 |
| 3 | Paper quality and framing | pass | 0 | 0 |
| 4 | Coverage, taxonomy leakage, and missing-literature risk | pass | 0 | 0 |
| 5 | Claim calibration and submission readiness | needs work | 0 | 2 |
| 6 | Positive-signal floor | pass | 0 | 0 |
| 7 | Academic format and scholarly correctness | needs work | 0 | 1 |
| 8 | Demo and proof (independent re-verification) | needs work | 0 | 3 |
| 9 | Direction coherence (anti-boilerplate-leak) | pass | 0 | 0 |
| 10 | Research-value (gap/contradiction/surprise/recency) | pass | 0 | 0 |
| 11 | System correctness (claim→quote→PDF) | needs work | 0 | 2 |
| 12 | Cross-model reviewer committee | pass | 0 | 0 |
| 13 | Citation integrity (cited→cached metadata) | pass | 0 | 0 |

## Evidence profile

- Filled evidence rows: 7
- Full-text verified rows: 5
- Preliminary / abstract-derived rows: 0
- Source-depth unclear rows: 2

## Round details

### Round 1: Ledger integrity — pass

**Notes**

- claim_rows=7
- supported_claims=5
- preliminary_claims=2
- filled_evidence_rows=7
- This round checks structure and status calibration: supported means full-text verified; preliminary-linked means traceable draft evidence.

### Round 2: Evidence depth and numerical discipline — needs work

**Warnings**

- 2 evidence row(s) have unclear source depth; mark them full-text verified or preliminary.

**Notes**

- full_text_verified=5/7
- preliminary_or_abstract=0/7
- unclear_source_depth=2/7

### Round 3: Paper quality and framing — pass

**Notes**

- paper=workspace/medicine-bio/medical-ai/paper/main.md
- finding_sections=3
- filled_evidence_rows=7

### Round 4: Coverage, taxonomy leakage, and missing-literature risk — pass

**Notes**

- triage_rows=8
- claimed_evidence_rows=7
- target_categories=cs.AI, cs.CV, cs.LG, eess.IV, physics.med-ph
- Coverage gaps still require human/domain reviewer search beyond arXiv metadata.

### Round 5: Claim calibration and submission readiness — needs work

**Warnings**

- 2 preliminary-linked claim(s) remain; do not promote to final support.
- Brief mixes supported and preliminary-linked claims; make final vs draft language explicit.

**Notes**

- claim_rows=7
- supported_claims=5
- preliminary_claims=2
- draft_only_claims=0
- unsupported_claims=0

### Round 6: Positive-signal floor — pass

**Notes**

- numeric_result_rows=6/7 (floor=2)
- comparative_rows=7/7 (floor=1)
- unique_cited_papers=7 (floor=3)
- correctness_score=0.794 (floor=0.5)
- novelty_score=0.974 (floor=0.35)

### Round 7: Academic format and scholarly correctness — needs work

**Warnings**

- 1 full-text supported row(s) do not surface their source_quote in the paper body: 2109.02722v2.

**Notes**

- paper=workspace/medicine-bio/medical-ai/paper/main.md
- abstract_words=127
- total_words=3467
- references_listed=7
- missing_format_sections=none

### Round 8: Demo and proof (independent re-verification) — needs work

**Warnings**

- Demo proof score 0.714 below required floor 0.8 (5/7 claims independently re-verified against cached PDFs).
- Claim not independently re-verified: 1803.08691v1 (overlap=1.0, substring=False).
- Claim not independently re-verified: 2212.08228v2 (overlap=1.0, substring=False).

**Notes**

- demo=workspace/medicine-bio/medical-ai/paper/demo.py
- proof=workspace/medicine-bio/medical-ai/paper/proof.json
- proof_score=0.714
- passed=5/7
- verdict=pass

### Round 9: Direction coherence (anti-boilerplate-leak) — pass

**Notes**

- direction_id=unknown
- family=medical-ai
- keywords_checked=0
- keyword_hits=0
- cross_family_leaks=0

### Round 10: Research-value (gap/contradiction/surprise/recency) — pass

**Notes**

- value_score=0.869 threshold=0.35
- gap_count=71 contradictions=30 surprises=23 recent_papers=1/8
- components gap=1.0 contradiction=1.0 surprise=1.0 recency=0.125

### Round 11: System correctness (claim→quote→PDF) — needs work

**Warnings**

- correctness detail: 1803.08691v1: missing source_quote/page/checked_at
- correctness detail: 2212.08228v2: missing source_quote/page/checked_at

**Notes**

- correctness_score=0.794 floor=0.55
- rows_scored=7
- pdfs_missing=0
- quote_in_pdf_avg=1.0
- claim_support_avg=0.502
- locator_present_avg=0.714

### Round 12: Cross-model reviewer committee — pass

**Notes**

- LLM disabled — cross-model jury skipped (deterministic baseline).

### Round 13: Citation integrity (cited→cached metadata) — pass

**Notes**

- citations_checked=7
- fabricated=0 year_mismatch=0 title_drift=0
- cached_corpus_size=8
- This round is deterministic: it cross-checks printed citations against cached arXiv metadata only.

## Interpretation

- `pass` means the deterministic audit found no structural, source-depth, taxonomy, or paper-quality warnings.
- `needs work` means the draft is traceable but still needs full-text/source-depth, taxonomy, or quality cleanup.
- `unsupported` means claims or required artifacts are missing or inconsistent enough to block trust.
- `submission_readiness=blocked` means the draft must not be treated as final or deployed as ready, even if it is useful as a transparent draft.
