Evidence-ledger draft

Research paper library

Library of generated evidence-ledger research papers.

Automated research factory

From a research direction to an auditable paper, end-to-end.

Pick a direction (or paste your own), the harness fetches arXiv, triages papers, builds an evidence ledger, full-text verifies every quote against the PDF, runs a 5-round audit, and drafts the paper. Every claim links back to a sourced paper ID and a verified quote.

Papers

11

in this release

Submission ready

7

passed 5-round audit

Supported claims

47

linked to source IDs

Full-text quotes

51

verified against PDF

Avg novelty

0.97

0–1, vs prior productions

Avg correctness

0.72

0–1, quote ↔ PDF ↔ claim

Avg demo proof

0.77

0–1, claims independently re-verified

Avg research value

0.43

0–1, gap × contradiction × surprise × recency

Publishable

2

product-health clean

Avg first TTV

0s

time to first evidence ledger

Avg draft TTV

2s

time to paper draft

What this site automates

  1. 1 · DirectionPick a built-in research direction or submit a custom one with keywords and a quality bar.
  2. 2 · FetchQuery arXiv via direction-specific keywords; cache PDFs locally.
  3. 3 · TriageScore relevance, dedupe by fingerprint, keep top candidates.
  4. 4 · Evidence ledgerExtract core claims, key results, and metric snapshots per paper.
  5. 5 · Full-text verifyRe-open each PDF, pull the exact source quote and page/section.
  6. 6 · 5-round auditCross-check claim ↔ evidence ↔ taxonomy fit; block on weak links.
  7. 7 · Draft paperRender markdown paper, claim ledger, brief, and audit report.
  8. 8 · PublishBuild static site with reader pages, evidence CSV, and progress timeline.

Paper library

Grouped by discipline / category. Cards sorted by research-value score (highest first), then readiness, then recency.

cs-ai/auto-architectures

1 paper · 1 ready for submission

ad-hoc

Automated Evidence-Ledger Production of Research Papers

ready audit: pass 6 evidence rows 6 full-text verified novelty: 0.97 correctness: 0.89 proof: 1.00 (6/6 pass) value: 0.91 TTV: 0s product: publishable 2026-06-01

Systems in Open Problems in Pre- And Post-Ln Architectures: An Evidence-Ledger Investigation increasingly promise longer-horizon, higher-autonomy workflows, but their outputs are difficult to trust when claims are not explicitly tied to evi…

cs-ai/deep-llm-layer-redundancy

1 paper · 0 ready for submission

auto-deep-llms-layer

Layer Redundancy and Depth Utilization in Deep LLMs: An Evidence Ledger

blocked audit: needs work 5 evidence rows 0 full-text verified novelty: 0.97 correctness: 0.14 proof: 0.00 (0/5 fail) value: 0.14 TTV: 1s product: needs-evidence 2026-05-21

Modern deep LLMs accumulate dozens of transformer layers, yet evidence repeatedly shows large fractions of layers are skippable, prunable, or contribute marginally — but the conditions, depth-vs-quality tradeoffs, and architecture-specific…

cs-ai/liquid-neural-networks

1 paper · 1 ready for submission

ad-hoc

Automated Evidence-Ledger Production of Research Papers

ready audit: pass 5 evidence rows 5 full-text verified novelty: 0.98 correctness: 0.91 proof: 1.00 (5/5 pass) value: 0.89 TTV: 0s product: publishable 2026-06-01

Systems in Liquid Neural Networks and Continuous-Time Models increasingly promise longer-horizon, higher-autonomy workflows, but their outputs are difficult to trust when claims are not explicitly tied to evidence. This draft synthesizes ta…

cs-ai/research-harnesses

1 paper · 0 ready for submission

ad-hoc

Automated Evidence-Ledger Production of Research Papers

blocked audit: needs work 8 evidence rows 8 full-text verified novelty: 0.98 correctness: 0.79 proof: 0.75 (6/8 pass) value: 0.94 TTV: 0s product: useful-draft 2026-06-01

Systems in Autonomous Research Harnesses increasingly promise longer-horizon, higher-autonomy workflows, but their outputs are difficult to trust when claims are not explicitly tied to evidence. This draft synthesizes taxonomy-scoped eviden…

cs-ai/rope-positional-bias-interaction

1 paper · 0 ready for submission

auto-especially-rope

RoPE × Implicit Positional Bias: An Evidence Ledger on Interaction Effects

blocked audit: needs work 5 evidence rows 0 full-text verified novelty: 0.96 correctness: 0.16 proof: 0.00 (0/5 fail) value: 0.37 TTV: 1s product: needs-evidence 2026-05-21

RoPE has become the default positional encoding for modern LLMs, yet recent papers report that explicit rotary signals interact unpredictably with implicit positional biases from attention, normalization, and massive activations — producing…

cs-ai/transformer-attention

1 paper · 1 ready for submission

custom-transformer-attention-variants

Evidence-Ledger Synthesis of Attention Variants and Linear Attention in LLMs

ready audit: pass 5 evidence rows 5 full-text verified novelty: 0.97 correctness: 0.85 proof: 1.00 (5/5 pass) value: 0.09 TTV: 0s product: useful-draft 2026-05-15

Attention has fragmented into MHA/MQA/GQA, sparse windowed DSA/SWA/CSA, linear/state-space Mamba, Delta, KDA, and hybrid attention; claims about quality parity, throughput, and long-context behavior are scattered and not directly comparable…

cs-ai/transformer-ffn-moe

1 paper · 1 ready for submission

custom-transformer-ffn-and-moe

Evidence-Ledger Synthesis of FFN and Mixture-of-Experts in LLMs

ready audit: pass 5 evidence rows 5 full-text verified novelty: 0.97 correctness: 0.84 proof: 1.00 (5/5 pass) value: 0.06 TTV: 0s product: useful-draft 2026-05-15

FFN dominates parameter count in transformer LLMs, and MoE/DeepSeek MoE/shared-expert variants now claim large efficiency wins, but sparse vs dense comparisons mix training-cost, serving-cost, and quality claims in inconsistent ways. This d…

cs-ai/transformer-normalization

1 paper · 1 ready for submission

custom-transformer-normalization

Evidence-Ledger Synthesis of Transformer Normalization Variants

ready audit: pass 5 evidence rows 5 full-text verified novelty: 0.98 correctness: 0.88 proof: 1.00 (5/5 pass) value: 0.12 TTV: 0s product: useful-draft 2026-05-15

Transformer normalization has fragmented into LayerNorm, RMSNorm, Pre-Norm, Post-Norm, and QK-Norm variants, with overlapping stability and training-speed claims that are not directly comparable. This draft synthesizes taxonomy-scoped evide…

cs-ai/transformer-position-encoding

1 paper · 1 ready for submission

custom-transformer-position-encoding

Evidence-Ledger Synthesis of Transformer Position Encoding Evolution

ready audit: pass 5 evidence rows 5 full-text verified novelty: 0.97 correctness: 0.86 proof: 1.00 (5/5 pass) value: 0.27 TTV: 0s product: useful-draft 2026-05-15

Modern LLMs claim long-context capability, but position encoding choices sinusoidal, RoPE, YaRN, NoPE are reported with mixed evidence and incompatible benchmarks, making it hard to compare claims about extrapolation and stability. This dra…

cs-ai/transformer-residual

1 paper · 1 ready for submission

custom-transformer-residual-connections

Evidence-Ledger Synthesis of Transformer Residual Connection Variants

ready audit: pass 5 evidence rows 5 full-text verified novelty: 0.98 correctness: 0.85 proof: 1.00 (5/5 pass) value: 0.03 TTV: 0s product: useful-draft 2026-05-15

Recent LLM labs DeepSeek, Kimi, ByteDance report modifications to residual connections in or around the FFN, but the design space is small and the evidence is scattered across system papers and ablations. This draft synthesizes taxonomy-sco…

medicine-bio/medical-ai

1 paper · 0 ready for submission

ad-hoc

Automated Evidence-Ledger Production of Research Papers

blocked audit: needs work 7 evidence rows 7 full-text verified novelty: 0.97 correctness: 0.79 proof: 0.71 (5/7 pass) value: 0.87 TTV: 0s product: useful-draft 2026-06-01

Systems in Medical AI and Clinical Decision Support increasingly promise longer-horizon, higher-autonomy workflows, but their outputs are difficult to trust when claims are not explicitly tied to evidence. This draft synthesizes taxonomy-sc…

Research frontiers (auto-mined)

Open problems extracted directly from the cached corpus — what the field itself says is unsolved. Highest scientific value lives here.

value: 0.25 2 papers admit this is open 1 gap stmts 0 contradictions

prompt

Future work could explore incor- porating common noise-handling strategies into the agent’s prompt to guide its data analysis.

Generated 2026-06-05T10:03:39+00:00. Duplicate production is blocked by direction, normalized title, and evidence-paper fingerprint.