Evidence-ledger draft

Research paper library

Library of generated evidence-ledger research papers.

Automated research factory

From a research direction to an auditable paper, end-to-end.

Pick a direction (or paste your own), the harness fetches arXiv, triages papers, builds an evidence ledger, full-text verifies every quote against the PDF, runs a 5-round audit, and drafts the paper. Every claim links back to a sourced paper ID and a verified quote.

Papers

18

in this release

Submission ready

7

passed 5-round audit

Supported claims

51

linked to source IDs

Full-text quotes

86

verified against PDF

Avg novelty

0.97

0–1, vs prior productions

Avg correctness

0.74

0–1, quote ↔ PDF ↔ claim

Avg demo proof

0.71

0–1, claims independently re-verified

Avg research value

0.61

0–1, gap × contradiction × surprise × recency

Publishable

2

product-health clean

Avg first TTV

41s

time to first evidence ledger

Avg draft TTV

50s

time to paper draft

What this site automates

  1. 1 · DirectionPick a built-in research direction or submit a custom one with keywords and a quality bar.
  2. 2 · FetchQuery arXiv via direction-specific keywords; cache PDFs locally.
  3. 3 · TriageScore relevance, dedupe by fingerprint, keep top candidates.
  4. 4 · Evidence ledgerExtract core claims, key results, and metric snapshots per paper.
  5. 5 · Full-text verifyRe-open each PDF, pull the exact source quote and page/section.
  6. 6 · 5-round auditCross-check claim ↔ evidence ↔ taxonomy fit; block on weak links.
  7. 7 · Draft paperRender markdown paper, claim ledger, brief, and audit report.
  8. 8 · PublishBuild static site with reader pages, evidence CSV, and progress timeline.

Paper library

Grouped by discipline / category. Cards sorted by research-value score (highest first), then readiness, then recency.

cs-ai/auto-architectures

1 paper · 1 ready for submission

ad-hoc

Automated Evidence-Ledger Production of Research Papers

ready audit: pass 6 evidence rows 6 full-text verified novelty: 0.97 correctness: 0.89 proof: 1.00 (6/6 pass) value: 0.91 TTV: 0s product: publishable 2026-06-01

Systems in Open Problems in Pre- And Post-Ln Architectures: An Evidence-Ledger Investigation increasingly promise longer-horizon, higher-autonomy workflows, but their outputs are difficult to trust when claims are not explicitly tied to evi…

cs-ai/auto-inference

1 paper · 0 ready for submission

auto-during-inference

Open Problems in During Inference: An Evidence-Ledger Investigation

blocked audit: needs work 5 evidence rows 5 full-text verified novelty: 0.98 correctness: 0.69 proof: 0.20 (1/5 fail) value: 0.88 TTV: 1m 33s product: useful-draft 2026-06-08

Across 41 cached papers, during inference is repeatedly flagged as an unresolved area 1 explicit open-problem statements, 2 cross-paper numeric contradictions, 0 surprise/counter-narrative findings. A scoped evidence ledger can separate whi…

cs-ai/auto-optimization

1 paper · 0 ready for submission

auto-optimization

Open Problems in Optimization: An Evidence-Ledger Investigation

blocked audit: needs work 5 evidence rows 5 full-text verified novelty: 0.97 correctness: 0.84 proof: 0.80 (4/5 pass) value: 0.88 TTV: 1m 55s product: useful-draft 2026-06-09

Across 55 cached papers, optimization is repeatedly flagged as an unresolved area 1 explicit open-problem statements, 2 cross-paper numeric contradictions, 1 surprise/counter-narrative findings. A scoped evidence ledger can separate which s…

cs-ai/auto-representation

1 paper · 0 ready for submission

auto-enable-richer-representation

Open Problems in Enable Richer Representation: An Evidence-Ledger Investigation

blocked audit: needs work 5 evidence rows 5 full-text verified novelty: 0.98 correctness: 0.76 proof: 0.60 (3/5 pass) value: 0.88 TTV: 1m 22s product: useful-draft 2026-06-01

Across 28 cached papers, enable richer representation is repeatedly flagged as an unresolved area 2 explicit open-problem statements, 2 cross-paper numeric contradictions, 0 surprise/counter-narrative findings. A scoped evidence ledger can…

cs-ai/auto-scaling

1 paper · 0 ready for submission

auto-would-explore-the-scaling

Open Problems in Would Explore The Scaling: An Evidence-Ledger Investigation

blocked audit: needs work 5 evidence rows 5 full-text verified novelty: 0.98 correctness: 0.85 proof: 0.80 (4/5 pass) value: 0.97 TTV: 2m 20s product: useful-draft 2026-06-08

Across 3 cached papers, would explore the scaling is repeatedly flagged as an unresolved area 6 explicit open-problem statements, 0 cross-paper numeric contradictions, 0 surprise/counter-narrative findings. A scoped evidence ledger can sepa…

cs-ai/deep-llm-layer-redundancy

1 paper · 0 ready for submission

auto-deep-llms-layer

Layer Redundancy and Depth Utilization in Deep LLMs: An Evidence Ledger

blocked audit: needs work 5 evidence rows 0 full-text verified novelty: 0.97 correctness: 0.14 proof: 0.00 (0/5 fail) value: 0.14 TTV: 1s product: needs-evidence 2026-05-21

Modern deep LLMs accumulate dozens of transformer layers, yet evidence repeatedly shows large fractions of layers are skippable, prunable, or contribute marginally — but the conditions, depth-vs-quality tradeoffs, and architecture-specific…

cs-ai/liquid-neural-networks

1 paper · 1 ready for submission

ad-hoc

Automated Evidence-Ledger Production of Research Papers

ready audit: pass 5 evidence rows 5 full-text verified novelty: 0.98 correctness: 0.91 proof: 1.00 (5/5 pass) value: 0.89 TTV: 0s product: publishable 2026-06-01

Systems in Liquid Neural Networks and Continuous-Time Models increasingly promise longer-horizon, higher-autonomy workflows, but their outputs are difficult to trust when claims are not explicitly tied to evidence. This draft synthesizes ta…

cs-ai/llm-quantization

1 paper · 0 ready for submission

seed-llm-quantization

Evidence-Ledger Synthesis of Post-Training Quantization for LLMs

blocked audit: needs work 5 evidence rows 5 full-text verified novelty: 0.96 correctness: 0.76 proof: 0.80 (4/5 pass) value: 0.97 TTV: 10s product: useful-draft 2026-06-11

Post-training quantization GPTQ, AWQ, weight-only and weight-activation schemes is widely claimed to compress LLMs to 4 bits with near-lossless accuracy, but papers report quality and speed against incompatible models, calibration sets, and…

cs-ai/peft-methods

1 paper · 0 ready for submission

seed-parameter-efficient-finetuning

Evidence-Ledger Synthesis of Parameter-Efficient Fine-Tuning Methods

blocked audit: needs work 5 evidence rows 5 full-text verified novelty: 0.97 correctness: 0.75 proof: 0.60 (3/5 pass) value: 0.94 TTV: 3m 24s product: useful-draft 2026-06-07

LLM adaptation increasingly relies on parameter-efficient fine-tuning LoRA, QLoRA, adapters, prefix/prompt tuning, but papers report accuracy-versus-memory trade-offs against incompatible baselines and benchmarks, making it hard to compare…

cs-ai/research-harnesses

1 paper · 0 ready for submission

ad-hoc

Automated Evidence-Ledger Production of Research Papers

blocked audit: needs work 8 evidence rows 8 full-text verified novelty: 0.98 correctness: 0.79 proof: 0.75 (6/8 pass) value: 0.94 TTV: 0s product: useful-draft 2026-06-01

Systems in Autonomous Research Harnesses increasingly promise longer-horizon, higher-autonomy workflows, but their outputs are difficult to trust when claims are not explicitly tied to evidence. This draft synthesizes taxonomy-scoped eviden…

cs-ai/retrieval-augmented-generation

1 paper · 0 ready for submission

seed-retrieval-augmented-generation

Evidence-Ledger Synthesis of Retrieval-Augmented Generation

blocked audit: needs work 5 evidence rows 5 full-text verified novelty: 0.97 correctness: 0.68 proof: 0.60 (3/5 pass) value: 0.88 TTV: 1m 30s product: useful-draft 2026-06-07

Retrieval-augmented generation RAG is widely claimed to reduce hallucination and improve factuality of LLMs, but papers report gains against incompatible retrievers, corpora, and evaluation metrics, making it hard to compare which retrieval…

cs-ai/rope-positional-bias-interaction

1 paper · 0 ready for submission

auto-especially-rope

RoPE × Implicit Positional Bias: An Evidence Ledger on Interaction Effects

blocked audit: needs work 5 evidence rows 0 full-text verified novelty: 0.96 correctness: 0.16 proof: 0.00 (0/5 fail) value: 0.29 TTV: 1s product: needs-evidence 2026-05-21

RoPE has become the default positional encoding for modern LLMs, yet recent papers report that explicit rotary signals interact unpredictably with implicit positional biases from attention, normalization, and massive activations — producing…

cs-ai/transformer-attention

1 paper · 1 ready for submission

custom-transformer-attention-variants

Evidence-Ledger Synthesis of Attention Variants and Linear Attention in LLMs

ready audit: pass 5 evidence rows 5 full-text verified novelty: 0.97 correctness: 0.85 proof: 1.00 (5/5 pass) value: 0.09 TTV: 0s product: useful-draft 2026-05-15

Attention has fragmented into MHA/MQA/GQA, sparse windowed DSA/SWA/CSA, linear/state-space Mamba, Delta, KDA, and hybrid attention; claims about quality parity, throughput, and long-context behavior are scattered and not directly comparable…

cs-ai/transformer-ffn-moe

1 paper · 1 ready for submission

custom-transformer-ffn-and-moe

Evidence-Ledger Synthesis of FFN and Mixture-of-Experts in LLMs

ready audit: pass 5 evidence rows 5 full-text verified novelty: 0.97 correctness: 0.84 proof: 1.00 (5/5 pass) value: 0.06 TTV: 0s product: useful-draft 2026-05-15

FFN dominates parameter count in transformer LLMs, and MoE/DeepSeek MoE/shared-expert variants now claim large efficiency wins, but sparse vs dense comparisons mix training-cost, serving-cost, and quality claims in inconsistent ways. This d…

cs-ai/transformer-normalization

1 paper · 1 ready for submission

custom-transformer-normalization

Evidence-Ledger Synthesis of Transformer Normalization Variants

ready audit: pass 5 evidence rows 5 full-text verified novelty: 0.98 correctness: 0.88 proof: 1.00 (5/5 pass) value: 0.12 TTV: 0s product: useful-draft 2026-05-15

Transformer normalization has fragmented into LayerNorm, RMSNorm, Pre-Norm, Post-Norm, and QK-Norm variants, with overlapping stability and training-speed claims that are not directly comparable. This draft synthesizes taxonomy-scoped evide…

cs-ai/transformer-position-encoding

1 paper · 1 ready for submission

custom-transformer-position-encoding

Evidence-Ledger Synthesis of Transformer Position Encoding Evolution

ready audit: pass 5 evidence rows 5 full-text verified novelty: 0.97 correctness: 0.86 proof: 1.00 (5/5 pass) value: 0.19 TTV: 0s product: useful-draft 2026-05-15

Modern LLMs claim long-context capability, but position encoding choices sinusoidal, RoPE, YaRN, NoPE are reported with mixed evidence and incompatible benchmarks, making it hard to compare claims about extrapolation and stability. This dra…

cs-ai/transformer-residual

1 paper · 1 ready for submission

custom-transformer-residual-connections

Evidence-Ledger Synthesis of Transformer Residual Connection Variants

ready audit: pass 5 evidence rows 5 full-text verified novelty: 0.98 correctness: 0.85 proof: 1.00 (5/5 pass) value: 0.03 TTV: 0s product: useful-draft 2026-05-15

Recent LLM labs DeepSeek, Kimi, ByteDance report modifications to residual connections in or around the FFN, but the design space is small and the evidence is scattered across system papers and ablations. This draft synthesizes taxonomy-sco…

medicine-bio/medical-ai

1 paper · 0 ready for submission

ad-hoc

Automated Evidence-Ledger Production of Research Papers

blocked audit: needs work 7 evidence rows 7 full-text verified novelty: 0.97 correctness: 0.79 proof: 0.71 (5/7 pass) value: 0.87 TTV: 0s product: useful-draft 2026-06-01

Systems in Medical AI and Clinical Decision Support increasingly promise longer-horizon, higher-autonomy workflows, but their outputs are difficult to trust when claims are not explicitly tied to evidence. This draft synthesizes taxonomy-sc…

Research frontiers (auto-mined)

Open problems extracted directly from the cached corpus — what the field itself says is unsolved. Highest scientific value lives here.

value: 0.30 2 papers admit this is open 2 gap stmts 0 contradictions

hallucination

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.

Generated 2026-06-12T10:06:04+00:00. Duplicate production is blocked by direction, normalized title, and evidence-paper fingerprint.