| 2606.03458v1 | End-to-end evaluations of KV-Cache quantization with Variance Normalization (KVarN) on generative benchmarks with substantial improvement over current state-of-the-art in AIME24, MATH500, HumanEval and IFEval. | supported | has evidence row | full-text | At 2.3 average bits per element even with the second scale, KVarN outperforms or matches prior methods, see e.g. | Abstract | in-scope: LLM extractor confirmed direction match | needs work; full-text verified; report=audit_report.md |
| 2512.19206v1 | We find that for effective low-bit KV cache quantization, the precision allocated to a key channel must be determined by two factors: its intrinsic quantization difficulty and its dynamic relevance to the query. | supported | has evidence row | full-text | Com- pared to BF16 and competitive baselines, MixKVQ pushes the effective bit-width down to 2.70 bits with negligible performance degradation. | 1 Introduction | in-scope: LLM extractor confirmed direction match | needs work; full-text verified; report=audit_report.md |
| 2606.09864v1 | Alignment collapse is real and silent. | preliminary-linked | has evidence row | full-text | Mistral-7B loses 15.2% of its refusals at only 1.03× perplexity, and no universal safe bit-width exists. | Abstract | in-scope: LLM extractor confirmed direction match | needs work; filled but source-depth unclear; report=audit_report.md |
| 2605.17757v1 | We propose OSCAR, an attention-aware calibration framework for ultra-low-bit KV-cache quantization. | supported | has evidence row | full-text | On Qwen3-4B-Thinking-2507 and Qwen3-8B, OSCAR reduces the BF16 accuracy gap to 3.78 and 1.42 points, respectively, while naive rotation INT2 collapses to nearly zero. | Abstract | in-scope: LLM extractor confirmed direction match | needs work; full-text verified; report=audit_report.md |
| 2505.18610v1 | We design progressive quantization and block-wise memory allocation techniques tailored for long-CoT scenarios to fully utilize the memory budget of the target hardware and effectively reduce the cumulative quantization error. | supported | has evidence row | full-text | Extensive experiments on 7B–70B long-CoT LLMs show that PM-KVQ improves reasoning benchmark performance by up to 8% over SOTA baselines under the same memory budget. | 1 | in-scope: LLM extractor confirmed direction match | needs work; full-text verified; report=audit_report.md |