⛷️ Paper Under Submission

📝 Publications

AAAI 2026 Oral
sym

Joint Evaluation of Answer and Reasoning Consistency for Hallucination Detection in Large Reasoning Models

Changyue Wang, Weihang Su, Qingyao Ai, Yiqun Liu

Code | HF Model

  • RACE (Reasoning and Answer Consistency Evaluation) is a framework for detecting hallucinations in Large Reasoning Models (LRMs) by jointly analyzing both reasoning traces and final answers. It detects inconsistencies and hallucinations through multi-signal analysis, achieving robust and generalizable performance across models and datasets. RACE is the first to reveal that prior black-box hallucination detection methods are fundamentally flawed when applied to Large Reasoning Models (LRMs), and pioneers the direction of black-box hallucination detection for LRMs.
EMNLP 2025 Main
sym

Knowledge Editing through Chain-of-Thought

Changyue Wang, Weihang Su, Qingyao Ai, Yichen Tang, Yiqun Liu

Code

  • EditCoT is a novel knowledge editing framework that updates LLMs through iterative chain-of-thought refinement, enabling efficient integration of new knowledge without retraining. It achieves state-of-the-art performance across diverse tasks and languages, offering superior generalization, stability, and effectiveness.
ACL 2025 Findings
sym

Decoupling Reasoning and Knowledge Injection for In-Context Knowledge Editing

Changyue Wang, Weihang Su, Qingyao Ai, Yujia Zhou, Yiqun Liu

Code

  • DecKER is a novel in-context editing framework that decouples reasoning from knowledge injection, mitigating conflicts between updated and original knowledge. It achieves significant improvements in multi-hop reasoning by preserving reasoning integrity while efficiently integrating new knowledge.
SIGIR-AP 2024
sym

LeKUBE: A Knowledge Update BEnchmark for Legal Domain

Changyue Wang, Weihang Su, Yiran Hu, Qingyao Ai, Yueyue Wu, Cheng Luo, Yiqun Liu, Min Zhang, Shaoping Ma

Code

  • LeKUBE is a comprehensive benchmark designed to evaluate knowledge update methods for legal LLMs. It highlights the unique challenges of updating legal knowledge—such as nuanced statutory changes and complex reasoning—revealing a significant gap between current techniques and real-world legal needs.
ACL 2024 Findings
sym

Unsupervised real-time hallucination detection based on the internal states of large language models

Weihang Su*, Changyue Wang*, Qingyao Ai, Yiran Hu, Zhijing Wu, Yujia Zhou, Yiqun Liu

Code

  • MIND is an unsupervised framework that detects hallucinations in LLMs by leveraging their internal states during inference for real-time analysis. Alongside, HELM provides a comprehensive benchmark to evaluate hallucination detection across diverse models and scenarios.