⛷️ Paper Under Submission
📝 Publications
AAAI 2026 Oral

Changyue Wang, Weihang Su, Qingyao Ai, Yiqun Liu
- RACE (Reasoning and Answer Consistency Evaluation) is a framework for detecting hallucinations in Large Reasoning Models (LRMs) by jointly analyzing both reasoning traces and final answers. It detects inconsistencies and hallucinations through multi-signal analysis, achieving robust and generalizable performance across models and datasets. RACE is the first to reveal that prior black-box hallucination detection methods are fundamentally flawed when applied to Large Reasoning Models (LRMs), and pioneers the direction of black-box hallucination detection for LRMs.
EMNLP 2025 Main

Knowledge Editing through Chain-of-Thought
Changyue Wang, Weihang Su, Qingyao Ai, Yichen Tang, Yiqun Liu
- EditCoT is a novel knowledge editing framework that updates LLMs through iterative chain-of-thought refinement, enabling efficient integration of new knowledge without retraining. It achieves state-of-the-art performance across diverse tasks and languages, offering superior generalization, stability, and effectiveness.
ACL 2025 Findings

Decoupling Reasoning and Knowledge Injection for In-Context Knowledge Editing
Changyue Wang, Weihang Su, Qingyao Ai, Yujia Zhou, Yiqun Liu
- DecKER is a novel in-context editing framework that decouples reasoning from knowledge injection, mitigating conflicts between updated and original knowledge. It achieves significant improvements in multi-hop reasoning by preserving reasoning integrity while efficiently integrating new knowledge.
SIGIR-AP 2024

LeKUBE: A Knowledge Update BEnchmark for Legal Domain
Changyue Wang, Weihang Su, Yiran Hu, Qingyao Ai, Yueyue Wu, Cheng Luo, Yiqun Liu, Min Zhang, Shaoping Ma
- LeKUBE is a comprehensive benchmark designed to evaluate knowledge update methods for legal LLMs. It highlights the unique challenges of updating legal knowledge—such as nuanced statutory changes and complex reasoning—revealing a significant gap between current techniques and real-world legal needs.
ACL 2024 Findings

Unsupervised real-time hallucination detection based on the internal states of large language models
Weihang Su*, Changyue Wang*, Qingyao Ai, Yiran Hu, Zhijing Wu, Yujia Zhou, Yiqun Liu
- MIND is an unsupervised framework that detects hallucinations in LLMs by leveraging their internal states during inference for real-time analysis. Alongside, HELM provides a comprehensive benchmark to evaluate hallucination detection across diverse models and scenarios.
- Parametric Retrieval Augmented Generation, Weihang Su, Yichen Tang, Qingyao Ai, Junxi Yan, Changyue Wang, Hongning Wang, Ziyi Ye, Yujia Zhou, Yiqun Liu. SIGIR 2025
- JuDGE: Benchmarking Judgment Document Generation for Chinese Legal System, Weihang Su, Baoqing Yue, Qingyao Ai, Yiran Hu, Jiaqi Li, Changyue Wang, Kaiyuan Zhang, Yueyue Wu, Yiqun Liu. SIGIR 2025
- Pre-training for Legal Case Retrieval Based on Inter-Case Distinctions, Weihang Su, Qingyao Ai, Yueyue Wu, Anzhe Xie, Changyue Wang, Yixiao Ma, Haitao Li, Zhijing Wu, Yiqun Liu, Min Zhang. ACM TOIS
- Mitigating Entity-Level Hallucination in Large Language Models, Weihang Su, Yichen Tang, Qingyao Ai, Changyue Wang, Zhijing Wu, Yiqun Liu. SIGIR-AP 2024