Selected Publications
- Monitorability as a Free Gift: How RLVR Spontaneously Aligns Reasoning
Zidi Xiong, Shan Chen, Himabindu Lakkaraju
ICML 2026
- How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior
Zidi Xiong*, Yuping Lin*, Wenya Xie*, Pengfei He, Zirui Liu, Jiliang Tang, Himabindu Lakkaraju, Zhen Xiang
ACL 2026
Featured by MIT Technology Review China
- Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models
Zidi Xiong, Shan Chen, Zhenting Qi, Himabindu Lakkaraju
NeurIPS 2025
- RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content
Zhuowen Yuan, Zidi Xiong, Yi Zeng, Ning Yu, Ruoxi Jia, Dawn Song, Bo Li
ICML 2024.
Cited in the Llama Guard 2 model card
- DECODINGTRUST: A Comprehensive Assessment of Trustworthiness in GPT Models
Boxin Wang*, Weixin Chen*, Hengzhi Pei*, Chulin Xie*, Mintong Kang*, Chenhui Zhang*, Chejian Xu, Zidi Xiong, Ritik Dutta, Rylan Schaeffer, Sang T. Truong, Simran Arora, Mantas Mazeika, Dan Hendrycks, Zinan Lin, Yu Cheng, Sanmi Koyejo, Dawn Song, Bo Li.
NeurIPS 2023.
Oral Presentation
Outstanding Paper Award
For full publications, please see my Google Scholar.