Preprints
Peer-reviewed
- RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content
Zhuowen Yuan, Zidi Xiong, Yi Zeng, Ning Yu, Ruoxi Jia, Dawn Song, Bo Li
ICML 2024
- BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
Zhen Xiang, Fengqing Jiang, Zidi Xiong, Bhaskar Ramasubramanian, Radha Poovendran, Bo Li
ICLR 2024.
NeurIPS 2023 BUGS workshop Oral Presentation.
- DECODINGTRUST: A Comprehensive Assessment of Trustworthiness in GPT Models
Boxin Wang, Weixin Chen, Hengzhi Pei, Chulin Xie, Mintong Kang, Chenhui Zhang, Chejian Xu, Zidi Xiong, Ritik Dutta, Rylan Schaeffer, Sang T. Truong, Simran Arora, Mantas Mazeika, Dan Hendrycks, Zinan Lin, Yu Cheng, Sanmi Koyejo, Dawn Song, Bo Li.
NeurIPS 2023.
Oral Presentation
Outstanding Paper Award
- CBD: A Certified Backdoor Detector Based on Local Dominant Probability
Zhen Xiang, Zidi Xiong, Bo Li.
NeurIPS 2023.
- UMD: Unsupervised Model Detection for X2X Backdoor Attacks
Zhen Xiang, Zidi Xiong, Bo Li.
ICML 2023.
- Rethinking the Necessity of Labels in Backdoor Removal.
Zidi Xiong, Dongxian Wu, Yifei Wang, Yisen Wang.
ICLR 2023 BANDS workshop.