CMedBench: A Comprehensive Benchmark for Efficient Medical Large Language Models

Shengbo Gao; Jinyang Guo; Lixian Su; Yifu Ding; Shiqiao Gu; Aishan Liu; Yuqing Ma; Zhiwang Zhang; Xianglong Liu

doi:10.1609/aaai.v40i25.39264

Authors

Shengbo Gao State Key Laboratory of Complex & Critical Software Environment, Beihang University, Beijing, China
Jinyang Guo State Key Laboratory of Complex & Critical Software Environment, Beihang University, Beijing, China School of Artifcial Intelligence, Beihang University, Beijing, China
Lixian Su Western University, Ontario, Canada
Yifu Ding State Key Laboratory of Complex & Critical Software Environment, Beihang University, Beijing, China School of Computer Science and Engineering, Beihang University, Beijing, China
Shiqiao Gu SenseTime Research, Beijing, China
Aishan Liu State Key Laboratory of Complex & Critical Software Environment, Beihang University, Beijing, China School of Computer Science and Engineering, Beihang University, Beijing, China
Yuqing Ma State Key Laboratory of Complex & Critical Software Environment, Beihang University, Beijing, China School of Artifcial Intelligence, Beihang University, Beijing, China
Zhiwang Zhang NingboTech University, Ningbo, China
Xianglong Liu State Key Laboratory of Complex & Critical Software Environment, Beihang University, Beijing, China School of Computer Science and Engineering, Beihang University, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v40i25.39264

Abstract

Large Language Models (LLMs) hold significant potential for enhancing healthcare applications, yet their deployment is hindered by high computational and memory demands. Model compression techniques offer solutions to reduce these demands, but their impact on medical LLMs remains underexplored. In this paper, we introduce CMedBench, the first comprehensive benchmark for evaluating compressed LLMs in medical contexts. CMedBench assesses five core dimensions: Medical Knowledge Ability, Medical Application Ability, Trustworthiness Maintenance, Compression Cross Combination, and Computational Efficiency. Through extensive empirical studies, we analyze the trade-offs between model efficiency and clinical performance across diverse models, datasets, and compression strategies. Our findings highlight critical limitations in current evaluation practices and provide a robust framework for aligning compression strategies with medical requirements. CMedBench serves as a vital resource for researchers and practitioners, guiding the development of efficient, trustworthy, and clinically effective LLMs for healthcare applications.

CMedBench: A Comprehensive Benchmark for Efficient Medical Large Language Models

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information