1.
Zeng Y, He W, Vasyltsov I, Pang J, Chen L. Acceleration of Large Transformer Model Training by Sensitivity-Based Layer Dropping. AAAI [Internet]. 2023 Jun. 26 [cited 2026 May 25];37(9):11156-63. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/26321