[1]
Y. Zeng, W. He, I. Vasyltsov, J. Pang, and L. Chen, “Acceleration of Large Transformer Model Training by Sensitivity-Based Layer Dropping”, AAAI, vol. 37, no. 9, pp. 11156–11163, Jun. 2023.