Zeng, Y., He, W., Vasyltsov, I., Pang, J., & Chen, L. (2023). Acceleration of Large Transformer Model Training by Sensitivity-Based Layer Dropping. Proceedings of the AAAI Conference on Artificial Intelligence, 37(9), 11156–11163. https://doi.org/10.1609/aaai.v37i9.26321