Distill BERT to Traditional Models in Chinese Machine Reading Comprehension (Student Abstract)
Recently, unsupervised representation learning has been extremely successful in the field of natural language processing. More and more pre-trained language models are proposed and achieved the most advanced results especially in machine reading comprehension. However, these proposed pre-trained language models are huge with hundreds of millions of parameters that have to be trained. It is quite time consuming to use them in actual industry. Thus we propose a method that employ a distillation traditional reading comprehension model to simplify the pre-trained language model so that the distillation model has faster reasoning speed and higher inference accuracy in the field of machine reading comprehension. We evaluate our proposed method on the Chinese machine reading comprehension dataset CMRC2018 and greatly improve the accuracy of the original model. To the best of our knowledge, we are the first to propose a method that employ the distillation pre-trained language model in Chinese machine reading comprehension.