MONICA2: Mobile Neural Voice Command Assistants towards Smaller and Smarter


  • Yoonseok Hong Netmarble AI Center
  • Shounan An Netmarble AI Center
  • Sunwoo Im Netmarble AI Center
  • Jaegeon Jo Netmarble AI Center
  • Insoo Oh Netmarble AI Center



On-device ASR, Transformer, Mobile Game, Weight Sharing, Voice Assistants


In this paper, we propose on-device voice command assistants for mobile games to increase user experiences even in hands-busy situations such as driving and cooking. Since most of the current mobile games cost large memory (e.g. more than 1GB memory), so it is necessary to reduce memory usage further to integrate voice commands systems on mobile clients. Therefore a need to design an on-device automatic speech recognition system that costs minimal memory and CPU resources rises. To this end, we apply cross layer parameter sharing to Conformer, named MONICA2 which results in lower memory usage for on-device speech recognition. MONICA2 reduces the number of parameters of deep neural network by 58%, with minimal recognition accuracy degradation measured in word error rate on Librispeech benchmark. As an on-device voice command user interface, MONICA2 costs only 12.8MB mobile memory and the average inference time for 3-seconds voice command is about 30ms, which is profiled in Samsung Galaxy S9. As far as we know, MONICA2 is the most memory efficient yet accurate on-device speech recognition which could be applied to various applications such as mobile games, IoT devices, etc.




How to Cite

Hong, Y., An, S., Im, S., Jo, J., & Oh, I. (2022). MONICA2: Mobile Neural Voice Command Assistants towards Smaller and Smarter. Proceedings of the AAAI Conference on Artificial Intelligence, 36(11), 13176-13178.