Phoneme Hallucinator: One-Shot Voice Conversion via Set Expansion

Authors

  • Siyuan Shan Department of Computer Science, University of North Carolina at Chapel Hill
  • Yang Li Department of Computer Science, University of North Carolina at Chapel Hill
  • Amartya Banerjee Department of Computer Science, University of North Carolina at Chapel Hill
  • Junier B. Oliva Department of Computer Science, University of North Carolina at Chapel Hill

DOI:

https://doi.org/10.1609/aaai.v38i13.29411

Keywords:

ML: Deep Generative Models & Autoencoders, ML: Applications, ML: Unsupervised & Self-Supervised Learning, NLP: Speech

Abstract

Voice conversion (VC) aims at altering a person's voice to make it sound similar to the voice of another person while preserving linguistic content. Existing methods suffer from a dilemma between content intelligibility and speaker similarity; i.e., methods with higher intelligibility usually have a lower speaker similarity, while methods with higher speaker similarity usually require plenty of target speaker voice data to achieve high intelligibility. In this work, we propose a novel method Phoneme Hallucinator that achieves the best of both worlds. Phoneme Hallucinator is a one-shot VC model; it adopts a novel model to hallucinate diversified and high-fidelity target speaker phonemes based just on a short target speaker voice (e.g. 3 seconds). The hallucinated phonemes are then exploited to perform neighbor-based voice conversion. Our model is a text-free, any-to-any VC model that requires no text annotations and supports conversion to any unseen speaker. Quantitative and qualitative evaluations show that Phoneme Hallucinator outperforms existing VC methods for both intelligibility and speaker similarity.

Published

2024-03-24

How to Cite

Shan, S., Li, Y., Banerjee, A., & Oliva, J. B. (2024). Phoneme Hallucinator: One-Shot Voice Conversion via Set Expansion. Proceedings of the AAAI Conference on Artificial Intelligence, 38(13), 14910-14918. https://doi.org/10.1609/aaai.v38i13.29411

Issue

Section

AAAI Technical Track on Machine Learning IV