Auto Annotation of Linguistic Features for Audio Deepfake Discernment

Authors

  • Kifekachukwu Nwosu University of Maryland, Baltimore County Rochester Institute of Technology
  • Chloe Evered University of Maryland, Baltimore County Georgetown University
  • Zahra Khanjani University of Maryland, Baltimore County
  • Noshaba Bhalli University of Maryland, Baltimore County
  • Lavon Davis University of Maryland, Baltimore County
  • Christine Mallinson University of Maryland, Baltimore County
  • Vandana P. Janeja University of Maryland, Baltimore County

DOI:

https://doi.org/10.1609/aaaiss.v2i1.27682

Keywords:

Deep Fakes, Auto Annotation, Linguistics

Abstract

We present an innovative approach to auto-annotate Expert Defined Linguistic Features (EDLFs) as subsequences in audio time series to improve audio deepfake discernment. In our prior work, these linguistic features – namely pitch, pause, breath, consonant release bursts, and overall audio quality, labeled by experts on the entire audio signal – have been shown to improve detection of audio deepfakes with AI algorithms. We now expand our approach to pilot a way to auto annotate subsequences in the time series that correspond to each EDLF. We developed an ensemble of discords, i.e. anomalies in time series, detected using matrix profiles across multiple discord lengths to identify multiple types of EDLFs. Working closely with linguistic experts, we evaluated where discords overlapped with EDLFs in the audio signal data. Our ensemble method to detect discords across multiple discord lengths achieves much higher accuracy than using individual discord lengths to detect EDLFs. With this approach and domain validation we establish the feasibility of using time series subsequences to capture EDLFs to supplement annotation by domain experts, for improved audio deepfake detection.

Downloads

Published

2024-01-22

Issue

Section

Assured and Trustworthy Human-centered AI (ATHAI)