Native Speech Processing with LLMs

Aaron Soh

doi:10.1609/aaai.v40i48.42324

Native Speech Processing with LLMs

Authors

Aaron Soh Nanyang Technological University College of Computing and Data Science Singapore

DOI:

https://doi.org/10.1609/aaai.v40i48.42324

Abstract

Recent advances in Large Language Models (LLMs) have achieved state-of-the-art performance in Automatic Speech Recognition (ASR), surpassing ASR-only systems such as Whisper. However, their application to other speech processing tasks, particularly speaker diarisation (SD), remains underexplored. This work proposes extending existing speech-aware LLM architectures with diarisation-specific training and context-based prompting to enable joint transcription and segmentation of multi-speaker audio. By exploiting the semantic reasoning and multilingual capabilities of pretrained LLMs, the proposed approach aims to improve diarisation accuracy, enhancing accessibility for assistive technologies and real-time captioning applications that rely on accurate speaker-aware transcriptions.

AAAI-26 / IAAI-26 / EAAI-26 Proceedings Cover

Downloads

PDF
Poster

Published

2026-03-14

How to Cite

Soh, A. (2026). Native Speech Processing with LLMs. Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41513–41515. https://doi.org/10.1609/aaai.v40i48.42324

Download Citation

Issue

Vol. 40 No. 48: EAAI-26 AI for Education, Model AI Assignments, AAAI-26 Emerging Trends, Doctoral Consortium, Student Abstracts, Undergraduate Consortium and Demonstrations

Section

AAAI Undergraduate Consortium

Native Speech Processing with LLMs

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information