Learning Precise Temporal Point Event Detection with Misaligned Labels

Julien Schroeter; Kirill Sidorov; David Marshall

doi:10.1609/aaai.v35i11.17145

Authors

Julien Schroeter Cardiff University
Kirill Sidorov Cardiff University
David Marshall Cardiff University

DOI:

https://doi.org/10.1609/aaai.v35i11.17145

Keywords:

Time-Series/Data Streams

Abstract

This work addresses the problem of robustly learning precise temporal point event detection despite only having access to poorly aligned labels for training. While standard (cross entropy-based) methods work well in noise-free setting, they often fail when labels are unreliable since they attempt to strictly fit the annotations. A common solution to this drawback is to transform the point prediction problem into a distribution prediction problem. However, we show that this approach raises several issues that negatively affect the robust learning of temporal localization. Thus, in an attempt to overcome these shortcomings, we introduce a simple and versatile training paradigm combining soft localization learning with counting-based sparsity regularization. In fact, unlike its counterparts, our approach allows to directly infer clear-cut point predictions in an end-to-end fashion while relaxing the reliance of the training on the exact position of labels. We achieve state-of-the-art performance against standard benchmarks in a number of challenging experiments (e.g., detection of instantaneous events in videos and music transcription) by simply replacing the original loss function with our novel alternative---without any additional fine-tuning.

Learning Precise Temporal Point Event Detection with Misaligned Labels

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription