An Application-Agnostic Automatic Target Recognition System Using Vision Language Models

Authors

  • Anthony Palladino The Charles Stark Draper Laboratory, Inc.
  • Dana Gajewski The Charles Stark Draper Laboratory, Inc.
  • Abigail Aronica The Charles Stark Draper Laboratory, Inc.
  • Patryk Deptula The Charles Stark Draper Laboratory, Inc.
  • Alexander Hamme The Charles Stark Draper Laboratory, Inc.
  • Seiyoung C. Lee The Charles Stark Draper Laboratory, Inc.
  • Jeff Muri The Charles Stark Draper Laboratory, Inc.
  • Todd Nelling The Charles Stark Draper Laboratory, Inc.
  • Michael A. Riley The Charles Stark Draper Laboratory, Inc.
  • Brian Wong The Charles Stark Draper Laboratory, Inc.
  • Margaret Duff The Charles Stark Draper Laboratory, Inc.

DOI:

https://doi.org/10.1609/aaai.v39i28.35154

Abstract

We present a novel Automatic Target Recognition (ATR) system using open-vocabulary object detection and classification models. A primary advantage of this approach is that target classes can be defined just before runtime by a non-technical end user, using either a few natural language text descriptions of the target, or a few image exemplars, or both. Nuances in the desired targets can be expressed in natural language, which is useful for unique targets with little or no training data. We also implemented a novel combination of several techniques to improve performance, such as leveraging the additional information in the sequence of overlapping frames to perform tubelet identification (i.e., sequential bounding box matching), bounding box re-scoring, and tubelet linking. Additionally, we developed a technique to visualize the aggregate output of many overlapping frames as a mosaic of the area scanned during the aerial surveillance or reconnaissance, and a kernel density estimate (or heatmap) of the detected targets. We initially applied this ATR system to the use case of detecting and clearing unexploded ordinance on airfield runways and we are currently extending our research to other real-world applications.

Published

2025-04-11

How to Cite

Palladino, A., Gajewski, D., Aronica, A., Deptula, P., Hamme, A., Lee, S. C., Muri, J., Nelling, T., Riley, M. A., Wong, B., & Duff, M. (2025). An Application-Agnostic Automatic Target Recognition System Using Vision Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 39(28), 28878-28884. https://doi.org/10.1609/aaai.v39i28.35154