Every Component Counts: Rethinking the Measure of Success for Medical Semantic Segmentation in Multi-Instance Segmentation Tasks

Authors

  • Alexander Jaus Karlsruhe Institute of Technology, Karlsruhe, Germany
  • Constantin Marc Seibold Institute for AI in Medicine (IKIM), University Medicine Essen, Essen, Germany
  • Simon Reiß Karlsruhe Institute of Technology, Karlsruhe, Germany
  • Zdravko Marinov Karlsruhe Institute of Technology, Karlsruhe, Germany
  • Keyi Li Karlsruhe Institute of Technology, Karlsruhe, Germany
  • Zeling Ye Karlsruhe Institute of Technology, Karlsruhe, Germany
  • Stefan Krieg Karlsruhe Institute of Technology, Karlsruhe, Germany
  • Jens Kleesiek Institute for AI in Medicine (IKIM), University Medicine Essen, Essen, Germany
  • Rainer Stiefelhagen Karlsruhe Institute of Technology, Karlsruhe, Germany

DOI:

https://doi.org/10.1609/aaai.v39i4.32408

Abstract

We present Connected-Component (CC)-Metrics, a novel semantic segmentation evaluation protocol, targeted to align existing semantic segmentation metrics to a multi-instance detection scenario in which each connected component matters. We motivate this setup in the common medical scenario of semantic metastases segmentation in a full-body PET/CT. We show how existing semantic segmentation metrics suffer from a bias towards larger connected components contradicting the clinical assessment of scans in which tumor size and clinical relevance are uncorrelated. To rebalance existing segmentation metrics, we propose to evaluate them on a per-component basis thus giving each tumor the same weight irrespective of its size. To match predictions to ground-truth segments, we employ a proximity-based matching criterion, evaluating common metrics locally at the component of interest. Using this approach, we break free of biases introduced by large metastasis for overlap-based metrics such as Dice or Surface Dice. CC-Metrics also improves distance-based metrics such as Hausdorff Distances which are uninformative for small changes that do not influence the maximum or 95th percentile, and avoids pitfalls introduced by directly combining counting-based metrics with overlap-based metrics as it is done in Panoptic Quality.

Published

2025-04-11

How to Cite

Jaus, A., Seibold, C. M., Reiß, S., Marinov, Z., Li, K., Ye, Z., … Stiefelhagen, R. (2025). Every Component Counts: Rethinking the Measure of Success for Medical Semantic Segmentation in Multi-Instance Segmentation Tasks. Proceedings of the AAAI Conference on Artificial Intelligence, 39(4), 3904–3912. https://doi.org/10.1609/aaai.v39i4.32408

Issue

Section

AAAI Technical Track on Computer Vision III