Rethinking Quality Assurance for Crowdsourced Multi-ROI Image Segmentation


  • Xiaolu Lu Microsoft
  • David Ratcliffe Microsoft
  • Tsu-Ting Kao Microsoft
  • Aristarkh Tikhonov Volpara Health Technologies Ltd.
  • Lester Litchfield Volpara Health Technologies Ltd.
  • Craig Rodger Microsoft
  • Kaier Wang Volpara Health Technologies Ltd.



Crowdsourcing, Image Segmentation, Quality Assurance, Annotation Workflow


Collecting high quality annotations to construct an evaluation dataset is essential for assessing the true performance of machine learning models. One popular way of performing data annotation is via crowdsourcing, where quality can be of concern. Despite much prior work addressing the annotation quality problem in crowdsourcing generally, little has been discussed in detail for image segmentation tasks. These tasks often require pixel-level annotation accuracy, and is relatively complex when compared to image classification or object detection with bounding-boxes. In this paper, we focus on image segmentation annotation via crowdsourcing, where images may not have been collected in a controlled way. In this setting, the task of annotating may be non-trivial, where annotators may experience difficultly in differentiating between regions-of-interest (ROIs) and background pixels. We implement an annotation process and examine the effectiveness of several in-situ and manual quality assurance and quality control mechanisms. We implement an annotation process on a medical image annotation task and examine the effectiveness of several in-situ and manual quality assurance and quality control mechanisms. Our observations on this task are three-fold. Firstly, including an onboarding and a pilot phase improves quality assurance as annotators can familiarize themselves with the task, especially when the definition of ROIs is ambiguous. Secondly, we observe high variability of annotation times, leading us to believe it cannot be relied upon as a source of information for quality control. When performing agreement analysis, we also show that global-level inter-rater agreement is insufficient to provide useful information, especially when annotator skill levels vary. Thirdly, we recognize that reviewing all annotations can be time-consuming and often infeasible, and there currently exist no mechanisms to reduce the workload for reviewers. Therefore, we propose a method to create a priority list of images for review based on inter-rater agreement. Our experiments suggest that this method can be used to improve reviewer efficiency when compared to a baseline approach, especially if a fixed work budget is required.




How to Cite

Lu, X., Ratcliffe, D., Kao, T.-T., Tikhonov, A., Litchfield, L., Rodger, C., & Wang, K. (2023). Rethinking Quality Assurance for Crowdsourced Multi-ROI Image Segmentation. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, 11(1), 103-114.