Designing Safety Specifications for Clinical AI: A Case
Study

Shibbir Ahmed

doi:10.1609/aaaiss.v7i1.36898

Authors

Shibbir Ahmed Texas State University

DOI:

https://doi.org/10.1609/aaaiss.v7i1.36898

Abstract

Clinical AI models increasingly inform care decisions, yet implicit assumptions about data timing, label semantics, calibration, and operating thresholds are rarely specified or monitored, causing subtle failures with standard metrics. We present executable safety contracts, lightweight, task-level specifications enforced as runtime checks for hospital length-of-stay prediction. The specifications capture preconditions (data integrity, index-time alignment, censoring), postconditions (admissible outputs, alert-budget bounds), and invariants (coverage/calibration targets, subgroup equity). We implement these checks in a Python pipeline and evaluate them on a single-center MIMIC-IV cohort and a multi-center eICU-style cohort using simple baselines (logistic regression, gradient boosting) with conformal intervals and post-hoc calibration. The contracts exposed hazards that MAE (Mean Absolute Error), AUC (Area Under the ROC Curve), or ECE (Expected Calibration Error) alone missed, for example, acceptable point error with severe under-coverage in eICU, well-calibrated probabilities that nonetheless violated alert-rate constraints, and dataset-specific fairness gaps. Lightweight remedies such as conformal radius tuning, threshold/alert-scope selection, and calibration often restored compliance without degrading point performance, while clarifying when deeper modeling or policy changes were needed. Overall, the case study shows that Design by Contract principles extend beyond APIs to system-level specifications for clinical ML, providing a practical way to state safety expectations, check them with minimal compute, and make violations actionable.

Designing Safety Specifications for Clinical AI: A Case Study

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information