Introduction
The prediction of clinical outcomes, such as postoperative mortality, morbidity, and prolonged length of stay (LOS) in the hospital, is one of the most significant challenges in modern medical practice. The evaluation and development of models that can reliably predict these outcomes are essential both for improving patient care and for guiding therapeutic strategies and the rational allocation of healthcare resources. For this purpose, advanced statistical methods are used that allow researchers and clinicians to assess the accuracy of available risk indicators. Among the most useful tools is the Receiver Operating Characteristic (ROC) curve, which, together with the Area Under the Curve (AUC), provides both a quantitative and visual measure of a model’s ability to correctly discriminate between patients who will experience complications and those who will not.
Methodology
In order to develop and evaluate predictive models, the association between patient characteristics and outcomes was first examined through univariate analysis, which was based on the chi-square test or Fisher’s exact test when required. Logistic regression models were then constructed and applied to estimate the probability of mortality, morbidity, or prolonged LOS. Risk scores were encoded into eight distinct categories, which were used as independent variables in the statistical models. In the case of multiparametric risk indices, the original categories proposed by the developers of each risk index were used rather than the total scores, allowing a more precise representation of the associations.
ROC Curve and AUC
The ROC curve is one of the most widely used tools for evaluating the accuracy of a diagnostic or predictive model. Its graphical representation depicts the relationship between the true positive rate (sensitivity) and the false positive rate (1-specificity). Ideally, the ROC curve approaches the upper left corner of the graph, where the test achieves 100% sensitivity and 100% specificity. Conversely, a curve that coincides with the diagonal line of chance (AUC = 0.5) indicates that the model has no predictive value. The area under the ROC curve is a numerical index that expresses the probability that the model will correctly classify a patient with a complication compared to a patient without a complication. An AUC value equal to 1 represents perfect prediction, while values below 0.7 indicate low predictive ability. On the other hand, values greater than 0.7 confirm the reliability and usefulness of a risk index as a predictive tool. In this study, ROC curves were calculated for all risk indicators, including the CARE score, while comparisons were carried out using the non-parametric approach of DeLong et al., which allows accurate statistical comparisons of AUC values across different models.
Calibration and Goodness-of-Fit
Beyond discrimination, calibration is equally important, referring to the degree to which the predicted probabilities of an outcome agree with the actually observed outcomes. In the present analysis, calibration was assessed with Pearson’s chi-square goodness-of-fit test. A low chi-square value indicates that the model provides predictions that are close to reality and is therefore characterized by satisfactory fit. In addition, inter-rater calibration of the CARE score was measured using Cohen’s kappa coefficient, which evaluates the agreement between independent raters, such as anesthesiologists and researchers. This procedure was applied first to the overall study population and then separately to the reference and validation cohorts, in order to determine whether the repeated use of the CARE score led to improved accuracy over time.
Results and Comparisons
The predictive models developed were also compared with other clinical indicators commonly used in daily practice. These included the ASA physical status classification, the New York Heart Association (NYHA) classification of heart failure, the left ventricular ejection fraction, patient age, serum creatinine levels, the urgency of surgery, and the type of surgical procedure. By applying ROC analysis and calculating the corresponding AUC values, it was possible to conduct a comparative evaluation of the predictive power of each indicator. Through this process, the study identified which algorithms provided higher discriminatory ability and which performed less effectively, thus supporting a more rational choice of the appropriate predictive model for clinical use.
Conclusions
The use of the ROC curve and the AUC is a fundamental component in the evaluation of predictive models for clinical outcomes. This methodology offers both a quantitative and a graphical approach to estimating the discriminatory ability of models, while at the same time enabling statistical comparisons across different algorithms. In combination with calibration assessment through the chi-square test, it ensures that the results are not only statistically significant but also clinically reliable. The application of these methods in large patient cohorts, as in the study under discussion, strengthens the ability of clinicians to make evidence-based decisions, improves the prediction of postoperative outcomes, and ultimately contributes to enhancing the overall quality of care delivered to patients.