The use of prehospital early warning scores (EWS) in ambulance services settings is widely advocated, their aim being to identify patients at risk of clinical deterioration, early in their clinical course (Lindskou et al, 2023). Early warning scores allow the clinician to calculate a risk score for an individual patient (Ann and Rupert, 2012). This score is based upon their clinical observations and vital signs at the time of assessment, with the resulting score providing an indication as to their risk (Martín-Rodríguez et al, 2020). Higher scores are indicative of a higher risk of adverse outcome and deterioration, and serve to identify patients requiring an increased clinical response (Pirneskoski et al, 2019). Early warning scores can be applied across a range of conditions and may be generic in nature, although tools also exist for specific conditions such as sepsis (Maciver, 2021). Local healthcare systems set threshold values for the resultant score to guide clinical decision-making, triage, and response decisions (Goodacre et al, 2023). Care must be taken to maintain a balance, ensuring that the risks of overlooking potentially critically unwell patients are weighed against the challenge of prioritising too many patients and overwhelming healthcare systems (Goodacre et al, 2023).
Acknowledging that compared to in-hospital ward settings, there is little published evidence to determine the optimal EWS for emergency department and prehospital care use, the systematic review undertaken by Guan et al (2022) seeks to determine which EWS best predicts in-hospital deterioration of patients when applied in the emergency department (ED) or within the prehospital setting (Guan et al, 2022). This systematic review and meta-analysis aimed to articulate the pooled odds of predicting clinical deterioration in hospitalised patients through the stratification of the EWS score as determined in the ED and prehospital settings. The impacts assessed included short-term (≤3-day) and long-term (≤30-day), mortality and intensive care unit (ICU) admission, together with overall lengths of hospital stay and cardiac or respiratory arrests, all investigated through consideration of the current evidence base.
Aim of commentary
This commentary aims to critically appraise the methods used within the review by Guan et al (2022) and expand upon its findings in the context of clinical practice.
Methods
This preregistered systematic review undertook a comprehensive multi-database search from February 2006 to February 2021. Screening of all included studies was undertaken to identify additional papers. Only experimental, quasi-experimental, or observational studies published in English, which assessed EWS in individuals aged 14 years or older in either an ED or prehospital settings, were included. The five tests of focus were:
National Early Warning Score (NEWS) 1 & 2. These tests were assessed regarding their ability to predict both short-term (3 days) and long-term mortality (30 days). Screening, data extraction and assessment of quality (Newcastle-Ottawa Scale) were undertaken by at least two reviewers independently. A meta-analysis was conducted using a random-effects model to calculate a diagnostic odds ratio (DOR) along with its corresponding 95% confidence interval. Heterogeneity was assessed using the I2 statistic. Publication bias was assessed by visual inspection of a funnel plot. A sensitivity analysis was conducted to evaluate the impact of the high risk of bias studies.
Results
After duplicate removal, 8972 papers were identified; after screening, 20 of these were included within the review. Among the included 20 studies, only seven were conducted in the prehospital setting, with the remainder being carried out within EDs. Two studies were classified to be of poor quality; in a sensitivity analysis, when these two studies were excluded, it was observed that their removal did not yield a significant impact on any of the results.
When evaluated for diagnostic accuracy in predicting up to 3-day mortality within the prehospital setting, it was noted that NEWS2 predictive score cut-off points of both ≥5 (DOR 14.06, 95% CI: 9.09 to 21.75, I2=0%) and ≥7 (DOR 12.26, 95% CI: 8.58 to 17.64, I2=4.4%) generated comparable DORs. At a threshold of ≥9, there was a notable enhancement in DORs (DOR 20.37, 95% CI: 13.16 to 31.52, I2=0%). However, owing to substantial imprecision in the estimates observed across all three analyses, the difference between the three thresholds did not achieve statistical significance. Similarly, the NEWS demonstrated a comparable level of accuracy to NEWS2, when both were evaluated at the same cut-off threshold of ≥7 (DOR 11.63, 95% CI: 9.75 to 13.88, I2=0%) within the prehospital setting.
When evaluated for predicting up to 30-day mortality, a NEWS threshold of ≥7 demonstrated a relatively low diagnostic accuracy within the prehospital setting (DOR 2.58, 95% CI: 0.59 to 11.21, I2=99.5%).
When evaluated for diagnostic accuracy in predicting up to 30-day mortality within the ED, there was no statistically significant difference of diagnostic accuracy between MEWS ≥3 (DOR 4.05, 95% CI: 2.35 to 6.99, I2=73.0%), ≥4 (DOR 6.48, 95% CI: 1.83 to 22.89, I2=90%) and NEWS ≥6 (DOR 4.92, 95% CI 2.71–8.96, I2=65.5%). Similarly, there was no statistically significant difference of diagnostic accuracy in predicting up to 30-day mortality within sepsis patients within EDs between MEWS ≥5 (DOR 3.05, 95% CI: 2.00 to 4.65, I2=0%) and NEWS ≥7 (DOR 4.74, 95% CI: 4.08 to 5.50, I2=0.0%). The diagnostic accuracy for MEWS ≥3 for predicting ICU admission was DOR 5.54 (95% CI: 2.02 to 15.21, I2=50.9%). A meta-regression was undertaken for diagnostic accuracy in predicting up to 30-day mortality within EDs.
Unfortunately, the tool with which this assessment was undertaken and at what threshold are not indicated. However, it was noted that 92% of the variance within whatever threshold was assessed could be explained by variation in age. An additional funnel plot assessment of publication bias using Deeks' funnel asymmetry tests was undertaken but was not significant at the highest and lowest thresholds.
Commentary
Critical appraisal of the authors' methods was carried out using a Joanna Briggs Institute (JBI) Critical Appraisal Tool for Systematic Reviews (Aromataris et al, 2015). This revealed a high methodological standard with all criteria achieved, demonstrating a robust process (11 out of 11). The completeness and high-quality approach of the methodology instils confidence that this review provides a comprehensive summary, and contextualisation of the published evidence on the topic. While the methodological approach to this review was sound, the prehospital clinician should read and interpret the results with an awareness of the limitations identified by the authors. These include the lack of power to evaluate medical versus trauma conditions, the limited availability of data pertaining to cardiac and/or respiratory arrest outcome, and the possibility of unknown confounders impacting hospital stay. This, together with the awareness that only seven of 20 papers included in the review were from studies conducted in the prehospital setting or using prehospital data, should inform the interpretation of findings and their translation to prehospital or paramedic practice.
The review demonstrated that, within the studies included for predicting thresholds, the cut-off points applied to EWS within the ED setting are lower than those used in the prehospital setting. The reporting of high cut-off points in the prehospital setting is potentially due to the need to strike a balance in sensitivity and specificity, since lower cut-off points would theoretically result in poorer sensitivity in the prehospital setting. This is compounded by the short duration of the interaction between prehospital clinicians and patients, potentially affecting the ability to achieve a reliable EWS.
From a prehospital perspective, the findings suggest that EWS scores applied in the prehospital setting may not accurately predict long-term events of 30-day mortality. This is potentially of relevance to the prehospital clinician in the context of the observation that EWS in the prehospital setting appear to be more accurate when managing more critically ill or compromised patients, and may not therefore be as applicable to patients outside of this cohort. As the balance between urgent and emergency presentations to ambulance services shifts towards those with urgent—rather than emergency—care needs, it may be the case that there is less reliability of EWS for those who potentially make up a large proportion of the population served by ambulance clinicians (Eaton, 2023).
However, caution must be applied to this inference given the large range in the confidence intervals presented, the non-statistically significant findings, and substantial heterogeneity found. Given these issues there is a significant degree of uncertainty in this result and the ability to draw definitive conclusions from the evidence presented within the review. A more specific systematic review looking at only NEWS and NEWS2 in any clinical setting found similar findings regarding these tools having poor predictive accuracy for all deaths within 30 days (Holland and Kellett, 2022).
The review did however demonstrate that EWS scores used in the prehospital setting can predict short-term clinical decline (up to 3-day mortality). With NEWS2 now widely adopted across ambulance services in England, it is important to be aware of the varying diagnostic accuracy produced at different thresholds (NHS England, 2018). When comparing different threshold scores of NEWS2, there was no distinct differentiation in the test's ability to predict up to 3-day mortality. This limited differentiation between tests was mainly caused by the wide confidence intervals presented. Although the review findings suggested that a NEWS2 score ≥9 might offer improved diagnostic accuracy, this finding lacked statistical significance when compared to alternative thresholds and tests. Prehospital clinicians should take note that the observations about the wide range of confidence intervals in the review's results still hold true, although to a lesser extent than in the case of long-term events. This variance in confidence intervals reduces the certainty of the presented estimates.
These findings related to NEWS2 are in harmony with a recent, slightly broader systematic review that delved into the diagnostic accuracy of short-term mortality prediction using EWS in the outpatient emergency care scenario (Burgos-Esteban et al, 2022). This review used a slightly different method of assessment regarding a descriptive analysis of the area under the receiver operating characteristic (ROC) curve. Unfortunately, the DOR does not provide additional information regarding specificity and sensitivity, as it is a combination of both which make up this estimate. Nevertheless, it does align with the findings that NEWS2 is reasonably accurate in predicting short-term mortality.
As highlighted in this review, there is still substantial uncertainty with regards to the predictive ability of EWS tools within the prehospital setting. Within ED settings, the meta-regression highlighted the possibility that the moderating factor of age may influence these tools' ability to predict short-term and long-term mortality. However, due to the limited number of studies within the prehospital setting, this valuable analysis was unable to take place. Therefore, future studies should aim to report and explore moderating factors in the long-term predictive ability of these tools within the prehospital setting, together with reassessing the tools identified in this review with the aim of assessing similar thresholds.
In evaluating long-term predictive capabilities in the prehospital setting, only the older NEWS tool could be assessed, highlighting the need for future research to scrutinise the newer NEWS2 for its long-term diagnostic predictive accuracy. Additionally, this review exclusively presented a combined measure of DOR, lacking the exploration of how the tool performs in terms of sensitivity and specificity. Therefore, future reviews should not only assess DOR, but also report both sensitivity and specificity, along with subsequent measurements to provide a comprehensive understanding of the tool's diagnostic performance.
This review found that the application and study of EWS scores within the ED is well documented, but only limited studies and evidence were found to assess their applicability in the prehospital setting. This finding, together with the results of the systematic review and particularly the meta-analysis, indicate that a degree of caution is necessary in drawing definitive conclusions regarding the use and reliability of EWS in the prehospital context. While future research may lead to further improvements and refinements to EWS for deterioration risk identification in patients presenting in the prehospital context, scores based on currently measured physiological parameters will need careful consideration regarding sensitivity and specificity, to ensure that clinical cut-offs and decision-making deliver real improvements over the current available EWS.