To implement and evaluate machine learning (ML) algorithms for the prediction of COVID-19 diagnosis, severity, and fatality and to assess biomarkers potentially associated with these outcomes.
Serum (n = 96) and plasma (n = 96) samples from patients with COVID-19 (acute, severe and fatal illness) from two independent hospitals in China were analyzed by LC-MS. Samples from healthy volunteers and from patients with pneumonia caused by other viruses (i.e. negative RT-PCR for COVID-19) were used as controls. Seven different ML-based models were built: PLS-DA, ANNDA, XGBoostDA, SIMCA, SVM, LREG and KNN.
The PLS-DA model presented the best performance for both datasets, with accuracy rates to predict the diagnosis, severity and fatality of COVID-19 of 93%, 94% and 97%, respectively. Low levels of the metabolites ribothymidine, 4-hydroxyphenylacetoylcarnitine and uridine were associated with COVID-19 positivity, whereas high levels of N-acetyl-glucosamine-1-phosphate, cysteinylglycine, methyl isobutyrate, l-ornithine and 5,6-dihydro-5-methyluracil were significantly related to greater severity and fatality from COVID-19.
The PLS-DA model can help to predict SARS-CoV-2 diagnosis, severity and fatality in daily practice. Some biomarkers typically increased in COVID-19 patients’ serum or plasma (i.e. ribothymidine, N-acetyl-glucosamine-1-phosphate, l-ornithine, 5,6-dihydro-5-methyluracil) should be further evaluated as prognostic indicators of the disease.