[1]朱丽萍,王存泽,王,等.预测新冠病毒感染患者有无症状的机器学习模型的构建与验证[J].福建医药杂志,2023,45(04):107-111.
 ZHU Liping,WANG Cunze,WANG Ling.Development and validation of a machine learning model to predict the symptomatic status of COVID-19 patients[J].FUJIAN MEDICAL JOURNAL,2023,45(04):107-111.
点击复制

预测新冠病毒感染患者有无症状的机器学习模型的构建与验证()
分享到:

《福建医药杂志》[ISSN:1002-2600/CN:35-1071/R]

卷:
45
期数:
2023年04期
页码:
107-111
栏目:
基础研究
出版日期:
2023-08-15

文章信息/Info

Title:
Development and validation of a machine learning model to predict the symptomatic status of COVID-19 patients
文章编号:
1002-2600(2023)04-0107-05
作者:
朱丽萍王存泽12
福建省立医院药学部(福州 350001)
Author(s):
ZHU Liping WANG Cunze WANG Ling
Department of Pharmacy, Fujian Provincial Hospital, Fuzhou, Fujian 350001, China
关键词:
新冠病毒感染 无症状 机器学习 GBM 预测
Keywords:
COVID-19 asymptomatic machine learning GBM prediction
分类号:
R181.8
文献标志码:
B
摘要:
目的 为科学合理分配医疗资源,提高救治率,探讨机器学习算法用于预测新冠病毒感染后是否出现症状的效果。方法 回顾性收集 2022年12月至2023年2月在某三甲医院确诊为新冠病毒感染患者的临床信息,并随机分为训练集(75%)和测试集(25%)。采用单因素logistic分析及最小绝对收缩和选择算子(LASSO)算法筛选出特征变量。采用fully connected deep neural network(FCDNN)、distributed random forest(DRF)、gradient boosting machine(GBM)以及generalized linear model(GLM)4种机器学习分类器,在训练集中进行模型的构建,并在验证集中验证最佳模型。采用受试者工作特征(ROC)曲线下面积(AUC)、逻辑回归损失(Logloss)、均方根误差(RMSE)和均方误差(MSE)评价机器学习的模型效能。应用基尼指数评价最优模型特征变量的重要性。结果 共251例患者纳入分析,其中训练集154例,验证集97例。经单因素logistic分析和LASSO计算后,筛选出年龄、长期饮酒史、睡眠欠佳比率、进食欠佳比率、糖尿病患病率、高血压患病率、其他疾病患病率、基础用药率、其他用药率、呼吸频率以及新冠病毒N基因的CT值等11个特征变量构建机器学习预测模型。4个机器学习模型中,GBM模型的AUC最高,而Logloss、RMSE、MSE最低,GBM模型在训练集和验证集中的 AUC 分别为 0.878 0、0.793 3。采用基尼指数评价特征变量的重要性,结果显示变量的重要性依次为N基因CT值、年龄、患其他疾病、呼吸频率、患高血压或糖尿病、长期饮酒史、进食欠佳和睡眠欠佳。结论 本研究开发并验证了一个GBM预测模型,在预测新冠病毒感染后有无症状上具有良好效能,能为患者后续的诊疗策略制定和医疗资源的分配提供重要参考。
Abstract:
Objective To scientifically and reasonably allocate medical resources,improve the treatment rate,and explore the effectiveness of machine learning algorithms in predicting symptoms after COVID-19 infection. Methods Clinical information of confirmed COVID-19 patients in a tertiary hospital from December 2022 to February 2023 was analyzed retrospectively.All patients were randomly divided into a training set(75%)and a test set(25%). Univariate logistic analysis and the least absolute shrinkage and selection operator(LASSO)algorithm were used to select feature variables. Four machine learning classifiers,including fully connected deep neural network(FCDNN),distributed random forest(DRF),gradient boosting machine(GBM),and generalized linear model(GLM),were used to construct models in the training set and validated the best model in the validation set. Receiver operating characteristic(ROC)curve area under the curve(AUC),logistic regression loss(Logloss),root mean square error(RMSE),and mean square error(MSE)were used to evaluate the performance of the machine learning models. The Gini index was used to evaluate the importance of the optimal model's feature variables. Results A total of 251 patients were included in the analysis,with 154 patients in the training set and 97 patients in the validation set. After univariate logistic analysis and LASSO calculation,11 feature variables were selected,including age,long-term alcohol drinking history,poor sleep ratio,poor eating ratio,diabetes prevalence,hypertension prevalence,other disease prevalence,baseline medication rate,other medication rate,respiratory rate,and CT value of the COVID-19 N gene. Among the four machine learning models,the GBM model had the highest AUC and the lowest Logloss,RMSE,and MSE. The AUC of the GBM model in the training set and validation set were 0.878 0 and 0.793 3,respectively. The importance of the feature variables evaluated by the Gini index was as follows: CT value of N gene,age,other diseases,respiratory rate,hypertension or diabetes,long-term alcohol drinking history,poor eating,and poor sleep. Conclusion This study developed and validated a GBM prediction model that demonstrated good performance in predicting symptoms after COVID-19 infection. It can provide important reference for subsequent diagnosis and treatment strategies as well as the allocation of medical resources for patients.

参考文献/References:

[1] Thompson H A,Mousa A,Dighe A,et al.Severe acute respiratory syndrome coronavirus 2(SARS-CoV-2)setting-specific transmission rates:a systematic review and meta-analysis[J].Clinical Infectious Diseases,2021,73(3):e754-e764.
[2] 张恒之,丁中兴,沈明望,等.新型冠状病毒疫情防控中的理论流行病学模型研究进展[J].中华预防医学杂志,2021,55(10):1256-1262.
[3] Yang S,Jiang L,Cao Z,et al.Deep learning for detecting corona virus disease 2019(COVID-19)on high-resolution computed tomography:a pilot study[J].Annals of Translational Medicine,2020,8(7):450.
[4] Alimadadi A,Aryal S,Manandhar I,et al.Artificial intelligence and machine learning to fight COVID-19 [J].Physiological Genomics,2020,52(4):200-202.
[5] Wang W,Xu Y,Gao R,et al.Detection of SARS-CoV-2 in different types of clinical specimens[J].JAMA,2020,323(18):1843-1844.
[6] Sun B,Feng Y,Mo X,et al.Kinetics of SARS-CoV-2 specific IgM and IgG responses in COVID-19 patients[J].Emerging Microbes & Infections,2020,9(1):940-948.
[7] Saurabh S,Verma M K,Gautam V,et al.Tobacco,alcohol use and other risk factors for developing symptomatic COVID-19 vs asymptomatic SARS-CoV-2 infection:a case-control study from western Rajasthan,India[J].Transactions of the Royal Society of Tropical Medicine and Hygiene,2021,115(7):820-831.
[8] Lima-Martínez M M,Carrera Boada C,Madera-Silva M D,et al.COVID-19 and diabetes:a bidirectional relationship[J].Clinical and Research in Arteriosclerosis,2021,33(3):151-157.
[9] Chick J.Alcohol and COVID-19[J].Alcohol and Alcoholism,2020,55(4):341-342.
[10] Duntas L H,Jonklaas J.COVID-19 and thyroid diseases:a bidirectional impact[J].Journal of the Endocrine Society,2021,5(8):bvab076.
[11] Nowak J K,Lindstrøm J C,Kalla R,et al.Age,Inflammation,and disease location are critical determinants of intestinal expression of SARS-CoV-2 receptor ACE2 and TMPRSS2 in inflammatory bowel disease[J].Gastroenterology,2020,159(3):1151-1154.
[12] 周子涵,崔炜.心血管系统常用药物对新型冠状病毒肺炎感染风险及不良预后的影响[J].临床荟萃,2022,37(10):869-888.
[13] Aitimwe I G,Pushpakom S P,Turner R l M A,et al.Ccardiovascular drugs and COVID-19 clnical oucomes:a systematic review and meta-analysis of randomized controlled trials[J].Br J Clin Pharmacol,2022,88(8):3577-3599.
[14] Semenzato L,Botton J,Drouin J,et al.Antihypertensive drugs and COVID-19 risk:a cohort study of 2 million hypertensive patients[J].Hypertension,2021,77(3):833-842.
[15] Fernando M E,Drovandi A,Glledge J.Meta-analysis of the association between angiotensin pathway inhitbitors and COVID-19 severty and mortally[J].Syst Rev,2021,10(1):243.

备注/Memo

备注/Memo:
基金项目:福建省自然科学基金面上项目(2020J011094)
1 福建医科大学药学院; 2 通信作者,Email:summerjuling@126.com
更新日期/Last Update: 2023-08-15