Optimizing Suicide Risk Prediction in Korea: A Comparison of Model Performance Using Resampling Methods and Machine Learning Algorithms
Article information
Abstract
Objective
Machine learning (ML) can assist in predicting suicide risk and identifying associated risk factors. Various resampling methods and algorithms must be applied to develop an ML prediction model with better performance. In this study, we developed an optimal Korean suicide prediction model by applying five ML algorithms, unsampled data, and two resampling methods.
Methods
In this study, data from the Korea National Health and Nutrition Examination Survey for 2017, 2019, and 2021 were integrated and analyzed to predict suicidal ideation in subjects aged ≥19 years. Logistic regression, random forest (RF), k-nearest neighbor, gradient boosting, and adaptive boosting were used as ML algorithms. Undersampling and oversampling are used as resampling methods to solve data imbalance problems.
Results
Among the study participants, 16,947 (95.14%) and 866 (4.86%) belonged to the control and suicidal ideation groups, respectively. Among the 15 ML models, the RF model exhibited excellent performance (sensitivity=0.781, area under the curve=0.870) in an algorithm trained with undersampled data.
Conclusion
Developing an optimized Korean suicide prediction model through additional validation based on the ML model developed in this study will help predict suicide risk factors caused by the interaction of individual, social, and environmental factors.
INTRODUCTION
Machine learning (ML) is a field of artificial intelligence (AI) in which software learns based on data, identifies patterns, and automates the creation of models for data analysis [1]. ML-based prediction models are becoming increasingly useful in mental healthcare. A previous study combined linguistic features and social media behavioral data to predict the occurrence of depression with 70% accuracy [2]. In a study targeting older individuals, a prediction model that integrated clinical variables, such as apathy, increased antidepressant resistance, and cognitive and magnetic resonance imaging data, predicted the diagnosis and treatment response of late-life depression [3,4]. Big data analysis combined with ML helps in the early detection and diagnosis of mental illness [5,6] and provides personalized treatment methods by predicting treatment responses [5-8]. Suicide occurs through a complex combination of personal, social, and environmental factors, with prevention being the only effective measure. Considering these characteristics, developing an ML prediction model will help detect and predict risk factors for suicide. Previous suicide-related studies have analyzed clinical and general population-based prospective and retrospective data using general statistical and sampling methods [9,10]. The development of an ML-based prediction model complements the limitations of previous studies. It is an efficient way to develop measures to reduce the high suicide rate, which is a major social issue in South Korea [5].
Representative risk factors for suicide include individual psychological factors [11-15], physical diseases [13,15,16], social and environmental factors [17,18], and lifestyle habits [19,20]. Thus far, prospective and retrospective studies have analyzed risk factors for suicide. Retrospective studies provide limited and insufficient information, because the clinical characteristics of risk factors for suicide depend on self-reports or medical records [21]. Prospective studies require a long follow-up period and consent from the subjects; therefore, it is difficult to study suicide deaths and the subjects are limited to suicide attempters [9,10,21]. ML presents a model based on a prospective prediction model free from the dependence on retrospective information [5]. Furthermore, ML research focuses on prediction (the explanatory power of the model) rather than inference (hypothesis testing), and simultaneously handles many predictors [5]. ML can be used to develop models that predict suicide risk by comprehensively considering various suicide-related factors.
Previous studies have used ML to predict suicide risk and identify risk factors. An ML model to predict suicide attempts among subjects with mood disorders was developed with an accuracy of 64.7%–72% [22]. The ML model was developed to identify suicide risk factors and predict suicide attempts by applying the random forest (RF) algorithm to sociodemographic data and psychiatric scales [23]. In a meta-analysis of 87 studies that used ML to develop a prediction model for suicidal behavior, the AI/ML method predicted suicidal behavior with >90% accuracy [24]. As suicide is due to a complex combination of social and environmental factors as well as individual characteristics [25], it is difficult to apply overseas ML models. Therefore, it is necessary to develop an ML prediction model that uses large-scale domestic data.
In a previous study, suicide risk was predicted using data from the Korea National Health and Nutrition Examination Survey (KNHANES). However, only one ML algorithm was used, making it impossible to compare the model performance with other ML algorithms [26,27]. Another study using KNHANES data used six ML algorithms, but only one data resampling method was used, making it impossible to compare the model performance with other resampling data [28]. In ML studies predicting suicide using population-based datasets, there is a class imbalance problem, in which there are fewer suicidal cases than non-suicidal cases [29]. Therefore, a method to resample the dataset is required to improve model performance [29]. The dataset could be adjusted through resampling techniques to address the issue of class imbalance in the data distribution. Resampling can be performed as follows: replicating data from the minority class, increasing the number of samples by creating virtual samples, or removing some data from the majority class. This study aimed to develop an optimal Korean suicide prediction model by applying various ML algorithms and data resampling using the KNHANES data.
METHODS
Data
The Korea Centers for Disease Control and Prevention has been administering the KNHANES since 1998 to assess Korean individuals’ general health and nutritional status. This study used data from the 7th (2016–2018) and the 8th (2019–2021) KNHANES [30-32]. Among the KNHANES data, those from 2017, 2019, and 2021 were investigated and analyzed to predict whether participants aged 19 years or older had suicidal ideation. The data used in this study were obtained from the KNHANES and Nutrition Examination Survey website (https://knhanes.kdca.go.kr/knhanes/main.do).
Subjects
This study included 17,813 adults aged ≥19 years. In the KNHANES, those who answered “yes” to the question “Have you ever seriously thought about suicide in the past year?” were classified into the suicidal ideation group, and those who answered “no” were classified into the non-suicidal ideation group.
Exploring risk factors for suicidal ideation
Statistical analysis
KNHANES data used in this study were extracted using two-stage stratified cluster sampling. In data analysis, the complex structure of the sampling procedure was incorporated using weights. A comparative analysis was performed between the suicidal and non-suicidal ideation groups. For continuous variables, weighted means (standard errors) were calculated, and the means between the groups were compared using an independent t-test. Categorical variables were compared by calculating the number of subjects and weighted proportions and applying the Rao–Scott chi-squared test. For statistical analysis, the PROC SURVEY procedure in SAS ver. 9.4 (SAS Institute) was used, and statistical significance was set at p<0.05.
ML analysis
Data processing and set assignment
Of the 17,813 participants, 866 (4.8%) belonged to the suicidal ideation group, representing an imbalanced class. Compared with a class that is infrequently present in the dataset, the prevalent class is called the majority class [33]. To solve this problem, the raw data were separated in a 7:3 ratio for training and testing, and undersampling and oversampling were performed using the training set. Undersampling uses majority classes (non-suicidal ideation group) by randomly selecting samples, whereas oversampling uses the synthetic minority oversampling technique (SMOTE). SMOTE is a novel approach to oversampling data in the case of the present imbalanced data [33]. To compare the test performances of the models trained with each dataset, the test set was split from the raw data, and the test performance was calculated using the same test data (Figure 1).
Input variables
We used the following variables to develop the ML algorithms: sociodemographic characteristics (age, sex, town, medical care, household income, educational status, housing type, marital status, and job position), health behaviors (subjective health status, stress awareness, depressive mood: depressive mood was determined based on whether they have depressed continuously for more than 2 weeks in past year, weight change in the past 1 year, walking, smoking, and drinking), and presence of chronic diseases (hypertension, dyslipidemia, stroke, osteoarthritis, rheumatic arthritis, osteoporosis, diabetes mellitus, renal failure, and liver cirrhosis).
Feature selection
Logistic regression (LR), RF, k-nearest neighbors (kNN), gradient boosting (GB), and adaptive boosting (AB) were used to select features related to suicidal ideation. The feature importance was measured using permutation importance. Permutation importance is a test method that measures how much the model’s prediction error increases after permutation, which is a method of calculating feature by shuffling a random set of ordered items into a different order. This is interpreted as the feature with the highest value being more important, and the value decreasing as less important. This was implemented using the scikit-learn package (https://scikit-learn.org/stable/).
ML analysis
All ML models were implemented using Python and the scikit-learn package. An ML approach was used to identify the suicidal ideation group and the risk factors contributing to suicidal ideation. LR, RF, kNN, GB, and AB algorithms were used in the study. LR is a regularization technique that minimizes the error between predictions and actual observations by imposing a penalty on variables that contribute little to the results. This study uses the ridge (L2) and lasso (L1) penalty terms. The L2 penalty reduces the coefficients of low-contribution variables to approximately zero by adding a penalty corresponding to the square of the coefficient. The L1 penalty adds a penalty corresponding to the sum of the absolute values, making the coefficients of the variables with low contributions zero [34]. RF is an ensemble algorithm consisting of multiple decision trees. A decision tree creates a model with a tree structure according to decision rules that determine the direction of branching from the features (variables). The impurities in each feature were measured, and the feature with the greatest decrease was selected. The RF algorithm is an extension of the bootstrap aggregation (bagging) method for ensembling multiple independent decision trees using bagging and characteristic randomness [35]. kNN is a method for grouping data points for classification or prediction based on similarity measurement techniques. Among the kNNs measured with the distance function that measures similarity, this algorithm assigns the most similar class and classifies it according to the majority vote of the neighbors [36]. GB and AB are ML methods based on boosting. Unlike bagging, boosting is a sequential learning method that sequentially learns several simple tree models, such as weak learners. This method assigns weights to the error data of the first learned model so that the second learner can perform better classification. This was repeated sequentially to provide a model with a strong performance. The difference between GB and AB is the method of applying weights. GB assigns weights using a gradient descent, whereas AB assigns more weights to incorrect samples [37].
Performance evaluation
Each model was evaluated using a test set. The performance of each model was evaluated using the F1 score, precision, sensitivity, specificity, and area under the curve (AUC) (Supplementary Table 1). Positive results correspond to the suicidal ideation group. The sensitivity is the ratio of people predicted to be positive by the model to those who are actually positive (true positive [TP]/positive [P]). The specificity is the ratio of people predicted to be negative by the model to those who are actually negative (true negative [TN]/negative [N]). Precision is the ratio of the actual positives to those predicted by the model to be positive (TP/predicted positive [PP]). The F1 score is the harmonic mean of precision and sensitivity and is an indicator that evaluates model performance when the data target is unbalanced. The AUC was calculated using a receiver operating characteristic (ROC) curve. An AUC value close to 0 indicates poor model prediction, whereas a value close to 1 indicates better model prediction [38].
All participants provided written informed consent to participate in the KNHANES. The Institutional Review Board of the Gyeongsang National University Changwon Hospital in Korea approved the use of publicly available data for statistical analysis (IRB No. 2024-01-020).
RESULTS
Characteristics of the participants
Among the study participants, 16,947 were in the non-suicidal ideation group and 866 were in the suicidal ideation group. Walking was more common in the non-suicidal ideation group than in the suicidal ideation group. Old age, low household income, low educational status, single detached house or tenement, separated or unmarried status, unemployment, medical care, poor subjective health status, hypertension, dyslipidemia, stroke, osteoarthritis, diabetes mellitus, stress awareness, depressive mood, weight change in the past 1 year, current smoking, and high-risk drinking were more frequent in the suicidal ideation group than in the non-suicidal ideation group (Table 1).
Selected features from the ML models
The features selected using the unsampled data are listed in Supplementary Table 2. Each algorithm adopted 10–40 features to construct a predictive model. The AB algorithm selected the lowest number (n=10), whereas the RF algorithm selected the highest (n=40). Supplementary Table 3 summarizes the features selected using undersampling data. The AB algorithm selected the least number (n=17) and RF selected the greatest number (n=40). Supplementary Table 4 summarizes the results of variable selection using the oversampled data. The AB algorithm selected the lowest number (n=31), whereas the LR, RF, and GB algorithms selected the highest number (n=39). Several common variables associated with suicidal ideation were identified by using the predictive model developed in this study. These included age, self-reported health status, stress awareness, depressive mood, weight changes over the past year, current smoking habits, being in the lowest household income quartile, having at least a university degree, and being a permanent employee.
Predictive performance of the ML models
The performance metrics of the models for predicting suicidal ideation using unsampled, undersampled, and oversampled data are presented in Table 2. For unsampled data, GB had the highest AUC (0.876), lowest sensitivity (0.092), and a specificity of 0.995. For the undersampled data, RF had the highest AUC (0.870) and sensitivity (0.781). For the oversampled data, GB had the highest AUC (0.872), sensitivity of 0.312, and specificity of 0.973. Figure 2 shows the ROC curves of LR, RF, kNN, GB, and AB for predicting suicidal ideation. The ROC curve using unsampled data showed that GB had the highest AUC (0.876), followed by AB (0.872). The ROC curve with under-sampled data showed that RF had the highest AUC (0.870), followed by LR (0.869) and GB (0.869). The ROC curve with oversampled data showed that GB had the highest AUC (0.872), followed by AB (0.867).
Performance of the prediction model for suicidal ideation using machine-learning algorithms trained on unsampling data, undersampling data, oversampling data
ROC curves for five machine-learning models in identifying suicidal ideation. A: Unsampling data. B: Undersampling data. C: Oversampling data. ROC, receiver operating characteristic; AUC, area under the curve.
Figure 3 illustrates the confusion matrices when the five models with unsampled, undersampled, and oversampled data were applied to the test data. Among the ML models trained with unsampled data, the AB model had the highest sensitivity (55/260=0.212). Among the ML models trained with the undersampled data, the RF model had the highest sensitivity (203/260=0.781), whereas with the oversampled data, the LR model had the highest sensitivity (128/260=0.492). According to the comprehensive performance comparison, the RF model performed best in predicting suicidal ideation using undersampled data (sensitivity=0.781, AUC=0.870).
DISCUSSION
In this study, data representing adults from the general Korean population were analyzed using five ML algorithms and three data sampling methods. Fifteen prediction models are compared to develop an optimal suicide prediction model. As the purpose of this study was to predict suicidal ideation in adults in the general population, it was necessary to develop a model with high sensitivity and AUC. Among the algorithms trained using undersampling data, the RF model, with 87% accuracy (AUC=0.870, 95% confidence interval: 0.841–0.898) and a sensitivity of 0.781, was able to classify the suicidal ideation and non-suicidal ideation groups better than the other models. Previous studies that developed models to predict the risk of suicidal ideation using ML suggested that many input variables or various ML-based algorithms are required to develop a better-performing model [28,39]. Our study complements the limitations of data sampling methods and ML algorithms used in previous suicide-related studies. First, we developed a prediction model with better performance than the existing models (accuracy: 71.9% to 81.0%) by applying five ML algorithms using a nationally representative dataset containing various variables related to personal, social, and environmental factors. Second, we compared the performances of the prediction models by applying various data resampling methods, such as undersampling and oversampling.
In previous studies that predicted suicidal ideators and suicide attempters using KNHANES data, one ML algorithm and one data resampling method were used [26,27]. Previous studies used only one ML algorithm (RF), but we used five ML algorithms and compared their performances with those of other ML algorithms. In our study, the GB algorithm trained with oversampled and unsampled data showed better AUC (0.872 and 0.876, respectively) than the other models; however, its sensitivity (0.312 and 0.092, respectively) was relatively low. When we compared the model performances of various ML algorithms, the one with the best performance was the RF model using undersampled data (sensitivity=0.781, AUC=0.870).
This study used two types of data resampling (undersampling and oversampling), whereas previous ML studies used only one type of resampling data [26,27]. Among all participants, there was a significant difference in the numbers between the suicidal ideation group (n=866) and the non-suicidal ideation group (n=16,947). When such imbalanced data are used directly, they tend to be biased toward the majority class. Therefore, data resampling was necessary. 29,40 Previous studies that utilized ML employed a type of oversampling approach, SMOTE, to attenuate the class imbalance problem [27] or the undersampling method [26]. However, despite the data resampling of the prediction model, there are limitations in the evaluation of actual data affected by a biased class ratio [27]. Similarly, the prediction model in a previous study, trained with the RF algorithm by undersampling, had a positive predictive value (PPV) of 0.787 in the test set. However, when applied to the entire population, PPV was as low as 0.462 [26]. Because the previous study used resampled data as a test set, it is possible that the PPV was high and did not reflect the actual phenomenon. In our study, the test set was split from the raw data, and the resampled data were applied only to the training set to develop a model that could evaluate the actual data.
In a previous study predicting suicidal ideation, six ML algorithms were used. However, only one data resampling (oversampling) method was used [28]. It was possible that the performance of the ML algorithm improved due to the lower prevalence of the minority class in the test set (AUC=0.794 to 0.877) [28]. In our study, the performance of the ML algorithm with unsampled data showed a high AUC (0.876) despite a low sensitivity (0.092). Therefore, various data resampling methods were applied, and the performance of the prediction model was evaluated using the sensitivity and AUC. In this study, the sensitivity and AUC of the prediction model trained with the RF algorithm using unsampled, undersampled, and oversampled data were as follows: AUC=0.731, sensitivity=0.173; AUC=0.870, sensitivity=0.781; AUC=0.840, sensitivity=0.162. As a result of this study, the data resampling method used to improve the performance of the suicide prediction model was undersampling.
This study has several limitations. First, we used data from the KNHANES on suicidal ideation and psychological status with simple questions, which may have affected the model’s performance. Second, because we developed an ML model using cross-sectional data, it was difficult to determine whether similar results could be obtained using longitudinal data. Third, additional analyses are required to compare the performance of the prediction models with those of other ML algorithms, such as artificial neural networks and support vector machines. Despite these limitations, this study is the first to develop an ML model for predicting suicidal ideation in Korean adults by comparing the performances of five ML algorithms and two data resampling methods. The prediction model developed in this study will help predict complex suicide risk factors arising from the interaction of individual, social, and environmental factors and develop interventions to improve them. The results of this study are significant for a sustainable Korean suicidal ideation prediction model that can be developed through ML using integrated data collected annually.
Supplementary Materials
The Supplement is available with this article at https://doi.org/10.30773/pi.2025.0187.
Measuring performance
Features associated with suicidal ideation selected using machine-learning algorithms that use unsampled data
Features associated with suicidal ideation selected using machine-learning algorithms that use undersampled data
Features associated with suicidal ideation selected using machine-learning algorithms that use oversampled data
Notes
Availability of Data and Material
The data are available at https://knhanes.kdca.go.kr/knhanes/main.do.
Conflicts of Interest
The authors have no potential conflicts of interest to disclose.
Author Contributions
Conceptualization: Eunji Lim, Dongyun Lee. Data curation: Sung Hyo Seo. Formal analysis: Eunji Lim, Nuree Kang. Funding acquisition: Dongyun Lee. Investigation: Eunji Lim, Sung Hyo Seo, Dongyun Lee. Methodology: Nuree Kang, Soyoung Park. Project administration: Dongyun Lee. Resources: Jae-Won Choi, Dongyun Lee. Supervision: Sung Hyo Seo, Dongyun Lee. Visualization: Eunji Lim, Sung Hyo Seo. Writing— original draft: Eunji Lim. Writing—review & editing: Bong-Jo Kim, Boseok Cha, So-Jin Lee, Sung Hyo Seo, Dongyun Lee.
Funding Statement
This work was supported by the Biomedical Research Institute Fund (GNUCHBRIF-2025-0005) from Gyeongsang National University Hospital in 2025 and a research grant from Gyeongsang National University in 2023.
Acknowledgments
The authors express their sincere gratitude to all the participants for their participation in this study.
