Simultaneous Utilization of Mood Disorder Questionnaire and Bipolar Spectrum Diagnostic Scale for Machine Learning-Based Classification of Patients With Bipolar Disorders and Depressive Disorders
Article information
Abstract
Objective
Bipolar and depressive disorders are distinct disorders with clearly different clinical courses, however, distinguishing between them often presents clinical challenges. This study investigates the utility of self-report questionnaires, the Mood Disorder Questionnaire (MDQ) and Bipolar Spectrum Diagnostic Scale (BSDS), with machine learning-based multivariate analysis, to classify patients with bipolar and depressive disorders.
Methods
A total of 189 patients with bipolar disorders and depressive disorders were included in the study, and all participants completed both the MDQ and BSDS questionnaires. Machine-learning classifiers, including support vector machine (SVM) and linear discriminant analysis (LDA), were exploited for multivariate analysis. Classification performance was assessed through cross-validation.
Results
Both MDQ and BSDS demonstrated significant differences in each item and total scores between the two groups. Machine learning-based multivariate analysis, including SVM, achieved excellent discrimination levels with area under the ROC curve (AUC) values exceeding 0.8 for each questionnaire individually. In particular, the combination of MDQ and BSDS further improved classification performance, yielding an AUC of 0.8762.
Conclusion
This study suggests the application of machine learning to MDQ and BSDS can assist in distinguishing between bipolar and depressive disorders. The potential of combining high-dimensional psychiatric data with machine learning-based multivariate analysis as an effective approach to psychiatric disorders.
INTRODUCTION
Bipolar disorder is distinct from depressive disorder. Bipolar disorder and depressive disorder exhibit disparities not only in terms of clinical symptoms but also in the clinical characteristics associated with a longitudinal illness course [1]. Bipolar depression is difficult to diagnose and is often mistaken for unipolar depression [2]. It is known that more than 10% of patients diagnosed with unipolar depression are eventually diagnosed with bipolar disorder [3]. Goodwin et al. [3] emphasized that erroneous diagnosis could lead to inadequate treatment interventions, thereby exacerbating the long-term prognosis of the patients. The treatment of choice for unipolar depression, antidepressants, can actually worsen the long-term course of bipolar disorder [4]. Antidepressants in bipolar disorder have been associated with increased risks of manic switching, cycle acceleration, as well as deficiencies in tolerance and withdrawal relapse [5,6]. Therefore, mood stabilizing agents are required for bipolar disorder treatment. Optimal treatment including mood stabilizers has been shown to improve long-term outcomes [7]. Therefore, the differentiation between the two disorders is essential in the treatment of bipolar disorder.
Several tools have been developed to distinguish between bipolar and unipolar disorder. The Mood Disorder Questionnaire (MDQ) [8] and Bipolar Spectrum Diagnostic Scale (BSDS) [9] are widely used tools for assessing mood disorders for distinguishing bipolar disorder from unipolar depression. The MDQ is a self-report questionnaire that screens for manic and hypomanic symptoms, while the BSDS is a self-report scale that evaluates a broader range of bipolar spectrum conditions. Several studies have been conducted regarding the differentiation between bipolar patients and unipolar patients. Some studies were performed to distinguish between patients with bipolar disorder and healthy controls [10,11], while others were performed only in the patients with bipolar disorder [8,9,12,13] or in the general population [14]. These scales have already been well-studied in Korean versions as well. The K-MDQ scale demonstrated good internal consistency, three main factors explaining nearly 60% of variance, and a potential cutoff score of 7 for differentiating bipolar disorder from non-clinical participants [10]. A study investigated the validity of the Korean version of the BSDS as a screening tool for bipolar spectrum disorder. The findings suggest that the Korean BSDS is effective for this purpose, with a slightly lower optimal cut-off score compared to the original version while maintaining good sensitivity and specificity [15]. A few differentiating between patients with bipolar disorder and depressive disorder utilized only one of the questionnaires rather than both questionnaires [15,16]. In summary, few studies have differentiated patients with bipolar disorder and depressive disorder using both questionnaires simultaneously.
While various questionnaires were employed for clinical evaluation, most studies were implemented using univariate, instead of multivariate, analysis. Although two studies have performed a combined application in distinguishing bipolar and depressive disorder, they were univariate analysis studies based on cutoffs rather than multivariate analysis [17,18]. Machine learning-based multivariate analysis refers to the application of machine learning algorithms and techniques to analyze and interpret datasets with multiple variables or features [19]. Although univariate analysis is effective in validating hypotheses for each specific variable, it does not fully exploit the information of each item in questionnaire. Multivariate statistics represent the general scope, while univariate statistics are specific instances within the multivariate model [20]. Given the extensive number of variables within the psychiatric questionnaire, the utilization of multivariate techniques allows for a comprehensive analysis, replacing the need for conducting multiple separate univariate analyses.
Considering the characteristics of the questionnaire data and the advantages of multivariate analysis, we hypothesized that simultaneous use of the MDQ and BSDS with machine learning classifiers would be useful in discriminating between patients with bipolar disorder and depressive disorder. We first enrolled patients with bipolar disorders and depressive disorders to complete the MDQ and BSDS. Next, we investigated the differences in each item of two questionnaire in the two groups. Lastly, the prediction performance in the two groups was compared using each set of questionnaires, wherein each set included either the MDQ alone, the BSDS alone, or both questionnaires.
METHODS
Subjects
This study took a retrospective approach, examining data from previous patient records. Participants are patients who visited the hospital from November 2011 to February 2021 among patients with mood symptoms at the Mood Disorder Clinic, an outpatient department within Pusan National University Hospital. Through initial interviews with psychiatrists, patients with the following diagnoses were enrolled based on the DSM-IV-TR criteria with Structured Clinical Interview: bipolar I disorder, bipolar II disorder, bipolar disorder not otherwise specified (NOS), major depressive disorder, depressive disorder NOS, and dysthymia. The final diagnosis was established based on the long-term course observed through outpatient follow-up, thereby minimizing the risk of misdiagnosis. The exclusion criteria encompassed the following: 1) individuals with intellectual disabilities unable to respond to the questionnaire, 2) those with illiteracy, and 3) participants who did not want to provide demographic information. The present study was approved by the Institutional Review Board at Pusan National University Hospital (PNUH IRB: No 2310-003-131).
Measurements
MDQ
The MDQ is a screening tool designed to aid in the detection of bipolar spectrum disorders [8]. It consists of a total of 13 items, assessing the presence and severity of symptoms associated with bipolar spectrum disorders. The MDQ is utilized for evaluating the presence and severity of manic and hypomanic symptoms, as well as their impact on daily functioning. By utilizing the MDQ, clinicians can identify individuals who may necessitate further evaluation for bipolar disorder, including those with milder or atypical presentations. The Korean version of the questionnaire has demonstrated good sensitivity and specificity [10].
BSDS
The BSDS is a self-report measure used to comprehensively evaluate a broad range of bipolar spectrum conditions [9]. Ghaemi et al. [9] reported that it can enhance specificity without a significant loss of sensitivity. It consists of a total of 20 items that thoroughly assess various symptoms associated with bipolar disorder, including mood elevation, irritability, changes in sleep patterns, energy levels, cognition, and other psychomotor features. The BSDS aims to provide a comprehensive assessment of symptoms, enabling clinicians to identify individuals who may fall within the broader spectrum of bipolar disorders, extending beyond the classic bipolar I and II diagnoses. The validation study conducted on the Korean version of the questionnaire exhibited good sensitivity and specificity [15].
Classification
To evaluate the performance of self-report questionnaires in multivariate analysis for classifying the two groups, several machine-learning algorithms were exploited. Given the data’s imbalance, we applied an upsampling strategy, duplicating the samples of the minority class to augment the dataset’s size. A common technique called upsampling, or oversampling, helps address data imbalance. This method increases the representation of underrepresented groups (minority classes) within the data, leading to a more balanced dataset [21]. This method can be found in a previous study by the author [22] and other researchers [23,24]. Subsequently, a balanced number of subjects from each group were chosen through a random selection process. Fivefold cross-validation was used to assess classification performance. By iterating this process 1,000 times, the metrics, including accuracy, area under the ROC curve (AUC), sensitivity, and specificity, were obtained with minimized bias, representing the average performance over the iterations. In the same randomly selected samples, three classifications were conducted separately using MDQ only, BSDS only, and both questionnaires. The classifier was chosen for its stability and simplicity from various established classifiers used in prior studies [25-27]. The selected classifiers include the support vector machine (SVM) [28], linear discriminant analysis (LDA) [29], random forest [30], and the k-nearest neighbor [31]. The procedures were performed using MATLAB® software (MathWorks, Inc., Natick, MA, USA).
Statistical analysis
The questionnaires of patients with bipolar disorders and depressive disorders were compared using independent t-tests. Cohen’s d, a measure that quantifies effect size, was also obtained to assess the significance of each item. A paired t-test was employed to compare the classification performance between questionnaire sets. All statistical analyses were conducted using MATLAB software.
RESULTS
Clinical characteristics of the subjects
The data analysis framework is presented in Figure 1. A total of 256 participants were included in the study. Among them, 164 were diagnosed with bipolar disorders, and 92 were patients with depressive disorders. After excluding participants who did not complete the questionnaires or met the exclusion criteria, the final subjects for analysis consisted of 122 patients with bipolar disorders and 67 patients with depressive disorders. The majority of patients with bipolar disorders were females (51.44%), with a mean age of 33.4±12.9 years old. For patients with depressive disorders, the majority were males (52.17%), and their mean age was 40.8±15.6 years old. There was a significant difference in age (p<0.001), while sex did not differ between the two groups.
Self-report questionnaires in patients with bipolar disorders and depressive disorders
The results of the MDQ in patients with bipolar disorders and depressive disorders are presented in Table 1. Patients with bipolar disorders exhibit not only higher total scores but also elevated scores in all items of the MDQ. Among the thirteen items, the effect size of ten items exceeds 0.5. The difference in total scores shows an effect size greater than 0.5.
In Table 2, the results of the BSDS between the two groups are displayed. Similar to the results of the MDQ, patients with bipolar disorders also show higher scores in all items and total scores in the BSDS. The total score exhibits the greatest effect size, exceeding 1. The effect size of eleven out of twenty items exceeds 0.5. The effect size of the difference in the total score was greater than 1, which was larger than that of any other item.
Classification performances using self-questionnaires
The classification of the patients with bipolar disorders and depressive disorders was performed using self-questionnaires individually and collectively. The classification performances of distinguishing between the two groups using only the MDQ are presented in Table 3. When applying the SVM with MDQ, the highest AUC obtained was 0.8393. The specificity was 0.8015 for SVM, while LDA achieved accuracy and sensitivity values of 0.7309 and 0.6987, respectively. As shown in Table 4, the highest AUC in classification using BSDS was 0.8000. The highest other metrics, specifically accuracy, sensitivity, and specificity, were 0.6950, 0.7016, and 0.7424, respectively. In Table 5, the classification results obtained by combining both MDQ and BSDS are displayed. Notably, the highest AUC was 0.8762, which showed a significant difference compared to using MDQ or BSDS alone. The highest accuracy, sensitivity, and specificity achieved were 0.7656, 0.7302, and 0.8405, respectively. In addition, three other metrics were significantly different than those using MDQ or BSDS.
DISCUSSION
Using self-report questionnaires, we examined how well they could classify patients with bipolar disorders and depressive disorders. After investigating whether there were differences among the questionnaire items in the two groups, classification performances using each questionnaire were obtained.
Our results are in line with the previous studies that reported significance in various clinical situations. All items on the MDQ showed difference in two groups, and the cutoff point of 7, as suggested in previous studies [8,11], was shown to be effective in our investigation. The results of BSDS in this study supported the validity of the threshold of ‘moderate probability’ suggested by Nassir Ghaemi et al. [9]. This is an optimal cutoff that has shown high specificity in other study [15]. In this study, which included over 100 patients and had a larger percentage of individuals with depressive disorders compared to previous research, the BSDS was also found to be beneficial. In summary, MDQ and BSDS are valid tools that demonstrate differences between the two groups, both in terms of each item and total score.
The results of machine-learning-based multivariate analysis showed that AUC was at an excellent discrimination level. Each classification utilizing the MDQ and BSDS achieved an AUC exceeding 0.8, and specificity of MDQ also exceeded 0.8. Distinct from existing research methods, despite applying the train-test split procedure, it showed good performance compared to those of other methods [11,15,16]. In the model that exploited both MDQ and BSDS, all metrics exhibited improved performance compared to when either questionnaire was used. In particular, its AUC is 0.8762, which may be useful as a valuable diagnostic support tool for clinician. The outcomes of this research, which demonstrated such a good performance in subtle problem of classifying patients with bipolar and depressive disorders, suggest that using multiple clinical information simultaneously with machine-learning-based multivariate analysis is a very promising approach.
The psychiatric nosology system is based on dimensional approaches [32], where clinicians evaluate various symptoms to diagnose mental illnesses. Although a substantial amount of information is gathered, it is not fully utilized. Diagnosis is determined by the number of symptoms that are satisfied, without considering the combination of symptoms. Concerning the MDQ, it does not inspect the specific items that a patient meets; instead, it solely assesses whether the patient meets the criteria by satisfying the seven questions, representing the cut-off. The present methods of analysis, relying on univariate approaches, underutilize the gathered patient data.
Machine-learning-based multivariate analysis offers several advantages. Firstly, machine learning algorithms can effectively handle high-dimensional data and capture complex relationships among multiple variables simultaneously. This data-driven approaches allow for a more comprehensive understanding of the underlying patterns and interactions within the data [33]. Additionally, machine learning techniques can automatically identify relevant features or predictors, reducing the need for manual selection. They can also adapt and learn from new data, making them suitable for dynamic and evolving datasets [34]. Lastly, machine learning models that enhance reliability and validity have a greater advantage in interpreting and generalizing results, enabling them to provide accurate predictions and classifications. The metrics obtained through the division into training and testing sets for model training and performance evaluation have an advantage in terms of reproducibility. Machine learning with reliability, interpretability, and usability is suitable for psychiatric diagnosis systems based on dimensional approaches [33,35]. In summary, high-dimensional psychiatric datasets can be effectively leveraged to develop improved diagnostic models through the application of machine-learning-based multivariate analysis.
Numerous prior studies have been conducted for distinguishing between patients with bipolar disorders and depressive disorders. They have approached this problem utilizing various data sources, including neuroimaging data [36-39] and peripheral biomarkers [40-43]. The highest accuracy of this study, 0.7518, is valuable considering that it was a much more challenging task performed only on patients with bipolar disorders and depressive disorders. Given the application of the train-test split procedure and the model’s high degree of generalizability, it is reasonable to assign a higher evaluation to the results of this study. Taken together, self-report questionnaires utilizing machine-learning-based multivariate analysis can be a valuable model for assessing bipolarity.
Several limitations of this study should be discussed. First, the patients included in this study were admitted to the hospital following treatment at a different hospital, and their symptoms may exhibit a relatively higher severity compared to those of patients in the general population. So, the gender ratio and the distribution of bipolar and unipolar disorder diagnoses among the study participants deviated from the patterns observed in the general population. These variations have the potential to diminish the generalizability of the study findings. To compensate for this limitation, we conducted 1,000 iterations to ensure the reliability of the metrics. Nonetheless, it is essential to conduct future research that is across on multiple centers with a large sample size. Second, a thorough analysis of individual items was insufficient. This study demonstrated the utility of machine learning when simultaneously utilizing MDQ and BSDS, yet there is potential for more specific analyses with these methods. For example, through feature selection, it is possible to determine the importance of individual items and their relevance to the classification. Further improvements can be made by incorporating additional features such as onset age, extroverted personality traits, frequent relapses, duration of episodes, and accompanying symptoms like hypersomnia during manic episodes. In future studies, these additional analyses should be conducted.
In this study, we demonstrate the usefulness of self-report questionnaires with machine-learning-based multivariate analysis in classification of patients with bipolar and depressive disorders. Our results showed that the AUC exceeded 0.8. In particular, the highest performance was achieved when both MDQ and BSDS were utilized. This suggests that machine learning is a very effective method for high-dimensional psychiatric diagnoses and clinical tools. Future studies should involve larger populations and employ more detailed analysis methods.
Notes
Availability of Data and Material
The datasets generated or analyzed during the study are available from the corresponding author on reasonable request.
Conflicts of Interest
The authors have no potential conflicts of interest to disclose.
Author Contributions
Conceptualization: Kyungwon Kim, Eunsoo Moon. Data curation: Kyungwon Kim, Hyun Ju Lim. Formal analysis: Kyungwon Kim. Funding acquisition: Kyungwon Kim. Investigation: Kyungwon Kim, Hyun Ju Lim, Hwagyu Suh. Methodology: Kyungwon Kim, Hyun Ju Lim, Young-Min Lee. Project administration: Kyungwon Kim. Resources: Hyun Ju Lim, Hwagyu Suh, Eunsoo Moon. Software: Kyungwon Kim, Eunsoo Moon. Supervision: JeMin Park, Byung-Dae Lee, Eunsoo Moon. Validation: Je-Min Park, Young-Min Lee. Visualization: Kyungwon Kim. Writing—original draft: Kyungwon Kim, Eunsoo Moon. Writing—review & editing: Je-Min Park, Eunsoo Moon.
Funding Statement
This study was supported by Biomedical Research Institute Grant (20210200), Pusan National University Hospital.
Acknowledgements
None