Psychometric Properties of the Mixed State Severity Index for Patients With Mood Disorder
Article information
Abstract
Objective
This study aimed to develop a reliable and valid Mixed State Severity Index (MSSI) to assess mood instability in patients with mood disorders and determine cutoff scores.
Methods
Twenty-one items were selected based on Koukopoulos’ criteria for mixed depressive episode, historically referred to as agitated depression, and Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition, Text Revision mixed features criteria. The MSSI was administered to 242 patients (major depressive disorder [n=92], bipolar disorder [BD] I [n=78], and BD II [n=72]) and 726 controls.
Results
The MSSI demonstrated high internal consistency (α=0.78–0.90). Exploratory factor analysis revealed a stable four-factor structure. Based on receiver operating characteristic analysis, optimal cutoff scores were identified to distinguish mood disorder groups from controls, ranging from 19.5 to 27.5 depending on diagnosis.
Conclusion
The MSSI is a reliable and valid instrument for assessing the severity of mixed features in patients with mood disorders. The established cutoff scores enhance its clinical utility, providing robust diagnosis and treatment planning support.
INTRODUCTION
Mood disorders, including major depressive disorder (MDD) and bipolar disorder (BD), represent a significant global health burden [1]. These conditions affect millions of people worldwide, leading to substantial personal suffering, economic costs, and societal impact [1]. Depression alone is a leading cause of disability globally, affecting over 330 million people [2]. Moreover, the prevalence and impact of BD are substantial, with an estimated 39.5 million people affected [3].
Within the spectrum of mood disorders, the concept of mixed features, particularly in depressive episodes, has gained increasing attention due to its clinical significance and impact on patient outcomes [4]. Mixed features are present in a considerable portion of mood disorder episodes, with studies showing that about 27.8% of cases exhibit three or more features of opposite polarity [5]. These features often appear in both MDD and bipolar depression [5]. Patients with mixed features often experience more severe symptoms, have a poorer response to conventional treatments, and face a higher risk of adverse behaviors, such as suicide, compared to those without mixed features [6].
Having reliable and valid measurement tools is crucial to develop better treatment strategies for patients with mixed features [7]. However, despite the development of several scales to measure the mixed features, such as the Depressive Mixed State (DMX-12), Koukopoulos Mixed Depression Rating Scale (KMDRS), Shahin Mixed Depression Scale (SMDS), and Clinically Useful Depression Outcome Scale supplemented with questions for the DSM-5-TR Mixed subtype [7-10], these instruments have several limitations. First, some existing scales for assessing mixed features in mood disorders often do not fully capture the entire spectrum of symptoms, particularly those that overlap between depression and mania [7,8]. For instance, symptoms such as irritability, distractibility, and psychomotor agitation, which are frequently observed in mixed states, are inconsistently included in these assessment tools [11]. This exclusion may result in inadequately capturing the full spectrum of mixed presentations [12]. Second, the dynamic nature of mood fluctuations in mixed states has not been consistently incorporated into existing assessment tools [13]. Such fluctuations should be addressed, as short-term mood changes can significantly impair patients’ daily functioning and pose challenges for accurate evaluation [13]. Third, existing scales are typically designed to be administered in only one format—either as self-reports or rater-administered interviews [8,10]. There is a need for versatile instruments that can be used in self-report measures and rater-administered interviews. Such tools would enhance not only the comprehensiveness of assessment, but also the ability to capture mixed features across various clinical and research settings [9].
This study primarily aims to develop and validate a versatile scale for assessing mixed features in mood disorders. This scale is designed to capture both shared and unique symptoms of mood disorders, including depression and mania, with the flexibility to be used as either a self-report or rater-administered tool. Unlike most existing scales that assess either symptom severity or duration in isolation, the newly developed scale simultaneously evaluates both dimensions, offering a more comprehensive and nuanced assessment of mixed states and addressing a critical limitation of prior instruments. In addition, cutoff scores for this scale are provided to facilitate its clinical application. These distinct characteristics allow the scale to support clinicians and researchers in accurately diagnosing mixed features, facilitating targeted treatment strategies, and improving patient outcomes. Furthermore, it holds significant potential for aiding clinical trials by providing a reliable measure for evaluating mixed states in mood disorders.
METHODS
Study design and participants
A total of 968 participants recruited between December 2022 and 2024, aged between 18 to 69 years, were recruited for this study. Of the participants, 242 had mood disorders (MDD [n=92], BD I [n=78], and BD II [n=72]) and 726 reported no psychiatric disorders. The mood disorder group comprises patients who visited the Seoul National University Bundang Hospital Mood Disorder Clinic for treatment, including 152 outpatients and 90 inpatients. The control group consisted of individuals who reported no prior psychiatric diagnosis and had never been diagnosed with any psychiatric disorder by a mental health professional. Diagnoses were made by board-certified psychiatrists (T.H.H. and W.M.) based on the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5-TR) criteria [14]. Diagnoses were based on structured interviews using the Mini-International Neuropsychiatric Interview (M.I.N.I.), along with reviews of case records [15]. Patient consent was waived for data collected through medical chart review. Consent for the Normal Control group was also waived, as researchers did not have direct access to participants’ personal information and only anonymized survey data were used for analysis. However, for participants in the dual-rating group within the patient sample, informed consent was obtained prior to data collection, as personally identifiable information was involved. The present study was approved by the Institutional Review Board of Seoul National University Bundang Hospital (Protocol Code: B-2211-791-303; Approval Date: October 24, 2022).
Measures
The Mixed State Severity Index (MSSI) was developed based on the diagnostic criteria for “mixed depressive episode (historically described as agitated depression)” [12] and the DSM-5-TR criteria for MDD with mixed features [14]. Given that mixed states are characterized by the concurrent presence of depressive and (hypo)manic symptoms, mood instability has been recognized as a core feature necessitating systematic evaluation [16]. A research team consisting of psychiatrists, clinical psychologists, and psychiatric nurses initially developed a set of 23 preliminary items to assess mood instability. These items were constructed to comprehensively reflect the diagnostic criteria for “mixed depressive episode” and the DSM-5-TR criteria for MDD with mixed features. The 23 preliminary items were subsequently evaluated by an expert team (including four psychiatrists, two psychology professors, three clinical psychologists, one research nurse, and five researchers from a mood disorders research team) to determine their suitability for assessing mood instability, and the final selection emphasized the inclusion of items that experts consistently regarded as clinically important. The expert team further reviewed and validated the overall structure of the scale and the content of all preliminary items.
Through this process, the MSSI was finalized, comprising a total of 21 items for assessing mood instability. Of these, 20 items assess the frequency (0: Never, 1: Less than one day [symptoms lasted only a few hours], 2: More than one day but less than one week, 3: More than one week but less than two weeks, 4: Daily) and severity (1: Mild, 2: Moderate, 3: Severe) of mood instability symptoms over the past two weeks, a time frame consistent with the DSM-5-TR diagnostic criteria for major depressive episodes and widely adopted in established mood disorder rating scales (e.g., Beck Depression Inventory–II, Patient Health Questionnaire-9) [14,17,18]. This two-week frame was chosen to capture clinically meaningful persistence of symptoms rather than transient fluctuations. For items 1–20 of the MSSI, if the frequency response is “0: Never,” the severity is not rated. To calculate the item scores, the frequency and severity for each item (1–20) are multiplied and the products are summed. The total score of the MSSI ranges from 0 to 240, with higher scores indicating greater severity of mood instability. This scoring method, modeled after the Neuropsychiatric Inventory, employs a multiplicative approach combining frequency and severity ratings to generate a composite symptom score [19,20]. Such a method enables capturing both the occurrence and intensity of mood instability symptoms, aligning with clinical practice where both dimensions are critical to evaluating functional impairment [21]. By employing this approach, the MSSI is expected to provide a comprehensive profile of mixed affective states, accommodating both frequently occurring mild symptoms and infrequently occurring severe symptoms. In addition to the 20 core items, the MSSI includes an additional item assessing the presence of motor retardation based on Koukopoulos’ diagnostic criteria for mixed depressive episode, where the presence of agitation without motor retardation indicates a mixed state [12]. Accordingly, the severity of motor retardation is not scored; rather, its binary presence is recorded to assist clinicians in applying Koukopoulos’ criteria during diagnostic evaluation [12]. Translations to multiple languages (English, Japanese, French, German, Spanish, and Mandarin) are provided in Supplementary Table 1. Supplementary Table 2 provides a scoring guide for the questionnaire.
The Hamilton Anxiety Rating Scale (HAM-A) is a rateradministered scale designed to assess the severity of anxiety symptoms. This scale includes 14 items that evaluate both the psychological and physical dimensions of anxiety. Each item is rated on a Likert-like scale from 0 to 4, with the total score ranging from 0 to 56; higher scores reflect greater anxiety levels [22]. In this study, the scale’s internal consistency was α=0.89. The Hamilton Depression Scale (HAM-D) is a widely recognized rater-administered tool for evaluating depression. Comprising 17 items, each item is scored from 0 to 4, or some items from 0 to 2. It has a maximum score of 53, with higher scores indicating more severe depressive symptoms [23]. In the present study, the scale’s internal consistency was α=0.83. The Bipolar Depression Rating Scale (BDRS) was standardized and has been widely used for evaluating bipolar depression [24]. The Korean Version of the Bipolar Depression Rating Scale (K-BDRS) is a semi-structured observation tool specifically designed to assess depressive symptoms in individuals with BD [25]. The scale comprises 20 items, each rated on a 4-point Likert scale: “No symptoms=0,” “Mild symptoms=1,” “Moderate symptoms=2,” and “Severe symptoms=3.” The total score ranges from 0 to 60, with higher scores indicating greater severity of depressive symptoms. In this study, the scale’s internal consistency was α=0.86. The Montgomery–Åsberg Depression Scale (MADRS) is designed to comprehensively assess the cognitive, affective, and biological aspects of depression [26]. It consists of 10 items, each rated on a 7-point Likert scale ranging from 0 to 6. Higher scores indicate more severe depression [27]. In the present study, the scale’s internal consistency revealed α=0.77. The Young Mania Rating Scale (YMRS) is a scale designed to evaluate manic symptoms in individuals with BD. Clinicians assess 11 core symptoms of manic using this scale [28]. Four items—thought content, irritability, speech (rate and amount), and disruptive/aggressive behavior—are scored on a scale from 0 to 8 to capture greater symptom variability. The remaining seven items are scored from 0 to 4. The total score ranges from 0 to 60, and higher scores indicate more severe manic states [29]. In the present study, the scale’s internal consistency demonstrated α=0.54. Cronbach’s alpha for the Korean version of Young Mania Rating Scale (K-YMRS) was 0.74 in the BD I group, 0.32 in the BD II group, and 0.27 in the MDD group. These differences reflect the scale’s specificity for assessing manic symptoms. The Mood Disorder Questionnaire (MDQ) is a screening instrument developed to assist in the identification of bipolar spectrum disorders [26]. Comprising 13 items, the Korean version of the Mood Disorder Questionnaire (K-MDQ) evaluates the presence and severity of symptoms associated with these disorders [30]. It assesses both manic and hypomanic symptoms, along with their impact on daily functioning [30]. In the present study, the internal consistency of the scale demonstrated α=0.88. The Zung Depression Scale (ZDS) was employed to assess depression. This 20-item self-report questionnaire measures the current level of depressive thoughts and behaviors in individuals [31]. Each item is rated on a four-point Likert scale (1–4), with higher scores indicating more severe levels of depression. In the current study, the scale’s internal consistency demonstrated α=0.81. The Korean version of the Beck Anxiety Inventory (K-BAI) is a self-rating tool composed of 21 items, each describing a specific anxiety symptom. It is commonly used for evaluating anxiety levels. Each item is rated on a Likert-like scale from 0 to 3, with the total score ranging from 0 to 63 [32]. We used the K-BAI, which classifies scores as follows: 0–7 points indicate minimal anxiety, 8–15 points indicate mild anxiety, 16–25 points indicate moderate anxiety, and 26–63 points indicate severe anxiety [33]. The Cronbach’s alpha for the scale in this study was 0.92. The Short Version of the Temperament Evaluation of Memphis, Pisa, Paris, and San Diego Autoquestionnaire (TEMPS-A-SV) assesses five temperamental traits: Cyclothymic, Depressive, Irritable, Hyperthymic, and Anxious [34]. This scale consists of 39 self-report items, with each item rated dichotomously as “Yes=1” or “No= 0.” The subscales are divided as follows: Cyclothymic temperament, 12 items (1–12, α=0.85); Depressive temperament, 8 items (13–20, α=0.71); Irritable temperament, 8 items (21–28, α=0.70); Hyperthymic temperament, 8 items (29–36, α=0.69); and Anxious temperament, 3 items (37–39, α=0.70). The Short Form of the Mood Instability Questionnaire-Trait (MIQ-T-SF) reflects an individual’s stable trait as characterized by mood instability and fluctuations in behavioral patterns [35]. This scale is a 17-item self-report scale designed to assess mood instability. Each item is rated on a 4-point Likert scale. The Cronbach’s alpha for the scale in this study was 0.88.
Statistical analysis
In this study, participants were divided into two groups based on the administration format of the scale. The self-report group (n=152) completed the MSSI as a self-administered questionnaire only, and the dual-rating group (n=90) completed both self-report and rater-administered versions of the MSSI to assess inter-method consistency. To evaluate construct validity with methodological rigor, participants were additionally randomized—regardless of administration format—into either an exploratory factor analysis (EFA) group (n=121) or a confirmatory factor analysis (CFA) group (n=121). A detailed overview of group assignments and analytical procedures is illustrated in Supplementary Figure 1. Participants’ demographic and clinical characteristics were examined. Subsequently, descriptive statistics were conducted for the individual items, and reliability analyses were performed for each subfactor and the total score. To investigate the factor structure of the MSSI, an EFA was conducted on a randomly split half of the total sample (n=121). Prior to conducting the EFA, a parallel analysis was employed to determine the optimal number of factors. This technique involves generating random correlation matrices with dimensions equivalent to the sample correlation matrix, calculating their eigenvalues, and identifying the number of factors based on the observed eigenvalues that exceed the mean eigenvalues derived from the random matrices. The maximum likelihood (ML) estimation and Oblimin rotation were used to obtain the optimal EFA solution, considering the inter-correlations among factors. All items with a factor loading of 0.3 or greater were assigned to the factor on which they loaded most strongly. A one-way analysis of variance was conducted to examine score differences across diagnosis groups for MSSI items 1 to 20, followed by Dunn’s test for post hoc analysis to identify specific between-group differences. Moreover, a chi-Square analysis was performed to assess differences in responses to the binary MSSI item 21 across diagnostic groups, with Bonferroni correction applied for post hoc analysis. A CFA was conducted on the remaining half of the sample (n=121) to validate the factor structure identified through the EFA. This analysis compared three alternative models with 3, 4, and 5 factors to identify the best-fitting model. Model fit was evaluated using several fit indices, including the comparative fit index (CFI), Tucker-Lewis index (TLI), root mean square error of approximation (RMSEA), and standardized root mean square residual (SRMR), which were used to determine the best-fitting model among alternative models. The model fit indices were assessed against widely accepted criteria: an acceptable model fit is indicated by a normed χ2 (the χ2 value divided by the degrees of freedom) of less than 3. CFI and TLI values greater than 0.95, RMSEA values less than 0.06, and SRMR values less than 0.09 [36]. Based on these criteria, the best-fitting model was selected for the interpretation. The Wilcoxon Signed-Rank Test was used to examine score differences between the Self-Report and the Rater-Administered versions for items 1 to 20. For item 21, which is binary, a chi-square test was conducted to examine differences in response patterns between the two formats. Additionally, Pearson correlation analysis was performed to examine the relationship between the factor scores obtained from the two different versions. To evaluate criterion validity, Pearson correlation analyses were conducted using data from the CFA sample, which included a self-report group (n=70) and a dual-rating group (n=51). Correlations were examined between MSSI factor scores and a range of clinical and psychological assessment tools related to mood disorders, including the HAM-A, HAM-D, K-BDRS, K-MADRS, K-YMRS, K-MDQ, ZDS, K-BAI, TEMPS-A-SV, and MIQ-T-SF. Receiver operating characteristic (ROC) analysis was performed to evaluate the diagnostic utility of this scale and to determine the optimal cutoff values for the total score in distinguishing each diagnostic group (BD I, BD II, MDD, BD I & BD II, and BD I, BD II & MDD combined) from the normal control group [37]. The optimal cutoff score was determined based on Youden’s index, which maximizes the balance between sensitivity and specificity. According to the index, values were categorized as excellent (≥0.09), good (≥0.08), moderate (≥0.07), and poor (<0.06) [38]. Additionally, the area under the curve (AUC) was calculated to assess the discriminative power of the total score, with values interpreted as excellent (≥0.9), good (0.8–0.9), fair (0.7–0.8), and poor (0.6–0.7) [39].
RESULTS
Participant characteristics
Table 1 displays demographic and clinical characteristics of the participants. As shown, the study included 242 participants with mood disorders, comprising 78 (32.2%) patients with BD I, 72 (29.8%) with BD II, and 92 (38%) with MDD. The mean age of participants was 32.33±11.85 years. Supplementary Table 3 presents the mean scores, standard deviations, skewness, and kurtosis values for items 1 through 20, along with Cronbach’α values for each subfactor and the total score. The internal consistency of the total scale was excellent, with a Cronbach’s α value of 0.90. The subscales also demonstrated good to excellent internal consistency, with the Cronbach’s α values ranging from 0.78 to 0.86.
EFA
Prior to conducting the main analysis, the suitability of the data for EFA was assessed using the Kaiser-Meyer-Olkin (KMO) measure and Bartlett’s test of sphericity. Results from both tests confirmed that the data from the 20 items of the MSSI were appropriate for EFA. The parallel analysis showed that the four-factor model was the most appropriate, which explains 52.4% of the total variance. Subsequently, an EFA was conducted based on the selected four-factor model, using ML estimation and Oblimin rotation.
Table 2 presents the EFA results based on the four-factor model. As shown in the table, each factor was named as follows, according to the pattern of factor loadings with values exceeding 0.3: affective instability (Factor 1: 8 items), elevated state (Factor 2: 5 items), cognitive hyperactivity (Factor 3: 4 items), and loss of impulsive control (Factor 4: 3 items).
CFA
Supplementary Table 4 reveals that the four-factor model provided the best fit (χ2=210.207, df=164, CFI=0.951, TLI=0.943, RMSEA=0.048, SRMR=0.079), outperforming both three-factor and five-factor models. Moreover, the chi-square difference test comparing the three-factor and four-factor models revealed a significant difference in model fit (Δχ2=77.228, df=3, p<0.001), while the comparison between the four-factor and five-factor models showed no significant difference (Δχ2=2.219, df=3, p=0.429). These results provide strong evidence that the four-factor model fits better than the other two models.
Mood disorder groups differences in MSSI scores
Supplementary Table 5 shows MSSI item scores across three diagnosis groups: BDI, BDII and MDD. The results reveal significant group differences for items 10 (increased goal-directed activity), 11 (pressured speech), 14 (grandiosity), and 21 (no motor retardation), while no significant differences were found for the remaining items.
Consistency between self-report and rater-administered versions
Table 3 compares the self-report and rater-administered versions across all items and factor scores to assess overall consistency between the two formats. The results reveal that significant differences were found for only three items—8 (restlessness), 14 (grandiosity), and 15 (tension)—while no significant differences were observed for the remaining 18 items. Table 4 also shows that Pearson correlations between the self-report and rater-administered versions were significant across all factor scores and the total score, ranging from 0.74 to 0.83 for factor scores and reaching 0.88 for the total score. These results indicate strong consistency between the two formats in both factor-level and overall scores.
Item level consistency between self-report and rater-administered versions in the dual rating group (N=90)
Criterion validity
Table 5 presents the results of the Pearson correlation analysis conducted to assess criterion validity. The analysis was performed on the second half of the sample used for the CFA, comprising the self-report group (n=70) and the dual-rating group (n=51). In the self-report group, significant correlations were observed between the depressive subscale of the TEMPS-A-SV and all four MSSI factors. For the K-MDQ, significant correlations were found with all MSSI factors except for affective instability and loss of impulsive control. In contrast, ZDS and K-BAI demonstrated significant correlations with all factors except for elevated state. The cyclothymic subscale of the TEMPS-A-SV showed significant correlations only with elevated state (r= 0.49, p<0.01). The irritable subscale of TEMPS-A-SV exhibited a significant correlation with elevated state and MSSI total score (r=0.48, p<0.01; r=0.26, p<0.05, respectively). The hyperthymic and anxious subscales of the TEMPS-ASV showed no significant correlations with any of the MSSI factors.
In the dual-rating group, significant correlations were observed between K-BDRS and all four factors. Significant correlations were found for HAM-A, HAM-D and K-MADRS with all MSSI factors except for elevated state. On the other hand, K-YMRS showed significant correlations with all factors except for loss of impulsive control. The MIQ-T-SF total score showed significant correlations with elevated state and cognitive hyperactivity (r=0.57, p<0.01; r=0.34, p<0.05, respectively).
Cutoff score
The ROC curves were generated to determine the optimal cutoff scores for distinguishing the mood disorder groups from the normal group, as presented in Supplementary Figures 2-5. Table 6 shows that all possible group comparisons (BD I vs. control; BD II vs. control; MDD vs. control; the combined BD I and BD II groups vs. control) yielded AUC values above 0.90, indicating excellent the discriminative power. The results also demonstrate that the highest AUC value of 0.96 was observed in the group comparisons that included either BD I, BD II, or both, while the AUC values were slightly lower in group comparisons that included the MDD group (MDD vs. control=0.92).
The optimal cutoff scores for each comparison were determined based on Youden’s Index, which ranged from 0.72 to 0.84. Sensitivity ranged from 0.80 to 0.89, and specificity ranged from 0.92 to 0.96, as shown in Table 7. According to the values of Youden’s Index, which maximizes the balance between sensitivity and specificity, the optimal cutoff scores were identified as follows: 27.5 for BD I, 23.5 for BD II, 19.5 for MDD, and 23.5 for the combined BD I and BD II group.
DISCUSSION
This study aimed to develop and validate a versatile scale that can help assess mixed features in mood disorders and provide cutoff values for diagnosing mixed features. In particular, it is worth emphasizing that this scale is designed for two types of form, self-report and rater-administered, and is assessed comprehensively with mixed features considering symptom duration and severity. In this study, the MSSI demonstrated good psychometric properties. Reliability ranged from good to excellent.
In previous studies, several factor structures have been postulated using measurement tools developed to examine mixed features [7-10]. Considering this, the present study comprehensively examined this through a series of EFA and CFAs. We identified a four-factor structure (affective instability, elevated state, cognitive hyperactivity, and loss of impulsive control) in the EFAs, which was cross-validated in a subsequent CFA in a non-overlapping sample. The scale has only recently been developed, but comparison with established factor structures is not yet possible. However, it is comparable to the scales developed for similar purposes. For example, Sani et al. [9] (2018) posited that the KMDRS, developed based on Koukopoulos’ construct of mixed depression, comprises two factors: anger/tension/impulsivity and psychomotor excitation [12]. The SMDS, designed to measure mixed symptoms of depression as proposed by Koukpoulos through self-report, exhibited two factors: psychomotor agitation and mixity without psychomotor agitation [9]. Furthermore, the DMX-12 scale, a tool designed to screen for mixed depression and mixed features, has been identified as having three factors: vulnerable responsiveness, spontaneous instability, and disruptive emotion/behavior [8]. The factors identified in the MSSI are similar to those reported in previous studies. However, MSSI is more comprehensive in scope, incorporating both the mixed features suggested by the DSM-5-TR and the mixed symptoms of depression proposed by Koukopoulos and Sani [12]. A thorough evaluation using the MSSI is instrumental in accurately diagnosing patients with challenging-to-differentiate mood disorders and in comprehending their mixed states.
Next, we found that items showing significant differences between the mood disorder groups were increased goal-directed activity, pressured speech, grandiosity, and no motor retardation, all of which are elevated state factors. Increased goal-directed activity, pressured speech, and grandiosity were effective in differentiating between BDI and MDD. Meanwhile, increased goal-directed activity was useful in differentiating between BDI and BDII, and the absence of motor retardation was effective in distinguishing between BDII and MDD. This finding is consistent with those of previous studies, which suggested that increased activity and energy, alongside mood changes, are identified in the DSM-5-TR as the cardinal symptoms of mania and hypomania [40,41]. In the present study, “increased goal-directed activity” factor emerged as the most salient differentiating item between the BDI and MDD.
However, the majority of items on this scale, which measure mixed features, did not differ between the mood disorder groups. This suggests that the symptoms of mixed states alone are insufficient to clearly differentiate between BD and depressive disorders in clinical practice. The challenge of distinguishing between BD and MDD has been extensively documented in the literature [42,43]. This discrepancy arises from the observation that individuals diagnosed with BD encounter a greater frequency of depressive episodes than manic or hypomanic episodes [44,45]. Additionally, a notable proportion of patients with mood disorders manifest mixed features within episodes, irrespective of whether they are diagnosed with BD or MDD [5,7,8]. Thus, MSSI is better suited to quantifying the severity of mixed features observed across the mood disorders, rather than serving as a differential diagnostic tool. Accurate differentiation between BD and MDD requires evaluation of the longitudinal course of mood episodes in lifetime, as single timepoint symptomatology may not sufficiently capture diagnostic distinctions.
Finally, based on the ROC analysis, the optimal cutoff scores for distinguishing mood disorder groups from the normal control group were identified. Group-specific cutoff scores ranged from 19.5 to 27.5 depending on the diagnostic category. Notably, the highest cutoff score was observed in the BD I group (27.5), while the lowest was found in the MDD group (19.5). This suggests that while the MSSI may have limited utility as a diagnostic tool for differentiating between specific mood disorder subtypes, it serves as a clinically useful tool for distinguishing pathological mood states from nonclinical states across the mood disorder spectrum.
Two versions of the scale were developed, a self-report and clinician rated version. We examined whether the reported scores between the versions differed. The results showed no significant differences for most items except for two items (restlessness and tension). This is consistent with previous studies that found both self-report and clinician rating scales to be widely used in assessing mood symptoms, with high correlations between the two methods [46,47]. However, several meta-analyses examining treatment responses in patients with depression have revealed that self-report scales tend to exhibit greater conservatism and smaller effect sizes than clinician ratings [48,49]. Potential explanations for these discrepancies include a range of factors, such as the severity of patient symptoms within the studies and the composition of the items within the scales [48]. In our study, restlessness and tension—symptoms often observed through behavior—showed notable discrepancies between self-reported and clinician rated scores. These differences may reflect the difficulty some patients experience in recognizing and accurately appraising the intensity of their own symptoms. Importantly, these findings highlight the utility of the scale in various clinical contexts. When clinician ratings are feasible, careful behavioral observation may enhance the accuracy of symptom assessment, particularly for externally observable features. Conversely, in settings where rater’s rating is not available, the self-report version still provides a reliable alternative, offering scores that closely correlate with clinician ratings in most domains. This flexibility supports the applicability of the MSSI across diverse clinical environments while maintaining validity and clinical relevance.
Moreover, the criterion-related validity was satisfactory, exhibiting significant correlations with relevant measures in the expected direction. In this study, we examined correlations with other self-reported measures of depression and anxiety, such as the K-MDQ, ZDS, and K-BAI, in the self-report group using self-report methods, and the HAM-D, HAM-A, and K-YMRS, which are commonly used clinician rating scales to assess patients with mood disorders, in the dual rating group. The results provide evidence that the convergent validity of the MSSI factors is acceptable.
This study has some limitations. First, it was a cross-sectional study. A longitudinal study is required to confirm the long-term stability of the scale. Second, given the wide range of states exhibited by the patients in this study, it would be beneficial to further test the hypothesis that the presence of depressive, manic, or hypomanic states affects the results. Third, as the mood disorder group in this study was recruited from a single medical institution, the findings may be influenced by potential selection bias, which could limit the generalizability of the results. Therefore, further research involving multiple clinical sites and more diverse patient populations is warranted.
This study provides evidence that the MSSI is a reliable and valid instrument for assessing mixed features in mood disorders. The MSSI will allow clinicians to identify mixed features of mood disorders and consider symptom duration and severity, resulting in more accurate and effective decisions from diagnosis to treatment. This is especially important since the scale can be administered by self-report or clinician rating, depending on the clinical setting. The MSSI is expected to be used effectively in patient diagnosis, treatment, and clinical trials.
Supplementary Materials
The Supplement is available with this article at https://doi.org/10.30773/pi.2025.0201.
Multilingual Versions of the MSSI for Cross Cultural Clinical Use
Scoring sheet
Descriptive statistics of items and reliability analysis of the mood disorder group (N=242)
The fit indices of different EFA models based on mood disorder group (N=121)
The results of mood disorder groups differences in the Mixed State Severity Index (N=242)
Participant group structure and analytical division for Mixed State Severity Index validation. ANOVA, analysis of variance; ROC, receiver operating characteristic; EFA, exploratory factor analysis; CFA, confirmatory factor analysis.
Receiver operating characteristic (ROC) curve for the bipolar disorder I and normal control groups.
Receiver operating characteristic (ROC) curve for the bipolar disorder II and normal control groups.
Receiver operating characteristic (ROC) curve for the major depressive disorder and normal control groups.
Receiver operating characteristic (ROC) curve for the combined bipolar disorder (BD) I and BD II groups and the normal control group.
Notes
Availability of Data and Material
Data supporting the findings of this study are available from the corresponding author upon reasonable request.
Conflicts of Interest
The authors have no potential conflicts of interest to disclose.
Author Contributions
Conceptualization: Woojae Myung, Hyeona Yu, Hyo Shin Kang, Jungkyu Park. Data curation: Woojae Myung, Hyeona Yu, Daseul Lee, Junwoo Jang, Jakyung Lee, Joohyun Yoon, Yun Seong Park, Hyun A Ryoo, Chan Woo Lee, Yoonjeong Jang. Funding acquisition: Woojae Myung. Investigation: all authors. Supervision: Woojae Myung, Jungkyu Park, Hyo Shin Kang. Writing—original draft: Woojae Myung, Hyeona Yu, Hyo Shin Kang, Jungkyu Park. Writing—review & editing: all authors.
Funding Statement
This research was supported by the 2025 Digital Therapeutics Development and Demonstration Support Program of the Ministry of Science and ICT (MSIT) and the National IT Industry Promotion Agency (NIPA), Republic of Korea. The content of this article is solely the responsibility of the authors and does not necessarily represent the official views of the MSIT or the NIPA.
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT, RS-2024-00335261 to WM). This study was supported by the NAVER Digital Bio Innovation Research Fund, funded by NAVER Corporation (Grant No. 37-2023-0140 to WM). This research was also supported by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number: RS-2023-KH136934 to WM). This research was funded by the Ministry of Science and ICT (MSIT) and the National IT Industry Promotion Agency (NIPA) as part of the Digital Therapeutics Development and Demonstration Support Program (Grant No. 08-2025- 0126). The study’s design, data collection, analysis, and interpretation, as well as the drafting of this report, were all independent of the funding source. The decision to submit for publication was made by the corresponding authors, who also had complete access to all the study’s data.
Acknowledgments
None
