Validity and Reliability of the Korean Version of Gotland Male Depression Scale
Article information
Abstract
Objective
Despite lower depression rates in men than in women, men’s suicide rates are significantly higher, suggesting potential gaps in depression screening. Rutz et al. developed the Gotland Male Depression Scale (GMDS), which includes symptoms commonly associated with male depression. This study was conducted to validate the Korean version of the GMDS (K-GMDS).
Methods
The K-GMDS, Patient Health Questionnaire-9 (PHQ-9), and outpatient records of 233 new patients at the outpatient psychiatry department of Catholic University Hospital in Daegu from February and May 2022 were retrospectively reviewed. Internal consistency was measured using Cronbach’s α, and external validity was tested by analyzing the scale’s correlation with the PHQ-9. The screening capacity of the K-GMDS was tested based on the receiver operating characteristic (ROC) curve, sensitivity, specificity, and overall accuracy.
Results
Of 233 patients, 42.6% (n=98) were classified to the depression group. Cronbach’s α was 0.92, and external validity was established with a Pearson’s correlation coefficient of 0.83 between the total score of the K-GMDS and the PHQ-9. While there were no significant differences in the area under the ROC curve between the K-GMDS and the PHQ-9, the K-GMDS had better sensitivity, specificity, and overall accuracy in screening depressive symptoms among men compared to the PHQ-9.
Conclusion
The K-GMDS exhibits satisfactory reliability and validity in psychiatric outpatient settings and outperforms the PHQ-9 in screening for depression among men. This study will be useful in developing male depression scales that are currently unavailable in South Korea.
INTRODUCTION
The World Health Organization reported that the prevalence of depression worldwide increased by 25% in the first year of the coronavirus disease-2019 outbreak [1]. Since the outbreak, global statistics on depression and suicide rates have varied across countries. Despite these differences, the consistent trend remains that women are more likely to experience depression, whereas men have higher suicide rates [1].
While many diseases can potentially lead to suicide, substantial evidence indicates that depression accounts for most suicide-related deaths [2]. Thus, it is paradoxical that the incidence of depression is higher in women than in men, whereeas suicide rates are higher in men. One explanation for this is that men have a high suicide rate due to diseases other than depression. However, Hawton et al. [3] found that male gender was associated with suicide even when the study was restricted to those with depression.
Studies on gender differences in depressive symptoms have been continuously conducted [4-6]. The recently revised edition of the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition, Text Revision (DSM 5-TR) mentions gender differences in depressive symptoms [7]. Women exhibit more appetite and sleep problems, such as gastrointestinal symptoms. By contrast, men are more likely to express depressive symptoms through maladaptive self-coping and problem-solving strategies, including substance use, risk-taking, and poor impulse control [7].
It has been argued that men need an exclusive scale for screening depressive symptoms, including symptoms of substance abuse and externalization [8]. Starting with the Gotland Male Depression Scale (GMDS) developed by Rutz in 1997 [8], many scales for screening depression in men have been employed, including the Masculine Depression Scale [9], Aging Male Symptoms Scale [10], and Male Depression Risk Scale [11]. The GMDS has been translated into languages other than English and used in male depression-related research and clinical practice worldwide [12-16].
The Korean version of the GMDS (K-GMDS) was translated by this research team for the first time in South Korea [17]. As of the date of this study, no other male depression scales have been translated or developed in Korea. However, the K-GMDS has a limitation in that it has not yet been externally validated and has never been used with psychiatric patients.
The primary aim of this study was to evaluate the internal and external validity of the K-GMDS for psychiatric patients. Furthermore, as we used another conventional depression scale, the Patient Health Questionnaire-9 (PHQ-9) [18], our aim was to compare the usefulness of the GMDS and PHQ-9 in screening for depression.
METHODS
Participants
This study included 233 patients who visited the psychiatric outpatient department of Daegu Catholic University Hospital between February and May 2022. Prior to commencement, this retrospective study based on medical records was approved by the Institutional Review Board (IRB) of Daegu Catholic University Hospital (IRB approval number: CR-22-163).
Procedures
Upon their initial visit to the psychiatric outpatient department, patients were required to fill out the PHQ-9 and KGMDS questionnaires, irrespective of their primary concerns. After completing the questionnaires, they were interviewed by a psychiatrist, and the DSM-5 criteria were applied for diagnosis. The inclusion criterion for all patients is age 18 years or older. The “depressed group” was defined as patients who met the diagnostic criteria of major depressive episode in the DSM-5. The exclusion criteria were as follows: inability to communicate fluently and the presence of psychotic symptoms or cognitive dysfunction.
Measures
K-GMDS
The GMDS is a self-report scale devised to evaluate depressive symptoms in men. The total score ranges from 0–39, with each item rated on a 4-point Likert-type scale (0–3 points). Zierau et al. [19] defined a score of 13–26 points as probable depression and 27–39 points as definite depression. In this study, we defined subjects who scored higher than the cutoff score of 12, as suggested by Zierau et al. [19], as belonging to the male depression group. The scale was translated into Korean by three psychiatrists, and counter-translation was performed by another psychiatrist, a native English speaker. Cronbach’s α was 0.86 in a previous study conducted by Zierau et al. [19] and 0.92 in the present study.
PHQ-9
The PHQ-9 is a 9-item self-report scale used to screen for major depressive disorder (MDD). Each item is related to one of the nine criteria used to diagnose DSM-5 MDD [18].
The total score ranges from 0–27, and each item is rated on a 4-point Likert-type scale (0–3 points). Scores are assigned according to the frequency of symptoms over the preceding 2-week period. A score of 10 and above indicates clinically significant depressive symptoms [20]. The Korean version of this scale has been validated and is now used in clinical and research settings [21]. The internal consistency of the PHQ-9 in the present study for the overall sample was high (α=0.93).
Statistical analysis
All statistical data were analyzed using SPSS version 25 (IBM Corp., Armonk, NY, USA) for Windows, and the significance level (p-value) for determining statistical significance was set at 0.05. For the sociodemographic data, independent t-tests and chi-square tests were used to calculate and compare age and the ratio of MDD. Cronbach’s coefficient (α) was used to examine internal consistency, with scores above 0.90 deemed reliable. Pearson’s correlation coefficient was used to examine external validity, with scores above 0.70 being acceptable. Receiver operating characteristic (ROC) curve analysis was performed to assess the ability to screen for MDD, and the area under the ROC curve (AUC) for each scale was compared. R version 4.1.3 (https://www.r-project.org) was used to compare the AUC of the two scales. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy were calculated to assess the screening ability of the scales. Exploratory Factor Analysis was conducted using SPSS to assess the factor structure of the K-GMDS. The Kaiser-Meyer-Olkin (KMO) measure and Bartlett’s test of sphericity were used to assess the adequacy of the data for factor analysis. Factors were extracted based on eigenvalues greater than 1, and the factor structure was examined using varimax rotation.
RESULTS
Participants
In total, 233 patients visited the psychiatric outpatient department, completed the questionnaire, and met with the psychiatrists. After reviewing the medical records, 24 patients were excluded from the analysis. Eighteen patients were excluded because of reliability issues such as psychotic symptoms, cognitive dysfunction, and intellectual disability. Six patients were excluded due to incomplete answers (Figure 1). The characteristics of the sample are summarized in Table 1.

Flowchart of screening and classifying patients. K-GMDS, Korean version of the Gotland Male Depression Scale; PHQ-9, Patient Health Questionnaire-9.
Of the final valid sample of 209 patients, 92 were males (44.02%) and 117 were females (55.98%). There were statistically significant age gaps between males and females (p=0.01).
A total of 98 patients (33 males [33.67%] and 65 females [66.33%]) were classified into the depressed group and 111 patients (59 males [53.15%] and 52 females [46.85%]) were classified into the other diagnosis group. The distribution of diagnoses of participants according to DSM-5 criteria are presented in Table 2.
Regarding PHQ-9 scores, the overall patient group scored an average of 13.41±8.26, with no significant difference between males and females. Similarly, within the depressed group and other diagnoses group, PHQ-9 scores did not significantly differ by gender.
The K-GMDS scores showed an overall average of 15.91± 9.71. In the depressed group, males had a higher average score (20.82±8.81) compared to females (17.94±8.27), though this difference was not statistically significant (p=0.11). For other diagnoses group, there were also no significant differences in scores between genders.
Reliability and validity
In terms of reliability, the Cronbach’s α of the K-GMDS was 0.92 and that of the PHQ-9 was 0.91, which means that both scales are reliable.
The validity of the K-GMDS was determined by its interscale correlation with the PHQ-9 calculated using Pearson’s correlation coefficients. The coefficient was 0.83, which is higher than 0.70, and thus clinically acceptable (Figure 2).
Factor analysis of the K-GMDS
The KMO test of sampling adequacy score was 0.93, and Bartlett’s test of sphericity was less than 0.001; therefore, the survey data in this study met the conditions for factor analysis [22].
Two factors were extracted: Factor 1 consisted of items 1–8 and Factor 2 consisted of items 9–13. All items had factor loading values higher than 0.40, and thus were positively related to factor I or II. The results are shown in Table 3.
ROC curve of K-GMDS and PHQ-9
To assess the sensitivity and specificity of the K-GMDS for the diagnosis of depression, ROC curves were obtained for all patients and for each gender (Figure 3).

ROC curve for the K-GMDS and the PHQ-9 for all patients and each gender group. ROC, receiver operating characteristic; K-GMDS, Korean version of the Gotland Male Depression Scale; PHQ-9, Patient Health Questionnaire-9.
The p-values for each AUC indicated that both K-GMDS and PHQ-9 had statistically significant discrimination ability in all groups (total: p<0.001 for both scales; males: p<0.001 for K-GMDS, p=0.002 for PHQ-9; females: p=0.008 for KGMDS, p=0.001 for PHQ-9), suggesting that both scales are valid tools for screening depression.
As all p-values for the difference between AUCs were larger than 0.05, the difference between the AUCs of the K-GMDS and PHQ-9 was not statistically significant in any group. p-values and AUCs are displayed in Table 4.
However, the K-GMDS, when conducted on males, had an acceptable value as it was larger than 0.7, while in other groups it had poorer results [23].
Screening ability of K-GMDS and PHQ-9
The PHQ-9 shows slightly higher sensitivity and NPV but lower specificity and similar PPV compared to the K-GMDS, leading to a marginally higher overall accuracy.
However, this tendency changed when the group was divided according to gender; the K-GMDS showed strength in identifying depressed male patients with higher sensitivity and accuracy compared to the PHQ-9, making it a potentially more effective tool for screening depression in men. The PHQ-9, however, was more effective for female patients, reflected by its higher sensitivity and accuracy.
The initial cut-off score for the K-GMDS was set at 12, as suggested by Zierau et al. [19] However, recalculating the optimal cut-off score using the Youden index for our clinical population revealed a new cut-off score of 11. Similarly, while the general cut-off score for the PHQ-9 is 10, our analysis confirmed that this remains the optimal score for this clinical sample.
For consistency with previous studies and to allow for direct comparisons, the sensitivity, specificity, PPV, NPV, and overall accuracy presented in this study were calculated using the original cut-off scores (12 for K-GMDS and 10 for PHQ-9) as recommended by previous literature [19,20].
The sensitivity, specificity, PPV, NPV, and overall accuracy of both scales are shown in Table 5.
DISCUSSION
This study is the first to validate the Korean version of a male depression scale. In Korea, although several qualitative studies related to male depressive symptoms have been conducted, no evidence has been acquired using methods such as scale development [24].
Of 233 patients, 98 were clinically classified as having major depressive episodes. In terms of reliability, both scales were reliable according to Cronbach’s α. Regarding external validity, the interscale correlation between the K-GMDS and PHQ-9 was 0.83, which is clinically acceptable. The factor structure revealed a two-factor model divided into items 1–8 and 9–13.
The severity of depressive symptoms, as measured by PHQ-9 and K-GMDS scores, does not significantly vary by gender. Also, according to the results of the ROC analysis, the AUCs of both scales were not statistically different, indicating that they had a similar ability to screen for depressive symptoms. For both the whole sample and each gender, the difference in AUCs between the scales was not statistically significant.
This study showed lower AUCs for both scales than previous studies [14,21,25-28]. The AUC was above 0.7 for male patients using the K-GMDS, indicating acceptable discrimination. However, for female patients using the K-GMDS and for both male and female patients using the PHQ-9, the AUC values ranged from 0.6 to 0.7, suggesting poorer discrimination [23]. The PHQ-9 had AUCs larger than 0.7 in most previous studies [21,25-27]. The ROC analysis of the GMDS studies had varied, but this study’s AUC is among the lower ranges observed [14,28]. Previous studies [21,28] focused on distinguishing depression in the general population or between general and clinical groups. In contrast, this study involved only psychiatric outpatients, who may have higher baseline levels of depressive symptoms, even without depressive disorders. This could reduce the discrimination ability of the scales, as depressive symptoms are more common across various diagnoses in this clinical setting.
In the factor analysis of the K-GMDS, previous studies have yielded conflicting results [12-14,28]. Our study identified a twofactor model, splitting items into 1–9 and 10–13, mirroring the model used in a prior community sample study of the KGMDS [29]. This similarity might raise concerns about the face validity of the Korean translation of the GMDS, especially since items 10–13 begin with nearly identical phrases, “Do you and your neighbors think...?,” diverging from the intent of the original items. Such phrasing potentially guides respondents to interpret items 1–9 as self-awareness symptoms and items 10–13 as reflective of their neighbors’ perceptions, which was not the original version’s aim. Consequently, the face validity of some items in the K-GMDS translation may not accurately represent the intended constructs, indicating a need for revision to ensure alignment with the original measure’s intent.
Previous studies validating the GMDS generally used the Beck Depression Inventory (BDI) or BDI-II as a conventional scale, but this was the first study to use the PHQ-9 [12,15,19,27]. Both the BDI-II and PHQ-9 possess adequate reliability, validity, and screening capacity, but the PHQ-9 is shorter, easier, and based on the diagnostic criteria for depression in the DSM-5 [30]. For these reasons, we chose the PHQ-9 as the reference scale.
Strömberg et al. [15] assessed the usefulness of the GMDS by administering the BDI and GMDS to men visiting a family doctor’s drop-in clinic. They reported that the GMDS did not identify as many cases as the BDI. Sigurdsson et al. [28] conducted a community study to validate the GMDS and reported that it was as effective as the BDI for screening male depression based on the results of the ROC curve analysis. However, it could not be said that it was superior to the BDI in its ability to screen for “male depression” because all participants were men. This study reveals that the overall screening efficacy of the K-GMDS and PHQ-9 is similar, yet, when screening for male depression, the K-GMDS demonstrates marginally better performance in certain metrics, including AUC, sensitivity, and overall accuracy.
This study is significant in that it applied the GMDS and a conventional depression scale to both men and women and compared the performance of these scales across the total sample and by gender. While previous studies have administered the GMDS to both men and women, they focused on comparing total scores or item-level scores between genders, or conducted confirmatory factor analyses for men and women [13,14]. However, no study has compared the performance of the scale across genders.
Despite the strengths of this study, it has some limitations. In addition to patient age, demographic variables such as occupation, marital status, and educational background, which were not included in this study, could have affected the results as confounding variables. Further, as the study design was retrospective, diagnoses of MDD were based on clinical judgement by board certified psychiatrist, instead of using structured diagnostic interview. This may result in lower diagnostic reliability compared to studies that utilized structured assessment tools.
In conclusion, this study is the first to validate and assess the screening ability of the K-GMDS compared to a conventional depression scale. The K-GMDS has been proven to be valid and useful in screening for MDD. Therefore, the K-GMDS emerges as a critical tool for depression screening among Korean men, laying groundwork for future studies in male depression in South Korea.
Notes
Availability of Data and Material
The datasets generated or analyzed during the study are available from the corresponding author on reasonable request.
Conflicts of Interest
The authors have no potential conflicts of interest to disclose.
Author Contributions
Conceptualization: Jung Yeon Moon, Seoyoung Yoon. Data curation: Seong Yoon Kim, Seungheon Yang. Formal analysis: Jung Yeon Moon, Seoyoung Yoon. Investigation: all authors. Methodology: Jung Yeon Moon, Seoyoung Yoon. Resources: all authors. Supervision: Seoyoung Yoon. Validation: Jung Yeon Moon, Seoyoung Yoon. Visualization: Jung Yeon Moon, Seoyoung Yoon. Writing—original draft: Jung Yeon Moon. Writing—review & editing: Seoyoung Yoon.
Funding Statement
None
Acknowledgements
None