The 12-Item General Health Questionnaire as an Effective Mental Health Screening Tool for General Korean Adult Population
Article information
Abstract
Objective
The 12-item General Health Questionnaire (GHQ-12) has been used extensively in various settings across different cultures. This study was conducted to determine the thresholds associated with optimum sensitivity and specificity for the GHQ-12 in Korean adults.
Methods
Data was acquired from a sample of 6,510 Korean adults, ages 18 to 64 years old, who were selected from the 2005 Census (2,581 men and 3,929 women). Participants completed the GHQ-12 and the Korean Composite International Diagnostic Interview (K-CIDI). Receiver Operating Characteristic (ROC) curve analysis was conducted.
Results
The mean GHQ-12 score for the total sample was 1.63 (SD 1.98). The internal consistency of the GHQ-12 was good (Cronbach's α=0.72). Results from the ROC curve indicated that the GHQ-12 yielded greater accuracy when identifying mood and anxiety disorders than when identifying all mental disorders as a whole. The optimal threshold of the GHQ-12 was either 1/2 or 2/3 point depending on the disorder, but was mainly 2/3.
Conclusion
The Korean version of the GHQ-12 could be used to screen for individuals at high risk of mental disorders, namely mood and anxiety disorders.
INTRODUCTION
The complexity inherent in identifying mental disorders has been recognized as an important healthcare issue. Mental disorders result in substantial patient suffering as well as growing healthcare costs; they are present in at least 20-36% of outpatients in primary care settings.1-3 Thus, it is important to develop and improve screening tools for mental disorders.
The General Health Questionnaire (GHQ) is a self-administered screening tool designed to detect current mental disturbances and disorders. Since its development by Goldberg and Hillier,4 the GHQ has been translated into 38 languagesa testament to the validity and reliability of the questionnaire.5
The GHQ was originally developed in a 60-item format, but several abridged versions (e.g., the GHQ-30, GHQ-28, GHQ-20, and GHQ-12) are also currently available. The GHQ-12 which has 12 items is especially attractive for use in busy clinical settings. It was adopted as a screening tool in an international World Health Organization (WHO) study of psychological disorders in primary health care, as it has been deemed the best validated among similar screening tools.6-8
In Korea, reliability and validity studies on several versions of the GHQ have been conducted, but only one has examined the validity of the GHQ-12 as of date of article (2012).9-12 In order to utilize the GHQ-12 as a screening tool, a cut-off score indicating the presence of mental illness should be determined for this instrument. Several studies offered optimal thresholds for the GHQ-12. Goldberg et al.8 showed that the best threshold for scores varied from 1/2 to 6/7, with the most common cut-off score being 2/3. A review of 17 other GHQ-12 validity studies revealed an equally wide range of ideal threshold scores, varying from 0/1 to 5/6.13
In later studies, in which this initial research was replicated, the distribution of cut-off scores ranged from 1/2 to 3/4.1,14-16 Differences in cut-off points may be due to diversity in the prevalence rates of psychiatric disorders and comorbid diagnoses.17 In addition, cultural factors could also be related to this disparity. For example, Lewis & Araya18 insisted on the possibility of a cultural bias, with Chilean participants tending to score higher than their British counterparts. However, this difference was found in only the negative aspects of the GHQ-12.19
It is essential to firmly establish a cut-off point, since it is only once an appropriate cut-off point is chosen that the GHQ-12 can be used to effectively identify persons with mental illness. Further, most researchers drew their samples from patients in primary care settings; a more diverse sample needs to be used to ensure that this is determined to be a useful screening tool for use throughout the community.
In an attempt to address the aforementioned shortcomings, the present study attempts to determine the thresholds associated with optimum sensitivity and specificity of the GHQ-12 in Korean adults. Its findings are anticipated to facilitate the identification of Korean adults with mental illness in both community and primary care settings.
METHODS
Participants and procedure
We used data from the Korean Epidemiologic Catchment Area (KECA-R) study, which was conducted from July 2006 to April 2007. Participants were selected using stratified, multistage, cluster sampling design based on the 2005 population census conducted by community registry offices. One individual per selected household was chosen; he or she was the individual born on the earliest day of the month, without consideration of the birth month or year (i.e., someone born on November 2, 1990 would be selected before someone born on January 15, 1970). From the 7,968 individuals who were initially selected, 6,510 participants between the ages of 18 and 64 were interviewed face-to-face (response rate=81.7%).
The KECA-R study was conducted jointly by the Seoul National University College of Medicine (SNUCM) and the Ministry of Health and Welfare, and was approved by the institutional review board of the SNUCM.
Data collection
Assessment of psychiatric disorders
The diagnostic reliability of the Korean GHQ-12 was assessed in terms of its accuracy in confirming earlier diagnoses made by the Korean Composite International Diagnostic Interview (K-CIDI) version 2.1. The CIDI20 is a fully structured diagnostic interview designed to make psychiatric diagnoses using the definitions and criteria given by the DSM-IV.21 Its Korean version, the K-CIDI was developed by Cho et al.22 in accordance with WHO23 guidelines. The present study utilized the K-CIDI to assign DSM-IV diagnoses, which were then treated as the gold standard.
General health questionnaire
We translated all items of the GHQ-12 into Korean and then asked a psychologist fluent in both languages to translate them back into English. Based on the success of this back translation, we expected minimal difference between the original and revised versions. Past researches suggest that Koreans are less likely to express positive emotions, and tend to give negative answers to questions about positive emotions or characteristics.24,25 Therefore, Item 8 ("been able to face up to your problems?") was translated in such a way as to express the question negatively. Each item of the GHQ-12 was rated on a 4 point scale, with the possible responses being "less than usual," "no more than usual," "rather more than usual," or "much more than usual." We mainly used a bimodal scoring method, whereby "less than usual" and "no more than usual" were both worth 0 points, and "rather more than usual" and "much more than usual" were each worth one 1 point. In the Likert-type scoring method, 0="less than usual," 1="no more than usual," 2="rather more than usual," and 3="much more than usual." Positive questions (Items 1, 3, 4, 7, and 12) were scored inversely.
Data analysis
We used SPSS version 18.0 to calculate descriptive statistics. A receiver operating characteristic (ROC) curve analysis was conducted on the data, as this a useful method for visualizing performance ability and grouping classification.26 This technique plots a test's true positive rate (sensitivity) against its false positive rate (1-specificity).27 The area under the curve (AUC) ranges from 0.5, for models with no discrimination ability, to 1, for models with perfect discrimination ability.28 Superior decision or detection performance is indicated by an ROC curve that is in the upper left corner of the ROC space.29 Based on levels of sensitivity and specificity, the AUC suggests an appropriate level of classification. An approximate guide for classifying the accuracy of the AUC is the traditional point system,28,30: 0.90-1.00=excellent; 0.80-0.90=good; 0.70-0.80=fair; 0.60-0.70=poor; 0.50-0.60=fail. In addition, the positive predictive value (PPV) describes what fraction of all positive results is correct; while the negative predictive value (NPV) describes what fraction of the negative results are correct. The predictive values are highly dependent on disease prevalence in the study sample.31
Total GHQ-12 scores were utilized as the test variable. The data from 60 participants were excluded from data analysis because they were incomplete. However, this did not affect the final results. The criterion variable comprised a composite of psychiatric diagnoses according to the K-CIDI. Two-by-two contingency tables were also created by cross tabulating diagnostic outcomes (the presence or absence of any mental disorder according to the K-CIDI) and the GHQ-12 screening outcomes (positive or negative screening on the GHQ-12).
RESULTS
Sample characteristics
Table 1 presents participant demographics (age, marital status, education level), as well as participants' GHQ-12 scores by gender. There were a total of 6,510 participants (2,581 male; 3,929 female). The overall mean score was 1.63 points (SD=1.98). The mean score for males was 1.51 (SD=1.84) and for females was 1.71 (SD=2.05) (t=-3.971, p<0.001).
Reliability
The internal consistency coefficients (Cronbach's alpha) of the GHQ-12 for bimodal scoring (0-0-1-1) and Likert-type scoring (0-1-2-3) were 0.72 and 0.79, respectively, indicating satisfactory internal consistency.
Validity coefficients and area under ROC curve
As this study aimed to examine the adequacy of the GHQ-12 as a diagnostic tool, lifetime diagnoses were not taken into consideration, and only current mental status was evaluated. Table 2 shows one-month prevalence of psychiatric diagnoses and GHQ-12 mean scores for Korean adults who completed the K-CIDI. The GHQ-12 mean score for those with mood disorders (2.3%) was 5.33 (SD=3.38), those with major depressive disorders was 5.39 (SD=3.43) and those with dysthymic disorders was 6.26 (SD=3.25). The GHQ-12 mean for those with anxiety disorders (4.8%) was 3.25 (SD=3.16), the mean for those with social phobia disorders was 4.75 (SD=3.63), and the mean for those with generalized anxiety disorders was 6.27 (SD=3.53). The GHQ-12 mean encompassing all mental disorders (according to DSM-IV diagnostic criteria, excluding nicotine-and alcohol-related disorders) was 3.50 (SD=3.15).
Table 3 shows means scores for participants who met DSM-IV diagnostic criteria for mental disorders excluding nicotine-and alcohol-related disorders (cases), as compared to non-cases, who did not meet diagnostic criteria (based on one-month prevalence). The GHQ-12 mean for cases (7.8%) was 3.50 (SD=3.15), and the mean for non-cases (92.2%) was 1.47 (SD=1.75). The cases group had significantly higher mean scores than the non-cases group (t=14.265, df=530.735, p<0.001). There was a non-significant difference in mean scores between males and females within the cases group (t=1.619, df=503, p=0.106).
The threshold values, sensitivity, specificity, PPV, NPV, and AUC of the GHQ-12 based on one-month prevalence are summarized in Table 4. The ROC analysis showed that the optimal cut-off point for the identification of diagnoses excluding nicotine-and alcohol-related disorders was 1/2. Sensitivity was 65% and specificity was 64%. However, the traditional, established point system for the AUC28-30 specifies that the cut-off point must be at least 0.70 to ensure fair accuracy. Within psychiatric disorders, higher sensitivity and specificity values were found for mood and anxiety disorders, such as generalized anxiety disorder (GAD), with the more appropriate value of 2/3 points commonly appearing. The AUC for dysthymic disorder was 0.90, the highest of all mental disorders within the present sample, with a sensitivity of 95% and specificity of 76%. In addition, the AUC for major depressive disorder was 0.82, and the sensitivity and specificity were 82% and 63%, respectively, giving a cut-off of 1/2, and 74% and 77%, respectively, for a cut-off of 2/3. The AUC for mood disorders was 0.83; the sensitivity and specificity were 84% and 63%, respectively, for a cut-off of 1/2, and 74% and 77%, respectively, for a cut-off of 2/3. In the discrimination of anxiety disorders, the AUC for GAD was 0.88; the sensitivity and specificity were 93% and 62%, respectively, for a cut-off of 1/2, and 84% and 77%, respectively, for a cut-off of 2/3. In addition, the AUC for social phobia was 0.78; sensitivity was 85% and specificity was 62%, for a cut-off of 1/2. The ROC curve for mood and anxiety disorders is presented in Figure 1.
DISCUSSION
An optimal standard score for any tool used in screening for mental disorders is necessary to best discriminate between healthy and high-risk groups. To date, many studies have verified the diagnostic validity and optimal thresholds of the GHQ-12, but most are limited to primary care settings.1,13-17,32,33-38 Moreover, the GHQ-12 is rarely used in Korea. Against this background, the current study aimed to provide support for the diagnostic validity of the GHQ-12 for use with Koreans, in a larger community setting. To this end, this study administered the Korean version of the GHQ-12, in addition to a mental illness epidemiological survey, to community-dwelling adults, and used an ROC analysis to verify the effectiveness of the GHQ-12 in screening for psychiatric disorders and to confirm its cut-off point.
The predictive power of the GHQ-12 for all diagnoses excluding nicotine-and alcohol-related disorders (AUC=0.696) indicated non-excellence. When we examined the exploratory predictive power of the GHQ-12 for specific diagnoses, we found AUC values over 0.70 were mostly associated with mood or anxiety disorders. Notably, the AUC for dysthymic disorder was 0.90, the highest among all diagnoses. Excellent predictive power was also found for other depressive disorders, with AUC values over 0.80. The AUCs for anxiety disorders were over 0.70, showing fair predictive power. These results suggest that the Korean GHQ-12 can be useful in screening for mood or anxiety disorders in both community and primary care settings, although it is more effective for some disorders than for others.
In the next phase of analysis, the threshold point for each disorder was calculated. For mood disorders, the optimal threshold was 2/3 points (sensitivity 74-95%, specificity 76-77%); in cases requiring high sensitivity, the more conservative threshold of 1/2 can be used. In addition, the optimal threshold for anxiety disorders was 1/2 or 2/3 points, depending on the type of the disorder. Naturally, the optimal threshold would vary depending on the reason for using the GHQ-12. If the goal is to comprehensively screen for any diagnosis, even at the risk of a high false positive rate, a 1/2 cut-off point may be established as the screening criteria. However, for better discrimination of mood disorders (e.g., dysthymic disorder and major depressive disorder) and anxiety disorders (e.g., GAD, social phobia, and agoraphobia), it may be more appropriate to adopt the more stringent threshold of 2/3. The optimal threshold ranged from 1/2 to 6/7 in studies using samples from various countries; results were concentrated around 1/2-3/4, with 2/3 being the most frequent cut-off point.13
This study offers support for an estimated optimal threshold that can be used as a criterion when screening for those at a high risk for mental disorders, including mood and anxiety disorders. These results are in accordance with Park et al.'s study, which explains that the Korean version of the GHQ-12 yields a two-factor structure of depression/anxiety and social dysfunction.12 However, caution should be exercised when using the GHQ-12 to screen for other psychiatric disorders with unverified clinical discrimination.
This study confirms the efficacy of the Korean version of the GHQ-12. In addition, applicability of the GHQ-12 may be extended to the general population by using a large sample of Korean adults. The Korean GHQ-12 can be completed within two minutes and can be easily understood and scored, making it particularly useful in busy clinical settings. Such screening aims to quickly identify at-risk individuals and direct them to treatment.39 A system of routine screening can facilitate the early identification, intervention, and prevention of depressive and anxiety disorders.40
This study is limited in that participants' ages ranged from 18 to 64, meaning that its results cannot be generalized to the growing population of older adults in Korea. Therefore, a follow-up study would be required to test the diagnostic validity of the Korean GHQ-12 in an elderly population.
Acknowledgments
We as authors thank the interviewers and the Korean Ministry of Health and Welfare for their cooperation and support.