# Evaluation of Two Circadian Rhythm Questionnaires for Screening for the Delayed Sleep Phase Disorder

## Article information

## Abstract

### Objective

Delayed sleep phase disorder (DSPD) is a condition in which patients often fall asleep some hours after midnight and have difficulty waking up in the morning. Circadian chronotype questionnaires such as Horne-Östberg Morningness-Eveningness Questionnaire (MEQ) and Basic Language Morningness (BALM) scale have been used for screening for DSPD. This study was to evaluate these two chronotype questionnaires for screening of DSPD.

### Methods

The study samples were 444 DSPD and 438 controls. Cronbach's alpha coefficient was calculated to evaluate for internal consistency. An exploratory factor analysis was conducted using principal-axis factoring. The diagnostic performance of a test was evaluated using Receiver Operating Characteristic (ROC) curve analysis. A discriminant function analysis was also performed.

### Results

For internal consistency, Cronbach's alpha of 0.898 for BALM was higher than the 0.837 for MEQ, though both have acceptable internal consistency. BALM has better construct validity than the MEQ because some MEQ items measure different dimensions. However, when we evaluated the efficiency of two questionnaires for DSPD diagnosis by using the ROC curve, the BALM was similar to the MEQ. In a discriminant analysis with the BALM to classify the two groups (DSPD vs. normal), 6 items were identified that resulted in good classification accuracy. Upon examination of the classification procedure, 94.2% of the originally grouped cases were classified correctly.

### Conclusion

These findings suggest that the BALM has better psychometric properties than the MEQ in screening and discriminating DSPS.

## INTRODUCTION

Chronotype is an individual difference reflecting the time of day at which individuals are "at their best".1,2 Chronotype is often assessed by means of self-rating questionnaires. The first and the most widely used chronotype questionnaire is the Morningness-Eveningness Questionnaire (MEQ).3 Some of the criticisms addressed to the MEQ are that the total score may not be appropriate to measure a multidimensional construct,4 and that small subsets of items may convey most of the total variance of the measure.5,6 Similar criticisms have been raised for the Circadian Type Questionnaire (CTQ) developed by Greenwood.7 Smith et al.8 developed a new questionnaire called the Composite Scale of Morningness (CSM) that is composed by the best items of the MEQ (nine items) and the CTQ (four items). Because CSM was criticized for language difficulty due to British English, it was simplified into a 'basic language morningness' (BALM) scale at a seventh grade (12-13 years) reading level.9 Both MEQ and BALM are widely used in evaluating morningness in US. The original Horne-Östberg scale has also been rephrased into a simplified American idiom,10 which was used for this study.

Delayed sleep phase disorder (DSPD) is a circadian rhythm sleep disorder. Typical DSPD patients tend to fall asleep some hours after midnight and have difficulty waking up in the morning. A prevalence of DSPD in the general adult population, equally distributed among women and men, has been reported at approximately 0.15%,11,12 but Ando et al.13 found that the prevalence of mild DSPD symptoms might be much higher. Patients with DSPD may be unable to fall asleep early enough to rest adequately before it is time for school or work, or they may be frequently unable to arrive at school or work on time. Consequently, DSPD can be a disabling and socially isolating condition, unless the patients are able to fit their habits into an accommodating social milieu.

Although sleep logs and actigraphy monitoring are helpful, there is no diagnostic test for DSPD, and diagnosis is based on the sleep history. Chronotype questionnaires such as MEQ and BALM have been used for screening for DSPD as a way of obtaining the history. However, there has been no information which one is better for the assessment for DSPD. We have performed genetic studies of DSPD and collected large samples of DNA from DSPD volunteers and matched controls. The purpose of this paper was evaluation of the two chronotype questionnaire for screening of DSPD.

To assess the psychometric characteristics of BALM and MEQ, reliability, validity and diagnostic utility were examined for each questionnaire. Concretely, we compared the two scales on the item-total correlation, scale coefficient alpha (internal consistencies), construct validity (factor analyses), criterion-related validity and classification accuracy (ROC curve analysis). Also discriminant analysis was used to determine which items were the best predictors of DSPD.

## METHODS

### Participant recruitment

The study participants were recruited for a case-control genetic study of DSPD. The recruitment process was described in previous reports.14,15 A brief description is as follows. In the initial part of the study, recruitment was limited to the Southern California region and was later expanded to other states of the US. Recruitment of the sample took place between June, 2004 and May, 2010. Some data from the sample was published previously.14,15 Recruitment of DSPD participants utilized contacts with sleep physicians, media contacts, UCSD minority outreach programs, and internet advertising. Only participants 25 years of age or older were accepted (with a few exceptions). DSPD participants completed the questionnaires and wore a wrist actigraph for 2 weeks. Normal control volunteers were recruited by word-of-mouth, community meetings, the internet, and by a campus poster. Control recruitment was targeted, so far as possible, to match the ancestry of the case series. Control participants completed the questionnaires and contributed a sample of blood or saliva, but they were not asked to wear the actigraphs or to provide sleep logs. The principal investigator (DFK) reviewed the record of each participant volunteering as a control. Based on their questionnaires, control volunteers were retrospectively rated as 1) certain DSPD, 2) possible DSPD, 3) neither, 4) possible ASPS, or 5) certain ASPS. Only those with ratings 3 or 4 were included among the normal controls in this analysis. All the subjects received an explanation of the study and signed written informed consent.

### DSPD diagnosis

Once all data were assembled, the principal investigator (DFK) reviewed the record of each participant who had volunteered as a DSPD case and recorded the participant's DSPD classification as 1) absolutely certain, 2) fairly certain, 3) questionable, 4) unlikely, or 5) very doubtful. The initial criterion for classification was the MEQ, recognizing that the criterion for definite evening type of <30 was too strict for the San Diego population.14 Confirmatory classification criteria included the score on the BALM, reported prior-week and adult-life bedtimes and awakening times, the actigraphic recordings, and whether the participant reported going to sleep "somewhat later" or "much later" than most people their age, both as a child and as an adult. Whether the participant reported distress about falling asleep, reported related social or vocational problems, or had sought medical attention for a sleep problem was also considered. The consistency of the data supporting a classification of DSPD was evaluated, together with the presence of depression, other mental illnesses, or other sleep disorders which might confuse the classification. However, if depression or other disorders had their first incidence after the onset of a pattern of delayed sleep and did not appear to be causing the delay, these disorders were not considered exclusionary. As many DSPD patients cannot consistently report to work by 8-9 AM, evening or night shift work was not considered exclusionary if the history indicated that the delay in sleep occurred before shift work was adopted, and the delay tended to persist when the participant was off work.

### Statistical analysis

#### Internal consistency

To determine internal consistency, Cronbach's alpha coefficients were calculated for the BALM and MEQ. Cronbach's alpha coefficient was calculated to determine the degree of homogeneity among the items in scale. A Cronbach's alpha of 0.80 may be considered acceptable. Between 0.80 and 0.89, the level of item consistency is good; and when it is 0.90 and above, it is excellent.16

#### Comparison of validity

Two types of validity were examined: construct and criterion validity. First, the internal structure or construct validity of each subscale within the BALM and MEQ was evaluated based on an exploratory factor analysis (EFA). Second, criterion validity was examined by computing correlations between the two scales and DSPD diagnoses.

For construct validity, an EFA was conducted using principal-axis factoring in SPSS (Version 15) on each scale to explore underlying dimensions. To minimize subjectivity, it was decided to apply the following decision criteria: 1) Kaiser's eigenvalue rule which is a psychometric criterion, 2) Cattell's scree plot and 3) the interpretability criterion.17 In EFA, factor loadings are generally considered to be meaningful when they exceed 0.30 or 0.40.17 To determine inclusion of an item in a factor dimension, a score above 0.35 on a primary loading of items after rotation was used as the cut off. In order to show criterion-related validity, each of two circadian rhythm scale scores (interval variable) was compared to a dichotomous variable, DSPD diagnosis using an eta (η) correlation coefficient.

#### ROC curve analysis

The diagnostic performance of a test, or its accuracy in discriminating diseased cases from normal cases is evaluated using Receiver Operating Characteristic (ROC) curve analysis. Receiver-operating characteristic (ROC) plots provide a pure index of accuracy by demonstrating the limits of a test's ability to discriminate between alternative states of health over the complete spectrum of operating conditions,18 over a range of threshold levels.

With ROC, the effectiveness of an instrument is assessed by evaluating the accuracy of discrimination between two groups. An area under the curve (AUC) of 1 defines a perfect test, whereas an area of 0.5 represents a completely inefficient measure; ROC areas of 0.80-0.90 are considered good discriminators, and curves of 0.90-1 are considered excellent.19 We have tried to evaluate whether BALM and MEQ performed better for evaluating and making diagnoses of DSPD, with the caution that the MEQ was generally more influential a priori in influencing the diagnostic rater, because MEQ ratings were more familiar. The efficiency of the two assessment tools for DSPD diagnosis was estimated by using ROC Curve nonparametric analysis. This curve, used to assess the accuracy of diagnosis criteria, offers a graph of true positive rate, sensitivity vs. false positive rate, in other words, 1.0-specificity.

#### Discriminant function analysis

A discriminant function analysis (DFA) was performed in order to determine whether DSPD case could be reliably classified from a set of predictors (circadian rhythm scale items) and whether any of the scale items demonstrated differential item functioning as a function of group. That is, DFA enabled us to determine the relative importance of the dependent variables (DVs) in discriminating among DSPD groups. In this study, the DVs (circadian rhythm scale items) were treated as predictors in order to examine how they were able to predict group membership for the two DSPD groups (DSPD and normal). Univariate F tests were then calculated in order to determine the importance of each independent variable (IV) in forming the discriminant functions.

Examining the Wilk's Lambda values for each of the predictors revealed how important the IV was to the discriminant function, with smaller values representing greater importance.

#### Statistical packages

All of the analyses were performed using SPSS for Windows version 15 (SPSS, Chicago, IL, USA), with the cutoff for statistical significance set at p<0.05.

## RESULTS

### Subjects

By the recruitment process described above, we recruited a case series of 387 DSPD along with 361 controls. Their mean age was 38.41 (SD=12.42). The gender distribution was 67.1% female and 32.9% male.

### Internal consistency

Table 1 and 2 show item-total statistics and Cronbach's alpha for the two scales, respectively. Internal consistency alphas were 0.898 for BALM and 0.837 for MEQ. In terms of standards, Nunnally recommends that coefficient alpha be 0.70 or higher in basic research and 0.90 or higher in applied settings where clinical decisions are based on test scores.20 Both scales had acceptable internal consistency. However, although Cronbach's alpha is usually higher for the large items scale than the small items scale because the number of items affects the magnitude of the coefficient,21 the alpha coefficient of the BALM was higher than that of the MEQ in spite of small item numbers of BALM. The corrected item-total correlations (homogeneity index) were greater than 0.40 (considered to be a marker of good coefficients) in all items of the BALM scale, but for the MEQ, seven of items (nos. 3, 6, 8, 12, 13, 14, 16) were lower than 0.40 (Table 1 and 2).

### Comparison of validity

#### Construct validity

For the EFA, orthogonally rotated factor solutions were used. Table 3 and 4 give the factor loading matrix for two scales. As shown in Table 3, three factors with eigenvalue of 1.0 or above were extracted from the BALM by using principal axis factoring with varimax rotation. A 3-factor solution accounted for 55.34% of the variance: Factor I (eigenvalue, 3.18; variance, 24.46%); Factor II (eigenvalue, 2.53; variance, 19.46%), and Factor III (eigenvalue, 1.484; variance, 11.42%). The first factor was defined by 7 items (nos. 1, 6, 8, 9, 10, 11, 13) with factor loading above 0.50. The items of this factor was congruent with factor 1 (morningness/effort) in the Brown study.9 The second factor was defined by 4 items (nos. 3, 4, 5, 12) with loading above 0.50, in common with factor 2 (morning alertness) in the Brown study.9

The third factor (evening) had 2 items with loading above 0.60 that refer to activities and affect in the evening as defined by the Brown study.9 Regarding the alpha values for the subscales of BALM, factor 1 (alpha=0.880), 2 (alpha=0.826), and 3 (alpha=0.759) subscales were above Nunnally's critical value of alpha >0.70.20

From the MEQ, five factors with eigenvalue of 1.0 or above were extracted by using principal axis factoring and orthogonally rotated by using varimax rotation (Table 4). Of the 19 items, 15 items loaded above 0.40 on one of the five extracted factors. A 5-factor solution accounted for 42.40% of the variance: Factor I (eigenvalue, 2.847; variance, 14.98%); Factor II (eigenvalue, 1.938; variance, 10.20%), Factor III (eigenvalue, 1.491; variance, 7.85%), Factor IV (eigenvalue, 1.185; variance, 6.24%), and Factor V (eigenvalue, 0.595; variance, 3.13%).

The first factor was composed of 6 items (nos. 9, 11, 15, 17, 18, 19). Four of the 6 items referred to morning types of activity. The second factor was composed of 3 items (nos. 4, 5, 7) which refer specifically to affect in the morning. The third factor was defined by 3 evening items (nos. 2, 10, 12). The fourth factor was identified by 2 items (nos, 3, 13). The fifth factor had only one item (no. 1) loading above 0.40. Cronbach's alpha values for subscales of the MEQ were factor 1 (alpha=0.809), 2 (alpha=0.735), 3 (alpha=0.623), and 4 (alpha=0.634). The alpha of factor 5 could not be calculated due to its having only a single item. Only factor 1 and factor 2 surpassed the advocated alpha level of 0.70, for the three remaining subscales did not exceed the commonly accepted threshold value of alpha >0.70.

#### Criterion related validity

Criterion validity coefficients between the two circadian rhythm scales and the criterion variable (dichotomous variable) DSPD diagnosis was eta=0.838 and eta=0.870 for the BALM and MEQ, respectively. Both coefficients are much greater than 0.30, which provides evidence for criterion validity of the two scales. However, when we evaluated the significance of differences between the two correlation coefficients, the two coefficients were not significantly different statistically (Z=1.75, p>0.05).

#### Classification accuracy

The Area under Curve (AUC) shows the efficiency of the tool: the broader the area is, the more efficient evaluation is. Figure 1 and Table 5 shows the ROC analysis results. The ROC analysis yielded a high AUC estimate of 0.972 for the BALM, and 0.986 for the MEQ.

Further, the AUC values were significant (p<0.001), and the 95% confidence interval (CI) bounds did not include 0.50 (BALM: 95% CI 0.962-0.982; MEQ: 95% CI 0.980-0.992), suggesting diagnostic discrimination that was better than chance alone.

#### Discriminant analysis

Then we next conducted a discriminant analysis with the BALM to classify the two groups (DSPD vs. normal). In the stepwise solution of the BALM, 6 predictors (item 2, 6, 7, 8, 11, 13) were identified that resulted in good classification accuracy (Table 6 and 7). Examination of the standardized canonical discriminant function coefficients in Table 6 indicates that the best predictor variable that contributed the most to discriminant function, which differentiated uniquely between the DSPD group and normal group, was BALM item 2 (Thinking only of your own "feeling best" times of day, what time would you go to bed if you were completely free to plan your evening?).

Upon examination of the classification procedure, 94.2% of the originally grouped cases were classified correctly by the discriminant function. Indeed, 93.5% of the DSPD group and 95% of the normal group were correctly classified using participants' scores on BALM item 2, 6, 7, 8, 11, 13. In order to cross-validate these results, the leave-one-out classification procedure in SPSS was conducted to check the predictive accuracy. Results indicated that 93.9% of the cross-validated group was correctly classified. A discriminant analysis with the MEQ was not done for two reasons, which were 1) BALM has better psychometric properties than the MEQ in many aspects, and 2) nine items of MEQ were overlapped with BALM.

## DISCUSSION

Although BALM was similar to MEQ in ROC, it required a smaller number of items and was better than MEQ in item reliability and validity. The main goal of the study was to compare BALM with MEQ in terms of psychometric properties. Comparing internal consistency, Chronbach's coefficient alpha for the BALM (0.898) was higher than that of the MEQ (0.837). Moreover, the corrected item-total correlations were greater than 0.40 in all items of the BALM scale. For the MEQ, seven of the items scored lower than 0.40. This suggests that BALM is superior to MEQ in homogeneity of items. In particular, the α coefficient of 0.898 in BALM is similar to the report by Brown (α=0.879 in shift workers, α=0.91 in day workers) and higher than the result by Pornpitakpan (α=0.781).9,22

Comparing validity, three factors with eigenvalues of 1.0 or above were extracted from the BALM by using principal axis factoring with varimax rotation, showing high consistency in factor structure between Brown9 and the current study done at very different times with very different samples. The alpha values for the subscales of BALM were factor 1 (alpha=0.880), 2 (alpha=0.826), and 3 (alpha=0.759).

The MEQ had considerably lower Cronbach's alpha values, particularly for factors 3, 4, and 5, which had poor consistency. Given the apparent heterogeneity of these 19 items, it is plausible to assume that the items measure different dimensions. These findings suggest that the BALM has better construct validity than the MEQ.

Criterion validity coefficients between the two circadian rhythm scales and the DSPD were eta=0.838 and eta=0.870 for the BALM and MEQ, respectively. Both coefficients are greater than 0.30, which provides evidence for criterion validity of the two scales. The differences in eta were not significant, according to Fisher's Z transformation test, though the BALM had slightly better criterion validity.

The efficiency of two assessment tools for DSPD diagnosis was estimated by using the ROC curve nonparametric analysis. The ROC analysis yielded a high AUC estimate of 0.972 in BALM, and 0.986 in MEQ. Two questionnaires AUC estimates were similar and both very good. The MEQ was slightly (but significantly) better at discriminating diagnosis, but this may be partly tautological, since the MEQ had been more influential in arriving at the rater's diagnostic classifications, combined with many other kinds of information integrated in making these diagnostic judgments.

In a discriminant analysis with the BALM to classify the two groups (DSPD vs. normal), 6 items were identified that resulted in good classification accuracy. Upon examination of the classification procedure, 94.2% of the originally grouped cases were classified correctly. Results revealed that 93.5% of the DSPD group and 95% of the normal group were correctly classified using participants' scores on BALM 6 items. Also in cross-validation, 93.9% of the cross-validated group was correctly classified. From these results, we may develop a shortened scale, which consisted of 6 items.

This study provides new evidence that the BALM is a reliable and valid measure of morningness which is psychometrically superior to the MEQ. The present findings indicate BALM scale is effective for distinguishing between diagnosed cases of DSPD and controls judged without DSPD.

## Acknowledgments

This research was supported by the U.S. National Institutes of Health's NHLBI HL071123. HJL was supported by the Korea Research Foundation Grant funded by the Korean Government (KRF-2010-0025130).