Reliability and Validity of a Tablet-Based Neuropsychological Test (the Hellocog) for Screening Dementia
Article information
Abstract
Objective
To address the gap in timely diagnosis of dementia due to limited screening tools, we investigated the validity and reliability of the Hellocog, computerized neuropsychological test based on tablets for screening dementia. The higher the probability score on the Hellocog, the higher the likelihood of dementia.
Methods
This study included 100 patients with dementia and 100 individuals with normal cognition who were aged 60 years or older and free of other major psychiatric, neurological, or medical conditions. They administered the Hellocog on a tablet under the supervision of a neuropsychologist. To determine test-retest reliability, 20 took the Hellocog again after 4 weeks. Diagnostic performance was assessed using the receiver operator characteristics (ROC) analysis.
Results
The Hellocog showed adequate internal consistency (Cronbach’s alpha=0.69) and good test-retest reliability (intraclass correlation coefficient=0.86, p<0.001). Participants with dementia scored higher on the Hellocog than those with normal cognition (p<0.001), confirming its high criterion validity. Strong correlations with the Mini-Mental Status Examination (MMSE) score and the total score of the Consortium to Establish a Registry for Alzheimer’s Disease Neuropsychological Assessment Battery (CERAD-TS) highlight the concurrent validity of the Hellocog. The area under the ROC curve for dementia of the Hellocog was excellent (0.971) and comparable to that of the MMSE and CERAD-TS. The sensitivity and specificity for dementia were 0.945 and 0.872%, respectively, which were slightly better than those of the MMSE and CERAD-TS.
Conclusion
Hellocog stands out as a valid and reliable tool for self-administered dementia screening, with promise for improving early detection of dementia.
INTRODUCTION
The number of people with dementia is increasing rapidly worldwide [1], including South Korea [2,3]. Although disease modifying therapies for dementia recently have been developed, they may be effective in the prodromal or early stages of the disease [4]. Furthermore, these therapies are specific to Alzheimer’s disease (AD) and there are still no cures for other forms of dementia. Thus, early diagnosis and intervention remain crucial.
However, the current diagnosis rate for dementia is approximately 50% [5], which may be due to complex interaction of a variety of factors, including low public awareness [6], low accessibility to diagnostic services [7], and lack of post-diagnostic services [8]. Among these, the lack of optimal screening tools is identified as a key contributing factor [9,10].
An ideal screening test for dementia should be sensitive and specific enough to identify those people with cognitive impairment who need further comprehensive diagnostic evaluation for dementia. In addition, it should be quick and easy to administer by a range of health professionals or self-administered without any assistance from a health professional. In this sense, the Mini-Mental Status Examination (MMSE), the most popular screening test for dementia in both clinical and research settings [11], has many drawbacks [12]. Although many other cognitive tests have been developed to screen for dementia, their disadvantages are not very different from those of the MMSE [13].
We have therefore developed the Hellocog, which is a brief, self-administered, tablet-based neuropsychological test. We developed the Hellocog based on our previous deep learning model for diagnosing dementia using demographic information, subjective memory complaints, depressive symptoms and the results of comprehensive neuropsychological tests [14]. The Hellocog consists of a questionnaire section (Hellocog-Q) and a cognitive test section (Hellocog-T). The Hellocog-Q consists of five questions about age, years of formal education, presence of subjective memory complaints (yes or no), presence of depressive mood (yes or no), and loss of interest or pleasure (yes or no). The Hellocog-T consists of the time orientation test (month and day of the week), the seven word list memory test (WLMT), the 13-digit trail making test (TMT), the seven word list recall test (WLRT), the seven word list recognition test (WLRcT), the one-minute verbal fluency test (VFT) for animal category, and the five-item confrontational naming test.
The foundation of this study, including the development of the Hellocog, is informed by research conducted by Choi et al. [14], which incorporated the Korean version of the Consortium to Establish a Registry for Alzheimer’s Disease (CERAD-K) neuropsychological assessment. As such, the Hellocog shares some structural similarities with the CERAD-K in terms of the cognitive domains assessed. However, it diverges significantly in its approach and implementation. From the CERAD-K neuropsychological assessment, only items identified by Choi et al.’s [14] research as beneficial for dementia risk screening were selected for inclusion in the Hellocog. Moreover, the specific content of the cognitive task items in Hellocog differs from those in CERAD-K, with the Hellocog designed to complete assessments in about one-third the time required for CERAD-K. Additionally, unlike the comprehensive neuropsychological evaluation provided by CERAD-K, which offers scores across individual cognitive domains, the Hellocog focuses solely on calculating a weighted total score for the purpose of dementia screening. This streamlined approach, facilitated by digital technology, allows for self-administration, immediate data processing, and an interactive testing experience, thus making Hellocog a distinct and efficient tool for dementia screening in contrast to the traditional paper-based CERAD-K.
In this study, we examined the validity and reliability of the Hellocog for screening for dementia and compared its diagnostic performance for dementia with that of the MMSE [15] and the CERAD-K Neuropsychological Assessment Battery total score (CERAD-TS) 16 administered by neuropsychologists.
METHODS
Participants
We enrolled 100 participants with dementia from visitors to the dementia clinic at Seoul National University Bundang Hospital from 2019 to 2021, and 100 participants with normal cognition from participants in the Korean Longitudinal Study on Cognitive Aging and Dementia [17] from 2019 to 2021. However, task failures were observed in 9% of the dementia group and 6% of the control group. Consequently, the final dataset used for data analysis included 91 participants in the dementia group and 94 participants in the control group. It is important to note that there were no dropouts in either group, ensuring the robustness and reliability of our findings.
All participants were aged 60 years or older, lived in the community, and had normal or corrected-to-normal vision and hearing. All participants were free of major psychiatric, neurological, or medical conditions other than dementia that could affect cognitive function. The study was approved by the Institutional Review Board of the Seoul National University Bundang Hospital (B-1905-540-302). All participants were fully informed and provided written informed consent by themselves or their legal guardians.
Diagnostic assessment
Psychiatrists specializing in geriatric psychiatry and dementia research conducted a standardized face-to-face diagnostic interview, physical and neurological examinations and laboratory tests using the CERAD-K Clinical Assessment Battery [15] and the Korean version of the Mini International Neuropsychiatric Interview [18].
Neuropsychologists or trained research nurses assessed the severity of subjective cognitive complaints and depressive symptoms using the Subjective Memory Complaints Questionnaire [19] and the Korean version of the Geriatric Depression Scale [20]. They assessed cognitive function using the CERAD-K Neuropsychological Assessment Battery [15,21], the Digit Span Test [22], and the Frontal Assessment Battery [23]. The CERAD-K Neuropsychological Assessment Battery consists of nine neuropsychological tests: VFT, 15-item Boston Naming Test, MMSE, WLMT, Constructional Praxis Test (CPT), WLRT, WLRcT, constructional recall test, TMT A and TMT B [15]. We defined objective cognitive impairment as a score of -1.5 standard deviation (SD) or below on any of these neuropsychological tests, except MMSE, compared with age-, sex-, and education-adjusted norms of elderly Koreans [16,21-23]. We obtained the CERAD-TS according to the equation proposed in our previous work [16].
A panel of Psychiatrists specializing in geriatric psychiatry and dementia then made a diagnosis of cognitive disorder and determined the global severity of dementia using the Global Deterioration Scale (GDtS) [24] at the consensus diagnostic conference. We diagnosed dementia according to the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition diagnostic criteria [25]. We diagnosed AD according to the National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer’s Disease and Related Disorders Association (NINCDS-ADRDA) diagnostic criteria [26], vascular dementia (VD) according to the National Institute of Neurological Disorders and Stroke-Association Internationale pour la Recherche et l’Enseignement en Neurosciences (NINDS-AIREN) diagnostic criteria [27], dementia of Lewy body (DLB) and Parkinson’s disease with dementia (PDD) according to the consensus guideline proposed by McKeith [28] and frontotemporal dementia according to the Lund-Manchester consensus diagnostic criteria [29].
Administration of the Hellocog
A neuropsychologist explained to each participant how to administer the Hellocog. Each participant then administered the Hellocog, which was installed in a tablet, without any assistance. Although the Hellocog was designed to automatically recognize speech, the accuracy of its speech recognition was limited in elderly Koreans. Therefore, in this validation study, research transcribers converted the recorded voices of the participants into text and manually entered them into the Hellocog. To examine the test-retest reliability of the Hellocog, we asked 20 participants (10 with dementia and 10 with normal cognition) to self-administer the Hellocog twice with a test-retest interval of four weeks.
Methodology and scoring of the Hellocog
Initially, research by Choi et al. [14] identified 43 variables. Of these, 27 were common to both the previous research and the current study involving Hellocog. These 27 variables were used to develop a logistic regression model identical to the one used in Choi’s research, aiming to predict the risk of dementia. The Hellocog test’s scoring system is not designed to provide separate scores for individual cognitive domains such as memory, language, attention, and executive function. Instead, it calculates a weighted composite score that reflects the overall risk of dementia. This clarification is essential to accurately represent the test’s methodology. The composite score is derived from a combination of responses to various cognitive tasks, with factors like age, education, and the cumulative responses to test components considered in the scoring equation. Higher scores on this scale indicate a greater likelihood of dementia. It is important to note that, although Hellocog does not offer domain-specific scores, its broad assessment range across various cognitive tasks is intended to provide a more accurate indication of dementia risk than tests that evaluate a more limited set of cognitive areas. This comprehensive approach aims to ensure that the influence of dementia’s type or stage on the test’s accuracy is minimized. The interpretation of Hellocog scores, based on validated thresholds, allows for a nuanced understanding of test results, facilitating early detection and monitoring of cognitive changes associated with dementia. This method underscores Hellocog’s utility as a reliable and valid tool for dementia screening, emphasizing its capacity to assess risk across a spectrum of cognitive functions without the need for individual domain scores. Based on our previous research [14], the Hellocog score was calculated using the following equation. The higher the Hellocog score, the higher the likelihood of having dementia.
Hellocog score=4.556+0.009*age+0.19*education+2.004*SMC-1.2815*DM2+0.8955*DM-0.7905*LIP2+2.1245*LIP-0.806*TOM-0.908*TOD-0.755*WLLR+0.065*WLLC-0.126*WLLRE-3.620*WLLP-0.312*WLL1+0.032*WLL2-0.095*WLL3+0.006*TMT-0.703*WLR+0.140*WLRI+0.237*WLRCB-0.321*WLRC-0.286*VF1-0.302*VF2-0.205*VF3-0.356*VF4+0.037×VFC+0.250*VFIS+0.104*VFS+0.043×NT
(SMC, subjective memory complaints; DM, depressive mood; LIP, loss of interest or pleasure; TOM, time orientation to month; TOD, time orientation to day; WLLR, recency index of word list learning; WLLC, consistency index of word list learning; WLLRE, number of repetition errors in word list learning; WLLP, primacy index of word list learning; WLL1, number of correct recalls in the first trial of the word list learning; WLL2, number of correct recalls in the second trial of the word list learning; WLL3, number of correct recalls in the third trial of the word list learning; TM, time for completing the trail making; WLR, number of correct recalls in the delayed word list recall; WLRI, number of intrusion errors in the delayed word list recall; WLRCB, response bias index of word list recognition; Y2, WLRC, total score or word list recognition; VF1, number of correct response during the first 15 seconds of the categorical verbal fluency; VF2, number of correct response during the second 15 seconds of the categorical verbal fluency; VF3, number of correct response during the third15 seconds of the categorical verbal fluency; VF4, number of correct response during the fourth 15 seconds of the categorical verbal fluency; VFC, clustering index of the categorical verbal fluency; VFIS, ineffective switching index of the categorical verbal fluency; VFS, switching index of the categorical verbal fluency; NT, number of correct confrontational naming on middle-frequency objects)
Statistical analysis
We compared continuous and categorical variables between groups using Student’s t-tests and chi-squared tests respectively.
We examined the internal consistency of the Hellocog using Cronbach’s alpha. We examined the test-retest reliability of the Hellocog using the intraclass correlation coefficient (ICC) between the test and retest scores. We assessed the concurrent validity of the Hellocog by examining its correlations with the MMSE, CEARD-TS, and GDtS using Pearson’s correlation analysis adjusting for age and education. We examined the criterion validity of the Hellocog by comparing Hellocog scores between participants with dementia and those with normal cognition. We evaluated the diagnostic accuracy for dementia of the Hellocog, MMSE, CERAD-TS and GDtS using receiver operating characteristic (ROC) analysis and determined the optimal cutoff score for dementia using the Youden index maximum (sensitivity+specificity-1) [30]. We compared the diagnostic accuracy for dementia of the Hellocog with that of the MMSE and CERAD-TS by comparing their areas under the ROC curve (AUC) using the z-test [31].
We performed ROC analyses using the MedCalc Statistical Software version 19.1 (MedCalc Software, Ostend, Belgium; https://www.medcalc.org; 2019) and all other analyses using the SPSS version 18.0 (SPSS Inc., Chicago, IL, USA).
RESULTS
As summarized in Table 1, of the 200 participants, 185 (91 with dementia and 94 with normal cognition) were included in the final analysis after excluding participants whose voice was not properly recorded due to a program error. Of the 91 patients with dementia, 73 (80.2%) had AD. Of the 18 (19.8%) patients with non-AD dementia, 9, 4, and 5 had VD, DLB/PDD, and FTD respectively. As summarized in Table 2, participants with dementia were older and less educated than those with normal cognition (p<0.001). However, the distribution of sex was comparable between the two groups (p=0.106). Participants with dementia had lower MMSE score and CERAD-TS but the higher GDtS score than those with normal cognition (p<0.001).
The Hellocog showed acceptable internal consistency (Chronbach’s alpha=0.69) and a good test-retest reliability (ICC=0.86, p<0.001). As shown in Table 1, the participants with dementia scored higher on the Hellocog than those with normal cognition (p<0.001), indicating that the Hellocog has good criterion validity. As shown in Table 2, the Hellocog score correlated well with the MMSE score (r=-0.73, p<0.001), the CERAD-TS (r=-0.82, p<0.001), and the GDtS score (r=0.76, p<0.001), indicating that the Hellocog has good concurrent validity.
As summarized in Table 3, the Hellocog showed excellent diagnostic performance for dementia (AUC=0.972). Its diagnostic performance was better than that of the MMSE and the CERAD-TS, but the differences were not statistically significant (p=0.352 for the MMSE; p=0.504 for the CERAD-TS). At the optimal cut-off score, the sensitivity and specificity of the Hellocog for dementia were 0.945 and 0.872, respectively. When the patients with AD were analyzed separately (Table 4), the results did not change. The Hellocog also showed excellent diagnostic performance for AD (AUC=0.971, 95% confidence interval=0.928–0.992). Although its diagnostic performance for AD was better than that of the MMSE and the CERAD-TS, the differences were not statistically significant (p=0.330 for the MMSE; p=0.846 for the CERAD-TS). At the optimal cut-off score, the sensitivity and specificity of the Hellocog for AD were 0.915 and 0.918, respectively.
DISCUSSION
The utilization of mobile technology among older adults presents a promising avenue for facilitating convenient and cost-effective assessments aimed at early detection of dementia. Mobile-based tests offer numerous advantages over traditional pen-and-paper tests, including higher completion rates, lower administration costs, automated scoring, immediate access to results, and effortless tracking of patient data [32-35]. In addition, mobile technology allows monitoring of cognitive function through repeated assessments outside the clinical setting, providing valuable insight into daily cognitive fluctuation and enabling early detection of subtle signs of cognitive decline [36,37].
To improve the accuracy and convenience of mobile cognitive testing, two categories of strategies can be explored: 1) adapting established neuropsychological tests into mobile versions, 2) designing cognitive tests specifically for mobile platforms [38]. The development of the Hellocog aligns with these strategies. First, the Hellocog is built upon our prior research aimed at creating an ideal battery of neuropsychological tests for dementia diagnosis from comprehensive neuropsychological assessments [14]. Second, all test items have been modified for a mobile platform and can be completed using touch screen or voice input.
The results of this study provide substantial proof that the Hellocog stands as a promising screening tool for dementia when compared to popular cognitive tests such as the MMSE and CERAD-TS. In addition, the high concurrent validity of the Hellocog with both the CERAD-TS and MMSE underscores its validity as a reliable screening tool for dementia. Furthermore, the Hellocog has several additional advantages over traditional pen-and-paper cognitive tests such as MMSE and CERAD Neuropsychological Assessment Battery as an early screening tool for dementia. First, its primary advantage is the ability to be self-administered at home without a human examiner, which significantly reduces the need for hospital visits, eliminates teaching bias or evaluation disparities, and minimizes infection risks. Second, not only is the administration of the Hellocog automated, but so is the scoring and reporting. Therefore, it is robust to human error and can provide immediate results. Owing to a larger set of questions, the administration time for Hellocog is around 15 minutes, which is longer compared to other mobile applications used for screening cognitive disorders. Despite this, Hellocog has effectively addressed this issue by designing a highly efficient and user-friendly interface.
Several mobile applications have been developed to assess cognitive function and screen for cognitive disorders (Table 5) [39]. Most of these applications have demonstrated diagnostic performance similar to that of the Hellocog. However, some of them have limitations in terms of the cognitive domains they assess. It is important to emphasize that dementia involves a persistent and progressive decline in multiple cognitive domains, not limited to memory. These domains encompass executive function, complex attention, language, learning, memory, perceptual-motor skills, and social cognition [40,41]. Therefore, a comprehensive evaluation of these cognitive domains is essential when screening for dementia. For example, executive function, which is critical because it involves a wide range of active cognitive processes such as verbal reasoning, problem solving, planning, sustained attention, resistance to distraction, multitasking, cognitive flexibility, and adaptability to novelty, is particularly impaired in the early stages of AD [42]. While instruments such as the Addenbrooke’s Cognitive Examination III provided an evaluation on wide range of cognitive domains, their administration by healthcare professionals may hinder their widespread use in clinical settings [39]. On the other hand, instruments such as the BrainTest lack sufficient evaluation of their psychometric properties and applicability to diverse populations, raising concerns about their reliability and validity for accurate cognitive assessment [43]. As we strive to improve dementia screening and early detection, it is imperative to address these limitations and work towards the development of mobile applications that provide comprehensive cognitive assessments, including executive function, while being easily accessible and validated for use across diverse populations.
This study has several limitations. First, there was a lack of uniformity in age and years of education between the patient and control groups. Although these variables were controlled for when comparing means or variances between groups and when examining correlations between variables, they were not accounted for in the ROC analysis, potentially leading to an overestimation of the diagnostic efficacy of the tests. Secondly, it didn’t take into account the participants’ familiarity or skill in using tablets or smartphones. The control group, being younger and more educated, may have performed better on the Hellocog than the patient group. This potential difference in experience with tablets or smartphones may have influenced the difference in test performance between the groups. Third, the current study did not develop a norm of the Hellocog. However, we agree that the development of population-specific norms will need to be addressed in subsequent studies for wider use of the Hellocog. Fourth, the current study employed a cross-sectional case-control design and is therefore susceptible to selection bias and lack of rater blinding. Although the current study did not restrict the type of dementia at enrollment, the type of dementia ultimately included in the analysis tended to overrepresent patients with AD by about 10% compared with the prevalence of AD reported in epidemiologic studies. Therefore, the validity of the Hellocog may need to be further validated in a follow-up study with non-AD. However, in this study, the Hellocog was self-administered by the participants themselves, so blinding is likely to have had a minimal effect on the results of this study. Fifth, the current study employed a cross-sectional case-control design and is therefore susceptible to selection bias and lack of rater blinding. Although the current study did not restict the type of dementia at enrollment, the type of dementia ultimately included in the analysis tended to overrepresent patients with AD by about 10% compared with the prevalence of AD reported in epidemiologic studies. Therefore, the validity of the Hellocog may need to be further validated in a follow-up study with non-AD. Sixth, one limitation of our study is the relatively small number of participants, approximately 20, involved in the test-retest reliability assessment of the Hellocog. The advice offered by DeVet et al. [44] is to use a sample size of 50 ‘‘as a starting point for negotiations’’. We agree that future research should aim to include a larger proportion of participants in the test-retest process to further validate these findings.
In conclusion, the Hellocog is a valid and reliable tool for screening for dementia and may help improve early detection of dementia.
Notes
Availability of Data and Material
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Conflicts of Interest
Ki Woong Kim, a contributing editor of the Psychiatry Investigation, was not involved in the editorial evaluation or decision to publish this article. All remaining authors have declared no conflicts of interest.
Author Contributions
Conceptualization: Eun Young Kim, Ki Woong Kim. Data curation: Hee Won Yang, Daniel Hahnsam Seok. Formal analysis: Hee Won Yang, Daniel Hahnsam Seok. Funding acquisition: Ji Won Han, Ki Woong Kim. Investigation: Hee Won Yang, Daniel Hahnsam Seok, Ki Woong Kim. Methodology: Hee Won Yang, Eun Young Kim. Project administration: Hee Won Yang, Eun Young Kim. Resources: Eun Young Kim, Seon Hyeok Kim, Jin Hwan Lim. Software: Seon Hyeok Kim. Supervision: Eun Young Kim, Ki Woong Kim. Validation: Ji Won Han, Ki Woong Kim. Visualization: Daniel Hahnsam Seok. Writing—original draft: Daniel Hahnsam Seok, Hee Won Yang, Ki Woong Kim. Writing—review & editing: all authors.
Funding Statement
This work was supported by the ATC (Advanced Technology Center) Program (10076733, Development of the cognitive rehabilitation solution for patients with cognitive impairment using speech recognition and eye tracking technology) funded By the Ministry of Trade, Industry & Energy (MOTIE, Korea).
Acknowledgements
None