Exploring the Relationships Between Antipsychotic Dosage and Voice Characteristics in Relation to Extrapyramidal Symptoms
Article information
Abstract
Objective
Extrapyramidal symptoms (EPS) are common side effects of antipsychotic drugs. Despite the growing interest in exploring objective biomarkers for EPS prevention and the potential use of voice in detecting clinical disorders, no studies have demonstrated the relationships between vocal changes and EPS. Therefore, we aimed to determine the associations between voice changes and antipsychotic dosage, and further investigated whether speech characteristics could be used as predictors of EPS.
Methods
Forty-two patients receiving or expected to receive antipsychotic drugs were recruited. Drug-induced parkinsonism of EPS was evaluated using the Simpson-Angus Scale (SAS). Participants’ voice data consisted of 16 neutral sentences and 2 second-long /Ah/utterances. Thirteen voice features were extracted from the obtained voice data. Each voice feature was compared between groups categorized based on SAS total score of below and above “0.6.” The associations between antipsychotic dosage and voice characteristics were examined, and vocal trait variations according to the presence of EPS were explored.
Results
Significant associations were observed between specific vocal characteristics and antipsychotic dosage across both datasets of 1–16 sentences and /Ah/utterances. Notably, Mel-Frequency Cepstral Coefficients (MFCC) exhibited noteworthy variations in response to the presence of EPS. Specifically, among the 13 MFCC coefficients, MFCC1 (t=-4.47, p<0.001), MFCC8 (t=-4.49, p<0.001), and MFCC12 (t=-2.21, p=0.029) showed significant group differences in the overall statistical values.
Conclusion
Our results suggest that MFCC may serve as a predictor of detecting drug-induced parkinsonism of EPS. Further research should address potential confounding factors impacting the relationship between MFCC and antipsychotic dosage, possibly improving EPS detection and reducing antipsychotic medication side effects.
INTRODUCTION
Extrapyramidal symptom (EPS) results from the excessive blockade of D2 dopamine receptors caused by antipsychotic medication within the nigrostriatal dopamine pathway, which plays a crucial role in motor control [1,2]. The main symptoms of EPS are parkinsonism, akathisia, dystonia, and tardive dyskinesia, which vary from minimal discomfort to permanent involuntary muscle movement [3,4]. Out of the four main symptoms of EPS, drug-induced parkinsonism appears to be between 20%–40% of people taking antipsychotics. The main features of drug-induced Parkinsonism include rigidity of the limbs and resistance to passive movement [5].
Symptoms usually appear at the early stage of treatment or when the antipsychotic dosage is increased, often within hours after the first administration of drugs, but may also appear within a few weeks [6,7]. However, the prediction of EPS has posed challenges due to variations in symptom severity and medication frequency observed among patients and differences in drug types [4]. A previous study found the cumulative frequency of EPS ranged from 7.7% (olanzapine) to 32.8% (long-acting typical antipsychotic drugs), and patients treated with second-generation antipsychotic medications had a lower risk of EPS than those treated with haloperidol [8].
Currently, standardized clinical scales, such as the Simpson-Angus Scale (SAS) and the Extrapyramidal Symptoms Rating Scale (ESRS) are widely used in the assessment of drug-induced parkinsonism [9,10]. Despite their common use in clinical practice, these clinical scales have limitations in predicting EPS due to their reliance on the patient’s subjective report, physical examination [6,11], and the possibility of overlooking mild symptoms. Therefore, there has been a growing interest in exploring objective biomarkers for use in the early detection and prevention of EPS.
Recently, potential biomarkers for detecting EPS have been proposed. Previous studies have highlighted the potential of PET scans in predicting and monitoring EPS, as they provide quantitative measurements of dopamine receptor occupancy directly from the brain [12,13]. Notably, a specific level of dopamine D2 receptor occupancy rates has been linked to the occurrence of EPS. However, the routine application of PET measurements faces challenges due to factors such as high cost, long examination time, and the associated risk of radiation exposure, all of which could result in low efficiency [14,15].
Meanwhile, prior research has explored the connections between Parkinson’s disease (PD) and voice characteristics due to their non-invasiveness, convenient monitoring, and easy quantification. Subtle voice changes have been observed both preceding and during disease onset [16-19], contributing to the ongoing development of detection systems through voice feature analysis [20,21]. Additionally, animal studies using the 6-hydroxydopamine rat model, which is widely used to investigate motor dysfunctions of PD, have revealed alterations in voice frequency range and loudness [22-24]. Considering the overlapping clinical characteristics observed in both PD and drug-induced parkinsonism of EPS [25-27], individuals with drug-induced parkinsonism may also exhibit discernible alterations in vocal characteristics.
Therefore, the present study aims to examine the possibility of early detection of drug-induced Parkinsonism by analyzing voice characteristics caused by the administration of antipsychotic drugs. We will focus on the dosage of antipsychotic drugs, which is known to cause EPS, and further determine the change aspects in voice characteristics according to the presence of EPS. Consequently, this study aims to investigate whether these vocal traits may serve as a reliable predictor for early EPS detection, eventually reducing the adverse effects of antipsychotic medications.
METHOD
This study was approved from the Institutional Review Board of Seoul National University Bundang Hospital (IRB no. B-1802-451-305). Patients received a full explanation about the study and were provided with informed consent for participation.
Participants
Subjects were recruited from the outpatient clinic at Seoul National University, Bundang Hospital between March 2018 to March 2019. Patients who met the following inclusion criteria were included in the study: 1) patients aged 19–65 years, 2) patients who met the criteria of International Classification of Diseases 10th revision, and 3) patients who were either currently receiving or expected to receive antipsychotic drugs. To exclude subjects whose speech features could be affected regardless of antipsychotic drug use, patients who were unable to read Korean, who had severe cognitive impairment and personality disorders, disease, or damage that may affect language expression were excluded. Further exclusion criteria included patients with uncontrolled physical diseases and a history of alcohol or substance use disorder. Drug-induced parkinsonism was evaluated using the SAS [28], and the total antipsychotic drug dose was converted to olanzapine equivalent dose using the method described by Inada and Inagaki [29] and Leucht et al. [30].
Study design
Voice recording process
An EPS speech corpus was constructed, consisting of 16 sentences (Supplementary Table 1) and 2 second-long /Ah/utterances. All 16 sentences were induced to be pronounced in the most natural state, while /Ah/utterances were consistently pronounced for 2 seconds. Sentences 1 to 16 were composed of emotion-neutral Korean vocabulary items derived from the Positive Affect and Negative Affect Schedule, which quantifies the emotional effects of Korean words based on the Multiple Affect Adjective Checklist-Revised [31]. A total of 1,887 voice segments were obtained through 111 recording sessions. The recordings were conducted in an enclosed treatment room using designated recording equipment. Voice data were collected repeatedly whenever there were changes in patients’ antipsychotic medication dose or when patients were clinically determined as EPS.
Voice feature extraction
A Python-based audio feature extraction package “Surfboard,” [32] which has demonstrated significance in the PD classification models [32], was used for the analysis. A total of 13 voice features were extracted: 3 pitch-related features (F0 contour, F0 statistics, Pitch Period Entropy [PPE]), 4 energy-related features (log energy, log energy slidingwindow, loudness, Root Mean Squares [RMS]), 4 frequency-related features (formants, Mel-Frequency Cepstral Coefficients [MFCC], Jitters, Shimmers), Harmonics to Noise Ratio (HNR), and Detrended Fluctuation Analysis (DFA). The fundamental frequency contour (F0 contour) represents changes in the approximate frequency (F0) of quasi-periodic structures in voice speech signals over time, with F0 statistics representing statistical measures derived from F0 [33]. PPE quantifies the uncertainty associated with the pitch period [34], and formant frequencies refer to the prominent peaks found in the power spectral envelope of a sound signal [35]. Jitters measure the relative period-to-period variability of the pitch period, whereas Shimmers are the relative period-to-period variability of the peak-to-peak amplitude. HNR, influenced by Jitters and Shimmers, indicates the mean ratio of harmonics to non-harmonics [33]. DFA is a signal processing method analyzing long-range fluctuations in a time series [36].
Regarding MFCC, a total of 13 coefficients were extracted. MFCC is a widely used feature in audio, speech signal processing, and recognition. In speech processing, the first 13 MFCC coefficients are commonly used as they capture the overall shape of the sound and play a crucial role in automatic speech recognition. MFCC1 is a constant feature that provides overall energy in the speech signal, and MFCC2 is understood as a comparison between low and high-energy regions in the frequency spectrum. Higher-order MFCC coefficients capture faster variations across the frequency spectrum [37]. Sub-parameters for each 13 voice features were selected based on methods described by Lenain et al. [32]. Feature vectors for each frame were extracted with a specific window and hop size for all voice features, and 405-dimensional feature vectors were extracted in total.
Statistical analysis
The overall study design is illustrated in Figure 1. In the present study, patient record data were categorized into two groups using the SAS score calculation method described by Hawley et al. [28]. The groups were defined based on the SAS total scores below 0.6 and above 0.6. Notably, a score of 0.6 is indicative of a borderline state that requires consideration for potential treatment change. Each of the 13 voice features was compared between these groups using an independent t-test, and voice features that showed statistically significant differences were selected as “Target Voice Features.” A linear mixed effects model was employed for repeated measures to test the association between total olanzapine equivalent dose and “Target Voice Features.” For “Target Voice Features” that demonstrated statistical significance with the total olanzapine equivalent dose, patient record data were divided into two groups according to clinically determined EPS status, and a paired t-test was conducted to examine the differences based on the presence and absence of EPS. The Shapiro–Wilk and Levene tests were employed to assess the normality distribution of the data and homogeneity of variance. In this study, outliers were identified and removed from the dataset before conducting statistical analysis to enhance reliability of the findings and address voice fluctuation during sentence pronunciations. Outliers were defined as data points that fell more than 1.5 times the interquartile range below the first or above the third quartile. Multiple comparisons were addressed with varying sample sizes, encompassing a larger set of 1,776 (111×16 sentences) and a smaller set of 111 /Ah/utterances. While applying the Benjamini-Hochberg method for the larger sample ensured robust control over type I errors, explicit corrections for the smaller samples were not conducted, considering the impact of statistical power. Statistical analysis was performed based on the 111 session data points. All analysis were conducted using MATLAB ver.9.12.0 (https://kr.mathworks. com/products/matlab.html) and R ver.4.2.2 (https://www.r-project.org).
RESULTS
Demographic and clinical features of participants
A total of 42 patients were recruited, comprising 25 females and 17 males. The participants were diagnosed with psychotic disorder (n=15), schizophrenia (n=21), schizophreniform disorder (n=1), schizoaffective disorder (n=1), bipolar disorder (n=3), and depressive disorder (n=1). The mean age (±standard deviation) of the patients was 32.43 (±11.38) years. The mean duration of illness was 5.93 (±7.85) years, and the mean SAS total score was 0.69 (±1.16). The mean total olanzapine equivalent dose was 11.10 (±7.29) mg. Demographic characteristics of the participants are provided in Table 1 [29,30,38].
Target Voice Features showing a significant association with the total olanzapine equivalent dose
Voice parameters that exhibited significant differences between groups based on SAS total scores of below “0.6” and above “0.6” were classified as “Target Voice Features,” with different outcomes observed in sentences 1 to 16 and /Ah/utterances (Table 2, Supplementary Table 2 and 3). Among the 13 voice parameters, a total of 12 parameters—F0 contour, F0 statistics, log energy, log energy sliding-window, formants, loudness, RMS, MFCCs, Jitters, Shimmers, HNR, and DFA—showed significant differences between groups in sentences 1 to 16. In /Ah/utterances, three features—F0 contours, MFCC, and Shimmers—were found to be statistically different.
Among the voice parameters classified as “Target Voice Features,” features that revealed significant associations with the total olanzapine equivalent dose exhibited differences between the sentences and the /Ah/utterances (Table 2, Supplementary Tables 4 and 5). In sentences of 1 to 16, a total of 8 voice features—F0 contour, F0 statistics, log energy, log energy slidingwindow, loudness, RMS, MFCCs, and Jitters—showed statistical significance, while in /Ah/utterance, only 1 voice feature—MFCC—demonstrated such significance. As a result, MFCC was the only voice feature showing a statistically significant association with the total olanzapine equivalent doses in both datasets. Given that both words and interjections (e.g., “ah”) are integral phonetic components of human speech, the subsequent analyses were conducted using MFCC, which turned out to be the only shared vocal trait present in both datasets.
Change aspects in MFCC coefficients between patients according to the presence and absence of EPS
Changes in mean values of MFCC coefficients between groups
Differences in mean values of MFCC coefficients were observed according to the absence and presence of EPS in sentences 1 to 16. However, no differences were observed in the mean values of MFCC coefficients’ of /Ah/utterances. Among the 13 MFCC coefficients from the 16 sentences that exhibited significant associations with the total olanzapine equivalent dose, MFCC1, MFCC8, and MFCC12 showed statistical significance between the two groups (Figure 2). Specifically, MFCC1 (t=-4.47, p<0.001) and MFCC8 (t=-4.49, p<0.001) were higher in patients with EPS compared to those without EPS, while MFCC12 (t=-2.21, p=0.029) was lower. MFCC2 (t=0.48, p=0.63), MFCC3 (t=0.03, p=0.98), MFCC6 (t=-0.11 p=0.91), MFCC7 (t=0.94, p=0.35) did not show statistically significant differences. All the other features, including MFCC4 (t=-3.19, p=0.004), MFCC5 (t=2.57, p=0.01), MFCC9 (t=-3.20, p=0.004), MFCC10 (t=-4.69, p<0.001), MFCC11 (t=-3.30, p=0.004), demonstrated significant differences between patients with EPS and those without EPS. However, they did not show a notable association with total olanzapine equivalent dose.
Changes in statistical features other than the mean values of MFCC coefficients between groups
MFCC1, MFCC3, MFCC5, MFCC7, MFCC8, MFCC9, MFCC11, and MFCC12 showed significant statistical difference based on the presence of EPS in sentences 1 to 16, but these differences were not observed in /Ah/utterances. Regarding the remaining statistical parameters, aside from the mean values of each coefficient, which were previously presented, the variables that exhibited significant results varied depending on the coefficient type (Figure 3 and Supplementary Table 6). All statistical features of MFCC1, MFCC3, MFCC7, MFCC8, MFCC9, MFCC11, MFCC12 were higher in patients with EPS compared to those without EPS. In contrast, for statistical features of MFCC5, patients with EPS showed a lower tendency.
DISCUSSION
To the best of our knowledge, this is the first study to elucidate the relationships among antipsychotic drug dosage, voice characteristics, and EPS, which are well-known side effects of antipsychotic drug use. Our study has revealed significant associations between several voice features and antipsychotic drug doses. Also, among the 13 voice features, MFCC merged as the most distinctive feature which showed significant group differences depending on the presence and absence of EPS. While Sinha et al. [16] explored the relationship between voice factors and the effects of antipsychotic medications on speech production, the study primarily focused on comparing voice performance among groups of patients taking specific antipsychotics, such as risperidone. In contrast, our study takes a broader approach, examining the effects of overall antipsychotic medication on voice changes in relation to EPS by applying the equivalent dose of olanzapine. This approach provides a comprehensive understanding of the impact of antipsychotic dosage on vocal characteristics associated with EPS. Taken together, our results support that certain voice traits, especially MFCC, may serve as a useful biomarker in detecting drug-induced parkinsonism of EPS.
Significant group differences were observed in several voice parameters when patients were divided into groups based on their total SAS scores. Notably, among the 13 voice features, F0 contour, MFCC, and Shimmer exhibited significant distinctions between the two groups across both datasets of 16 sentences and /Ah/utterances. Prior investigations have consistently reported significant differences in fundamental frequency (F0) between individuals with PD and the control group [39,40]. In our study, F0 contour, which reflects the temporal pattern of F0 changes and provides prosodic information in speech, displayed significant group differences in line with these previous findings. Furthermore, our results align with existing research demonstrating significant variations in MFCC feature values associated with the presence of PD. Studies exploring the detection of PD using voice analysis reported differences in Shimmer values during sustained phonation of vowels in PD patients [41,42]. which was also consistent with our findings.
The observed similarities in vocal feature changes between drug-induced parkinsonism categorized by SAS score and PD support the understanding that drug-induced parkinsonism shares clinical characteristics with PD. PD is characterized by the pathological hallmark of dopamine-producing cell loss in the substantia nigra, leading to dysfunction within the basal ganglia-thalamocortical circuitry. This disruption may interfere with the precise regulation of motor movements, including vocalization [43,44]. As a result, irregular muscle activation patterns during speech production may occur. Our study results suggest that the mechanism underlying vocal changes observed in patients with drug-induced Parkinsonism may be similar to that observed in PD.
Our study also revealed that higher doses of antipsychotic drugs were associated with noticeable alterations in specific voice features. This finding is consistent with the study conducted by Skodda et al. [45], which reported speech rate abnormalities in patients with Huntington’s disease (HD) receiving antidopaminergic medication. Patients exhibited more severe motor deficits and speech impairments compared to the control group. Similarly, Rusz et al. [34,46] identified noteworthy correlations between the dose of antipsychotic medication, articulation, and the Unified Huntington’s Disease Rating Scale score, which assesses the severity of HD, including motor function. Higher doses of antipsychotic medication are often prescribed to HD patients with more severe chorea, which can exacerbate bradykinesia and rigidity, ultimately leading to increased motor disability. The observed correlation between antipsychotic medication dose and vocal feature changes suggests that the blockade of dopamine receptors in the striatum by antipsychotics may lead to reduced voluntary movements [5,47]. These findings highlight the potential impact of antipsychotic drugs on patients’ voices, underscoring the importance of addressing potential adverse effects on vocal function during treatment.
Distinct statistical differences in MFCC values were observed based on the presence and absence of EPS, particularly in sentences 1 to 16, suggesting its possibility as a valuable indicator for detecting drug-induced parkinsonism. Due to its intrinsic ability to reflect vocal tract changes [48,49], MFCC has been recognized as a potential biomarker in numerous mental and neurological disorders characterized by vocal pattern alterations [50,51]. Prior research by Naranjo et al. [52] highlighted the relevance of MFCC3 in reflecting pathological speech patterns associated with PD. Similarly, a study conducted by Taguchi et al. [51] revealed higher values of MFCC2 in patients with depression compared to the control group. Considering that motor function abnormalities are commonly observed in PD, and factors such as psychomotor reduction, fatigue, and reduced energy level in depression may impact motor functions, alterations in MFCC values in these patients may serve as biomarkers reflecting disease characteristics.
Speech difficulties can result from muscle contractions leading to disruptions in voice [53]. Although lower dimensions of MFCC remain unaffected by vocal cord activity, they are known to capture alterations in the vocal tract. The vocal tract, which includes the pharynx, larynx, nasal cavity, and oral cavity from vocal cords to lips, plays a crucial role in determining voice quality [54]. Factors such as teeth, tongue movement, and muscle tone collectively impact vocal tone [55]. Thus, the changes observed in MFCC values may imply motor alterations in the vocal tract as a consequence of increased antipsychotic medication.
While interpreting the findings of the present study, several limitations should be considered. The sample size of this study was relatively small, consisting of 42 patients. To address this constraint, we aggregated data from 16 individually analyzed sentences into a single dataset during the analysis process. Nevertheless, further research with a larger sample size is needed to achieve a more comprehensive understanding of the linguistic characteristics associated with vocal change. Furthermore, the two analyzed datasets exhibited unbalanced sample sizes, with dataset 1 comprising 1,776 samples (111 data points with 16 sentences each) and dataset 2 consisting of 111 data points (/Ah/utterances). Although we applied the multiple comparisons correction in the dataset with a larger sample size, it is important to note that the inherent limitations of the unbalanced sample sizes across both datasets might impact the robustness of the methods and results.
Also, the inclusion of heterogeneity in antipsychotic medications among patients may have influenced the accuracy of our findings. Patients in our study were administered various types of antipsychotic medications. Although we used the olanzapine equivalent dose to mitigate variability, complete control over the potential impact of different drug types was not achieved. Moreover, patients were also prescribed benzodiazepine medications such as clonazepam and lorazepam, commonly used for managing motor-related symptoms. This complicates the attribution of effects solely to antipsychotic medications, as benzodiazepines were administered concurrently to alleviate EPS. Consequently, there is a possibility that any observed voice abnormalities in patients could have been mitigated by benzodiazepines intended to address antipsychotic side effects. However, it is worth noting that most patients were prescribed atypical antipsychotic medications. This suggests that the development of EPS may be primarily influenced by medication dosage rather than specific drug types (e.g., typical antipsychotics, atypical antipsychotics), with the overall impact of medication variability expected to be relatively small.
Despite limitations such as small sample size and medication heterogeneity, our primary aim of examining the early detection of drug-induced Parkinsonism of EPS through voice analysis following antipsychotic drug administration remains crucial. Our study not only focused on the associations between vocal features and antipsychotic medication, but also demonstrated that specific voice features revealing significant relationships with dosage exhibit notable differences based on the presence of EPS. This comprehensive approach has provided significant insights exploring the connections between antipsychotic dosage, vocal features, and EPS. These results suggest that voice features may offer a reliable indicator for detecting drug-induced parkinsonism of EPS. The convenience and efficiency of requiring simple voice measurements during the treatment process present an opportunity for better detection, potentially addressing the risk of overlooking mild symptoms associated with assessments relying on subjective observations. Further research in larger, homogeneous groups should validate vocal traits as reliable biomarkers for EPS, contributing to the mitigation of adverse effects associated with antipsychotic medications.
In conclusion, the present study revealed several vocal parameters demonstrating significant associations with antipsychotic medication dosage. Among these vocal traits, MFCC displayed notable differences depending on the presence or absence of EPS, suggesting that MFCC may serve as a potential biomarker for early detection and prevention of drug-induced parkinsonism in EPS.
Supplementary Materials
The Supplement is available with this article at https://doi.org/10.30773/pi.2023.0417.
Notes
Availability of Data and Material
The datasets generated or analyzed during the study are available from the corresponding author on reasonable request.
Conflicts of Interest
Euitae Kim, a contributing editor of the Psychiatry Investigation, was not involved in the editorial evaluation or decision to publish this article. All remaining authors have declared no conflicts of interest.
Author Contributions
Conceptualization: Seoyoung Kim, Kyogu Lee, Euitae Kim. Data curation: Hyeyoon Kim, Kyogu Lee. Formal analysis: Hyeyoon Kim. Funding acquisition: Euitae Kim, Kyogu Lee. Investigation: all authors. Methodology: Hyeyoon Kim. Project administration: Euitae Kim, Kyogu Lee. Resources: Seoyoung Kim, Euitae Kim. Software: Hyeyoon Kim. Supervision: Euitae Kim. Validation: Hyeyoon Kim, Subin Lee. Visualization: Hyeyoon Kim. Writing—original draft: Hyeyoon Kim. Writing—review & editing: Hyeyoon Kim, Seoyoung Kim, Subin Lee, Euitae Kim.
Funding Statement
This study was supported by the National Research Foundation of Korea (NRF) grants funded by the Korea government (MSIT) (No. NRF-2019M3C7A1032472, NRF-2022R1A2B5B02002400).
Acknowledgements
None