Enhancing Electroencephalogram-Based Prediction of Posttraumatic Stress Disorder Treatment Response Using Data Augmentation
Article information
Abstract
Objective
This study aimed to improve the prediction of treatment response in patients with posttraumatic stress disorder (PTSD) by applying a variational autoencoder (VAE)-based data augmentation (DA) approach to electroencephalogram (EEG) data.
Methods
EEG spectrograms were collected from patients diagnosed with PTSD. A VAE model was pretrained on the original spectrograms and used to generate augmented data samples. These augmented spectrograms were then utilized to train a deep neural network (DNN) classifier. The performance of the model was evaluated by comparing the area under the receiver operating characteristic curve (AUC) between models trained with and without DA.
Results
The DNN trained with VAE-augmented EEG data achieved an AUC of 0.85 in predicting treatment response, which was 0.11 higher than the model trained without augmentation. This reflects a significant improvement in classification performance and model generalization.
Conclusion
VAE-based DA effectively addresses the challenge of limited EEG data in clinical settings and enhances the performance of DNN models for treatment response prediction in PTSD. This approach presents a promising direction for future EEG-based neuropsychiatric research involving small datasets.
INTRODUCTION
An electroencephalogram (EEG) is a non-invasive electrophysiological method that measures brain activity via electrodes attached to the scalp. It is widely used to detect abnormal electrical signals for diagnosing neurological disorders such as epilepsy, sleep disorders, and brain death [1-4]. In neuropsychiatry, EEG also serves as a valuable biomarker to detect and classify brain dysfunctions in conditions such as posttraumatic stress disorder (PTSD) [5], schizophrenia [6], major depressive disorder [7], and Alzheimer’s disease [8].
While EEG holds great promise for identifying neuropsychiatric disorders, predicting treatment response remains a significant clinical challenge. Current treatments often yield highly individualized outcomes [9], and patients may undergo prolonged interventions with uncertain effectiveness. If clinicians could predict treatment responsiveness in advance, it would enable the development of personalized therapeutic strategies and reduce unnecessary clinical burden.
Machine learning (ML) techniques have been employed in predictive modeling for treatment outcomes in psychiatric disorders [10]. With recent advances in artificial intelligence, deep neural networks (DNNs) have emerged as powerful tools for EEG analysis [11]. However, DNNs require large, well-distributed datasets for effective training, which is difficult to achieve in clinical settings due to patient recruitment challenges and cost-intensive data collection processes. As a result, models trained on scarce clinical EEG data are prone to overfitting and weak generalization performance.
To overcome these limitations, data augmentation (DA) techniques have been introduced in EEG-based deep learning studies to increase data diversity and model robustness [12]. Conventional DA methods, including geometric and photometric transformations, have been commonly used. More recently, generative approaches using models such as generative adversarial networks (GANs) and autoencoders have demonstrated improved generalization in EEG classification tasks [13-17]. CNN-based models such as EEGNet [18], Deep ConvNet, and hybrid CNN-LSTM architectures [19-23] have also shown high classification accuracy for tasks like epilepsy detection and user authentication. However, despite these advances, variational autoencoder (VAE)-based augmentation has been underutilized in EEG classification, particularly for spectrogram representations and treatment response prediction.
In this study, we propose a novel EEG DA approach using a VAE24 to address the issue of limited data in predicting treatment response for PTSD. Our method involves generating synthetic EEG spectrograms via a pretrained VAE, which are then used to train a DNN classifier. By improving generalization performance and reducing overfitting, this approach aims to enhance the accuracy and clinical utility of EEG-based prediction models. We also evaluate the robustness of our framework through subject-wise cross-validation to demonstrate its applicability in real-world neuropsychiatric settings.
METHODS
EEG data acquisition and preprocessing
Resting-state EEG recordings were obtained from 48 patients diagnosed with PTSD, both before and after transcranial direct current stimulation (tDCS) treatment. The recordings were collected using a 62-channel EEG system. Electrooculogram (HEO/VEO) and electrocardiogram (EKG) channels were excluded from the analysis. Preprocessing was conducted using EEGLAB [25] and MATLAB R2020a (MathWorks). The raw EEG signals were referenced to the average reference and bandpass filtered from 1 to 50 Hz using a Butterworth filter. Independent component analysis was performed to identify and remove artifacts. Subsequently, noisy segments were visually inspected and rejected. For each subject, a clean 150-second EEG segment was retained for further analysis.
Definition of treatment response
Subjects were classified into responders and non-responders based on symptom changes measured by the Clinician-Administered PTSD Scale for DSM-5 (CAPS-5) [17]. Classification was based on two criteria: the total symptom severity score and the total number of PTSD-related symptoms. Patients demonstrating a 50% or greater reduction in both metrics post-treatment were defined as responders. This threshold is commonly used in clinical PTSD research to reflect clinically meaningful improvement [26]. As a result, 17 patients were classified as responders and 31 as non-responders.
EEG spectrogram construction
EEG signals were segmented into 1-second epochs using a 50% overlap. Each segment was windowed with a Hamming window, and power spectra were computed using fast Fourier transform. The resulting spectrograms were stored as 3D arrays with dimensions [subject (48)×frequency (30)×time (299)].
The structure of the extracted spectrogram dataset is illustrated in Figure 1.
Spectrogram dataset represented as a three-dimensional array extracted from pre-treatment electroencephalogram data.
Based on prior findings [27], only the CZ and O1 channels were used in this study due to their superior discriminative power for treatment response prediction. CZ, located at the vertex, reflects global brain activity, while O1 over the occipital lobe is sensitive to visual and sensory processing.
Spectrograms derived from pre-treatment EEG were used to predict treatment outcomes. The final model input consisted of concatenated spectrograms from CZ and O1, providing both cognitive and affective signal features relevant to PTSD.
DA using VAE
Given the limited sample size, a VAE-based DA strategy was applied to enhance generalizability and mitigate overfitting. VAEs are generative models capable of learning latent feature representations and generating new samples from the same distribution as the original data [24].
In this study, the pretrained VAE was used to synthesize additional EEG spectrograms for each subject. The VAE encoder maps the spectrogram input to a latent vector z sampled from a Gaussian distribution, which is then reconstructed through the decoder. The augmented samples were used to expand the training set, effectively doubling the original data size.
The overall structure of the dataset and augmentation process is summarized in Table 1.
Classification framework and evaluation
The augmented spectrogram data, along with the original samples, were used to train a DNN based on EEGNet. Hyperparameters such as learning rate, batch size, and number of training epochs were optimized through grid search using a validation split from the training set in each fold. This tuning aimed to balance model complexity and generalization performance. To ensure robust evaluation, subject-wise cross-validation was used. In each fold, all data from a single subject were withheld for testing, preventing data leakage and enhancing generalizability to unseen individuals. This approach mimics real-world clinical scenarios where models are applied to new patients and ensures independence between training and testing sets.
The overall classification framework is illustrated in Figure 2.
Ethical approval
All participants provided written informed consent. The study protocol was approved by the Institutional Review Board of Inje University, Ilsan Paik Hospital (IRB No. 2015-07-025), and was conducted in accordance with the principles of the Declaration of Helsinki.
RESULTS
Performance comparison with other DA methods
To evaluate the effect of DA on EEG spectrogram classification, we applied a VAE-based DA method and compared it with other commonly used approaches, including noise injection (NI) and time-segmentation. DA was applied only to the training dataset, while the raw, unaugmented data were used for testing. Following augmentation, the size of the training set was doubled, resulting in a total of 78 samples.
The VAE model was configured with a 2D CNN-based encoder and decoder. Training was performed for up to 500 epochs with a batch size of 32, kernel size of 5, filter size of 16, two latent nodes, and a learning rate of 0.001. Hyperparameters were selected based on prior studies [18,24] and empirical tuning, and optimization was conducted using the Adam optimizer.
The classification model trained with the VAE-augmented dataset achieved an area under the receiver operating characteristic curve (AUC) of 0.81±0.16, with sensitivity of 0.48±0.23, specificity of 0.88±0.15, and balanced accuracy of 68.2%±15.1%. This performance was compared against three other augmentation settings, as summarized in Table 2.
Training performance and validation trends for each DA method are illustrated in Figure 3.
EEGNet model performance under different data augmentation (DA) conditions. A: Training accuracy comparison among baseline, time-segmentation, noise injection (NI), and variational autoencoder (VAE) augmentation. B: Training loss across the same augmentation methods. C: Training and validation accuracy for baseline (none) versus VAE-based augmentation. D: Training and validation accuracy for NI versus VAE-based augmentation.
Although the NI-based model demonstrated slightly higher training accuracy and lower training loss, the VAE-based model achieved better validation performance, indicating improved generalization. In contrast, NI-based models showed limited improvement in validation accuracy after early epochs, suggesting possible overfitting.
Performance with N-fold VAE-based augmentation
To determine the optimal augmentation scale, we empirically evaluated model performance using different VAE-based augmentation factors (N). As shown in Figure 4, performance improved with increased augmentation, reaching the best results at 45-fold augmentation: AUC of 0.85, sensitivity of 0.85±0.08, specificity of 0.81±0.17, and balanced accuracy of 64.9%±9.4%.
Classification performance at different levels of VAEbased augmentation (N-fold), showing peak performance at 45-fold augmentation. AUC, area under the receiver operating characteristic curve; VAE, variational autoencoder; DA, data augmentation.
Beyond 45-fold augmentation, performance declined, indicating a threshold beyond which additional augmentation no longer improves and may degrade performance.
To ensure model generalizability, subject-wise cross-validation was employed, where each test fold included data from previously unseen individuals.
DISCUSSION
Treatment response prediction
In comparison to conventional approaches such as clinical interviews or neuroimaging, our EEG-based classification framework offers several advantages for predicting PTSD treatment response. Clinical assessments like the CAPS-5 are subject to interviewer bias and patient self-report limitations, while neuroimaging methods such as fMRI or PET, although informative, are expensive and logistically demanding.
In contrast, EEG is non-invasive, cost-effective, and offers high temporal resolution, making it well-suited for clinical environments.
Our study demonstrated that combining EEG with deep learning and VAE-based DA enables reliable prediction even with limited sample sizes, especially when using subject-wise cross-validation to ensure generalizability. This approach provides a scalable and objective tool that can complement traditional assessments in identifying likely responders at an early stage.
However, individual variability in EEG patterns remains a limitation. Differences in baseline neural activity may affect model robustness. Future work should incorporate longitudinal EEG data (pre-, mid-, and post-treatment) to capture treatment-induced changes and further reduce inter-individual variability. Testing the framework in more diverse populations will also enhance its clinical applicability.
Limitations
Our results suggest that EEG-based classification, enhanced by VAE augmentation, is a viable approach for predicting PTSD treatment response despite a relatively small dataset. The VAE-augmented model showed improved generalization performance, supported by subject-wise cross-validation.
However, some limitations remain. First, the model exhibited higher specificity than sensitivity, raising concerns about false negatives in clinical use. This imbalance suggests that responders may be misclassified, which is problematic in treatment planning. Future studies should explore decision threshold adjustments or architectural modifications to improve sensitivity. Including metrics such as the F1 score may also provide a more balanced performance evaluation. Second, although hyperparameter tuning was performed, details were not explicitly presented. To avoid data leakage, it is important to ensure that tuning is restricted to the training set. Third, while qualitative comparisons were provided, direct benchmarking against alternative generative models—such as GANs, diffusion models, or transformer-based architectures—was not conducted. Future research should include such comparisons to better contextualize model performance. Fourth, although performance peaked at a 45-fold augmentation, the effect of increasing augmentation scale on overfitting was not systematically analyzed. Further studies should examine whether high augmentation levels risk introducing non-physiological artifacts. Finally, the sample was limited to PTSD patients, restricting generalizability. Expanding to more diverse diagnostic groups and increasing the sample size will be essential for broader validation.
Response threshold selection
A 50% reduction in PTSD symptoms was used to define responders, consistent with clinical standards and prior studies [26,27]. This threshold reflects clinically meaningful improvement and allows comparability with existing literature.
While we did not perform formal sensitivity analysis on the threshold, future work should explore how altering the threshold (e.g., 40% or 60%) impacts classification outcomes and whether subgroup-specific thresholds enhance model personalization.
Clinical validity of augmented data
Validating the clinical plausibility of the VAE-generated data is an important next step. We propose two complementary approaches: 1) qualitative expert review by neurologists to assess the physiological realism of generated EEG spectrograms and 2) the use of discriminator-based validation techniques, similar to those used in GAN frameworks [15,16], to detect non-physiological artifacts.
Although our results suggest that VAE-augmented data contributed meaningfully to model performance, no explicit validation was performed to confirm the integrity of the generated signals. Future studies should incorporate expert evaluations and statistical similarity assessments, as well as visual comparisons between real and synthetic spectrograms, to verify the fidelity of the augmentation process.
In conclusion, deep learning applications in clinical neuropsychiatry are often constrained by limited sample sizes and class imbalances, which can hinder model generalization. To address these challenges, this study proposed a VAE-based DA strategy for EEG spectrogram classification to predict PTSD treatment response. By generating synthetic data that preserved the distribution of the original EEG signals, the proposed method improved the model’s robustness against noise and variability. When trained on the VAE-augmented dataset, the classification model achieved an AUC of 0.85 in the delta band, representing an improvement of approximately 0.11 compared to models trained without augmentation. These findings highlight VAE-based augmentation as an effective strategy to enhance deep learning performance in clinical EEG analysis, especially when working with small and imbalanced datasets. This approach may serve as a practical tool for improving early prediction of treatment outcomes in PTSD and potentially other psychiatric disorders.
Notes
Availability of Data and Material
The datasets generated or analyzed during the study are not publicly available due to patient privacy concerns and institutional data protection policies, but are available from the corresponding author on reasonable request.
Conflicts of Interest
The authors have no potential conflicts of interest to disclose.
Author Contributions
Conceptualization: Seung-Hwan Lee. Data curation: Chaeyeon Yang. Methodology: Suh-Yeon Dong. Supervision: Seung-Hwan Lee, Suh-Yeon Dong. Writing—original draft: Sangha Kim, Suh-Yeon Dong. Writing—review & editing: all authors.
Funding Statement
This work was supported by the National Research Foundation of Korea’s Brain Korea 21 FOUR Program at the Sookmyung Women’s University and by the MSIT (Ministry of Science and ICT), Korea, under the ICAN (ICT Challenge and Advanced Network of HRD) program (IITP-2025-RS-2022-00156299) supervised by the IITP (Institute of Information & Communications Technology Planning & Evaluation).
Acknowledgments
None
