### INTRODUCTION

^{1,}

^{2,}

^{3}signal processing,

^{4,}

^{5,}

^{6,}

^{7,}

^{8}image processing,

^{9,}

^{10,}

^{11}text categorization,

^{12}data mining,

^{13}pattern recognition

^{14,}

^{15,}

^{16,}

^{17,}

^{18}and medical diagnosis.

^{19,}

^{20}The aim of feature selection is to choose a subset of available features by eliminating less important or unnecessary features. To extract as much information as possible from a given set while using a smaller number of features, the features with little or no predictive information is to be eliminated, and strongly correlated redundant features are to be ignored.

^{21}Thus, a large amount of computation time can be saved with a valuable subset. The selected subset of features used to represent such classification function influences several aspects of classification, including the time required to learn a classification function, the accuracy of the learned classification algorithm, the time-space cost associated with the features, and the number of samples required for training. MDD is considered to be a chronic, relapsing and remitting illness and early medical diagnosis is important for the consequent treatment process. Many of the patients (30-50%) fail to respond to initial antidepressant treatment process.

^{22}So there is a clear need for methods that select the right treatment for the right patient.

^{23}Repetitive transcranial magnetic stimulation (rTMS) has been proposed as an alternative

^{24}with its less invasive and less painful treatment process compared to electrical brain stimulation application.

^{25}In the light of "Personalized Medicine" perspective to depression, recently both ACO and neuroimaging biomarkers have been studied and point promising results in aiding treatment prediction using pre-treatment measures.

^{26,}

^{27,}

^{28,}

^{29}

^{30,}

^{31}and functional neuroimaging biomarkers

^{32,}

^{33}which focused on the predictive effect of change of frontal quantitative EEG (QEEG) cordance in theta and delta frequency bands. In,

^{34}EEG data is analyzed to compare normal subjects versus subjects suffering from various mental disorders. It was found that a change in delta or theta band EEG power can be evaluated as a specific marker of brain dysfunction.

^{35}Considerable number of applications underline that the AD medication effects are physiologically detectable in the EEG and QEEG cordance is one of the auspicious biomarkers used to predict the treatment response which has generated research interest. In addition to its valuable contribution as biomarker, EEG patterns with optimized subset using ACO to minimize the number of features while maximizing classification performance.

^{36}ACO was used as a feature selection method to classify hand motion surface electromyography signals in another study.

^{27}Another feature selection application using ACO was used for images from the mammography image analysis society database.

^{28}ACO method was also tested on one of the most important biosignal driven applications, which is the Brain Computer Interface (BCI) problem with 56 EEG channels.

^{29}In a pilot study the algorithm was introduced to select genes relevant to cancers first, then the multilayer perceptron neural network and support vector machine classifiers were used for cancer classification.

^{37}The main goal for the clinical research in the MDD is predicting the response of MDD subjects to rTMS therapy using their pre-treatment QEEG cordance and enhancement of the diagnostic accuracy. These are crucially important for the proper medical treatment and slowing down of the progress of the illness. An ANN based model combined with an optimization algorithm was designed as a tool in order to reduce the number of features while increasing the prediction accuracy.

### METHODS

### Participants

### EEG recordings

^{38}A subsequent study corroborated the method comparing EEG cordance with simultaneously recorded PET scans reflecting perfusion.

^{39}

### rTMS session procedures and ratings

^{40}Table 1 gives the HAMD scores for each group before and after rTMS treatment.

### BP neural network

^{41}There are various types and architectures of neural networks varying fundamentally in the way they learn; the details of which are well documented in literature.

^{42,}

^{43}

^{44}Because of these advantages, BP neural network is more appropriate for processing EEG data which is possible noisy, unstable and nonlinear. In this study, for modeling process, feed-forward neural network trained by a backpropagation algorithm is used. The network is based on the supervised procedure, i.e. the network constructs a model based on examples of data with known outputs. The architecture of the network is a layered feed-forward neural network, in which the non-linear elements (neurons) are arranged in successive layers, and the information flows from input layer to output layer, through the hidden layer(s).

^{45}Input data is received from 6 electrodes as QEEG cordance, 10 neurons were used in hidden layer and sigmoid transfer function used in each neuron because of its nonlinear behavior. In order to minimize the error between the model output and a reference value MSE (mean square error) is used as the cost function, given in equation 1. The cost function is minimized by ACO.

_{k}is the output of the model and z

_{k}is the reference output.

### Feature selection with ACO algorithm

^{46}

^{47}In such a problem, a set of cities (nodes) is given and the distance between each is known. The aim is to find the shortest path that allows each city to be visited just once. Alternative paths are generated on the basis of a probabilistic model and in the ACO metaphor, these paths are said to be constructed by artificial ants walking on the graph that encodes the problem in which each vertex represents a city and each edge represents a connection between two cities. Initial attempts for building an ACO algorithm were not satisfying until the algorithm was coupled with a local optimizer.

^{48}One problem is early convergence to a less than optimal solution because too much virtual pheromone was laid quickly. To avoid this problem, pheromone evaporation is implemented. In other words, the pheromone associated with a solution disappears after a period of time. In the construction of a solution, ants select the following city to be visited through a stochastic mechanism. When ant

*k*is in city i and has so far constructed the partial solution

*s*, the probability of going to city j is given as:

_{p}*N(s*represent the set of feasible nodes. σ and υ are constants to control the relative importance of the pheromone versus the heuristic information,

^{p})*η*which is given as:

_{ij}*d*is the distance between city

_{ij}*i*and city

*j*.

*τ*, associated with the edge joining cities

_{ij}*i*and

*j*is updated as follows:

*ρ*is the evaporation rate,

*m*is the number of ants, and is the quantity of pheromone laid on edge (

*i, j*) by ant

*k*,

_{k}is the length of the tour constructed by ant k.

^{29}

_{ij}is the pheromone value between nest (i) and (j) at the nth iteration,

*θ*is the general pheromone updating coefficient,

*J*is the cost function for the tour travelled by the ant.

_{best}) and highest cost value (J

_{worst}) in one iteration respectively. The pheromone evaporation, given in equation 8, decrease the pheromone density of the visited paths to let the ants visit low density paths assuring diversity.

### RESULTS

^{23,}

^{26,}

^{49}The results of our study support the former clinical researches and focus on the prefrontal region and theta frequency band for MDD patients.

### DISCUSSION

^{50}alzheimer patients,

^{51}depression patients

^{52}and patients suffering from epilepsy

^{53}were also studied and contributed to the combination of optimization algorithms and neural networks to increase the classification performance.

^{54,}

^{55}The machine learning paradigm has been applied in a study using ANN fed with EEG data to differentiate three classes of subjects: those with schizophrenia, those with depression, and healthy subjects.

^{56}Combining various biomarkers, statistical methods were also used to predict

^{23,}

^{40,}

^{57}treatment results. In order to increase the prediction performance, various feature selection methods were proposed for multi-channel EEG data.

^{58,}

^{59}Some other studies underlined the performance of ACO as feature selection method comparing to principal component analysis, genetic algorithm, random tree generation and differential evolution methods.

^{27,}

^{28,}

^{36,}

^{54,}

^{60}