Classification of motor imaginary in EEG using random forest classifier

Although the motor-imagery-based brain computer interface (BCI) has become popular in recent years, its practical application is limited due to the classification accuracy of methods. In this study, a new classification scheme is proposed for the classification of multi-class motor imaginary in EEG using random forest (RF) classifier. In the proposed scheme, a fourstage binary classification tree is constructed. An RF model is trained for each stage of decision tree using features extracted from the EEG channels. The EEG band powers of each channel are the extracted features from the EEG signal. The proposed classification scheme is applied on the BCI competition IV dataset 2a recordings. The EEG data is acquired from nine subjects and the proposed scheme is performed for each subject independently. The kappa values of the proposed scheme are calculated to compare the results with the methods in the literature. It is demonstrated that the proposed classification scheme has higher kappa values than the methods in the literature.


Introduction
People who suffer from tetraplegia or ALS can lose their motor ability functions.Therefore, these people cannot control external devices with their muscles.Brain-computer interfaces (BCI) are an alternate way to control electronic devices.BCIs are based on the understanding of the brain activity and converting them into control commands [1].Generally, during the imaginary movements of different parts of their body, spatially localised brain activity arises [2].Junhua et al. [3] developed a motor-imaginary-based BCI system.During the experiments, subjects were asked to accomplish three cognitive tasks: The imaginary left-hand, right-hand and right-foot movement.EEG data was collected by an amplifier, which has 118 EEG and 10 EOG channels.The sampling frequency of the amplifier is set to 1 KHz.EEG amplifier has an internal bandpass filter of 0.05 and 200 Hz.The delta frequency (0-4 Hz) band energy of the EEG data and common spatial pattern (CSP)-based features are extracted as a feature.The fisher discriminant classifier is used in order to classify the EEG data.Authors tested the developed classification scheme on the two subjects' EEG data.It was observed that the proposed classification scheme has 93.45% and 91.88% sensitivity for each cognitive task.
Izguerdo et al. [4] developed a BCI system that consists of three motor imaginary cognitive tasks.The cognitive tasks are determined as left-hand movement, right-hand movement and constructing sentences with a keyword.The developed classification scheme is applied on the BCI Competition III dataset V benchmark data.In the feature extraction step, the band energies of EEG data are extracted as feature.An ANN and fuzzy-logic-based hybrid classifier is used in the signal classification stage.The authors claimed that the developed classification scheme has 87.21%, 82.26% and 58.72% classification accuracy per cognitive task, respectively.
In this study, the benchmark data of BCI competition IV dataset 2a is used.EEG data is obtained from nine healthy subjects in two sections.The first 5 min of EEG data is neglected in order to reduce the EOG effects on the EEG signal.The first 5 min can be divided into three sections: 2 min eyes open, 2 min eyes closed and 1 min with eye movements.The EOG parts of the experiment are given in Figure 1.During the experiment, subjects were asked to accomplish four different cognitive tasks: left-hand, right-hand, both foot and tongue movement.Sampling frequency and resolution of the EEG amplifier was set to 250 Hz and 100 uv, respectively.The data consisted of the 22 channel EEG and 3 channel EOG data.A 50-Hz digital notch filter is applied on the EEG in order to suppress the power line frequency.The EEG data is also filtered with a 20th degree bandpass filter (8-30 Hz).

Path Planning Phase
The CSP method is developed by H. Ramoser [5] to classify multi-channel EEG signals.The main goal of the CSP is to get a linear transform matrix that maximises the variance of the two-class signal matrix.Rows of the gathered transform matrix are assumed as weights for the channels.The raw EEG data is represented as an N × M matrix X. N is the number of the channels and M is the number of samples per channels.The normalised spatial covariance of X can be calculated as follows: The trace of a matrix is defined as the sum of diagonal elements.The spatial covariance is calculated for both the classes by using the EEG data matrices X r and X l (i.e., left and right motor imaginary) The composite spatial covariance is then calculated as The C c matrix can be decomposed with the standard matrix decomposition In Eq. ( 4), U c is the eigenvector matrix and Λ c is the eigenvalue diagonal matrix.The whitening transformation matrix P is calculated as follows: The variances in the space spanned by U c are equalised using P matrix as follows: S l and S r matrices share common eigenvectors.Therefore, the sum of the corresponding eigenvalues of two matrices is equal to I The eigenvectors, which have the largest eigenvalues for S l , have the smallest eigenvalues for S r , and vice versa.Therefore, the eigenvectors of B matrix can be used for two-class classification problems.The transformation of the whitened EEG data is used in the first and last eigenvectors in B, providing the feature vectors that are optimal for discriminating two populations of odd EEG in the least squares sense.
The projection matrix (W) can be defined as follows: The EEG data is decomposed with the W as The columns of the W -1 matrix are called the CSPs.

Random Forest
The decision tree technique was initially proposed by Morgan and Sonquiest in 1963 [x].The technique was used frequently for classification applications after a study published by Breiman in 1984 [6,7].The classification algorithm starts by splitting the training data into left and right nodes based on a threshold applied to the first feature.The threshold is determined as the boundary value, which minimises the residual of sum squares (regression criteria) and is determined as the threshold.The residual of the sum squares can be calculated as The splitting operation is applied to each node repeatedly using the next feature at each level of the decision tree till pure class samples are obtained.Therefore, the size and computational cost of the decision tree depends on the number of features.A sample of the separation flowchart of random forest (RF) algorithm is shown in Figure 2. The size of the decision tree can be decreased by pruning operation.The pruning operation can be described as the elimination of useless features.The gini-index, which measures the statistical dispersion, can be used to decide whether the node is promising or not.Therefore, the gini-index of a node can be used for feature selection as follows: The classification accuracy of a binary decision tree depends on the training data.To overcome this problem, the training data are segmented into randomly selected sub-datasets.A decision tree is built for each sub-dataset.The output of the RF classifier is set as the average of the decision trees' outputs where M is the number of decision trees and T i is the output of the ith tree.

Feature Extraction
Feature extraction, which maps the measurement space into the feature space, is the initial step in any particular pattern recognition problem.In this study, a set of both time and frequency domain features were extracted from the related signals.In addition to CSP-based features, band energies (alfa, beta and theta) of EEG data signal are included in the feature vector as frequency domain-based features.During offline analysis of the EEG data, it is observed that the delta energy band decreases the classification performance.Therefore, the delta band is not included in the feature vector.The CSP is defined between two classes.Therefore, two methods, one versus rest and the other versus one, are used in the literature.In this study, one versus all method is used to extract feature.

Signal Classification
A multi-stage decision-making algorithm is developed in order to analyse motor imaginary EEG data in this study.The proposed motor imaginary classification system consists of three decision stages.In the first stage, the developed algorithm decides whether the EEG data sample is left hand or not.If the algorithm determines that the EEG data sample does not belong to the left hand, it goes to the second stage where it decides whether it is the right-hand or other classes.In the final stage of the algorithm, the EEG data is assigned to both the hands or tongue.RF classifiers are used at each stage of the proposed motor imaginary classification system.The number of the grown trees in each classifier is set as 100 by the trial-and-error method.The developed classification scheme is shown in Figure 3.

Results
In the literature, the performance of classifiers is compared by two performance criteria: sensitivity and specificity.The sensitivity and specificity of a classifier can be calculated as follows: In Eqs. ( 13) and ( 14), TN, TP, FN and FP correspond to true negatives, true positives, false negatives and false positives, respectively.TP samples are assumed as the correctly classified data samples in a related class.Like TP samples, TN samples are the correctly classified data samples belonging to other classes.FN samples are the data samples in a class incorrectly classified as a related class to one of the other classes.Besides, FP samples are the data samples in other classes which are incorrectly classified as the related class.For analysis, the EEG data is divided into 1.25-sec time windows.Therefore, each class has 144 samples per subject.A total of 50% of the data are used for training and the remaining 50% is used for testing the proposed classifier.Totally, each class has 324 samples for testing.The confusion table of classifier is given in Table 1.Of the 324 samples corresponding to the left-hand class, 200 (sensitivity: 0.617) samples are correctly classified by the proposed classification scheme.The classification sensitivities for the right hand, foot and tongue are calculated as 0.638, 0.654 and 0.660, respectively.Using the confusion table, class specificities of the proposed classification scheme are calculated and are given in Table 2.It is observed that the RF classifier has the highest classification specificity on the tongue class.

Discussion
In this study, a three-stage RF-based classification scheme is proposed for the motor imaginary in EEG signals.The developed scheme is tested on the four-class BCI Competition dataset 2a benchmark data.Both time (CSP) and frequency domain (EEG band energies)-based features are extracted from the EEG data.It is observed that the proposed classification scheme has the highest sensitivity and specificity on the tongue class.

Figure 1 .
Figure 1.Sample scenario for trapping into local minimum

Figure 2 .
Figure 2. Sample flowchart for RF classifier

Table 1 .
Confusion table of RF classifier

Table 2 .
Class specificities of RF classifier