Evaluating Linear and Non-linear Dimensionality Reduction Approaches for Deep Learning-based Network Intrusion Detection Systems
View/ Open
Date
2023-08-08Author
Wanjau, Stephen K.
Wambugu, Geoffrey M.
Oirere, Aaron M.
Metadata
Show full item recordAbstract
Dimensionality reduction is an essential ingredient of machine learning modelling that seeks to improve the performance of such models by extracting better quality features from data while removing irrelevant and redundant ones. The technique aids reduce computational load, avoiding data over-fitting, and increasing model interpretability. Recent studies have revealed that dimensionality reduction can benefit from labeled information, through joint approximation of predictors and target variables from a low-rank representation. A multiplicity of linear and non-linear dimensionality reduction techniques are proposed in the literature contingent on the nature of the domain of interest. This paper presents an evaluation of the performance of a hybrid deep learning model using feature extraction techniques while being applied to a benchmark network intrusion detection dataset. We compare the performance of linear and non-linear feature extraction methods namely, the Principal Component Analysis and Isometric Feature Mapping respectively. The Principal Component Analysis is a non-parametric classical method normally used to extract a smaller representative dataset from high-dimensional data and classifies data that is linear in nature while preserving spatial characteristics. In contrast, Isometric Feature Mapping is a representative method in manifold learning that maps high-dimensional information into a lower feature space while endeavouring to maintain the neighborhood for each data point as well as the geodesic distances present among all pairs of data points. These two approaches were applied to the CICIDS 2017 network intrusion detection benchmark dataset to extract features. The extracted features were then utilized in the training of a hybrid deep learning-based intrusion detection model based on convolutional and a bidirection long short term memory architecture and the model performance results were compared. The empirical results demonstrated the dominance of the Principal Component Analysis as compared to Isometric Feature Mapping in improving the performance of the hybrid deep learning model in classifying network intrusions. The suggested model attained 96.97% and 96.81% in overall accuracy and F1-score, respectively, when the PCA method was used for dimensionality reduction. The hybrid model further achieved a detection rate of 97.91% whereas the false alarm rate was reduced to 0.012 with the discriminative features reduced to 48. Thus the model based on the principal component analysis extracted salient features that improved detection rate and reduced the false alarm rate.
Collections
- Journal Articles (CI) [106]