Improving Detection Accuracy for Imbalanced Network Intrusion Classification using Cluster-based Under-sampling with Random Forests
Network intrusion classification in the imbalanced big data environment becomes a significant and important issue in information and communications technology (ICT) in this digital era. Presently, intrusion detection systems (IDSs) are commonly using tool to detect and prevent internal and external network attacks/ intrusions. IDSs are majorly bifurcated into host-based and network-based systems, and use pattern matching techniques to detect intrusions that known as misuse-based intrusion detection system. Machine learning (ML) and data mining (DM) algorithms are widely using for classifying intrusions in IDS over the last few decades. One of the major challenges for building IDS employing machine learning and data mining algorithms is to improve the intrusion classification accuracy and also reducing the false-positive rate. In this paper, we have introduced a new method for improving detection rate to classify minority-class network attacks/ intrusions using cluster-based under-sampling with Random Forest classifier. The proposed method is a multi-layer classification approach, which can process the highly imbalanced big data to correctly identify the minority/ rare class-intrusions. Initially, the proposed method classify a data point/ incoming data is attack/ intrusion or not (like normal behaviour), if it’s an attack then the proposed method try to classify attack type and later sub-attack type. We have used cluster-based under-sampling technique to deal with class-imbalanced problem and popular ensemble classifier Random Forest for addressing overfitting problem. We have used KDD99 intrusion detection benchmark dataset for experimental analysis and tested the performance of proposed method with existing machine learning algorithms like: Artificial Neural Network (ANN), na ̈ıve Bayes (NB) classifier, Random Forest, and Bagging techniques.
The source codes—used to overcome this problem—are publicly available at https://github.com/MdOchiuddinMiah/Network-Intrusion-Classification.
We can directly download by clicking the link.
Note: The package will download in zip format
(.zip)namedNetwork-Intrusion-Classification.zip.
Cloning a repository syncs it to our local machine (Example for Linux-based OS). After clone, we can add and edit files and then push and pull updates.
- Clone over HTTPS:
user@machine:~$ git clone https://github.com/MdOchiuddinMiah/Network-Intrusion-Classification - Clone over SSH:
user@machine:~$ git clone git@github.com:MdOchiuddinMiah/Network-Intrusion-Classification.git
| Normal or attack detection |
|---|
![]() |
| Main attack types detection |
|---|
![]() |
| Final attack/ intrusion detection |
|---|
![]() |
The datasets are available on the open-source repository. Please click for the download.
The source code of the Machine Learning model are available on the open-source repository. Please click for the download.


