G.A.M.I.T: Graphing and Machine-Learning Investigation Toolkit

G.A.M.I.T: Graphing and Machine-Learning Investigation Toolkit

Team members

Chow Ray Jia Rachel (ESD), Khairunnisa Bte Kunhimohamed N (ISTD), Gabriel Chan Zheng Yong (ISTD), Lu Lu (ESD), Shreya Prasad (ESD), Ivan Christian (ISTD)

Instructors:

Costas Courcoubetis, Keegan Kang, Lim Kwan Hui

Writing Instructors:

Pang Yoke Kian Rachel

Introduction

_{Due to the complex and lengthy data needed for the classification of attacks, combined with the shortage of cyber-security experts to thoroughly investigate the attack, there is a need for automation of this process to increase accuracy of classification and decrease manual labor.}

_{Hence, our team built a stand-alone web UI that presents a visual representation and performs machine learning of cyber attack dataset. The results from visual graphs and machine learning are able to reaffirm investigators’ initial suspicion of the type of crime, and suggest missing evidence, and interesting insights about the crime.}

^{The following is the journey a user will experience during the use of this product. The ideal user is a cybersecurity professional who has an inclination of the type of attack and is looking to reaffirm it. They will first upload the dataset, choose the suspected attack type, then select a criteria to filter the dataset if necessary. At this point the web application will output the source destination visualisation of the attack. If the graph matches the hypothesised attack, the user will continue and the machine learning module will generate a confidence score as well as insights about the attack. However if the graph does not match the hypothesised attack the user may go back and reselect the suspected attack type as well as filter criteria.}

The machine learning model is used to generate a confidence score as well as a set of insights regarding the elements of the network traffic dataset provided by the security analysts to check whether the network traffic contains a botnet attack or not.

The main functionality of the machine learning module is to analyse the network traffic that the analyst provided using a pre-trained Convolutional Neural Network model. This model has been trained using the CTU-13 Dataset to detect botnet network transactions based on the features that the model has learned during its training.

The model also returns some insights based on the model parameters in the form of a table of lists containing the flagged-out botnet transactions from the model’s prediction. This can be used by the security analysts to further reaffirm and gain better insights of the network traffic that was uploaded to this analyser.

The final model architecture can be seen in Model Architecture

The graph visualization module aims to generate graphs from dataset uploaded to provide users with a visual representation of the data.
The main functionality and technicality of visualisation module is using Graphical Software Neo4j and python libraries e.g. d3.js which is Integrated into our backend to plot nodes and edges to visualise the attack datasets. This enables users to form a more accurate hypothesis of whether a cyber-attack is taking place.

These implemented features bring about benefits of saving intensive computational time and memory, removes noise from dataset and most importantly enables user to zoom in to a specific areas of interest.

An example of graphs plotted can be seen beside.

We collected feedback on the final prototype regarding its user experience and the 4 customer needs. The survey was targeted to the security analysts from related fields and the questions were designed to evaluate the performances of the modules as well as the overall user interface.

In terms of user experience. The figures show the percentage of ratings that are higher than 4 on a scale of 5 in terms of satisfaction level. There is a positive result in terms of user experience as for all the modules and overall experience, the percentages are higher than 75%.

In terms of how well the customers’ needs are met for each module, our solution successfully addresses the primary needs (TIme Reduction, Manpower) with the highest score, followed by Ease of Use and Insights.

In addition, the respondents showed a preference towards Clean and Intuitive presentation as instruction is clear and easy to follow. The user also has the flexibility to change parameters and get their desired results.