Combining program slicing and graph neural networks to detect software vulnerabilities

  • WJC de Kraker

Student thesis: Master's Thesis


Recent studies show that deep-learning models outperform software vulnerability detection tools that rely on human-defined rules. Deep learning focuses on neural networks that can automatically improve at a given task using data. According to prior work, a specific type of neural network is well suited for detecting software vulnerabilities. This type of network is called a graph neural network (GNN). GNN’s are specifically designed to learn from graph data. Learning from this type of data is ideal for vulnerability detection because program syntax and semantics can be well expressed in graph structures. Examples of graphs that represent program syntax/semantics are abstract syntax trees, control flow graphs, and data flow graphs.
Although GNN’s outperform previous approaches due to their high F1 score, there is still roomfor improvement. One of the areas of improvement is that these GNN studies are limited to detecting vulnerabilities at the function level. Detection at this level means that the neural network performs predictions on a function and ignores the rest of the program. As a result, these approaches are unable to detect vulnerabilities caused by the interaction of multiple functions.
In this study we investigate the effect of detecting vulnerabilities across function boundaries. To achieve this goal, we extend prior work using a technique called inter-procedural program slicing. This technique is designed to extract a subset of a potentially large program based on a specific criterion. By choosing a security criterion, slicing results in a subprogram that preserves behavior related to the existence of a vulnerability. This subprogram can consist of multiple functions, which allows the detection of vulnerabilities across function boundaries. Although program slicing has already been used for vulnerability
detection, we distinguish ourselves in two ways. We are one of the first studies to
combine slicing with GNN’s and we included more language constructs in the slices.
To train and evaluate our deep-learning model, we collected a data set of both vulnerable and non-vulnerable C/C++ code. This data set combined samples from the Software Assurance Reference Dataset (SARD) and National Vulnerability Database (NVD). To limit the scope of our research efforts, we focus on two prevalent types of vulnerabilities: out-ofbounds writes (CWE-787) and out-of-bounds reads (CWE-125).
Using this data set, we first tried to find the optimal target depth for our program slicing algorithm. The term target depth refers to the maximum depth of the call tree. We experimented with a target depth ranging from zero to the max depth of our data set. Evaluating the accuracy and F1 score shows an increase in performance as the function depth increases. We conclude that limiting the function depth loses key information required to classify our samples.
To compare our program slicing approach with function-level analysis, we reproduced the results of a previous study and made several key enhancements to their model. These enhancements included bug fixes and a new word embedding strategy. Next, we ran the model on our new data set and compared the F1 score of both approaches. The program slicing approach performed significantly better on the data set because a third of the vulnerabilities required inter-procedural analysis. This confirmed our hypothesis that buffer vulnerabilities can occur across a series of functions and thus that program slicing outperforms function-level analysis in certain scenarios. However, this improvement in F1 score did have a drawback. Calculating a single epoch took longer, but was still feasible in a limited amount of time.
Date of Award20 Jul 2022
Original languageEnglish
SupervisorHarald Vranken (Examiner) & Arjen Hommersom (Co-assessor)

Master's Degree

  • Master Software Engineering

Cite this