Discovering Software Vulnerabilities Using Data-flow Analysis and Machine Learning

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

Abstract

We present a novel method for static analysis in which we combine data-flow analysis with machine learning to detect SQL injection (SQLi) and Cross-Site Scripting (XSS) vulnerabilities in PHP applications. We assembled a dataset from the National Vulnerability Database and the SAMATE project, containing vulnerable PHP code samples and their patched versions in which the vulnerability is solved. We extracted features from the code samples by applying data-flow analysis techniques, including reaching definitions analysis, taint analysis, and reaching constants analysis. We used these features in machine learning to train various probabilistic classifiers. To demonstrate the effectiveness of our approach, we built a tool called WIRECAML, and compared our tool to other tools for vulnerability detection in PHP code. Our tool performed best for detecting both SQLi and XSS vulnerabilities. We also tried our approach on a number of open-source software applications, and found a previously unknown vulnerability in a photo-sharing web application.
Original languageEnglish
Title of host publicationProceedings of the 13th International Conference on Availability, Reliability and Security
Place of PublicationNew York, NY, USA
Publisheracm
Number of pages10
ISBN (Print)978-1-4503-6448-5
DOIs
Publication statusPublished - 2018
Event13th International Conference on Availability, Reliability and Security - Hamburg, Germany
Duration: 27 Aug 201830 Aug 2018
https://dl.acm.org/citation.cfm?doid=3230833.3230856

Conference

Conference13th International Conference on Availability, Reliability and Security
Abbreviated titleARES 2018
CountryGermany
CityHamburg
Period27/08/1830/08/18
Internet address

Fingerprint

Data flow analysis
Learning systems
Static analysis
Classifiers

Keywords

  • Software security, data-flow analysis, machine learning, static code analysis, vulnerability detection

Cite this

Kronjee, J., Hommersom, A., & Vranken, H. (2018). Discovering Software Vulnerabilities Using Data-flow Analysis and Machine Learning. In Proceedings of the 13th International Conference on Availability, Reliability and Security [6] New York, NY, USA: acm. https://doi.org/10.1145/3230833.3230856
Kronjee, Jorrit ; Hommersom, Arjen ; Vranken, Harald. / Discovering Software Vulnerabilities Using Data-flow Analysis and Machine Learning. Proceedings of the 13th International Conference on Availability, Reliability and Security. New York, NY, USA : acm, 2018.
@inproceedings{87e4ec8a80de4fd286dd65b4a532cfb9,
title = "Discovering Software Vulnerabilities Using Data-flow Analysis and Machine Learning",
abstract = "We present a novel method for static analysis in which we combine data-flow analysis with machine learning to detect SQL injection (SQLi) and Cross-Site Scripting (XSS) vulnerabilities in PHP applications. We assembled a dataset from the National Vulnerability Database and the SAMATE project, containing vulnerable PHP code samples and their patched versions in which the vulnerability is solved. We extracted features from the code samples by applying data-flow analysis techniques, including reaching definitions analysis, taint analysis, and reaching constants analysis. We used these features in machine learning to train various probabilistic classifiers. To demonstrate the effectiveness of our approach, we built a tool called WIRECAML, and compared our tool to other tools for vulnerability detection in PHP code. Our tool performed best for detecting both SQLi and XSS vulnerabilities. We also tried our approach on a number of open-source software applications, and found a previously unknown vulnerability in a photo-sharing web application.",
keywords = "Software security, data-flow analysis, machine learning, static code analysis, vulnerability detection",
author = "Jorrit Kronjee and Arjen Hommersom and Harald Vranken",
year = "2018",
doi = "10.1145/3230833.3230856",
language = "English",
isbn = "978-1-4503-6448-5",
booktitle = "Proceedings of the 13th International Conference on Availability, Reliability and Security",
publisher = "acm",

}

Kronjee, J, Hommersom, A & Vranken, H 2018, Discovering Software Vulnerabilities Using Data-flow Analysis and Machine Learning. in Proceedings of the 13th International Conference on Availability, Reliability and Security., 6, acm, New York, NY, USA, 13th International Conference on Availability, Reliability and Security, Hamburg, Germany, 27/08/18. https://doi.org/10.1145/3230833.3230856

Discovering Software Vulnerabilities Using Data-flow Analysis and Machine Learning. / Kronjee, Jorrit; Hommersom, Arjen; Vranken, Harald.

Proceedings of the 13th International Conference on Availability, Reliability and Security. New York, NY, USA : acm, 2018. 6.

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

TY - GEN

T1 - Discovering Software Vulnerabilities Using Data-flow Analysis and Machine Learning

AU - Kronjee, Jorrit

AU - Hommersom, Arjen

AU - Vranken, Harald

PY - 2018

Y1 - 2018

N2 - We present a novel method for static analysis in which we combine data-flow analysis with machine learning to detect SQL injection (SQLi) and Cross-Site Scripting (XSS) vulnerabilities in PHP applications. We assembled a dataset from the National Vulnerability Database and the SAMATE project, containing vulnerable PHP code samples and their patched versions in which the vulnerability is solved. We extracted features from the code samples by applying data-flow analysis techniques, including reaching definitions analysis, taint analysis, and reaching constants analysis. We used these features in machine learning to train various probabilistic classifiers. To demonstrate the effectiveness of our approach, we built a tool called WIRECAML, and compared our tool to other tools for vulnerability detection in PHP code. Our tool performed best for detecting both SQLi and XSS vulnerabilities. We also tried our approach on a number of open-source software applications, and found a previously unknown vulnerability in a photo-sharing web application.

AB - We present a novel method for static analysis in which we combine data-flow analysis with machine learning to detect SQL injection (SQLi) and Cross-Site Scripting (XSS) vulnerabilities in PHP applications. We assembled a dataset from the National Vulnerability Database and the SAMATE project, containing vulnerable PHP code samples and their patched versions in which the vulnerability is solved. We extracted features from the code samples by applying data-flow analysis techniques, including reaching definitions analysis, taint analysis, and reaching constants analysis. We used these features in machine learning to train various probabilistic classifiers. To demonstrate the effectiveness of our approach, we built a tool called WIRECAML, and compared our tool to other tools for vulnerability detection in PHP code. Our tool performed best for detecting both SQLi and XSS vulnerabilities. We also tried our approach on a number of open-source software applications, and found a previously unknown vulnerability in a photo-sharing web application.

KW - Software security, data-flow analysis, machine learning, static code analysis, vulnerability detection

U2 - 10.1145/3230833.3230856

DO - 10.1145/3230833.3230856

M3 - Conference article in proceeding

SN - 978-1-4503-6448-5

BT - Proceedings of the 13th International Conference on Availability, Reliability and Security

PB - acm

CY - New York, NY, USA

ER -

Kronjee J, Hommersom A, Vranken H. Discovering Software Vulnerabilities Using Data-flow Analysis and Machine Learning. In Proceedings of the 13th International Conference on Availability, Reliability and Security. New York, NY, USA: acm. 2018. 6 https://doi.org/10.1145/3230833.3230856