Discovering Software Vulnerabilities Using Data-flow Analysis and Machine Learning

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

Abstract

We present a novel method for static analysis in which we combine data-flow analysis with machine learning to detect SQL injection (SQLi) and Cross-Site Scripting (XSS) vulnerabilities in PHP applications. We assembled a dataset from the National Vulnerability Database and the SAMATE project, containing vulnerable PHP code samples and their patched versions in which the vulnerability is solved. We extracted features from the code samples by applying data-flow analysis techniques, including reaching definitions analysis, taint analysis, and reaching constants analysis. We used these features in machine learning to train various probabilistic classifiers. To demonstrate the effectiveness of our approach, we built a tool called WIRECAML, and compared our tool to other tools for vulnerability detection in PHP code. Our tool performed best for detecting both SQLi and XSS vulnerabilities. We also tried our approach on a number of open-source software applications, and found a previously unknown vulnerability in a photo-sharing web application.
Original languageEnglish
Title of host publicationProceedings of the 13th International Conference on Availability, Reliability and Security
Place of PublicationNew York, NY, USA
Publisheracm
Number of pages10
ISBN (Print)978-1-4503-6448-5
DOIs
Publication statusPublished - 2018
Event13th International Conference on Availability, Reliability and Security - Hamburg, Germany
Duration: 27 Aug 201830 Aug 2018
https://dl.acm.org/citation.cfm?doid=3230833.3230856

Conference

Conference13th International Conference on Availability, Reliability and Security
Abbreviated titleARES 2018
CountryGermany
CityHamburg
Period27/08/1830/08/18
Internet address

Keywords

  • Software security, data-flow analysis, machine learning, static code analysis, vulnerability detection

Fingerprint Dive into the research topics of 'Discovering Software Vulnerabilities Using Data-flow Analysis and Machine Learning'. Together they form a unique fingerprint.

Cite this