Detection of DGA-Generated Domain Names with TF-IDF

Harald Vranken, Hassan Alizadeh

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Botnets often apply domain name generation algorithms (DGAs) to evade detection by generating large numbers of pseudo-random domain names of which only few are registered by cybercriminals. In this paper, we address how DGA-generated domain names can be detected by means of machine learning and deep learning. We first present an extensive literature review on recent prior work in which machine learning and deep learning have been applied for detecting DGA-generated domain names. We observe that a common methodology is still missing, and the use of different datasets causes that experimental results can hardly be compared. We next propose the use of TF-IDF to measure frequencies of the most relevant n-grams in domain names, and use these as features in learning algorithms. We perform experiments with various machine-learning and deep-learning models using TF-IDF features, of which a deep MLP model yields the best results. For comparison, we also apply an LSTM model with embedding layer to convert domain names from a sequence of characters into a vector representation. The performance of our LSTM and MLP models is rather similar, achieving 0.994 and 0.995 AUC, and average F1-scores of 0.907 and 0.891 respectively.
Original languageEnglish
Article number414
Number of pages28
JournalElectronics
Volume11
Issue number3
DOIs
Publication statusPublished - 29 Jan 2022

Keywords

  • BOTNET
  • DGA
  • TF-IDF
  • botnet
  • deep learning
  • machine learning

Fingerprint

Dive into the research topics of 'Detection of DGA-Generated Domain Names with TF-IDF'. Together they form a unique fingerprint.

Cite this