Abstract
Automated browsers (web bots) are an invaluable tool for studying the web. However, research has shown that web bots can be distinguished from regular browsers and that they may be served different content as a consequence. This undermines their utility as a measurement tool. So far, three methods have been used to detect web bots: browser fingerprint, order of site traversal, and aspects of page interaction.
While site traversal depends on the study being executed, the other two aspects can be controlled in a generic fashion. Whereas identifiability of web bot fingerprints has been studied in the past, how to alter the fingerprint has received less attention. In this paper, we study which method to alter the fingerprint incurs the least side effects. Secondly, we provide an initial investigation of how the interaction API of Selenium differs from human interaction. We incorporate the latter results into HLISA, an API that simulates interaction like humans. Finally, we discuss the conceptual arms race between simulators and detectors and find that conceptually, detecting HLISA requires modelling human interaction.
While site traversal depends on the study being executed, the other two aspects can be controlled in a generic fashion. Whereas identifiability of web bot fingerprints has been studied in the past, how to alter the fingerprint has received less attention. In this paper, we study which method to alter the fingerprint incurs the least side effects. Secondly, we provide an initial investigation of how the interaction API of Selenium differs from human interaction. We incorporate the latter results into HLISA, an API that simulates interaction like humans. Finally, we discuss the conceptual arms race between simulators and detectors and find that conceptually, detecting HLISA requires modelling human interaction.
Original language | English |
---|---|
Title of host publication | IMC '21 |
Subtitle of host publication | Proceedings of the 21st ACM Internet Measurement Conference |
Pages | 380–389 |
Number of pages | 10 |
DOIs | |
Publication status | Published - Nov 2021 |
Event | The 21st ACM Internet Measurement Conference - Online, ACM, New York, United States Duration: 2 Nov 2021 → 4 Nov 2021 https://conferences.sigcomm.org/imc/2021/ |
Conference
Conference | The 21st ACM Internet Measurement Conference |
---|---|
Abbreviated title | IMC '21 |
Country/Territory | United States |
City | New York |
Period | 2/11/21 → 4/11/21 |
Internet address |
Keywords
- behavioural recognition
- browser fingerprinting
- reliability
- web bot detection
- web studies