How gullible are web measurement tools? a case study analysing and strengthening OpenWPM's reliability

Research output: Chapter in Book/Report/Conference proceedingConference Article in proceedingAcademicpeer-review

Abstract

Automated browsers are widely used to study the web at scale. Their premise is that they measure what regular browsers would encounter on the web. In practice, deviations due to detection of automation have been found. To what extent automated browsers can be improved to reduce such deviations has so far not been investigated in detail. In this paper, we investigate this for a specific web automation framework: OpenWPM, a popular research framework specifically designed to study web privacy. We analyse (1) detectability of OpenWPM, (2) resilience of OpenWPM's data recording, and (3) prevalence of OpenWPM detection. Our analysis (1) reveals OpenWPM is easily detectable. Our investigation of OpenWPM's data recording integrity (2) identifies novel evasion techniques and previously unknown attacks against OpenWPM's instrumentation. We investigate and develop mitigations to address the identified issues. Finally, in a scan of 100,000 sites (3), we observe that OpenWPM is commonly detected (∼14% of front pages). Moreover, we discover integrated routines in scripts specifically to detect OpenWPM clients. In conclusion, our case study shows that even the most popular web measurement framework, OpenWPM, is more gullible than expected, and this gullibility is rarely accounted for in studies.

Original languageEnglish
Title of host publicationCoNEXT '22: Proceedings of the 18th International Conference on emerging Networking EXperiments and Technologies
Editors Giuseppe Bianchi, Alessandro Mei
Place of PublicationNew York
PublisherAssociation for Computing Machinery (ACM)
Pages171-186
Number of pages16
ISBN (Electronic)9781450395083
DOIs
Publication statusPublished - 30 Nov 2022
Event18th International Conference on emerging Networking EXperiments and Technologies - Roma, Italy
Duration: 6 Dec 20229 Dec 2022
Conference number: 18
https://conferences2.sigcomm.org/co-next/2022/#!/home

Conference

Conference18th International Conference on emerging Networking EXperiments and Technologies
Abbreviated titleCoNEXT '22
Country/TerritoryItaly
CityRoma
Period6/12/229/12/22
Internet address

Keywords

  • bot detection
  • privacy
  • reliability
  • security
  • web bots
  • web measurements

Fingerprint

Dive into the research topics of 'How gullible are web measurement tools? a case study analysing and strengthening OpenWPM's reliability'. Together they form a unique fingerprint.

Cite this