TICK TOCK, THE CLOCK IS TICKING
: ON THE FINE-TUNING OF MACHINE LEARNING MODELS FOR OFFENSIVE CONTENT CLASSIFICATION ON TIKTOK

  • K Cools

Student thesis: Master's Thesis

Abstract

The rise of social media and the prevalence of technology have led to a shift in how young individuals, particularly Generation Z, interact and consume information. The rise of social media platforms like TikTok has provided extremist groups with an opportunity to exploit and disseminate their propaganda, as well as recruit new members. As a result, the emer-gence of online propaganda, recruitment, and radicalization tactics has presented a chal-lenge for governments, organizations, and social media platforms. They must now devise ways to identify and block radicalizing content to prevent the spread of these dangerous ideologies.
Brenton Tarrant, a self-radicalised individual, carried out a deadly attack in 2019, killing 51 people and injuring 40 others at two mosques in New Zealand. Tarrant, known as the Christchurch attacker, broadcasted his shooting spree via livestream on Facebook and pro-moted it in his manifesto (“The Great Replacement”).
Despite TikTok’s efforts to remove videos that contain or promote violent extremism, white supremacist content remains prevalent on the platform. A prime example is the fre-quent reference to Tarrant’s manifesto and ideology, which allows TikTok users to allude to the Christchurch terrorist attack.
To address such issues, this research addresses the ongoing challenges faced by online social media platforms, such as TikTok, in detecting radicalization content. The scope of this research has been broadened from extremist content to offensive content to encom-pass the overarching process of radicalization. The process of compiling a TikTok-specific dataset is outlined, as well as the development of a series of computational machine learn-ing models to detect offensive language present on the platform.
This research outlines the process of fine-tuning large language classification and ma-chine learning models to detect offensive language on TikTok. Several models with dif-ferent fine-tuning configurations were developed, and the performance of both large lan-guage classification and machine learning classification models was compared against re-spective baseline models. The results show that fine-tuned large language classification models outperformed fine-tuned machine learning models. Additionally, the generaliza-tion capacity of the models are assessed by evaluating their performance on an unseen dataset derived from previous research by Waseem and Hovy [2016a] and Davidson et al.[2017b].
The data collection and scripts for training the computational models associated with this research have been made available on GitHub1 in an effort to contribute to this area of research.
Date of Award9 Jun 2023
Original languageEnglish
SupervisorGideon Maillette de Buij Wenniger (Examiner) & Clara Maathuis (Co-assessor)

Master's Degree

  • Master Computer Science

Cite this

'