Active Learning for Reducing Labeling Effort in Text Classification Tasks

Pieter Floris Jacobs*, Gideon Maillette de Buy Wenniger*, Marco Wiering*, Lambert Schomaker*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference Article in proceedingAcademicpeer-review

Abstract

Labeling data can be an expensive task as it is usually performed manually by domain experts. This is cumbersome for deep learning, as it is dependent on large labeled datasets. Active learning (AL) is a paradigm that aims to reduce labeling effort by only using the data which the used model deems most informative. Little research has been done on AL in a text classification setting and next to none has involved the more recent, state-of-the-art Natural Language Processing (NLP) models. Here, we present an empirical study that compares different uncertainty-based algorithms with BERTbase as the used classifier. We evaluate the algorithms on two NLP classification datasets: Stanford Sentiment Treebank and KvK-Frontpages. Additionally, we explore heuristics that aim to solve presupposed problems of uncertainty-based AL; namely, that it is unscalable and that it is prone to selecting outliers. Furthermore, we explore the influence of the query-pool size on the performance of AL. Whereas it was found that the proposed heuristics for AL did not improve performance of AL; our results show that using uncertainty-based AL with BERTbase outperforms random sampling of data. This difference in performance can decrease as the query-pool size gets larger.

Original languageEnglish
Title of host publicationArtificial Intelligence and Machine Learning
Subtitle of host publication33rd Benelux Conference on Artificial Intelligence, BNAIC/Benelearn 2021, Esch-sur-Alzette, Luxembourg, November 10–12, 2021, Revised Selected Papers
EditorsLuis A. Leiva, Cédric Pruski, Réka Markovich, Amro Najjar, Christoph Schommer
PublisherSpringer, Cham
Pages3-29
Number of pages27
Edition1
ISBN (Electronic)978-3-030-93842-0
ISBN (Print)9783030938413
DOIs
Publication statusPublished - 12 Jan 2022
Event33rd Benelux Conference on Artificial Intelligence, BNAIC/ BENELEARN 2021 - Esch-sur-Alzette, Luxembourg
Duration: 10 Nov 202112 Nov 2021
https://bnaic2021.uni.lu/bnaic-benelearn/

Publication series

SeriesCommunications in Computer and Information Science
Volume1530 CCIS
ISSN1865-0929

Conference

Conference33rd Benelux Conference on Artificial Intelligence, BNAIC/ BENELEARN 2021
Country/TerritoryLuxembourg
CityEsch-sur-Alzette
Period10/11/2112/11/21
Internet address

Keywords

  • Active Learning
  • BERT
  • Deep Learning
  • Text classification

Fingerprint

Dive into the research topics of 'Active Learning for Reducing Labeling Effort in Text Classification Tasks'. Together they form a unique fingerprint.

Cite this