Coding energy knowledge in constructed responses with explainable NLP models

Sebastian Gombert*, Daniele Di Mitri, Onur Karademir, Marcus Kubsch, Hannah Kolbe, Simon Tautz, Adrian Grimm, Isabell Bohm, Knut Neumann, Hendrik Drachsler

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review


Background: Formative assessments are needed to enable monitoring how student
knowledge develops throughout a unit. Constructed response items which require
learners to formulate their own free‐text responses are well suited for testing their active
knowledge. However, assessing such constructed responses in an automated fashion is a
complex task and requires the application of natural language processing methodology.
In this article, we implement and evaluate multiple machine learning models for coding
energy knowledge in free‐text responses of German K‐12 students to items in formative
science assessments which were conducted during synchronous online learning sessions.
Dataset: The dataset we collected for this purpose consists of German constructed
responses from 38 different items dealing with aspects of energy such as manifestation and transformation. The units and items were implemented with the help of project‐based pedagogy and evidence‐centered design, and the responses were coded
for seven core ideas concerning the manifestation and transformation of energy. The
data was collected from students in seventh, eighth and ninth grade.
Methodology: We train various transformer‐ and feature‐based models and compare
their ability to recognize the respective ideas in students' writing. Moreover, as
domain knowledge and its development can be formally modeled through knowledge
networks, we evaluate how well the detection of the ideas within responses translated into accurate co‐occurrence‐based knowledge networks. Finally, in terms of the
descriptive accuracy of our models, we inspect what features played a role for which
prediction outcome and if the models pick up on undesired shortcuts. In addition to
this, we analyze how much the models match human coders in what evidence within
responses they consider important for their coding decisions.
Results: A model based on a modified GBERT‐large can achieve the overall most
promising results, although descriptive accuracy varies much more than predictive
accuracy for the different ideas assessed. For reasons of comparability, we also evaluate the same machine learning architecture using the SciEntsBank 3‐Way
Original languageEnglish
Pages (from-to)767-786
Number of pages20
JournalJournal of Computer Assisted Learning
Issue number3
Early online date15 Dec 2022
Publication statusPublished - Jun 2023


  • automated coding
  • constructed response assessment
  • energy didactics
  • energy transformation
  • knowledge networks
  • short answer scoring


Dive into the research topics of 'Coding energy knowledge in constructed responses with explainable NLP models'. Together they form a unique fingerprint.

Cite this