This paper presents the voice emotion recognition part of the FILTWAM framework for real-time emotion recognition in affective e-learning settings. FILTWAM (Framework for Improving Learning Through Webcams And Microphones) intends to offer timely and appropriate online feedback based upon learner’s vocal intonations and facial expressions in order to foster their learning. Whereas the facial emotion recognition part has been successfully tested in a previous study, the here presented study describes the development and testing of FILTWAM's vocal emotion recognition software artefact. The main goal of this study was to show the valid use of computer microphone data for real-time and adequate interpretation of vocal intonations into extracted emotional states. The software that was developed was tested in a study with twelve participants. All participants individually received the same computer-based tasks in which they were requested eighty times to mimic specific vocal expressions (960 occurrences in total). Each individual session was recorded on video. For the validation of the voice emotion recognition software artefact, two experts annotated and rated participants' recorded behaviours. Expert findings were then compared with the software recognition results and showed an overall accuracy of Kappa of 0.743. The overall accuracy of the voice emotion recognition software artefact is 67% based on the requested emotions and the recognized emotions. Our FILTWAM-software allows to continually and unobtrusively observing learners’ behaviours and transforms these behaviours into emotional states. This paves the way for unobtrusive and real-time capturing of learners' emotional states for enhancing adaptive e-learning approaches.
- Speech interaction
- Affective computing
- Speech emotion recognition
- Real-time software development
- Evaluation methodology
- Empirical study of user behaviour