Time and Semantic Similarity – What is the Best Alternative to Capture Implicit Links in CSCL Conversations?

Gabriel Gutu, Mihai Dascalu, Traian Rebedea, Stefan Trausan-Matu

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

4 Downloads (Pure)

Abstract

The goal of our research is to compare novel semantic techniques for identifying implicit links between utterances in multi-participant CSCL chat conversations. Cohesion, reflected by the strength of the semantic relations behind the automatically identified links, is assessed using WordNet-based semantic distances, as well as unsupervised semantic models, i.e. Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA). The analysis is built on top of the ReaderBench framework and multiple identification heuristics were compared, including: semantic cohesion metrics, normalized cohesion measures and Mihalcea’s formula. A corpus of 55 conversations in which participants used explicit links between utterances where they considered necessary for clarity was used for validation. Our study represents an in-depth analysis of multiple methods used to identify implicit links and reveals the accuracy of each technique in terms of capturing the explicit references made by users. Statistical similarity measures ensured the best overall identification accuracy when using Mihalcea’s formula, while WordNet-based techniques provided best results for un-normalized similarity scores applied on a window of 5 utterances and a time frame of 1 minute.
Original languageEnglish
Title of host publicationMaking a Difference: Prioritizing Equity and Access in CSCL
Subtitle of host publication12th International Conference on Computer Supported Collaborative Learning, Volume 1
EditorsBrian K. Smith, Marcela Borge, Emma Mercier, Kyu Yon Lim
PublisherInternational Society of the Learning Sciences
Pages223-230
ISBN (Print)978-0-9903550-0-7
Publication statusPublished - 22 Jun 2017
Externally publishedYes
EventMaking a Difference: Prioritizing Equity and Access in CSCL: 12th International Conference on Computer Supported Collaborative Learning - Philadelphia, United States
Duration: 18 Jun 201721 Jun 2017
https://cscl17.wordpress.com/

Conference

ConferenceMaking a Difference: Prioritizing Equity and Access in CSCL
Abbreviated titleCSCL 2017
CountryUnited States
CityPhiladelphia
Period18/06/1721/06/17
Internet address

Fingerprint

Semantics

Cite this

Gutu, G., Dascalu, M., Rebedea, T., & Trausan-Matu, S. (2017). Time and Semantic Similarity – What is the Best Alternative to Capture Implicit Links in CSCL Conversations? In B. K. Smith, M. Borge, E. Mercier, & K. Y. Lim (Eds.), Making a Difference: Prioritizing Equity and Access in CSCL: 12th International Conference on Computer Supported Collaborative Learning, Volume 1 (pp. 223-230). International Society of the Learning Sciences.
Gutu, Gabriel ; Dascalu, Mihai ; Rebedea, Traian ; Trausan-Matu, Stefan. / Time and Semantic Similarity – What is the Best Alternative to Capture Implicit Links in CSCL Conversations?. Making a Difference: Prioritizing Equity and Access in CSCL: 12th International Conference on Computer Supported Collaborative Learning, Volume 1. editor / Brian K. Smith ; Marcela Borge ; Emma Mercier ; Kyu Yon Lim. International Society of the Learning Sciences, 2017. pp. 223-230
@inproceedings{386f7ecddb954a89bcb66a2e51c353e5,
title = "Time and Semantic Similarity – What is the Best Alternative to Capture Implicit Links in CSCL Conversations?",
abstract = "The goal of our research is to compare novel semantic techniques for identifying implicit links between utterances in multi-participant CSCL chat conversations. Cohesion, reflected by the strength of the semantic relations behind the automatically identified links, is assessed using WordNet-based semantic distances, as well as unsupervised semantic models, i.e. Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA). The analysis is built on top of the ReaderBench framework and multiple identification heuristics were compared, including: semantic cohesion metrics, normalized cohesion measures and Mihalcea’s formula. A corpus of 55 conversations in which participants used explicit links between utterances where they considered necessary for clarity was used for validation. Our study represents an in-depth analysis of multiple methods used to identify implicit links and reveals the accuracy of each technique in terms of capturing the explicit references made by users. Statistical similarity measures ensured the best overall identification accuracy when using Mihalcea’s formula, while WordNet-based techniques provided best results for un-normalized similarity scores applied on a window of 5 utterances and a time frame of 1 minute.",
author = "Gabriel Gutu and Mihai Dascalu and Traian Rebedea and Stefan Trausan-Matu",
year = "2017",
month = "6",
day = "22",
language = "English",
isbn = "978-0-9903550-0-7",
pages = "223--230",
editor = "Smith, {Brian K.} and Marcela Borge and Emma Mercier and Lim, {Kyu Yon}",
booktitle = "Making a Difference: Prioritizing Equity and Access in CSCL",
publisher = "International Society of the Learning Sciences",

}

Gutu, G, Dascalu, M, Rebedea, T & Trausan-Matu, S 2017, Time and Semantic Similarity – What is the Best Alternative to Capture Implicit Links in CSCL Conversations? in BK Smith, M Borge, E Mercier & KY Lim (eds), Making a Difference: Prioritizing Equity and Access in CSCL: 12th International Conference on Computer Supported Collaborative Learning, Volume 1. International Society of the Learning Sciences, pp. 223-230, Making a Difference: Prioritizing Equity and Access in CSCL, Philadelphia, United States, 18/06/17.

Time and Semantic Similarity – What is the Best Alternative to Capture Implicit Links in CSCL Conversations? / Gutu, Gabriel; Dascalu, Mihai; Rebedea, Traian; Trausan-Matu, Stefan.

Making a Difference: Prioritizing Equity and Access in CSCL: 12th International Conference on Computer Supported Collaborative Learning, Volume 1. ed. / Brian K. Smith; Marcela Borge; Emma Mercier; Kyu Yon Lim. International Society of the Learning Sciences, 2017. p. 223-230.

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

TY - GEN

T1 - Time and Semantic Similarity – What is the Best Alternative to Capture Implicit Links in CSCL Conversations?

AU - Gutu, Gabriel

AU - Dascalu, Mihai

AU - Rebedea, Traian

AU - Trausan-Matu, Stefan

PY - 2017/6/22

Y1 - 2017/6/22

N2 - The goal of our research is to compare novel semantic techniques for identifying implicit links between utterances in multi-participant CSCL chat conversations. Cohesion, reflected by the strength of the semantic relations behind the automatically identified links, is assessed using WordNet-based semantic distances, as well as unsupervised semantic models, i.e. Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA). The analysis is built on top of the ReaderBench framework and multiple identification heuristics were compared, including: semantic cohesion metrics, normalized cohesion measures and Mihalcea’s formula. A corpus of 55 conversations in which participants used explicit links between utterances where they considered necessary for clarity was used for validation. Our study represents an in-depth analysis of multiple methods used to identify implicit links and reveals the accuracy of each technique in terms of capturing the explicit references made by users. Statistical similarity measures ensured the best overall identification accuracy when using Mihalcea’s formula, while WordNet-based techniques provided best results for un-normalized similarity scores applied on a window of 5 utterances and a time frame of 1 minute.

AB - The goal of our research is to compare novel semantic techniques for identifying implicit links between utterances in multi-participant CSCL chat conversations. Cohesion, reflected by the strength of the semantic relations behind the automatically identified links, is assessed using WordNet-based semantic distances, as well as unsupervised semantic models, i.e. Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA). The analysis is built on top of the ReaderBench framework and multiple identification heuristics were compared, including: semantic cohesion metrics, normalized cohesion measures and Mihalcea’s formula. A corpus of 55 conversations in which participants used explicit links between utterances where they considered necessary for clarity was used for validation. Our study represents an in-depth analysis of multiple methods used to identify implicit links and reveals the accuracy of each technique in terms of capturing the explicit references made by users. Statistical similarity measures ensured the best overall identification accuracy when using Mihalcea’s formula, while WordNet-based techniques provided best results for un-normalized similarity scores applied on a window of 5 utterances and a time frame of 1 minute.

M3 - Conference article in proceeding

SN - 978-0-9903550-0-7

SP - 223

EP - 230

BT - Making a Difference: Prioritizing Equity and Access in CSCL

A2 - Smith, Brian K.

A2 - Borge, Marcela

A2 - Mercier, Emma

A2 - Lim, Kyu Yon

PB - International Society of the Learning Sciences

ER -

Gutu G, Dascalu M, Rebedea T, Trausan-Matu S. Time and Semantic Similarity – What is the Best Alternative to Capture Implicit Links in CSCL Conversations? In Smith BK, Borge M, Mercier E, Lim KY, editors, Making a Difference: Prioritizing Equity and Access in CSCL: 12th International Conference on Computer Supported Collaborative Learning, Volume 1. International Society of the Learning Sciences. 2017. p. 223-230