Unlocking the Power of Word2Vec for Identifying Implicit Links

Gabriel Gutu, Mihai Dascalu, Stefan Ruseti, Traian Rebedea, Stefan Trausan-Matu

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

2 Downloads (Pure)

Abstract

This paper presents a research on using Word2Vec for determining implicit links in multi-participant Computer-Supported Collaborative Learning chat conversations. Word2Vec is a powerful and one of the newest Natural Language Processing semantic models used for computing text cohesion and similarity between documents. This research considers cohesion scores in terms of the strength of the semantic relations established between two utterances; the higher the score, the stronger the similarity between two utterances. An implicit link is established based on cohesion to the most similar previous utterance, within an imposed window. Three similarity formulas were used to compute the cohesion score: an unnormalized score, a normalized score with distance and Mihalcea’s formula. Our corpus of conversations incorporated explicit references provided by authors, which were used for validation. A window of 5 utterances and a 1- minute time frame provided the highest detection rate both for exact matching and matching of a block of continuous utterances belonging to the same speaker. Moreover, the unnormalized score correctly identified the largest number of implicit links.
Original languageEnglish
Title of host publication2017 IEEE 17th International Conference on Advanced Learning Technologies (ICALT)
Subtitle of host publicationAdvanced Technologies for Supporting Open Access to Formal and Informal Learning
PublisherIEEE
Pages199-200
Publication statusPublished - Jul 2017
Externally publishedYes
Event2017 IEEE 17th International Conference on Advanced Learning Technologies (ICALT) - Timisoara, Romania
Duration: 3 Jul 20177 Jul 2017
https://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?&filter=issueId%20EQ%20%228001692%22&searchWithin=unlocking&pageNumber=1&resultAction=REFINE

Conference

Conference2017 IEEE 17th International Conference on Advanced Learning Technologies (ICALT)
Abbreviated titleICALT 2017
CountryRomania
CityTimisoara
Period3/07/177/07/17
Internet address

Fingerprint

Semantics
Processing

Keywords

  • implicit links
  • CSCL
  • Word2Vec
  • text cohesion
  • semantic models

Cite this

Gutu, G., Dascalu, M., Ruseti, S., Rebedea, T., & Trausan-Matu, S. (2017). Unlocking the Power of Word2Vec for Identifying Implicit Links. In 2017 IEEE 17th International Conference on Advanced Learning Technologies (ICALT): Advanced Technologies for Supporting Open Access to Formal and Informal Learning (pp. 199-200). IEEE.
Gutu, Gabriel ; Dascalu, Mihai ; Ruseti, Stefan ; Rebedea, Traian ; Trausan-Matu, Stefan. / Unlocking the Power of Word2Vec for Identifying Implicit Links. 2017 IEEE 17th International Conference on Advanced Learning Technologies (ICALT): Advanced Technologies for Supporting Open Access to Formal and Informal Learning. IEEE, 2017. pp. 199-200
@inproceedings{0f8e4fe350ae48868407512eea1bf5c8,
title = "Unlocking the Power of Word2Vec for Identifying Implicit Links",
abstract = "This paper presents a research on using Word2Vec for determining implicit links in multi-participant Computer-Supported Collaborative Learning chat conversations. Word2Vec is a powerful and one of the newest Natural Language Processing semantic models used for computing text cohesion and similarity between documents. This research considers cohesion scores in terms of the strength of the semantic relations established between two utterances; the higher the score, the stronger the similarity between two utterances. An implicit link is established based on cohesion to the most similar previous utterance, within an imposed window. Three similarity formulas were used to compute the cohesion score: an unnormalized score, a normalized score with distance and Mihalcea’s formula. Our corpus of conversations incorporated explicit references provided by authors, which were used for validation. A window of 5 utterances and a 1- minute time frame provided the highest detection rate both for exact matching and matching of a block of continuous utterances belonging to the same speaker. Moreover, the unnormalized score correctly identified the largest number of implicit links.",
keywords = "implicit links, CSCL, Word2Vec, text cohesion, semantic models",
author = "Gabriel Gutu and Mihai Dascalu and Stefan Ruseti and Traian Rebedea and Stefan Trausan-Matu",
year = "2017",
month = "7",
language = "English",
pages = "199--200",
booktitle = "2017 IEEE 17th International Conference on Advanced Learning Technologies (ICALT)",
publisher = "IEEE",
address = "United States",

}

Gutu, G, Dascalu, M, Ruseti, S, Rebedea, T & Trausan-Matu, S 2017, Unlocking the Power of Word2Vec for Identifying Implicit Links. in 2017 IEEE 17th International Conference on Advanced Learning Technologies (ICALT): Advanced Technologies for Supporting Open Access to Formal and Informal Learning. IEEE, pp. 199-200, 2017 IEEE 17th International Conference on Advanced Learning Technologies (ICALT), Timisoara, Romania, 3/07/17.

Unlocking the Power of Word2Vec for Identifying Implicit Links. / Gutu, Gabriel; Dascalu, Mihai; Ruseti, Stefan; Rebedea, Traian; Trausan-Matu, Stefan.

2017 IEEE 17th International Conference on Advanced Learning Technologies (ICALT): Advanced Technologies for Supporting Open Access to Formal and Informal Learning. IEEE, 2017. p. 199-200.

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

TY - GEN

T1 - Unlocking the Power of Word2Vec for Identifying Implicit Links

AU - Gutu, Gabriel

AU - Dascalu, Mihai

AU - Ruseti, Stefan

AU - Rebedea, Traian

AU - Trausan-Matu, Stefan

PY - 2017/7

Y1 - 2017/7

N2 - This paper presents a research on using Word2Vec for determining implicit links in multi-participant Computer-Supported Collaborative Learning chat conversations. Word2Vec is a powerful and one of the newest Natural Language Processing semantic models used for computing text cohesion and similarity between documents. This research considers cohesion scores in terms of the strength of the semantic relations established between two utterances; the higher the score, the stronger the similarity between two utterances. An implicit link is established based on cohesion to the most similar previous utterance, within an imposed window. Three similarity formulas were used to compute the cohesion score: an unnormalized score, a normalized score with distance and Mihalcea’s formula. Our corpus of conversations incorporated explicit references provided by authors, which were used for validation. A window of 5 utterances and a 1- minute time frame provided the highest detection rate both for exact matching and matching of a block of continuous utterances belonging to the same speaker. Moreover, the unnormalized score correctly identified the largest number of implicit links.

AB - This paper presents a research on using Word2Vec for determining implicit links in multi-participant Computer-Supported Collaborative Learning chat conversations. Word2Vec is a powerful and one of the newest Natural Language Processing semantic models used for computing text cohesion and similarity between documents. This research considers cohesion scores in terms of the strength of the semantic relations established between two utterances; the higher the score, the stronger the similarity between two utterances. An implicit link is established based on cohesion to the most similar previous utterance, within an imposed window. Three similarity formulas were used to compute the cohesion score: an unnormalized score, a normalized score with distance and Mihalcea’s formula. Our corpus of conversations incorporated explicit references provided by authors, which were used for validation. A window of 5 utterances and a 1- minute time frame provided the highest detection rate both for exact matching and matching of a block of continuous utterances belonging to the same speaker. Moreover, the unnormalized score correctly identified the largest number of implicit links.

KW - implicit links

KW - CSCL

KW - Word2Vec

KW - text cohesion

KW - semantic models

M3 - Conference article in proceeding

SP - 199

EP - 200

BT - 2017 IEEE 17th International Conference on Advanced Learning Technologies (ICALT)

PB - IEEE

ER -

Gutu G, Dascalu M, Ruseti S, Rebedea T, Trausan-Matu S. Unlocking the Power of Word2Vec for Identifying Implicit Links. In 2017 IEEE 17th International Conference on Advanced Learning Technologies (ICALT): Advanced Technologies for Supporting Open Access to Formal and Informal Learning. IEEE. 2017. p. 199-200