Combining Taxonomies using Word2vec

Tobias Swoboda, Matthias Hemmje, Mihai Dascalu, Stefan Trausan-Matu

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

15 Downloads (Pure)

Abstract

Taxonomies have gained a broad usage in a variety of fields due to their extensibility, as well as their use for classification and knowledge organization. Of particular interest is the digital document management domain in which their hierarchical structure can be effectively employed in order to organize documents into content-specific categories. Common or standard taxonomies (e.g., the ACM Computing Classification System) contain concepts that are too general for conceptualizing specific knowledge domains. In this paper we introduce a novel automated approach that combines sub-trees from general taxonomies with specialized seed taxonomies by using specific Natural Language Processing techniques. We provide an extensible and generalizable model for combining taxonomies in the practical context of two very large European research projects. Because the manual combination of taxonomies by domain experts is a highly time consuming task, our model measures the semantic relatedness between concept labels in CBOW or skip-gram Word2vec vector spaces. A preliminary quantitative evaluation of the resulting taxonomies is performed after applying a greedy algorithm with incremental thresholds used for matching and combining topic labels.
Original languageEnglish
Title of host publicationProceedings of the 2016 ACM Symposium on Document Engineering
PublisherAssociation for Computing Machinery (ACM)
Pages131-134
ISBN (Print) 978-1-4503-4438-8
DOIs
Publication statusPublished - 27 Sep 2016
Externally publishedYes
Event2016 ACM Symposium on Document Engineering - Vienna, Austria
Duration: 13 Sep 201616 Sep 2016
https://dl.acm.org/citation.cfm?id=2960811

Conference

Conference2016 ACM Symposium on Document Engineering
CountryAustria
CityVienna
Period13/09/1616/09/16
Internet address

Fingerprint

Taxonomies
Labels
Vector spaces
Seed
Semantics
Processing

Keywords

  • Word2Vec
  • taxonomy integration
  • ontology alignment
  • automated semantic integration

Cite this

Swoboda, T., Hemmje, M., Dascalu, M., & Trausan-Matu, S. (2016). Combining Taxonomies using Word2vec. In Proceedings of the 2016 ACM Symposium on Document Engineering (pp. 131-134). Association for Computing Machinery (ACM). https://doi.org/10.1145/2960811.2967151
Swoboda, Tobias ; Hemmje, Matthias ; Dascalu, Mihai ; Trausan-Matu, Stefan. / Combining Taxonomies using Word2vec. Proceedings of the 2016 ACM Symposium on Document Engineering. Association for Computing Machinery (ACM), 2016. pp. 131-134
@inproceedings{f4160dbf621a464ca8821696f2c9fe25,
title = "Combining Taxonomies using Word2vec",
abstract = "Taxonomies have gained a broad usage in a variety of fields due to their extensibility, as well as their use for classification and knowledge organization. Of particular interest is the digital document management domain in which their hierarchical structure can be effectively employed in order to organize documents into content-specific categories. Common or standard taxonomies (e.g., the ACM Computing Classification System) contain concepts that are too general for conceptualizing specific knowledge domains. In this paper we introduce a novel automated approach that combines sub-trees from general taxonomies with specialized seed taxonomies by using specific Natural Language Processing techniques. We provide an extensible and generalizable model for combining taxonomies in the practical context of two very large European research projects. Because the manual combination of taxonomies by domain experts is a highly time consuming task, our model measures the semantic relatedness between concept labels in CBOW or skip-gram Word2vec vector spaces. A preliminary quantitative evaluation of the resulting taxonomies is performed after applying a greedy algorithm with incremental thresholds used for matching and combining topic labels.",
keywords = "Word2Vec, taxonomy integration, ontology alignment, automated semantic integration",
author = "Tobias Swoboda and Matthias Hemmje and Mihai Dascalu and Stefan Trausan-Matu",
year = "2016",
month = "9",
day = "27",
doi = "10.1145/2960811.2967151",
language = "English",
isbn = "978-1-4503-4438-8",
pages = "131--134",
booktitle = "Proceedings of the 2016 ACM Symposium on Document Engineering",
publisher = "Association for Computing Machinery (ACM)",
address = "United States",

}

Swoboda, T, Hemmje, M, Dascalu, M & Trausan-Matu, S 2016, Combining Taxonomies using Word2vec. in Proceedings of the 2016 ACM Symposium on Document Engineering. Association for Computing Machinery (ACM), pp. 131-134, 2016 ACM Symposium on Document Engineering, Vienna, Austria, 13/09/16. https://doi.org/10.1145/2960811.2967151

Combining Taxonomies using Word2vec. / Swoboda, Tobias; Hemmje, Matthias; Dascalu, Mihai; Trausan-Matu, Stefan.

Proceedings of the 2016 ACM Symposium on Document Engineering. Association for Computing Machinery (ACM), 2016. p. 131-134.

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

TY - GEN

T1 - Combining Taxonomies using Word2vec

AU - Swoboda, Tobias

AU - Hemmje, Matthias

AU - Dascalu, Mihai

AU - Trausan-Matu, Stefan

PY - 2016/9/27

Y1 - 2016/9/27

N2 - Taxonomies have gained a broad usage in a variety of fields due to their extensibility, as well as their use for classification and knowledge organization. Of particular interest is the digital document management domain in which their hierarchical structure can be effectively employed in order to organize documents into content-specific categories. Common or standard taxonomies (e.g., the ACM Computing Classification System) contain concepts that are too general for conceptualizing specific knowledge domains. In this paper we introduce a novel automated approach that combines sub-trees from general taxonomies with specialized seed taxonomies by using specific Natural Language Processing techniques. We provide an extensible and generalizable model for combining taxonomies in the practical context of two very large European research projects. Because the manual combination of taxonomies by domain experts is a highly time consuming task, our model measures the semantic relatedness between concept labels in CBOW or skip-gram Word2vec vector spaces. A preliminary quantitative evaluation of the resulting taxonomies is performed after applying a greedy algorithm with incremental thresholds used for matching and combining topic labels.

AB - Taxonomies have gained a broad usage in a variety of fields due to their extensibility, as well as their use for classification and knowledge organization. Of particular interest is the digital document management domain in which their hierarchical structure can be effectively employed in order to organize documents into content-specific categories. Common or standard taxonomies (e.g., the ACM Computing Classification System) contain concepts that are too general for conceptualizing specific knowledge domains. In this paper we introduce a novel automated approach that combines sub-trees from general taxonomies with specialized seed taxonomies by using specific Natural Language Processing techniques. We provide an extensible and generalizable model for combining taxonomies in the practical context of two very large European research projects. Because the manual combination of taxonomies by domain experts is a highly time consuming task, our model measures the semantic relatedness between concept labels in CBOW or skip-gram Word2vec vector spaces. A preliminary quantitative evaluation of the resulting taxonomies is performed after applying a greedy algorithm with incremental thresholds used for matching and combining topic labels.

KW - Word2Vec

KW - taxonomy integration

KW - ontology alignment

KW - automated semantic integration

U2 - 10.1145/2960811.2967151

DO - 10.1145/2960811.2967151

M3 - Conference article in proceeding

SN - 978-1-4503-4438-8

SP - 131

EP - 134

BT - Proceedings of the 2016 ACM Symposium on Document Engineering

PB - Association for Computing Machinery (ACM)

ER -

Swoboda T, Hemmje M, Dascalu M, Trausan-Matu S. Combining Taxonomies using Word2vec. In Proceedings of the 2016 ACM Symposium on Document Engineering. Association for Computing Machinery (ACM). 2016. p. 131-134 https://doi.org/10.1145/2960811.2967151