Projects per year
Abstract
Taxonomies have gained a broad usage in a variety of fields due to their extensibility, as well as their use for classification and knowledge organization. Of particular interest is the digital document management domain in which their hierarchical structure can be effectively employed in order to organize documents into content-specific categories. Common or standard taxonomies (e.g., the ACM Computing Classification System) contain concepts that are too general for conceptualizing specific knowledge domains. In this paper we introduce a novel automated approach that combines sub-trees from general taxonomies with specialized seed taxonomies by using specific Natural Language Processing techniques. We provide an extensible and generalizable model for combining taxonomies in the practical context of two very large European research projects. Because the manual combination of taxonomies by domain experts is a highly time consuming task, our model measures the semantic relatedness between concept labels in CBOW or skip-gram Word2vec vector spaces. A preliminary quantitative evaluation of the resulting taxonomies is performed after applying a greedy algorithm with incremental thresholds used for matching and combining topic labels.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2016 ACM Symposium on Document Engineering |
Publisher | Association for Computing Machinery (ACM) |
Pages | 131-134 |
ISBN (Print) | 978-1-4503-4438-8 |
DOIs | |
Publication status | Published - 27 Sept 2016 |
Externally published | Yes |
Event | 2016 ACM Symposium on Document Engineering - Vienna, Austria Duration: 13 Sept 2016 → 16 Sept 2016 https://dl.acm.org/citation.cfm?id=2960811 |
Conference
Conference | 2016 ACM Symposium on Document Engineering |
---|---|
Country/Territory | Austria |
City | Vienna |
Period | 13/09/16 → 16/09/16 |
Internet address |
Keywords
- Word2Vec
- taxonomy integration
- ontology alignment
- automated semantic integration
Fingerprint
Dive into the research topics of 'Combining Taxonomies using Word2vec'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Rage: Realising an Applied Gaming Eco-system
Westera, W. (PI), Georgiadis, K. (CoI), Saveski, G. (CoI), van Lankveld, G. (CoI), Bahreini, K. (CoI), van der Vegt, W. (CoI), Berkhout, J. (CoI), Nyamsuren, E. (CoI), Kluijfhout, E. (CoI) & Nadolski, R. (CoI)
1/02/15 → 31/07/19
Project: Research