Classifying Written Texts Through Rhythmic Features

Mihaela Balint, Stefan Trausan-Matu, Mihai Dascalu

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

4 Downloads (Pure)

Abstract

Rhythm analysis of written texts focuses on literary analysis and it mainly considers poetry. In this paper we investigate the relevance of rhythmic features for categorizing texts in prosaic form pertaining to different genres. Our contribution is threefold. First, we define a set of rhythmic features for written texts. Second, we extract these features from three corpora, of speeches, essays, and newspaper articles. Third, we perform feature selection by means of statistical analyses, and determine a subset of features which efficiently discriminates between the three genres. We find that using as little as eight rhythmic features, documents can be adequately assigned to a given genre with an accuracy of around 80 %, significantly higher than the 33 % baseline which results from random assignment.
Original languageEnglish
Title of host publicationArtificial Intelligence: Methodology, Systems, and Applications. AIMSA 2016
EditorsC. Dichev, G. Agre
PublisherSpringer
Pages121-129
ISBN (Electronic)978-3-319-44748-3
ISBN (Print)978-3-319-44747-6
DOIs
Publication statusPublished - 18 Aug 2016
Externally publishedYes
EventInternational Conference on Artificial Intelligence: Methodology, Systems, and Applications: Artificial Intelligence: Methodology, Systems, and Applications - Varna, Bulgaria
Duration: 7 Sep 201610 Sep 2016
https://link.springer.com/book/10.1007/978-3-319-44748-3
https://www.springer.com/la/book/9783319447476

Publication series

NameLecture Notes in Computer Science (LNCS)
PublisherSpringer
Volume9883
NameLecture Notes in Artificial Intelligence (LNAI)
Volume9883

Conference

ConferenceInternational Conference on Artificial Intelligence: Methodology, Systems, and Applications
Abbreviated titleAIMSA 2016
CountryBulgaria
CityVarna
Period7/09/1610/09/16
Internet address

Fingerprint

Literary Analysis
Poetry
Feature Selection
Rhythm
Assignment
Newspaper Articles

Keywords

  • rhythm
  • text classification
  • natural language processing
  • discourse analysis

Cite this

Balint, M., Trausan-Matu, S., & Dascalu, M. (2016). Classifying Written Texts Through Rhythmic Features. In C. Dichev, & G. Agre (Eds.), Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2016 (pp. 121-129). (Lecture Notes in Computer Science (LNCS); Vol. 9883), (Lecture Notes in Artificial Intelligence (LNAI); Vol. 9883). Springer. https://doi.org/10.1007/978-3-319-44748-3_12
Balint, Mihaela ; Trausan-Matu, Stefan ; Dascalu, Mihai. / Classifying Written Texts Through Rhythmic Features. Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2016. editor / C. Dichev ; G. Agre. Springer, 2016. pp. 121-129 (Lecture Notes in Computer Science (LNCS)). (Lecture Notes in Artificial Intelligence (LNAI)).
@inproceedings{727a3c5b2f434b159bc140746908a650,
title = "Classifying Written Texts Through Rhythmic Features",
abstract = "Rhythm analysis of written texts focuses on literary analysis and it mainly considers poetry. In this paper we investigate the relevance of rhythmic features for categorizing texts in prosaic form pertaining to different genres. Our contribution is threefold. First, we define a set of rhythmic features for written texts. Second, we extract these features from three corpora, of speeches, essays, and newspaper articles. Third, we perform feature selection by means of statistical analyses, and determine a subset of features which efficiently discriminates between the three genres. We find that using as little as eight rhythmic features, documents can be adequately assigned to a given genre with an accuracy of around 80 {\%}, significantly higher than the 33 {\%} baseline which results from random assignment.",
keywords = "rhythm, text classification, natural language processing, discourse analysis",
author = "Mihaela Balint and Stefan Trausan-Matu and Mihai Dascalu",
year = "2016",
month = "8",
day = "18",
doi = "10.1007/978-3-319-44748-3_12",
language = "English",
isbn = "978-3-319-44747-6",
series = "Lecture Notes in Computer Science (LNCS)",
publisher = "Springer",
pages = "121--129",
editor = "C. Dichev and G. Agre",
booktitle = "Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2016",

}

Balint, M, Trausan-Matu, S & Dascalu, M 2016, Classifying Written Texts Through Rhythmic Features. in C Dichev & G Agre (eds), Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2016. Lecture Notes in Computer Science (LNCS), vol. 9883, Lecture Notes in Artificial Intelligence (LNAI), vol. 9883, Springer, pp. 121-129, International Conference on Artificial Intelligence: Methodology, Systems, and Applications, Varna, Bulgaria, 7/09/16. https://doi.org/10.1007/978-3-319-44748-3_12

Classifying Written Texts Through Rhythmic Features. / Balint, Mihaela; Trausan-Matu, Stefan; Dascalu, Mihai.

Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2016. ed. / C. Dichev; G. Agre. Springer, 2016. p. 121-129 (Lecture Notes in Computer Science (LNCS); Vol. 9883), (Lecture Notes in Artificial Intelligence (LNAI); Vol. 9883).

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

TY - GEN

T1 - Classifying Written Texts Through Rhythmic Features

AU - Balint, Mihaela

AU - Trausan-Matu, Stefan

AU - Dascalu, Mihai

PY - 2016/8/18

Y1 - 2016/8/18

N2 - Rhythm analysis of written texts focuses on literary analysis and it mainly considers poetry. In this paper we investigate the relevance of rhythmic features for categorizing texts in prosaic form pertaining to different genres. Our contribution is threefold. First, we define a set of rhythmic features for written texts. Second, we extract these features from three corpora, of speeches, essays, and newspaper articles. Third, we perform feature selection by means of statistical analyses, and determine a subset of features which efficiently discriminates between the three genres. We find that using as little as eight rhythmic features, documents can be adequately assigned to a given genre with an accuracy of around 80 %, significantly higher than the 33 % baseline which results from random assignment.

AB - Rhythm analysis of written texts focuses on literary analysis and it mainly considers poetry. In this paper we investigate the relevance of rhythmic features for categorizing texts in prosaic form pertaining to different genres. Our contribution is threefold. First, we define a set of rhythmic features for written texts. Second, we extract these features from three corpora, of speeches, essays, and newspaper articles. Third, we perform feature selection by means of statistical analyses, and determine a subset of features which efficiently discriminates between the three genres. We find that using as little as eight rhythmic features, documents can be adequately assigned to a given genre with an accuracy of around 80 %, significantly higher than the 33 % baseline which results from random assignment.

KW - rhythm

KW - text classification

KW - natural language processing

KW - discourse analysis

U2 - 10.1007/978-3-319-44748-3_12

DO - 10.1007/978-3-319-44748-3_12

M3 - Conference article in proceeding

SN - 978-3-319-44747-6

T3 - Lecture Notes in Computer Science (LNCS)

SP - 121

EP - 129

BT - Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2016

A2 - Dichev, C.

A2 - Agre, G.

PB - Springer

ER -

Balint M, Trausan-Matu S, Dascalu M. Classifying Written Texts Through Rhythmic Features. In Dichev C, Agre G, editors, Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2016. Springer. 2016. p. 121-129. (Lecture Notes in Computer Science (LNCS)). (Lecture Notes in Artificial Intelligence (LNAI)). https://doi.org/10.1007/978-3-319-44748-3_12