Mining for constructions in texts using N-gram and network analysis

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Mining for constructions in texts using N-gram and network analysis. / Shibuya, Yoshikata; Jensen, Kim Ebensgaard.

In: Globe: A Journal of Language, Culture and Communication, Vol. 2, 02.10.2015, p. 23-54.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Shibuya, Y & Jensen, KE 2015, 'Mining for constructions in texts using N-gram and network analysis', Globe: A Journal of Language, Culture and Communication, vol. 2, pp. 23-54. https://doi.org/10.5278/ojs.globe.v2i0.1113

APA

Shibuya, Y., & Jensen, K. E. (2015). Mining for constructions in texts using N-gram and network analysis. Globe: A Journal of Language, Culture and Communication, 2, 23-54. https://doi.org/10.5278/ojs.globe.v2i0.1113

Vancouver

Shibuya Y, Jensen KE. Mining for constructions in texts using N-gram and network analysis. Globe: A Journal of Language, Culture and Communication. 2015 Oct 2;2:23-54. https://doi.org/10.5278/ojs.globe.v2i0.1113

Author

Shibuya, Yoshikata ; Jensen, Kim Ebensgaard. / Mining for constructions in texts using N-gram and network analysis. In: Globe: A Journal of Language, Culture and Communication. 2015 ; Vol. 2. pp. 23-54.

Bibtex

@article{9cf4b294c57144b5a22e50efcd0c9ac9,
title = "Mining for constructions in texts using N-gram and network analysis",
abstract = "In constructionist theory, constructions are functional entities that pair form and conventionalized semantic and/or discourse-pragmatic function. One of the main tasks of the construction grammarian is thus to identify and document constructions. Seeing that it is unlikely that this can be done satisfactorily via introspection, there is a need for different ways of identifying constructions in language use. In this paper, we will explore the extent to which the N-gram information retrieval technique – which has seen use in phraseological analysis, discourse analysis, register characterization, and corpus stylistics – is applicable in the identification of constructions and their functionality in discourse. An N-gram is a constellation of a specified number (N = number) of entities that frequently (co)occur in a data population. In this paper we will report on an exploratory study in which we apply N-gram analysis to Lewis Carroll's novel Alice's Adventures in Wonderland and Mark Twain's novelThe Adventures of Huckleberry Finn and extrapolate a number of likely constructional phenomena from recurring N-gram patterns in the two texts. In addition to simple N-gram analysis, the following will be applied: comparative N-gram analysis which draws on a slightly adjusted distinctive collexeme analysis, hierarchical agglomerative cluster analysis, and N-gram-based network analysis. The latter is explored as a way to capture different N-gram types, and underlying constructions, in one representation. The main premise is that, if constructions are functional units, then configurations of words that tend to recur together in discourse are likely to have some sort of function that speakers utilize in discourse. Writers of fiction, for instance, may use constructions in characterizations, mind-styles, text-world construction and specification of narrative temporality. In this paper, our special interest lies in the relationship between constructions and the discourse of fiction. As the study reported in this article is exploratory, it serves just as much to test the methods mentioned above as to analyze and characterize the two novels.",
keywords = "Faculty of Humanities, construction grammar, corpus stylistics, corpus linguistics, corpus methodology, cognitive poetics, cognitive stylistics, functionality of language, literary language, N-gram, network analysis, network science, node centrality, text mining, Alice's Adventures in Wonderland, The Adventures of Huckleberry Finn",
author = "Yoshikata Shibuya and Jensen, {Kim Ebensgaard}",
year = "2015",
month = oct,
day = "2",
doi = "10.5278/ojs.globe.v2i0.1113",
language = "English",
volume = "2",
pages = "23--54",
journal = "Globe: A Journal of Language, Culture and Communication",
issn = "2246-8838",
publisher = "Aalborg Universitetsforlag ",

}

RIS

TY - JOUR

T1 - Mining for constructions in texts using N-gram and network analysis

AU - Shibuya, Yoshikata

AU - Jensen, Kim Ebensgaard

PY - 2015/10/2

Y1 - 2015/10/2

N2 - In constructionist theory, constructions are functional entities that pair form and conventionalized semantic and/or discourse-pragmatic function. One of the main tasks of the construction grammarian is thus to identify and document constructions. Seeing that it is unlikely that this can be done satisfactorily via introspection, there is a need for different ways of identifying constructions in language use. In this paper, we will explore the extent to which the N-gram information retrieval technique – which has seen use in phraseological analysis, discourse analysis, register characterization, and corpus stylistics – is applicable in the identification of constructions and their functionality in discourse. An N-gram is a constellation of a specified number (N = number) of entities that frequently (co)occur in a data population. In this paper we will report on an exploratory study in which we apply N-gram analysis to Lewis Carroll's novel Alice's Adventures in Wonderland and Mark Twain's novelThe Adventures of Huckleberry Finn and extrapolate a number of likely constructional phenomena from recurring N-gram patterns in the two texts. In addition to simple N-gram analysis, the following will be applied: comparative N-gram analysis which draws on a slightly adjusted distinctive collexeme analysis, hierarchical agglomerative cluster analysis, and N-gram-based network analysis. The latter is explored as a way to capture different N-gram types, and underlying constructions, in one representation. The main premise is that, if constructions are functional units, then configurations of words that tend to recur together in discourse are likely to have some sort of function that speakers utilize in discourse. Writers of fiction, for instance, may use constructions in characterizations, mind-styles, text-world construction and specification of narrative temporality. In this paper, our special interest lies in the relationship between constructions and the discourse of fiction. As the study reported in this article is exploratory, it serves just as much to test the methods mentioned above as to analyze and characterize the two novels.

AB - In constructionist theory, constructions are functional entities that pair form and conventionalized semantic and/or discourse-pragmatic function. One of the main tasks of the construction grammarian is thus to identify and document constructions. Seeing that it is unlikely that this can be done satisfactorily via introspection, there is a need for different ways of identifying constructions in language use. In this paper, we will explore the extent to which the N-gram information retrieval technique – which has seen use in phraseological analysis, discourse analysis, register characterization, and corpus stylistics – is applicable in the identification of constructions and their functionality in discourse. An N-gram is a constellation of a specified number (N = number) of entities that frequently (co)occur in a data population. In this paper we will report on an exploratory study in which we apply N-gram analysis to Lewis Carroll's novel Alice's Adventures in Wonderland and Mark Twain's novelThe Adventures of Huckleberry Finn and extrapolate a number of likely constructional phenomena from recurring N-gram patterns in the two texts. In addition to simple N-gram analysis, the following will be applied: comparative N-gram analysis which draws on a slightly adjusted distinctive collexeme analysis, hierarchical agglomerative cluster analysis, and N-gram-based network analysis. The latter is explored as a way to capture different N-gram types, and underlying constructions, in one representation. The main premise is that, if constructions are functional units, then configurations of words that tend to recur together in discourse are likely to have some sort of function that speakers utilize in discourse. Writers of fiction, for instance, may use constructions in characterizations, mind-styles, text-world construction and specification of narrative temporality. In this paper, our special interest lies in the relationship between constructions and the discourse of fiction. As the study reported in this article is exploratory, it serves just as much to test the methods mentioned above as to analyze and characterize the two novels.

KW - Faculty of Humanities

KW - construction grammar

KW - corpus stylistics

KW - corpus linguistics

KW - corpus methodology

KW - cognitive poetics

KW - cognitive stylistics

KW - functionality of language

KW - literary language

KW - N-gram

KW - network analysis

KW - network science

KW - node centrality

KW - text mining

KW - Alice's Adventures in Wonderland

KW - The Adventures of Huckleberry Finn

U2 - 10.5278/ojs.globe.v2i0.1113

DO - 10.5278/ojs.globe.v2i0.1113

M3 - Journal article

VL - 2

SP - 23

EP - 54

JO - Globe: A Journal of Language, Culture and Communication

JF - Globe: A Journal of Language, Culture and Communication

SN - 2246-8838

ER -

ID: 164294150