Problemer ved elektronisk behandling, især lemmatisering af ældre sprogtrin, uden fast sprognorm, illustreret ud fra Leonora Christinas Franske Selvbiografi fra 1673

Foredrag af Lene Schøsler. 

Foredraget er på dansk (med engelsk slideshow) og er åbent for alle.

Abstract 

My paper (presented in Danish with ppp in English) is based on research in collaboration with Gilles Souvay, (ATILF, Nancy), presented in two joint papers at a conference on Historical Lexicography in a Digital Age, november 2022. Our goal is to present the challenges for the lemmatization of old texts and show our solutions.

Our research concerns a previous state of the French language, i.e. French from the 17th century. The main challenge raised by old texts is the non-stability of the language, in the sense that there is much variation, sometimes unpredictable variation, at the levels of orthography, morphology, lexicon, and syntax.

Language, especially old stages of a language, varies according to parameters that are well known. However, these parameters are not always sufficiently well described, and their relative importance depends on the period studied. Our goal is to provide tools for an analysis of classic French, including its numerous variation possibilities.

Our joint papers refer to a common corpus, which is a 70-page text written in 1673, at a time when the standardization of French was being developed, but not yet fixed, which makes this text particularly interesting. In order to lemmatize a non-fixed state of language, it is necessary to adapt the technical tools developed for modern French, so that the multiple graphic and morphological variants of a lemma can be treated, variants which no longer exist in modern French. It is known that the variation in 17th century French depends above all on the geographical and social origin of the person writing a text, but the communication situation and the medium are also of importance. In the case of old texts, medium implies narration vs. fictitious direct speech. In our text, the variation is particularly significant insofar as it is a text written by a Dane, whose mother tongue is not French. Nevertheless, her mastery of the language is such that the specific features of the so-called classical French are found in her text. It is interesting to develop an analysis tool adapted to this text, because such a tool can be used for other texts from the same period. See more.