CORPUS17: a philological corpus for 17th c. French - École nationale des chartes Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

CORPUS17: a philological corpus for 17th c. French

CORPUS17: un corpus philologique pour le XVIIe siècle français

Résumé

We investigate the creation of a 17th c. French literary corpus. We present the main options regarding available standards, the training data we created and the efficiency of the models produced for OCR, spelling normalisation and lemmatisation – always with open-source solutions. We also present our encoding choices and the global logic of a corpus designed as a virtuous circle, enhancing automatically the tools that are used for its construction.
We investigate the creation of a 17th c. French literary corpus. We present the main options regarding available standards, the training data we created and the efficiency of the models produced for OCR, spelling normalisation and lemmatisation – always with open-source solutions. We also present our encoding choices and the global logic of a corpus designed as a virtuous circle, enhancing automatically the tools that are used for its construction.
Fichier principal
Vignette du fichier
CORPUS17.pdf (767.16 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03041871 , version 1 (11-12-2020)

Identifiants

Citer

Simon Gabay, Alexandre Bartz, Yohann Deguin. CORPUS17: a philological corpus for 17th c. French. Proceedings of the 2nd International Digital Tools & Uses Congress (DTUC ’20), Oct 2020, Hammamet, Tunisia. ⟨10.1145/3423603.3424002⟩. ⟨hal-03041871⟩
188 Consultations
301 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More