Handwritten Text Recognition for Documentary Medieval Manuscripts - École nationale des chartes Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2022

Handwritten Text Recognition for Documentary Medieval Manuscripts

La reconnaissance de l'écriture pour les manuscrits documentaires du Moyen Âge

Vincent Jolivet

Résumé

The handwritten text recognition (HTR) techniques aim to effectively recognize sequence of characters in an input manuscript image by training an artificial intelligence into the historical writing features. Efficient HTR models will help to transform digitized manuscript collections into an indexed and quotable corpus able to provide meaningful research clues to historical questions. Before, several issues must be addressed such as the lack of relevant training corpora; the large number of variations proposed by each scribal hand and by each writing script or the complex page layout. This paper presents two models and one cross-model aiming to automatically transcribe Latin and French medieval documentary manuscripts, mostly charters and registers, produced between the 12th and 15th centuries and classified into two major writing scripts: Textualis (from the late-11th to 13th century) and Cursiva (from the 13th to the 15th century). The models architecture is based into a CRNN network coupled to a CTC loss. The training and evaluation, entailing 120k lines of text and almost 1M tokens, were conducted using three ready-to-use ground-truth corpora : The Alcar-HOME database, the e-NDP corpus and the Himanis project. We describe the training architecture and corpora and we discuss the main training problems, the results and the perspectives open by HTR techniques on medieval documentary manuscripts.
Fichier principal
Vignette du fichier
HTR_medieval_latin_french.pdf (2.79 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03892163 , version 1 (09-12-2022)
hal-03892163 , version 2 (20-06-2023)
hal-03892163 , version 3 (16-12-2023)

Identifiants

  • HAL Id : hal-03892163 , version 1

Citer

Sergio Torres Aguilar, Vincent Jolivet. Handwritten Text Recognition for Documentary Medieval Manuscripts. 2022. ⟨hal-03892163v1⟩
405 Consultations
427 Téléchargements

Partager

Gmail Facebook X LinkedIn More