Automated Alignment of Medieval Text Versions based on Word Embeddings

TitleAutomated Alignment of Medieval Text Versions based on Word Embeddings
Publication TypeConference Paper
Year of Publication2019
AuthorsMeinecke, Christofer, Wrisley David Joseph, and Jänicke Stefan
Conference NameLEVIA’19: Leipzig Symposium on Visualization in Applications 2019
Conference LocationLeipzig
KeywordsDigital Humanities, Sentence Alignment, Visualization, Word Embedding
AbstractMedieval textuality is characterized by instability in text structure and length that varies according to the text tradition. This instability in the versions, otherwise known as “mouvance”, is characterized by dialectal difference, traces of orality, the modification of wording and even the rewriting and rearrangement of large parts of the text. To help humanities scholars in the exploratory analysis of such complex text collections, the visual analytic system iteal was initially proposed. The system aligns similar phrases on a line-level on the basis of string similarity and word n-grams. We propose an extension of this system that replaces the parameter-based approach with an automatic one using word embeddings thereby adding a semantic component. The benefit of the new visualization system is shown through a comparison of different versions of medieval French texts. Additionally, a domain-expert compared the parameter-based approach with the approach based on word embeddings to outline the similarities and differences in the alignments.