Historians rely on different sources to reconstruct the thought, society and history of past civilizations. Many of these sources are text-based – whether written on scrolls or carved into stone, the preserved records of the past help shed light on ancient societies.
However, these records of our ancient cultural heritage are often incomplete due to deliberate destruction, or erosion and fragmentation over time. This is the case for inscriptions: texts written on a durable surface (such as stone, ceramic, metal) by individuals, groups and institutions of the past, and which are the focus of the discipline called epigraphy. Thousands of inscriptions have survived to our day; but the majority have suffered damage over the centuries, and parts of the text are illegible or lost. The reconstruction ("restoration") of these documents is complex and time consuming, but necessary for a deeper understanding of civilizations past.
One of the issues with discerning meaning from incomplete fragments of text is that there are often multiple possible solutions. In many word games and puzzles, players guess letters to complete a word or phrase – the more letters that are specified, the more constrained the possible solutions become. But unlike these games, where players have to guess a phrase in isolation, historians restoring a text can estimate the likelihood of different possible solutions based on other context clues in the inscription – such as grammatical and linguistic considerations, layout and shape, textual parallels, and historical context.
Now, by using machine learning trained on ancient texts, DeepMind and Yiannis Assael, a Greek IT scientist built a system that can furnish a more complete and systematically ranked list of possible solutions, which will hopefully augment historians’ understanding of a text.
Pythia – which takes its name from the priestess who delivered the god Apollo's oracular responses at the Greek sanctuary of Delphi – is the first ancient text restoration model that recovers missing characters from a damaged text input using deep neural networks. Bringing together the disciplines of ancient history and deep learning, the present work offers a fully automated aid to the text restoration task, providing ancient historians with multiple textual restorations, as well as the confidence level for each hypothesis. Pythia takes a sequence of damaged text as input, and is trained to predict character sequences comprising hypothesized restorations of ancient Greek inscriptions (texts written in the Greek alphabet dating between the 7th century BCE and the 5th century CE). The architecture works at both the character- and word-level, thereby effectively handling long-term context information, and dealing efficiently with incomplete word representations. This makes it applicable to all disciplines dealing with ancient texts (philology, papyrology, codicology) and applies to any language (ancient or modern).