AI algorithms can restore damaged old texts • The Register

According to a new article published in Nature.

A team of computer scientists and classical studies experts led by DeepMind and Ca’ Foscari University of Venice trained a transformer-based neural network to restore inscriptions written in ancient Greek between the 7th century BC and the 5th century AD. The model, named “Ithaka” after the home of legendary Greek king Odysseus, can also estimate when the text was written and where it might have come from.

By recovering fragments of text from broken pieces of pottery or fuzzy writing, for example, researchers can begin to translate them and learn more about ancient civilizations. Thea Sommerschield, co-author of the paper and an epigraph from ancient Greek and Roman, says The register that inscriptions are vital records, everything from sacred calendars to laws to leases can be preserved.

Why Ancient Greek? The researchers said the variable content and context available in Greek epigraphic records make it an “excellent challenge” for language processing, as well as the large number of (digitized) written texts currently available – essential for model training. .

“These documents are one of the most important sets of evidence of the history, language, religion, politics and mentality of the ancient world,” she said. Sommerschield hopes Ithaca will pave the way for researchers to study history with new AI techniques.

“Just as microscopes and telescopes have expanded the range of what scientists can do – providing historians with additional tools to aid their discoveries and improve our collective understanding of history and culture. We hope this work will establish a new standard for the field of digital epigraphy, using advanced deep learning architectures to support the work of ancient historians,” she told us.

First, the text must be transcribed by scanning an image of an old object or script. The text is then introduced into Ithaca for analysis. It works by predicting lost or blurry characters to restore words as outputs. The software generates and ranks a list of its best predictions; epigraphists can then skim through them and judge whether or not the model’s guesses seem accurate.

The best results are achieved when man and machine work together. When the experts worked alone, they were 25% accurate in reconstructing ancient artifacts, but when they collaborated with Ithaca, the accuracy level jumped to 72%. Ithaca’s performance alone is around 62%, for comparison. It is also 71% for locating where the text was written and can date works within 30 years of their creation between 800 BC and 800 AD.

Ithaca was trained on over 63,000 Greek inscriptions containing over three million words from the Packard Humanities Institute’s Searchable Greek Inscriptions public dataset. The team masked out parts of the text and loaded the template to fill in the blanks. Ithaca analyzes other words in a given sentence for context when generating characters.

For example, when restoring the ancient Greek word for “covenant,” he looked at the words “Athenians” and “Thessalians,” describing people from two ancient peoples who banded together to repel the Spartans. It is likely that these three words appeared together in the same sentence in previous uncorrupted entries that the model saw in her training phase.

“Ithaca is trained to restore up to half of the missing text. In our experiments, and in the case of restoration in particular, we report results with up to 10 missing characters,” Sommerschield told us.

“Ithaka has been used to re-date key texts from classical Athens, thus contributing to topical debates in ancient history. We hope that many more such discoveries will follow and that historians will include Ithaca in their workflow. -party historians have been very positive and enthusiastic.”

DeepMind researchers are now adjusting their model to accommodate other types of ancient writing systems, such as Akkadian developed in Mesopotamia, Demotic from ancient Egypt, Maya originating from Central America, and Arabic. ancient Hebrew. “We hope that models like Ithaca can unlock the potential for cooperation between AI and the humanities, transformatively impacting the way we study and write about some of the most important periods in human history,” said he declared.

You can see a demo of Ithaca here and find the code for Ithaca here. ®

Sharon D. Cole