Even the Pope Uses AI. Really.

Colm Maye tells us why the digitisation of the Vatican’s library could mean a fundamental change in the practices of 1.2 billion Catholics around the world. 


A musty labyrinth lies under the Vatican. The Vatican Apostolic Library, established in 1451, comprises 85 kilometres of shelves laden with treasures; the Papal Bull that excommunicated Martin Luther, the court proceedings against Galileo Galilei and one of the oldest bibles in the world (I mean they would, wouldn’t they?). It also contains priceless primary sources for active theological and moral questions, for over a billion people globally. 


What does all of this have to do with AI? Rather a lot, actually.


Only 60 scholars per day are permitted to access the archives, which means that the collection has until recently been quite, as stated by Sam Kean of The Atlantic, ‘useless’. However, 2018 marked the kickoff of ‘In Codice Ratio’, an ambitious project that aims to digitise the collection using the combined powers of artificial intelligence, optical-character-recognition (OCR) software and (nerdy) highschoolers. Ancient manuscript transcription is not an easy thing to teach an AI; challenges to be surmounted include calligraphic and spacing inconsistencies and the innate similarities between certain letters. Traditional OCR  identifies individual letters by recognising gaps in between them, comparing them with characters in a database, and inputting the best match into the new document. How to tackle irregular gaps between letters (cheekily called ‘dirty spacing’) or s(h)nakey cursive script?


Letters are chopped into a series of vertical and horizontal lines. The software recognises ‘local minimums’ in this mishmash as the points on the page with the least ink, and therefore likely spaces between letters. What results is a series of chunky jigsaw pieces which the computer must assemble into letters. An excellent starting point, but how to teach the computer to correctly assembly the jigsaws? Over 500 highschoolers were recruited to judge the AI’s attempts to recreate letters, checking them against a selection of acceptable characters chosen by paleographers (ancient calligraphy specialists) to guide the AI towards autonomy. In 2018 the AI was achieving 96% accuracy on handwritten letters, making the documents far easier to read. The work is ongoing. Secondary school teachers wait with bated breath.


As Orwell said, “Who controls the past controls the future”. The Apostolic Library is a funny kind of past though. It dictates the practices of hundreds of millions of people in the present.  It’s a trove of information relating to cultures from the Romans to the Incas, a vast repository of physical information about the past which can be used to uncover new narratives and challenge received wisdom.


This draws attention. The library faces around 100 cybersecurity threats every month to its digitised collection, which comprises 25% of the total documentation in scanned form. There’s a digital arms race against hackers who could compromise the archive, whether by editing its contents to manipulate information or by using ‘ransomware’, which takes targeted databases hostage by locking the files and threatening deletion unless a steep ransom is paid promptly. While the Catholic church is not known to be poor, a successful attack of this nature could sap funding for the project and set it back by years. 


In order to spurn these attacks as you would a rabid dog, the Vatican has employed pioneering cybersecurity firm Darktrace, which fights fire with fire by neutralising AI cyber threats using its own AI modelled on the human immune system. The system observes normal cyber-activity within an organisation and becomes familiar with its patterns, in much the same way that the immune system monitors normal biological activity. A cyberattack could be compared with a hostile viral receptor antigen, which the system recognises as alien. AI being used to protect AI digitising the archive to protect the faith. 


For more information, visit the archive’s website.