Webinar Summary: Machine Learning and AI in Libraries, Archives and Museums

In this blog post, we summarize a recent talk by Dr. Mia Ridge from the British Library, discussing the role of AI and its potential applications in galleries, libraries, archives, and museums (GLAMs).

Recording of the webinar can be found via this link.

Implications of AI and Machine Learning for GLAMs

Dr. Ridge began the talk by discussing the definition of AI, which involves the emulation of human thought processes by computers, enabling them to learn and make decisions without explicit instructions. Machine learning, a subset of AI, involves developing algorithms that can identify patterns in data and make predictions based on statistical analysis. These algorithms can be applied to a wide range of tasks, such as image labeling, content generation, and natural language processing. The potential applications of AI in GLAMs are vast and varied, with advances in technology offering new opportunities for innovation and collaboration.

The British Library and the Living with Machines Project

The British Library is a large research library with up to 200 million items in its collections, and over 3 million new physical and born-digital items are being added every year. One of the challenges in managing this vast collection is that the items are in separate silos, making it difficult to search across all collections.

Dr. Ridge then shared about the Living with Machines project, which is a collaboration between the British Library, the Alan Turing Institute, and several partner universities. The project aims to harness the combined power of massive digitized historical collections and computational analytical tools to examine the impact of machines and technology on life during the long nineteenth century (c.1780-1920).

Reading Maps with Computer Vision Models

The team has developed MapReader, a computer vision model that can break down ordnance survey maps into small patches and annotate them by hand to identify and highlight evidence of railway infrastructure. This has opened up new avenues for historical research, as it is now possible to ask questions about the impact of railway infrastructure on the lives of people living near rail lines.

British ‘railspace’ and buildings as predicted by the MapReader computer vision programme

Image source

DeezyMatch for Fuzzy String Matching

The Living with Machines team has also developed DeezyMatch, a flexible deep neural network approach to fuzzy string matching that can be used in tasks such as fuzzy string match, candidate ranking/selection, query expansion, and toponym matching despite the poor automatic transcription (OCR) and digitization done decades ago.

Dr. Ridge also pointed out the challenges that they came across during the project. Researchers struggle to understand the scale and complexity of the collections, including the long and varied history of previous collecting and digitization decisions, as well as the impact of copyright and publishers’ rights on what they are able to lawfully provide access to.

Applications of AI at Other GLAM Organizations

Dr. Ridge shared several initiatives from other GLAMs on the potential of using machine learning and AI to enhance access, organization, and preservation of cultural heritage materials. These include:

Yale University’s Pixplot which uses machine learning and data visualization to group images by similarity.
The partnership between the National Library of Sweden and Nvidia to train AI models on 500 years of Swedish text to support humanities research. The models are shared on HuggingFace.
Berlin State Library’s Human Machine Culture which aims to use AI for document analysis and image similarity search while involving specialists expertise.
Annif, a Finnish project for automatic subject indexing and classification.
Matt Miller’s project on producing structured data from unstructured text using ChatGPT to extract people, places, and dates from diary entries, despite the fact that the dates in the journal were inconsistently written.
The Library of Congress’s Humans-in-the-Loop initiative, which aims to improve the accuracy of machine learning models used to transcribe historical texts by incorporating human expertise and feedback

These projects highlight the importance of ethical considerations, involving domain experts, and managing collaborations on the terms of the GLAMs.

Challenges and the Way Ahead

Towards the end of the talk, Dr. Ridge summarized the challenges presented to GLAMs when using AI, including its accuracy, reliability, and ethical and legal concerns about the ownership and representation of data used to train AI models. However, it was emphasized that AI also offers opportunities for GLAMs to improve search and discovery tools, redress biases, and link multiple tools together to create a better user experience.

To start using AI in GLAMs, it’s important to identify the problems that need to be solved and start with a pilot or minimum viable project. GLAMs can work with students or researchers to explore new technologies and increase AI literacy across the organization. Finally, it’s important to keep up with changes in the AI landscape and discuss ideas and challenges as a community.

– By Jennifer Gu, Library

Top

AI in Research & Learning
Digital Humanities

Tags: British Library, digital humanities, digital scholarship, GLAMs, machine learning

published May 13, 2023
last modified January 22, 2024