Project Background
The Library has a site “HKUST Digital Images“, which serves as a valuable archive of the university’s history, comprising over 30,000 photos of the University’s early events from 1988 to the early 2000s with diverse scenes, including campus construction, teaching and learning activities, conferences and seminars, congregations, VIP visits, student life, and staff events.
This site, which developed a decade ago, relied on crowdsourcing for manual tagging and user-contributed comments to provide contextual information to each photo. This presentation slide that created in 2012 contains the details of the background and system design of this site at that time.
With the rapid advancement of artificial intelligence (AI), numerous models have emerged in recent years that can be utilized for performing tasks such as object detection, image tagging, and face recognition. Regarding this, we would like to explore the use of various computer vision models to generate more metadata for our photos, eliminating the reliance for manual tagging and commenting through crowdsourcing.
Project Goal
The aim of this project is to explore the feasibility of utilizing various computer vision models to generate metadata for images. We provided the photos and existing data from “HKUST Digital Images” as sample data to our 2 student developers, Jack and Eric, with the purpose of enhancing the associated metadata, making them more informative and providing richer context for each photo.
Methodologies
Keywords Generation for Image Tagging
Various computer vision models are explored, including GroundingDINO and Recognize Anything Model (RAM), for automated object detection and tagging, generating keywords for each photo.
Image Narrative Generation
The feasibility of using Large Language and Vision Assistant (LLaVA) is investigated for generating image descriptions, providing a narrative to the photo and enhancing the understanding of the scene captured.
Face Recognition
The use of facial recognition models is explored, including face_recognition, deepface and clustering method, to identify individuals in the photos. By comparing AI-generated outputs with the existing crowdsourced tags, we aim to identify important individuals that not yet tagged, such as prominent faculty members, staff or council members.
Visualization
One of our student developers, Jack, also made use of VIKUS Viewer and the keywords he generated to visualize the HKUST Digital Images photos, facilitating the exploration of their thematic and temporal characteristics.
Download our code and try it out!
URL: https://github.com/hkust-lib-ds/P002-PUBLIC_ObjectDetection-ImageTagging
Words from students
LAU Ming Kit, Jack
BEng in Computer Engineering
Year 4
Throughout this project, I gained a deeper understanding of computer vision, particularly how models utilize embeddings to detect objects within an image and generate object masks. This experience significantly enhanced my programming skills and expanded my knowledge in the field of computer visions…
ZHANG Ka Ho Eric
BEng in Electronic Engineering
Year 2
This project provided me with invaluable hands-on experience in applying computer vision and machine learning techniques to solve real-world problems. I am filled with gratitude for the opportunity to contribute to a project that sits at the intersection of technology and practical utility…
Project Team
Developers
- LAU Ming Kit, Jack ◇ Year 4 student, BEng in Computer Engineering
- ZHANG Ka Ho Eric ◇ Year 2 student, BEng in Electronic Engineering
Advisers
- Holly CHAN ◇ Assistant Manager (Digital Humanities)
- Leo WONG ◇ Librarian (Systems & Digital Services)
- Aster ZHAO ◇ Librarian (Research Support)
- Jennifer GU ◇ Librarian (Research Support)
Presentation
Lau, Jack M.K., & Chan, Holly H.Y. (2024, June 18). Beyond Pixels: AI-driven Image Processing for Enhanced Contextualization of HKUST’s Digital Images (1988-2000s) through the Applications of AI Models for Image Tagging, Object Detection, and Facial Recognition [Webinar presentation]. AI and the Future of Digital Preservation, International Federation of Library Associations and Institutions (IFLA). https://www.ifla.org/events/artificial-intelligence-and-the-future-of-digital-preservation/ [slides]
Publication
Paper to be released soon. Stay tuned!