Project 002 – Object Detection and Image Tagging

HKUST Library Digital Scholarship DS CoLab Project 002

Project Background

The Library has a site “HKUST Digital Images“, which serves as a valuable archive of the university’s history, comprising over 30,000 photos of the University’s early events from 1988 to the early 2000s with diverse scenes, including campus construction, teaching and learning activities, conferences and seminars, congregations, VIP visits, student life, and staff events.

This site, which developed a decade ago, relied on crowdsourcing for manual tagging and user-contributed comments to provide contextual information to each photo. This presentation slide that created in 2012 contains the details of the background and system design of this site at that time.

With the rapid advancement of artificial intelligence (AI), numerous models have emerged in recent years that can be utilized for performing tasks such as object detection, image tagging, and face recognition. Regarding this, we would like to explore the use of various computer vision models to generate more metadata for our photos, eliminating the reliance for manual tagging and commenting through crowdsourcing.

Project Goal

The aim of this project is to explore the feasibility of utilizing various computer vision models to generate metadata for images. We provided the photos and existing data from “HKUST Digital Images” as sample data to our 2 student developers, Jack and Eric, with the purpose of enhancing the associated metadata, making them more informative and providing richer context for each photo.

Methodologies

Keywords Generation for Image Tagging

Various computer vision models are explored, including GroundingDINO and Recognize Anything Model (RAM), for automated object detection and tagging, generating keywords for each photo.

DS-P002_projectIntro_LLaVA

Image Narrative Generation

The feasibility of using Large Language and Vision Assistant (LLaVA) is investigated for generating image descriptions, providing a narrative to the photo and enhancing the understanding of the scene captured.

DS-P002_projectIntro_LLaVA

Face Recognition

The use of facial recognition models is explored, including face_recognition, deepface and clustering method, to identify individuals in the photos. By comparing AI-generated outputs with the existing crowdsourced tags, we aim to identify important individuals that not yet tagged, such as prominent faculty members, staff or council members.  

DS-P002_projectIntro_LLaVA

Visualization


One of our student developer, Jack, also made use of VIKUS Viewer and the keywords he generated to visualize the HKUST Digital Images photos, facilitating the exploration of their thematic and temporal characteristics.

Download our code and try it out!

URL: https://github.com/hkust-lib-ds/P002-PUBLIC_ObjectDetection-ImageTagging

Words from students

LAU Ming Kit, Jack

BEng in Computer Engineering
Year 4


Throughout this project, I gained a deeper understanding of computer vision, particularly how models utilize embeddings to detect objects within an image and generate object masks. This experience significantly enhanced my programming skills and expanded my knowledge in the field of computer visions…

ZHANG Ka Ho Eric

BEng in Electronic Engineering
Year 2


This project provided me with invaluable hands-on experience in applying computer vision and machine learning techniques to solve real-world problems. I am filled with gratitude for the opportunity to contribute to a project that sits at the intersection of technology and practical utility…

Project Team

Developers

  • LAU Ming Kit, Jack ◇ Year 4 student, BEng in Computer Engineering
  • ZHANG Ka Ho Eric ◇ Year 2 student, BEng in Electronic Engineering

Advisers

  • Holly CHAN ◇ Assistant Manager (Digital Humanities)
  • Leo WONG ◇ Librarian (Systems & Digital Services)
  • Aster ZHAO ◇ Librarian (Research Support)
  • Jennifer GU ◇ Librarian (Research Support)

Presentation

This project is presented in a webinar “AI and the Future of Digital Preservation” that organized by the International Federation of Library Associations and Institutions (IFLA) on 18 June 2024 with the topic “Beyond Pixels: AI-driven Image Processing for Enhanced Contextualization of HKUST’s Digital Images (1988-2000s) through the Applications of AI Models for Image Tagging, Object Detection, and Facial Recognition”. (slides can be found here)

Publication

Paper to be released soon. Stay tuned!