Project Background
Named-Entity Recognition (NER) is a natural language processing technique that can automatically identify and categorize key elements such as people, organizations, locations, dates, and other important concepts within a large amount of text. This technique enables researchers to conduct insightful analysis of textual information relevant to their research efficiently.
There are various models and tools available online for doing NER tasks but many of them need coding and time for customization in order to reach the desired results. The technical complexity of such methods may be an obstacle for those researchers without strong coding skills.
To address this issue, in this project, our student developers, Sherry and Berry, conducted an extensive environmental scan on the existing NER tools, extracted some notable features, and developed a NER tool, with a particular focus on the Chinese language.
Project Goal
The aim of this project is to develop a user-friendly tool that can streamline the Chinese NER process, allowing non-technical researchers to overcome the technical barriers and to more conveniently utilize NER in their work, particularly for the Chinese language research community.
Features highlights
Auto Annotation
The foundation of our tool makes use of Academia Sinica’s CKIP models to execute automated Chinese NER task for annotation.
Add Entity Manually
Accurately identifying all entities remains a challenging task for the current natural language processing technologies. For those entities that cannot be identified by the model, you can manually add entities as needed. Entities can also be edited or deleted as necessary.
Group Different Entities by group, class and alias
You can group different recognized entities by customizable groups, classes, and aliases, and view the frequency of entities accordingly.
The table above explains the definition of the four terms – instance, class, group, alias – that allow you to categorize the entities in different ways.
Visualizations
Various visualization charts are displayed, allowing you to quickly glimpse the frequency of the selected entities and conduct other insightful analysis.
Upload Multiple Files for Annotation
You can upload multiple text files for analysis. You can find a chart showing the frequency of entities across different files. One use case is that you can treat this chart as a trend analysis to observe how the occurrence of certain entities varies across different chapters.
Export Data
You may export the entities data in CSV format, which will contain the frequency of all recognized entities, the grouped entities and their frequencies, as well as the assigned alias groups and their frequencies.
Each entity, group and alias are assigned with a unique ID, and the relationships between them are indicated, allowing you to conduct further advanced analysis using these detailed CSV files.
For more features and explanations, please visit our user manual guide for details.
Download our tool and try it out!
URL: https://github.com/hkust-lib-ds/P001-PUBLIC_Chinese-NER-Tool
We would be delighted to learn how our tool can support your research. If you have the opportunity to utilize this tool in your work, we would greatly appreciate it if you could share your experience with us!
Words from students
YIP Sau Lai, Sherry
BSc in Data Science and Technology
Year 3
I realized the importance of dividing the project into smaller, independent modules and prioritizing them. This approach facilitated task division and procedural planning, making the implementation process more manageable and efficient…
HAN Liuruo, Berry
BSc in Data Science and Technology
Year 2
Throughout this project, I have had the opportunity to expand my knowledge and skills in various areas, such as website functionality planning, user-oriented design, collaborative workflow with GitHub, etc…
Project Team
Developers
- YIP Sau Lai, Sherry ◇ Year 3 student, BSc in Data Science and Technology
- HAN Liuruo, Berry ◇ Year 2 student, BSc in Data Science and Technology
Advisers
- Holly CHAN ◇ Assistant Manager (Digital Humanities)
- Leo WONG ◇ Librarian (Systems & Digital Services)
- Jennifer GU ◇ Librarian (Research Support)
- Aster ZHAO ◇ Librarian (Research Support)
Presentation
Yip, Sherry S.L., Han, Berry L., & Chan, Holly H.Y. (2024, November 27). From ideation to implementation: Develop a Chinese NER tool to enrich literary experiences (從構思到實現的全過程:製作中文自動實體標注工具,豐富文學體驗) [Conference presentation]. The 15th Conference on Cooperative Development and Sharing of Chinese Resources (CCDSCR), Hong Kong. https://www.hkpl.gov.hk/en/extension-activities/ccdscr2024/schedule.html
Publication
Paper to be released soon. Stay tuned!