Research Bridge
research_bridge_logo
Transcribe Cantonese Speech to Text: With Code Samples and Automated Batch Processing Techniques
AI in Research & Learning
Digital Humanities

In this post, Holly Chan, our Assistant Manager of Digital Humanities, demonstrates how to perform Cantonese speech-to-text transcription with coding. The code she shares will speed up the batch process, further enhance your ability to work with audio data in humanities research. 

While the examples in this article use Cantonese speech, the same techniques can be applied to transcribe speech in other languages. 
 

Code

You can find all of the code shown below in this Jupyter Notebook (.ipynb file). If this is your first time using Jupyter Notebook, please refer to our previous tutorial article “How to open .ipynb file (Jupyter Notebook)” for reference.

Below is one of the recommended overall workflows:

 Recommended Speech To Text Workflow 
 

Install and Import Necessary Packages

Install and import necessary packages 
 

1. Speech (Cantonese) to Text (Spoken Cantonese)

The first step is to transcribe the speech directly into text that captures the spoken language. This can be achieved using Automated Speech Recognition (ASR) models
 

Method 1: Using Google Speech Recognition

The code below uses the speech_recognition library to transcribe the Cantonese audio file into spoken Cantonese text. The recognize_google() function uses Google Speech Recognition to perform this task. For Cantonese, the language code “yue-HK” is used. If you need to work with other languages, please refer to the language code available here: https://cloud.google.com/text-to-speech/docs/voices 

using Google Speech Recognition 

speech-to-text output from Google Speech Recognition 
 

As you can see from the screenshot above, the speech-to-text output from Google Speech Recognition does not include punctuation. To add punctuation to the transcribed text, you could make use of OpenAI

All HKUST staff and students have monthly free credits for using Azure OpenAI service. Please refer to this article “How to use HKUST Azure OpenAI API key with Python (with sample code and use case examples)” to learn how you can obtain your Azure OpenAI API key. 

add punctuation to the transcribed text with OpenAI 

output of adding punctuation to the transcribed text 

 

Method 2: Using Whisper model

https://huggingface.co/alvanlii/whisper-small-cantonese The code below uses the Hugging Face Transformers library to load a pre-trained model for translating Cantonese speech to written Chinese text. However, it takes a longer time to process the task than that of Method 1 (see the table below for the comparison).

 using Whisper model for speech to text 

output of Whisper Model 

 

Comparison Between Method 1 and 2

Comparison Between Method 1 And Method 2 
 

2. Speech (Cantonese) to Text (Written Chinese) Using Whisper V3 Model

https://huggingface.co/openai/whisper-large-v3 We can also use the OpenAI’s Whisper V3 model to transcribe Cantonese speech to Written Chinese text directly.

 transcribe Cantonese speech to Written Chinese text with Whisper V3 model 

Output of transcribing Cantonese speech to Written Chinese text with Whisper 
 

3. Text (Spoken Cantonese) to Text (Written Chinese) Using OpenAI

Since the existing ASR models are not fully accurate, we would recommend improve the Cantonese text output first, and then use that refined Cantonese as the basis for converting it to written Chinese. The code below uses OpenAI to translate the spoken Cantonese text to written Chinese text. You can adjust the prompt message according to your needs. 

Text (Spoken Cantonese) to Text (Written Chinese) using OpenAI 

Output of text in spoken Chinese to text in written Chinese using OpenAI 
 

4. Summarize Using OpenAI

To further enhance your research capabilities, you can generate summaries of the transcribed speech or text. This can be particularly useful when dealing with long passages or multiple documents – the summary can give you the key points in a nutshell. You may adjust the prompt message and add more criteria, such as keep the response output within a certain amount of word limit. 

Summarize using OpenAI 

Output of openAI summarization 
 

5. Batch Processing and Export to CSV

The code below allows you to process a batch of audio files in the WAV format and export the results (filename, duration, and Cantonese text) to a CSV file. Simply place all your WAV audio files in the same directory in this Jupyter Notebook, and run the code. 

Batch processing and export to CSV 

Then, manually check the Cantonese text and refine. Afterward, perform Canto_to_Chi_OpenAI and summarize tasks as follows. 

Batch processing Canto_to_Chi_OpenAI and summarize tasks 
 

Conclusion

By automating the process of transcribing speech to text and organizing the results in a structured CSV format, researchers can save more time and effort to focus on the analysis and interpretation of the data. Hope this sharing has provided a useful solution to enhance your humanities research projects that involving audio data.

Previous News
Previous News
Next News
Previous News
Next News
Previous News
Next News
Next News
Next News
Previous News
Previous News
Next News
Previous News
Next News
Previous News
Next News
Previous News
Previous News
Research Bridge
Next News
Research Bridge
Previous News
Research Bridge
Next News
Previous News
Next News
Research Bridge
Previous News
Next News
Previous News
Previous News
Library Stories
Next News
Next News
Library Stories
Next News
Library Stories
Next News
Library Stories
Previous News
Next News
Previous News
Next News
Previous News
Next News
Library Stories
Previous News
Library Stories
Next News
Library Stories
Previous News
Research Bridge
Previous News
Next News
Previous News
Research Bridge
Previous News
Next News
Previous News
Research Bridge
Previous News
Previous News
Next News
Previous News
Next News
Previous News
Next News
Library Stories
Next News
Library Stories
Previous News
Library Stories
Previous News
Next News
Library Stories
Next News
Research Bridge
Previous News
Next News
Next News
Previous News
Next News
Previous News
Next News