Skip to content
Start main Content

Gale Digital Scholar Lab and Constellate for Text Analysis

Interested in doing text analysis but daunted about where to start? Fear not! Two new library resources, Gale Digital Scholar Lab and Constellate, offer tools, data resources, and user-friendly interfaces to help you begin.

Gale Digital Scholar Lab

Gale Digital Scholar LabGale Digital Scholar Lab Logo (Gale DSL) is a platform for text analysis, data mining, and data visualization. With access to the Library’s licensed content in Gale Primary Sources, you can create and analyze content sets using the digital humanities tools provided.

To get started, log in with your Microsoft account (your HKUST SSO should work). After agreeing to the terms of use and privacy policy (they promise never to sell your data), you can create a username, select an avatar, and begin exploring.

Build, Clean, and Analyze

Gale DSL provides students and scholars with text and data mining resources, visualization tools, and methodology suggestions. The platform offers self-learning materials to guide you through various stages of a digital scholarship pathway.

Build Module In Gale DSLThe “Build” stage involves creating and managing your content sets from our licensed content in the Gale Primary Resources collections. For example, you can analyze papers from government archives about China and Western countries from the 1800s to the 1990s, or explore the full text of The Economist (1843-2020), among other options. Alternatively, you can upload and use your own text files.

 

Clean Module In Gale DSLIn the “Clean” stage, you can refine your content sets by applying stop words and text correction techniques. The data cleaning configurations can be reused across content sets. 

 

Analyze Module In Gale DSLThe “Analyze” stage assists you in selecting the right tools and methods, such as document clustering, named entity recognition, or n-grams. Gale DSL also provides sample projects for self-guided learning.

 

N-gram models reveal where sequences of words are relevant (N stands for the size of word sequences considered in the model). For example, in content sets constructed around 18th century food types, “coffee” and “tea”, the presence of the word “young” and types of people (men, women, lady, etc.) occur with greater frequency in the N-gram results of the Coffee content set (Figure.a) than the Tea content set (Figure.b). While Coffee’s N-grams note both good and evil, liquor, and common sense, Tea’s N-grams focus on the otherness of the product itself.

Coffee content set in the 18th century ngram

Figure a. Ngrams – Coffee content set in the 18th century

Ngrams - tea content set in the 18th century

Figure b. Ngrams – Tea content set in the 18th century

 

Constellate

ConstellateConstellate Logo is a platform that facilitates text analysis through the world’s leading archival repositories of scholarly and primary source content, including JSTOR, Portico, and other Ithaka collections.

To use Constellate, similar to Gale DSL, you’ll need to register and create a JSTOR user account. Once you’ve agreed to the terms of use and privacy policy (which also assure data protection), you can begin utilizing the platform. For seamless access to all the available data, it’s recommended to create your account on-campus at HKUST, so that your IP is recognized. If this isn’t possible, follow the instructions provided here: https://constellate.org/docs/log-in#pair

Dataset Builder and the Constellate Lab

Similar to Gale DSL, Constellate offers an integrated text analysis platform that grants access to scholarly content. Furthermore, it provides rich open educational resources into a cloud-based lab where you can use Constellate Notebooks and other Jupyter Notebooks, execute functional code, and even share your own notebooks with other Constellate users. Additionally, there is a beta version of R environment available.

Infographics Constellate Dashboard

The Tutorials section has lots of text and videos from beginner (basics of text analysis and basic coding with R and Python) to advanced methods like tokenization, topic modelling or sentiment analysis.

By leveraging the capabilities of Gale Digital Scholar Lab and Constellate, you can embark on an enriching text analysis journey, gaining valuable insights from a wide range of scholarly resources.

– By Victoria Caplan, Library

Hits: 182

Go Back to page Top

Tags: , , , , , ,

published October 12, 2023