🚧 This is a working draft and will change often. Do not cite!
Use the latest published version instead.
🚧

27.3. Tutorials and examples#

This page includes information on tutorials and examples to help you work with text from Trove.

Tutorials#

Analysing keywords in Trove’s digitised newspapers

Example image

You want to explore differences in language use across a collection of digitised newspaper articles. The Australian Text Analytics Platform provides a Keywords Analysis tool that helps you examine whether particular words are over or under-represented across collections of text. But how do get data from Trove’s newspapers to the keyword analysis tool?

Examples from the GLAM Workbench#

Exploring text files harvested with the Trove Harvester

This notebook suggests some ways in which you can aggregate and analyse the individual OCRd text files for each article — look at word frequencies; calculate TF-IDF values.

Finding non-English newspapers in Trove

There are a growing number of non-English newspapers digitised in Trove. However, if you’re only searching using English keywords, you might never know that they’re there. This notebook analyses the language of a sample of articles from each newspaper to create a list of non-English newspapers.

Counting words and phrases in digitised books

This notebook provides a simple example of extracting word and ngram frequencies from the OCRd text of a digitised book using TextBlob and Wordcloud.

Recipe generator

In this notebook we use TextBlob to extract nouns, verbs, and sentences from the OCRd text of a 19th century cookery book. We try to clean things up a bit, using regular expressions to discard likely OCR errors. Then we recombine the various parts in random combinations to create delicious recipes for all occasions. Enjoy!

Other examples#