24.1. Overview of oral histories#
On this page
The National Library of Australia holds over 55,000 hours of oral history and folklore recordings dating back to the 1950s. This collection is being made available online, and many recordings can now be listened to using Trove’s audio player.
24.1.1. Finding oral histories#
Items from the NLA’s oral history collection can be found in Trove’s Music, Audio, & Video category. If you’re only interested in what’s available online, the standard approach to finding digitised resources seems to work effectively – search in Music, Audio, & Video category for "nla.obj"
, with the availability
facet set to y
, and the format
facet set to Sound/Interview, lecture, talk
.
If you’re also interested in oral histories that aren’t yet online, you can use the nuc
index instead of "nla.obj"
to find all oral histories in the NLA collection – search in the Music, Audio, & Video category for nuc:ANL OR nuc:"ANL:DL"
with the format
facet set to Sound/Interview, lecture, talk
.
This search will probably return some items that aren’t from the NLA’s own oral history collection. But the alternatives I’ve tried miss some oral histories, so I think it’s better to be inclusive and weed the results as necessary.
24.1.2. Licensing of oral histories#
If you click through to a digitised copy of an oral history in Trove, you’ll be presented with a licence agreement that you’ll need to accept before using the recording. The agreement notes:
You are seeking access to an oral history recording. Oral history is by its nature spoken memory. It is a personal opinion and is not intended to present the final verified or complete narrative of events.
The following end user licence agreement is intended to preserve both the rights of the interviewee as well as protecting the reputation of individuals and the Library. It describes the obligations of anyone who accesses the material in the collection. It is a requirement of use that you comply with these conditions.
You can download and view the full licence agreement as a PDF.
This section of the Trove Data Guide documents methods for accessing data relating to the oral histories which, by their nature, bypass the licence agreement screen. If you’re intending to work with the oral histories, I’d strongly suggest you take time to read and consider the licence agreement before proceeding.
24.1.3. How many oral histories are there?#
I’ve attempted to harvest details of all the NLA’s oral histories described in Trove, both online and not online. The results are available as a CSV file in the GLAM Workbench. Using this data you can explore the shape of the collection.
Show code cell source
import re
import altair as alt
import pandas as pd
df = pd.read_csv(
"https://raw.githubusercontent.com/GLAM-Workbench/trove-oral-histories-data/main/trove-oral-histories.csv",
keep_default_na=False,
)
df["online_status"] = df.apply(
lambda x: "online" if x["fulltext_url"] != "" else "not online", axis=1
)
df["online_status"].value_counts().to_frame().reset_index().style.format(
thousands=","
).hide().hide(axis=1)
not online | 13,881 |
online | 6,202 |
24.1.4. How are the interviews distributed over time?#
The date
field tells you when each interview was recorded.
Show code cell source
df["year"] = df["date"].str.extract(r"\b((?:19|20)\d{2})\b")
years = df.value_counts(["year", "online_status"]).to_frame().reset_index()
chart_online_years = (
alt.Chart(years)
.mark_bar(size=4)
.encode(
x="year:T",
y=alt.Y("count:Q"),
color="online_status",
tooltip=["online_status", alt.Tooltip("year:T", format="%Y"), "count"],
)
.properties(width=600)
)
chart_online_years
24.1.5. How many hours of recordings are available online?#
Information about the duration
of each audio file can be extracted from the audio player. Adding the values together gives us a total for all the oral histories online.
Show code cell source
print(f"Hours: {df['duration'].sum() / 60 / 60:,}")
Hours: 15,107.1775
24.1.6. How many oral histories have transcripts or summaries you can download?#
Many of the oral histories online have summaries or transcripts that you can download.
Show code cell source
print(f"Summaries: {df['summary'].sum():,}")
print(f"Transcripts: {df['transcript'].sum():,}")
Summaries: 3,784
Transcripts: 1,781
24.1.7. What are the oral histories about?#
The subject
field contains standard(ish) subject headings that provide an insight into the topics of oral history interviews.
Here’s the top ten subjects of oral histories that are not online. The formatting of the --
separators has been normalised, and final fullstops removed.
Show code cell source
def split_and_clean(value):
values = value.split(" | ")
return list(
set([re.sub(r"(\w)--(\w)", r"\1 -- \2", v).strip(".") for v in values if v])
)
df["subject"] = df["subject"].apply(split_and_clean)
subjects = df[["online_status", "subject"]].explode("subject")
subjects.loc[(subjects["online_status"] == "not online")][
"subject"
].value_counts().to_frame().reset_index()[:10].style.hide()
subject | count |
---|---|
Folk musicians | 376 |
Academics | 316 |
Authors, Australian -- Interviews | 269 |
Politicians -- Australia -- Interviews | 264 |
Painters -- Australia -- Interviews | 212 |
Australian poetry | 203 |
Folk music -- Australia | 201 |
Aboriginal Australians -- Interviews | 196 |
Authors | 174 |
Musicians | 170 |
Here’s the top ten subjects of oral histories that are are online. The formatting of the --
separators has been normalised, and final fullstops removed.
Show code cell source
subjects.loc[subjects["online_status"] == "online"][
"subject"
].value_counts().to_frame().reset_index()[:10].style.hide()
subject | count |
---|---|
Painters -- Australia -- Interviews | 193 |
Politicians -- Australia | 192 |
Prime ministers -- Australia -- Quotations | 188 |
Older people -- New South Wales -- Biography | 187 |
Menzies, Robert, Sir, 1894-1978. Speeches | 185 |
Federal politicians | 184 |
Politicians -- Australia -- Quotations | 183 |
Australia -- Politics and government -- 1945-1965 | 172 |
Politicians -- Australia -- Interviews | 171 |
Academics | 126 |
24.1.8. What collections do the oral histories belong to?#
You can use the isPartOf
field in the record metadata to examine thematic collections within the larger oral history collection. Here’s the top twenty series
values from the isPartOf
field. The values have been normalised by removing the final full stops.
Show code cell source
df["is_part_of"] = df["is_part_of"].apply(split_and_clean)
series = df["is_part_of"].explode().value_counts().to_frame().reset_index()
# Show series only (not publication)
series.dropna().loc[series["is_part_of"].str.startswith("series")][:20].style.hide()
is_part_of | count |
---|---|
series: National Press Club luncheon address | 909 |
series: Rob and Olya Willis folklore collection | 763 |
series: Cultural context of unemployment oral history project | 496 |
series: Hazel de Berg collection | 405 |
series: Menzies MS 4936 collection | 347 |
series: Bringing them home oral history project | 336 |
series: Australian generations oral history project | 296 |
series: Chris Sullivan folklore collection | 262 |
series: Seven years on : continuing life histories of Aboriginal leaders oral history project | 214 |
series: Forgotten Australians and Former Child Migrants oral history project | 208 |
series: John Meredith folklore collection | 207 |
series: NSW Bicentennial oral history collection | 203 |
series: Australia 1938 oral history project | 187 |
series: Alex Hood folklore collection | 186 |
series: Australian Antarctic Division oral history collection | 168 |
series: Drovers oral history project | 147 |
series: Voices of the bush oral history project | 147 |
series: John Gorton collection | 137 |
series: Ten Pound Poms collection | 135 |
series: Australian response to AIDS oral history project | 108 |
A complete list of series values is available in this text file. You can use these values with the series
index to find all the oral histories within a collection. For example, searching for series:"Hazel de Berg collection"
will find all the interviews in the Hazel de Berg collection.
24.1.9. Which countries do the oral histories relate to?#
The spatial
metadata field is used to relate oral history interviews to specific geographic locations. The values used in this field are mostly codes from the MARC list of geographic areas. For example, Australia is represented by the code u-at
(although Trove often includes trailing dashes to make the code a fixed width, so in this case the value would be u-at---
). The MARC codes are not very useful on their own, as there are no links to other sources of geospatial information. I’ve created a dataset that maps the MARC codes to Wikidata entries. It includes geospatial coordinates, ISO country codes, and GeoNames identifiers. Using this dataset, you can link the Trove spatial
values to ISO country codes, and create a choropleth map that shows the number of oral history records associated with each country.
Show code cell source
import altair as alt
import geopandas as gpd
# Split apart the spatial values
places = df["spatial"].str.split(" | ", regex=False).explode().to_frame()
# Remove spatial values that aren't MARC codes, or are deprecated codes (start with dash)
places = places.loc[
(~places["spatial"].str.contains(r"^[A-Z]{1}")) & (places["spatial"] != "")
]
# Remove trailing dashes
places["spatial"] = places["spatial"].str.rstrip("-").str.strip()
# Load mappings from MARC to ISO
place_codes = pd.read_csv(
"https://gist.github.com/wragge/7389bf347fb1b7e82011e5ddcb4b44dc/raw/a2cf41ce976714a54f96364c631d78469a6896aa/marc_geographicareas.csv"
)
# Merge spatial data with ISO mappings linked on MARC code
labelled_places = pd.merge(
places, place_codes, how="left", left_on="spatial", right_on="code"
)
# Each MARC code can be associated with multiple country codes, split them into separate rows
labelled_places["iso_country_code"] = labelled_places["iso_country_code"].str.split(
"|", regex=False
)
labelled_places = labelled_places.explode("iso_country_code")
# Get the counts of each country code
country_counts = (
labelled_places.value_counts(["iso_country_code"]).to_frame().reset_index()
)
# Load the Natural Earth countries dataset using Geopandas
url = "https://naciscdn.org/naturalearth/110m/cultural/ne_110m_admin_0_countries.zip"
gdf_ne = gpd.read_file(url)
gdf_ne = gdf_ne[["NAME", "ISO_A2", "geometry"]]
# Countries without records won't appear on the map, so to avoid holes we create a background map with all countries
# and a foreground map with the data, then combine the two.
# The foreground map contains the data
foreground = (
alt.Chart(gdf_ne)
.mark_geoshape(stroke="black")
.encode(
color=alt.Color(
"count:Q",
# Log scale because Australia dominates
scale=alt.Scale(type="log", scheme="tealblues"),
title="Number of records",
),
tooltip=[alt.Tooltip("NAME", title="country"), alt.Tooltip("count:Q", title="number of records", format=",")],
)
# The lookup links the map data with the Trove spatial data via the ISO code
.transform_lookup(
lookup="ISO_A2",
from_=alt.LookupData(country_counts, "iso_country_code", ["count"]),
)
.project(type="naturalEarth1")
.properties(
width=600, height=400, title="Countries associated with NLA oral histories"
)
)
# The background map displays the borders of all countries
background = (
alt.Chart(gdf_ne)
.mark_geoshape(stroke="black", fill="white")
.properties(width=600, height=400)
.project("naturalEarth1")
)
oh_countries_chart = (background + foreground).properties(padding=20)
display(oh_countries_chart)