23.1. Overview of Parliamentary Papers#
23.1.1. What are Parliamentary Papers?#
Parliamentary Papers are documents presented to the Australian Parliament. Sometimes this is required by law. Other times itās just for information. The Parliament of Australia website notes:
Documents presented include the annual reports of all government agencies, reports of royal commissions and other government inquiries, parliamentary committee reports, and a wide variety of other material.
As well as Trove, Parliamentary Papers can be found through ParlInfo, Parliamentās own online database.
Hereās a few randomly selected examples:
thumbnail | title | contributor | date | fulltext_url |
---|---|---|---|---|
Annual report | Australian National Gallery. 4281 91a28c5a-87ef-58bb-b90e-108fcce1c04a | 1992 | https://nla.gov.au/nla.obj-2064284044 | |
Business of the Senate | Australia. Parliament. Senate. 265 eb3a91ad-da7d-5b95-a8fc-88f84f43067f | 2011 | https://nla.gov.au/nla.obj-793216811 | |
Payments to or for the states, the Northern Territory, and local government authorities | Australia. | 1982 | https://nla.gov.au/nla.obj-1613445385 |
23.1.2. How many Parliamentary Papers are digitised in Trove?#
Many Commonwealth Parliamentary Papers have been digitised and made available through Trove. But, because of the way theyāre arranged and described, itās difficult to know exactly how many there are. Iāve attempted to harvest details of all the Parliamentary Papers in Trove using a combination of techniques. Based on this dataset, it seems there are currently 24,991 digitised Parliamentary Papers in Trove. Here are some more statistics from this dataset:
Show code cell source
df = pd.read_csv(
"https://github.com/GLAM-Workbench/trove-parliamentary-papers-data/raw/main/trove-parliamentary-papers.csv",
keep_default_na=False,
)
stats = [
["Number of digitised Parliamentary Papers", df.shape[0]],
["Total number of pages", df["pages"].sum()],
["Median number of pages per publication", df["pages"].median()],
]
stats_df = pd.DataFrame(stats)
stats_df.style.format(thousands=",", precision=0).hide().hide(axis=1).set_properties(
**{"text-align": "left"}
)
Number of digitised Parliamentary Papers | 24,991 |
Total number of pages | 2,448,576 |
Median number of pages per publication | 60 |
Most of the Parliamentary Papers in Trove were published before 2013. If you search in ParlInfo for Parliamentary Papers published before 2013 the total number of results is 25,853 ā close, but not exactly the same. There could be publications missing from Trove, or duplicates in the ParlInfo results.
23.1.3. When were the Parliamentary Papers published?#
The date
metadata is not always accurate, but it seems good enough to explore the distribution of Troveās Parliamentary Papers over time.
Show code cell source
import altair as alt
df["year"] = df["date"].str.extract(r"\b(\d{4})$")
years = df["year"].value_counts().to_frame().reset_index()
chart_dates = (
alt.Chart(years)
.mark_bar(size=3)
.encode(
x="year:T", y="count:Q", tooltip=[alt.Tooltip("year:T", format="%Y"), "count:Q"]
)
.properties(width="container")
)
display(chart_dates)
From the chart above it looks like the earliest Parliamentary Paper pre-dates the Commonwealth Parliament. What is it?
Show code cell source
df["year"] = df["year"].astype("Int64")
earliest = df.loc[df["year"].idxmin()]
display(
HTML(
f"<a href='{earliest['fulltext_url']}'>{earliest['title']} / {earliest['alternative_title']}</a>"
)
)
23.1.4. Titles and topics of Parliamentary Papers#
What are all these Parliamentary Papers about? You can use the title
, subject
, and contributor
fields to explore their content.
Here, for example is a word cloud generated from the title
field. Thereās a lot of annual reports, and many of the titles include the abbreviation āPPā, so Iāve excluded the words āreportā, āannualā, āPPā, and āARā.
Show code cell source
from wordcloud import STOPWORDS, WordCloud
# Add to the list of standard stopwords
stopwords = ["report", "annual", "pp", "AR"] + list(STOPWORDS)
titles = " ".join(df["title"].to_list())
wc = WordCloud(stopwords=stopwords, width=800, height=300)
wc.generate(titles).to_image()
The subject
field contains a list of standard(ish) subject headings. Hereās the top twenty values:
Show code cell source
import re
def split_and_clean(value):
values = value.split("|")
return list(
set([re.sub(r"(\w)--(\w)", r"\1 -- \2", v).strip(".") for v in values if v])
)
df["subject"] = df["subject"].apply(split_and_clean)
subjects = df["subject"].explode().to_frame()
# Remove trailing full stops
subjects["subject"] = subjects["subject"].str.strip(".")
subjects["subject"].value_counts().to_frame().reset_index()[:20].style.format(
thousands=","
).hide()
subject | count |
---|---|
Australian | 7,321 |
Australia | 6,833 |
Tariff -- Australia | 1,575 |
Finance, Public -- Australia -- Accounting -- Periodicals | 1,568 |
Federal issue | 1,265 |
Administrative agencies -- Australia -- Auditing -- Periodicals | 1,165 |
Finance, Public -- Australia -- Auditing | 1,150 |
Finance, Public -- Auditing | 1,140 |
Executive departments -- Australia -- Auditing -- Periodicals | 1,135 |
Tariff Australia | 1,111 |
Legislative auditing -- Australia -- Periodicals | 1,106 |
Australia -- Appropriations and expenditures -- Periodicals | 1,035 |
Public works -- Australia -- Periodicals | 947 |
Public buildings -- Australia -- Periodicals | 862 |
Finance, Public -- Australia -- Periodicals | 765 |
Industries -- Australia -- Periodicals | 760 |
Australia -- Industries -- Periodicals | 686 |
Periodicals | 553 |
Tariff -- Australia -- Periodicals | 551 |
Key item | 501 |
The name of the agency that created a particular publication can also give an indication of its content. Here are the top twenty contributing organisations:
Show code cell source
def clean_contributor(value):
if cleaned := re.search(r"(.*?) [0-9]+ [0-9a-z\-]+$", str(value)):
return cleaned.group(1).strip(".")
else:
return str(value).strip(".")
contributors = df["contributor"].str.split("|").explode().to_frame()
contributors["cleaned name"] = contributors["contributor"]
contributors["cleaned name"] = contributors["contributor"].apply(clean_contributor)
contributors.dropna()["cleaned name"].value_counts().to_frame().reset_index()[
:20
].style.format(thousands=",").hide()
cleaned name | count |
---|---|
Australia. Tariff Board | 3,802 |
Australia. Parliament | 3,284 |
Australian National Audit Office | 3,111 |
Australia. Parliament. Standing Committee on Public Works | 2,059 |
1,560 | |
Australia. Industries Assistance Commission | 1,053 |
Australia. Parliament. Joint Committee of Public Accounts | 825 |
Australia. Parliament. issuing body | 787 |
Australia | 415 |
Australia. Parliament. Senate. Committee of Privileges | 399 |
Australia. Parliament. Joint Standing Committee on Treaties | 348 |
Australia. Parliament. House of Representatives, issuing body | 305 |
Australia. Parliament, | 294 |
Australia. Royal Commission into Aboriginal Deaths in Custody | 282 |
Australia. Inter-State Commission | 276 |
Australia. Special Advisory Authority | 240 |
Australia. Inter-state Commission | 239 |
Australia. Treasury | 236 |
Australia. Parliament. Senate. Standing Committee on Regulations and Ordinances | 219 |
Australia. Parliament. The Senate, issuing body | 212 |