🚧 This is a working draft and will change often. Do not cite!
Use the latest published version instead.
🚧

14. Trove API introduction#

Use the Trove Application Programming Interface (API) to get direct access to Trove data. Just make a request and get back data in a predictable, structured format that computers can understand.

14.1. Why use the API?#

The API is not the only way of getting data from Trove, but it’s the most flexible, reliable and scalable. You can give it a search query and work your way through the complete set of results, downloading every record. By comparison, the Trove web interface displays a maximum of 2,000 results, and even the Bulk Export feature is limited to one million.

The data you get back from the API is in a structured form that can be read and manipulated by computers. This means you can take advantage of a wide range of existing tools and libraries to build reusable pipelines for data analysis and visualisation.

You can use the Trove API to:

  • create datasets containing metadata and text

  • build new applications to visualise, analyse, or annotate Trove data

  • integrate Trove data into existing tools or interfaces

The Trove API allows you to:

  • download a complete set of search results (no matter the size)

  • control the amount and type of metadata retrieved, including user annotations

  • download the full text content of many digitised resources

Limitations:

  • no direct, or consistent, method for requesting images

14.2. API requests, endpoints, and responses#

API documentation typically includes references to things like requests, endpoints, and responses. Put simply, requests are the questions you ask; endpoints are the addresses you send your questions to; and responses are the answers you get back.

API requests are just normal urls. They encode your questions using query parameters. For example, the Trove API uses the q parameter for search terms, so including q=wragge in your API request will ask Trove to search for resources that include the word ‘wragge’.

There are three main types of data you can request from the Trove API:

  • search results – a list of results matching a supplied search query

  • individual records – a single record retrieved using a unique identifier

  • lists of records – a list of system-controlled values, such as newspaper titles or Trove contributors

Each type of data has its own address, or endpoint, that you need to include in the request url so that your parameters get sent to the right place. The addresses all share the same base url (the v3 at the end is the current version of the Trove API):

https://api.trove.nla.gov.au/v3

You can ask for a search using the /result endpoint, so you just add /result to the base url:

https://api.trove.nla.gov.au/v3/result

You separate the endpoint from your query using a ?, so to search for articles including the word ‘wragge’ in the newspaper category you’d use a url like this (click the link to view the results):

https://api.trove.nla.gov.au/v3/result?q=wragge&category=newspaper

Note

Most of the time you’ll be using a software library like Python Requests to construct your API requests, so you don’t need to worry about manually formatting the url.

API responses are highly structured pieces of text with clearly identified labels and values. Don’t be intimidated – they’re intended to be read by computers rather than humans! The Trove API provides a choice of two response formats: XML and JSON. Both formats include the same data, but while XML uses lots of angle brackets to identify data fields, JSON uses a combination of curly and square brackets, colons and commas. Here’s a newspaper article in XML:

<article id="61389505" url="https://api.trove.nla.gov.au/v3/newspaper/61389505">
    <heading>MR. WRAGGE'S "WRAGGE."</heading>
    <category>Article</category>
    <title id="64">
        <title>
        Clarence and Richmond Examiner (Grafton, NSW : 1889 - 1915)
        </title>
    </title>
    <date>1902-07-15</date>
    <page>4</page>
    <pageSequence>4</pageSequence>
    <relevance score="280.02325439453125">
        <value>very relevant</value>
    </relevance>
    <snippet>
        Mr. Wragge is going to issue a "Wragge." This is the title of his paper to be, as Mr. Wragge, having weathered Sproule, Drake and other extraordinarily named storms on
    </snippet>
    <troveUrl>
        https://.nla.gov.au/nla.news-article61389505?searchTerm=wragge
    </troveUrl>
</article>

And in JSON:

{
    "id" : "61389505",
    "url" : "https://api.trove.nla.gov.au/v3/newspaper/61389505",
    "heading" : "MR. WRAGGE'S \"WRAGGE.\"",
    "category" : "Article",
    "title" : {
      "id" : "64",
      "title" : "Clarence and Richmond Examiner (Grafton, NSW : 1889 - 1915)"
    },
    "date" : "1902-07-15",
    "page" : "4",
    "pageSequence" : "4",
    "relevance" : {
      "score" : 280.02325439453125,
      "value" : "very relevant"
    },
    "snippet" : "Mr. Wragge is going to issue a \"Wragge.\" This is the title of his paper to be, as Mr. Wragge, having weathered Sproule, Drake and other extraordinarily named storms on",
    "troveUrl" : "https://.nla.gov.au/nla.news-article61389505?searchTerm=wragge"
}

Nowadays, JSON is more widely used for moving data around, so the examples in this guide all use JSON. You can tell the Trove API which format you’d like by adding the encoding parameter to your request:

https://api.trove.nla.gov.au/v3/result?q=wragge&category=newspaper&encoding=json

Note

The default encoding format is XML, so if you don’t specify json you’ll get XML!

The actual data fields contained within an API response vary according to your query and the endpoint used. Specific types of responses are discussed further below.

14.3. Authorising your requests with an API key#

The Trove API lets you make a limited number of requests without any authorisation. This is handy for quick testing or experimentation, but for most uses you’ll need to authorise your requests with an API key. Trove API keys are free and, for non-commercial uses, can be obtained instantly.

There are two ways of adding your key to an API request:

  • using the key parameter

  • adding it to your request’s header values

Using the key parameter is easy, but can be insecure. If your key was mySeCReTkEy, your request url would look like this:

https://api.trove.nla.gov.au/v3/result?q=wragge&category=newspaper&encoding=json&key=mySeCReTkEy

A more secure and future-proof method is adding the key to the X-API-KEY field in your request’s headers. If you’re using a library like Python Requests to access the API, it’s easy to set header values. See below for a full example.

14.4. A simple API request#

Here’s an example of making a simple API request using the Python Requests library. You see many examples like this throughout this guide:

  • import the Requests library

  • define your query parameters

  • add your API key to the request headers

  • make the request

# Import the Requests library
import requests

# Define the query parameters
params = {"q": "wragge", "category": "newspaper", "encoding": "json"}

# Add your key to the request headers like this
# headers = {"X-API-KEY": "mySeCReTkEy"}
# Here I'm using a real key that I've already imported as `API_KEY`
headers = {"X-API-KEY": API_KEY}

# Make the request using the endpoint, parameters, and headers,
# saving the response as `response`
response = requests.get(
    "https://api.trove.nla.gov.au/v3/result", params=params, headers=headers
)

The response object contains the JSON-formatted search results. The example below loads the JSON data into a variable called data. It then retrieves the total number of results returned by the query, by drilling down through the JSON hierarchy to get to the data["category"][0]["records"]["total"] value. Finally it displays the first article record.

# Load the JSON data
data = response.json()

# Get the total number of results
total = data["category"][0]["records"]["total"]
print(f"There are {total:,} results!\n")

# Display the first article
data["category"][0]["records"]["article"][0]
There are 143,975 results!
{'id': '61389505',
 'url': 'https://api.trove.nla.gov.au/v3/newspaper/61389505',
 'heading': 'MR. WRAGGE\'S "WRAGGE."',
 'category': 'Article',
 'title': {'id': '64',
  'title': 'Clarence and Richmond Examiner (Grafton, NSW : 1889 - 1915)'},
 'date': '1902-07-15',
 'page': '4',
 'pageSequence': '4',
 'relevance': {'score': 280.9134521484375, 'value': 'very relevant'},
 'snippet': 'Mr. Wragge is going to issue a "Wragge." This is the title of his paper to be, as Mr. Wragge, having weathered Sproule, Drake and other extraordinarily named storms on',
 'troveUrl': 'https://nla.gov.au/nla.news-article61389505?searchTerm=wragge'}

14.5. Endpoints#

Search results#

  • /result – search across all categories (except archived websites)

Individual records#

To request an individual record you need to know its numeric identifier. Then you add the identifier to the endpoint url. So to request the record of the newspaper

  • /newspaper/[id]

  • /work/[id]

  • /people/[id]

  • /list/[id]

  • /contributor/[id]

  • /newspaper/title/[id]

  • /gazette/title/[id]

  • /magazine/title/[id]

Lists of records#

  • /newspaper/titles

  • /gazette/titles

  • /magazine/titles

  • /contributor