Scrapers

A set of classes for scraping data about items, series, and agencies from the National Archives of Australia’s online database, RecordSearch.

The main entities described within RecordSearch are Items, Series, and Agencies. Put simply, items are contained within series, and series are created and controlled by agencies. But the Series System, on which RecordSearch is based, allows a much more complex range of relationships between entities to be documented.

Base


source

make_browser

 make_browser ()

source

make_session

 make_session ()

source

RSSearch

 RSSearch (results_per_page=20, sort=None, record_detail='brief',
           **kwargs)

Base class for an advanced search in RecordSearch. There are different search forms for the different RecordSearch entities, so don’t use this directly. Use one of the subclasses.


source

RSEntity

 RSEntity (identifier=None, cache=True, **kwargs)

Base class for individual RecordSearch entities – item, series, or agency.

Not for direct use – use the appropriate subclass instead.


source

RSBase

 RSBase ()

Base class with utility methods.

Items


source

RSItem

 RSItem (identifier=None, cache=True, details=None)

Class used for extracting data about an individual item (usually a file, but can be a volume, box, photograph etc) from RecordSearch.

You need to supply one of the following parameters:

  • identifier – the Item ID (aka barcode)
  • details – the BeautifulSoup HTML element containing the item details

You’d only use details if you already have a RecordSearch page and want to extract item data from it. (There’s an example of this in the RSItemSearch class.)

The item data is obtained by accessing the item’s .data attribute.

Items in RecordSearch are usually paper files, but can be other things like volumes, boxes, videos, or digital objects. Items have a unique identifier described as the ‘Item ID’, this was previously referred to as the item’s ‘barcode’. The RSItem class extracts information about an individual item from RecordSearch, using it’s Item ID.

Here are the fields returned:

  • title (string)
  • identifier (string)
  • series (string)
  • control_symbol (string)
  • digitised_status (boolean) – True if the item has been digitised
  • digitised_pages (integer) – number of pages in the digitised file
  • access_status (string) – one of ‘Open’, ‘OWE’, ‘Closed’, ‘NYE’ (see access status options)
  • access_decision_reasons (list) – a list of reasons why material has been been witheld from public access (if CLOSED, or OWE)
  • location (string)
  • contents_date_str (ISO formatted date)
  • contents_start_date (ISO formatted date)
  • contents_end_date (ISO formatted date)
  • access_decision_date_str (ISO formatted date)
  • access_decision_date (ISO formatted date)
  • retrieved (ISO formatted datetime) - when this record was scraped

Note that digitised_pages is not part of the metadata presented in RecordSearch’s item description. This value is obtained from the digitised file viewer using get_digitised_pages().

To retrieve information about an item, just give RSItem() the Item ID (also known as the barcode).

# Get an item
item = RSItem('3445411')

You can then access the item data using the .data attribute.

display(item.data)
{'title': 'WRAGGE Clement Lionel Egerton : SERN 647 : POB Cheadle England : POE Enoggera QLD : NOK  (Father) WRAGGE Clement Lindley',
 'identifier': '3445411',
 'series': 'B2455',
 'control_symbol': 'WRAGGE C L E',
 'note': '',
 'digitised_status': True,
 'digitised_pages': 47,
 'access_status': 'Open',
 'access_decision_reasons': [],
 'location': 'Canberra',
 'retrieved': '2023-01-20T10:47:44.524487+11:00',
 'contents_date_str': '1914 - 1920',
 'contents_start_date': '1914',
 'contents_end_date': '1920',
 'access_decision_date_str': '12 Apr 2001',
 'access_decision_date': '2001-04-12'}

Use item.data[FIELD NAME] to access individual fields. The series value of this item should be ‘B2455’.

assert item.data['series'] == 'B2455'

As an added bonus, the string representation of the item is also its brief citation.

str(item)
'NAA: B2455, WRAGGE C L E'

Not all items have notes, but if they do, they should be included.

# This item has a note
item = RSItem(324717)

assert 'note' in item.data

# Display the note
item.data['note']
'Summary heading Negative of photograph of Poon Gooey available Descriptive note Photograph of Poon Gooey used in National Archives of Australia exhibition – Alien Edwardians – Chinese Immigrants and Commonwealth Government. Duplicating negative of photograph available – please see National Archives staff [ B6416 b/c #10551458]'

The extracted data is saved into a simple key-value cache to speed up repeat requests. If you want to scrape a fresh version, use .refresh_cache().


source

RSItem.refresh_cache

 RSItem.refresh_cache ()

Delete data for this entity from the cache, then extract a fresh version from RecordSearch.

We can check that this has worked by comparing the value of retrieved, which is the date/time the data was scraped.

old_retrieved_date = item.data['retrieved']

item.refresh_cache()

new_retrieved_date = item.data['retrieved']

assert old_retrieved_date != new_retrieved_date

source

RSItemSearch

 RSItemSearch (results_per_page=20, sort=9, record_detail='brief',
               **kwargs)

Search for items in RecordSearch.

Supply any of the item search parameters as kwargs to initialise the search.

Optional parameters:

  • results_per_page (default: 20)
  • sort (default: 1 – order by id)
  • page – to retrieve a specific page of results
  • record_detail – amount of detail to include, options are:
    • ‘brief’ (default) – just the info in the search results
    • digitised – add the number of pages if the file is digitised (slower)
    • ‘full’ – get the full individual record for each result (slowest)

To access a page of results, use the .get_results() method. This method increments the results page, so you can call it in a loop to retrieve the complete result set.

Useful attributes:

  • .total_results – the total number of results in the results set
  • .total_pages – the total number of result pages
  • .kwargs – a dict containing the supplied search parameters
  • .params – a dict containing the values of the optional parameters

Item search parameters

These are the parameters you can supply as keyword arguments to RSItemSearch.

Parameter Input type Values
kw Text Keywords or phrase to search for
kw_options Select How to combine the keywords – see keyword options
kw_exclude Text Keywords or phrase to exclude
kw_exclude_options Select How to combine the keywords – see keyword options
search_notes Checkbox Set to True to search notes as well as titles
series Text Limit to items from this series – eg ‘A1’
series_exclude Text Exclude items from this series
control Text Limit to items with this control symbol (use * for wildcards) – eg ’1947/2*’
control_exclude Text Exclude items with this control symbol
item_id Text Get the item with this identifier (no wildcards allowed)
date_from Text Include items with content after this date (year only) – eg ‘1925’
date_to Text Include items with content before this date (year only) – eg ‘1945’
formats Select Limit to items in this format – see format options
formats_exclude Select Exclude items in this format – see format options
locations Select Limit to items held in this location – see list of locations
locations_exclude Select Exclude items held in this location – see list of locations
access Select Limit to items with this access status – see access status options
access_exclude Select Exclude items with this access status – see access status options
digital Checkbox Limit to digitised items – set to True

Keyword options

Use one of the following values to specify how keywords or phrases should be treated using the kw_options parameter. The default is ‘ALL’.

  • ‘ALL’ (default) – must include all keywords
  • ‘ANY’ – must include at least one of the keywords
  • ‘EXACT’ – treat the keywords as a phrase

Format options

Use one of the following values with the formats and formats_exclude parameters to limit your results to items in that format. The default is to include all formats.

  • ‘Paper files and documents’
  • ‘Index cards’
  • ‘Bound volumes’
  • ‘Cartographic records’
  • ‘Photographs’
  • ‘Microforms’
  • ‘Audio-visual records’
  • ‘Audio records’
  • ‘Electronic records’
  • ‘3-dimensional records’
  • ‘Scientific specimens’
  • ‘Textiles’

Location options

Use one of the following values with the locations and locations_exclude parameters to limit your results to items held in that location. The default is to include all locations.

  • ‘NAT,ACT’ – National office (ACT)
  • ‘AWM’ – Australian War Memorial
  • ‘NSW’
  • ‘NT’
  • ‘QLD’
  • ‘SA’
  • ‘TAS’
  • ‘VIC’
  • ‘WA’

Access status options

Use one of the following values with the access and access_exclude parameters to limit your results to items with this access examination status. The default is to include all.

  • ‘OPEN’ – available for public access
  • ‘OWE’ – open with exceptions (eg it might have pages withheld or redactions applied)
  • ‘CLOSED’ – withheld completely from public access
  • ‘NYE’ – not yet examined (no access decision has been made)

Examples

Here’s a basic keyword search for items.

item_results = RSItemSearch(kw='wragge')

Initialising the RSItemSearch class sets up the search and retrieves some information about the results set. For example, to see the total number of results, we just access the .total_results attribute.

item_results.total_results
212

source

RSItemSearch.get_results

 RSItemSearch.get_results (page=None)

Return a list of results from a search results page.

The page value is incremented with each request, so you can call this method in a loop to retrieve the complete results set. When you reach then of the results, this method will return an empty list.

Optional parameter:

  • page – request a specific page from the results set
item_results.get_results()
{'total_results': 212,
 'page': 1,
 'number_of_results': 20,
 'results': [{'series': 'A2479',
   'control_symbol': '17/1306',
   'title': 'The Wragge Estate. Property for sale.',
   'identifier': '149309',
   'access_status': 'Open',
   'location': 'Canberra',
   'contents_date_str': '1917 - 1917',
   'contents_start_date': '1917',
   'contents_end_date': '1917',
   'digitised_status': True,
   'retrieved': '2023-01-20T10:47:46.524370+11:00'},
  {'series': 'A2487',
   'control_symbol': '1919/8962',
   'title': '[Application for free passage - Rupert Lindley Wragge]',
   'identifier': '156686',
   'access_status': 'Open',
   'location': 'Canberra',
   'contents_date_str': '1918 - 1919',
   'contents_start_date': '1918',
   'contents_end_date': '1919',
   'digitised_status': True,
   'retrieved': '2023-01-20T10:47:46.524370+11:00'},
  {'series': 'B1535',
   'control_symbol': '736/23/341',
   'title': '[Applications for Commissions: K S Wragge, R A Fry, V J T Sharpe, J G Cameron]',
   'identifier': '377317',
   'access_status': 'Open',
   'location': 'Melbourne',
   'contents_date_str': '1939 - 1939',
   'contents_start_date': '1939',
   'contents_end_date': '1939',
   'digitised_status': False,
   'retrieved': '2023-01-20T10:47:46.524370+11:00'},
  {'series': 'AWM54',
   'control_symbol': '1010/4/150',
   'title': '[War Crimes and Trials - Affidavits and Sworn Statements:] Statements by No number RW Woodhouse; VX39749 Pte RC Woodman; 23663 Able Seaman CD Woodman; NX58655 WO/1 WJ Woodward; VX37644 Pte CA Woodward; QX15620 Pte HA Woodward; TX3959 Pte BJ Woodward; No number Capt EJ Wooldridge; No number AN Wooton; NX501953 Spr GA Worland; QX259 Capt NA Worthington; QX14395 Gnr RL Wragge; NX12233 Lt-Col JW Wright; NX70664 Capt RG Wright; VX35065 L/Bdr JH Wright; SX8474 Pte RR Wright; NX56234 Cpl AG Wright; NX27630 L/Sgt L Wrightson; NX69819 Dvr RP Wyatt; WX12593 Pte RW Wyllie',
   'identifier': '479150',
   'access_status': 'Open',
   'location': 'Australian War Memorial',
   'contents_date_str': '1945 - 1947',
   'contents_start_date': '1945',
   'contents_end_date': '1947',
   'digitised_status': False,
   'retrieved': '2023-01-20T10:47:46.524370+11:00'},
  {'series': 'A1716',
   'control_symbol': '240',
   'title': '"Wragge" [NOTE: Registration and exhibit]',
   'identifier': '659953',
   'access_status': 'Open',
   'location': 'Canberra',
   'contents_date_str': '1902 - 1902',
   'contents_start_date': '1902',
   'contents_end_date': '1902',
   'digitised_status': True,
   'retrieved': '2023-01-20T10:47:46.524370+11:00'},
  {'series': 'A1716',
   'control_symbol': '241',
   'title': '"Wragge" [NOTE: Registration and exhibit]',
   'identifier': '659955',
   'access_status': 'Open',
   'location': 'Canberra',
   'contents_date_str': '1902 - 1902',
   'contents_start_date': '1902',
   'contents_end_date': '1902',
   'digitised_status': True,
   'retrieved': '2023-01-20T10:47:46.524370+11:00'},
  {'series': 'A1716',
   'control_symbol': '243',
   'title': '"Wragge" [NOTE: Registration and exhibit]',
   'identifier': '659958',
   'access_status': 'Open',
   'location': 'Canberra',
   'contents_date_str': '1902 - 1902',
   'contents_start_date': '1902',
   'contents_end_date': '1902',
   'digitised_status': True,
   'retrieved': '2023-01-20T10:47:46.524370+11:00'},
  {'series': 'AWM93',
   'control_symbol': '22/2/83',
   'title': '[Australian War Memorial registry files, first series] Staff - Personal files: Mr W G Wragge',
   'identifier': '1015137',
   'access_status': 'Not yet examined',
   'location': 'Australian War Memorial',
   'contents_date_str': '1924 - 1925',
   'contents_start_date': '1924',
   'contents_end_date': '1925',
   'digitised_status': False,
   'retrieved': '2023-01-20T10:47:46.524370+11:00'},
  {'series': 'B503',
   'control_symbol': 'Q2018',
   'title': 'Wragge, Raymond Lindley QX14395 [Prisoners of War Trust Fund application]',
   'identifier': '1031372',
   'access_status': 'Not yet examined',
   'location': 'Melbourne',
   'contents_date_str': '1955 - 1977',
   'contents_start_date': '1955',
   'contents_end_date': '1977',
   'digitised_status': False,
   'retrieved': '2023-01-20T10:47:46.524370+11:00'},
  {'series': 'B503',
   'control_symbol': 'V3600',
   'title': 'Wragge, Leslie James VX25877 [Prisoners of War Trust Fund application]',
   'identifier': '1033806',
   'access_status': 'Not yet examined',
   'location': 'Melbourne',
   'contents_date_str': '1955 - 1977',
   'contents_start_date': '1955',
   'contents_end_date': '1977',
   'digitised_status': False,
   'retrieved': '2023-01-20T10:47:46.524370+11:00'},
  {'series': 'SP459/1',
   'control_symbol': '429/8/5434',
   'title': 'Injuries - VX112202 Captain WRAGGE, H S [Box 64]',
   'identifier': '1365125',
   'access_status': 'Not yet examined',
   'location': 'Sydney',
   'contents_date_str': '1947 - 1947',
   'contents_start_date': '1947',
   'contents_end_date': '1947',
   'digitised_status': False,
   'retrieved': '2023-01-20T10:47:46.524370+11:00'},
  {'series': 'A2478',
   'control_symbol': 'TRZECIAK H',
   'title': 'TRZECIAK Helmut born 17 February 1920; frida (nee Wragge) born 13 November 1921; Monika born 9 January 1944; Rosemarie born 30 June 1945; Hartmut born 19 January 1952 - German - travelled per ship SKAUBRYN departing in 1954',
   'identifier': '1438793',
   'access_status': 'Open',
   'location': 'Canberra',
   'contents_date_str': '1954 - 1954',
   'contents_start_date': '1954',
   'contents_end_date': '1954',
   'digitised_status': False,
   'retrieved': '2023-01-20T10:47:46.524370+11:00'},
  {'series': 'A9951',
   'control_symbol': '71',
   'title': 'Nominal Roll - Original 3501 - 3542. Wragge, Keith Clement - Zusman, Solomon',
   'identifier': '1751559',
   'access_status': 'Open',
   'location': 'Canberra',
   'contents_date_str': '1949 - 1949',
   'contents_start_date': '1949',
   'contents_end_date': '1949',
   'digitised_status': True,
   'retrieved': '2023-01-20T10:47:46.524370+11:00'},
  {'series': 'A9951',
   'control_symbol': '72',
   'title': 'Nominal Roll - Duplicate 3501 - 3542. Wragge, Keith Clement - Zusman, Solomon',
   'identifier': '1751560',
   'access_status': 'Open',
   'location': 'Canberra',
   'contents_date_str': '1949 - 1949',
   'contents_start_date': '1949',
   'contents_end_date': '1949',
   'digitised_status': True,
   'retrieved': '2023-01-20T10:47:46.524370+11:00'},
  {'series': 'J1193',
   'control_symbol': 'QX14395',
   'title': 'Members folders, Second World War Queensland army personnel - Wragge Raymond Lindley',
   'identifier': '1908313',
   'access_status': 'Open',
   'location': 'Brisbane',
   'contents_date_str': '1940 - 1949',
   'contents_start_date': '1940',
   'contents_end_date': '1949',
   'digitised_status': False,
   'retrieved': '2023-01-20T10:47:46.524370+11:00'},
  {'series': 'A10074',
   'control_symbol': '1926/13',
   'title': 'WRAGGE Thomas William Eric versus COLLINS James; COLLINS Edward',
   'identifier': '3141698',
   'access_status': 'Open',
   'location': 'Canberra',
   'contents_date_str': '1926 - 1926',
   'contents_start_date': '1926',
   'contents_end_date': '1926',
   'digitised_status': False,
   'retrieved': '2023-01-20T10:47:46.524370+11:00'},
  {'series': 'B2455',
   'control_symbol': 'WRAGGE A C P',
   'title': 'Wragge Alfred Charles Peter : SERN Depot : POB Rockhampton QLD : POE Brisbane QLD : NOK W Wragge F E',
   'identifier': '3445406',
   'access_status': 'Open',
   'location': 'Canberra',
   'contents_date_str': 'circa1914 - circa1920',
   'contents_start_date': None,
   'contents_end_date': None,
   'digitised_status': True,
   'retrieved': '2023-01-20T10:47:46.524370+11:00'},
  {'series': 'B2455',
   'control_symbol': 'WRAGGE C L E',
   'title': 'WRAGGE Clement Lionel Egerton : SERN 647 : POB Cheadle England : POE Enoggera QLD : NOK  (Father) WRAGGE Clement Lindley',
   'identifier': '3445411',
   'access_status': 'Open',
   'location': 'Canberra',
   'contents_date_str': 'circa1914 - circa1920',
   'contents_start_date': None,
   'contents_end_date': None,
   'digitised_status': True,
   'retrieved': '2023-01-20T10:47:46.524370+11:00'},
  {'series': 'B2455',
   'control_symbol': 'WRAGGE G S',
   'title': 'Wragge George Stanley : SERN 5580 : POB Croydon QLD : POE Cairns QLD : NOK Rivers Mrs Elizabeth',
   'identifier': '3445416',
   'access_status': 'Open',
   'location': 'Canberra',
   'contents_date_str': 'circa1914 - circa1920',
   'contents_start_date': None,
   'contents_end_date': None,
   'digitised_status': True,
   'retrieved': '2023-01-20T10:47:46.524370+11:00'},
  {'series': 'B2455',
   'control_symbol': 'WRAGGE J H',
   'title': 'Wragge John Henry : SERN 6922 : POB Richmond VIC : POE Melbourne VIC : NOK W Wragge Lillian Maud',
   'identifier': '3445422',
   'access_status': 'Open',
   'location': 'Canberra',
   'contents_date_str': 'circa1914 - circa1920',
   'contents_start_date': None,
   'contents_end_date': None,
   'digitised_status': True,
   'retrieved': '2023-01-20T10:47:46.524370+11:00'}],
 'retrieved': '2023-01-20T10:47:46.530565+11:00'}
item_results.params
{'results_per_page': 20, 'sort': 9, 'record_detail': 'brief'}

This example returns a result set with a single item. In this case full data will be retrieved even if record_detail is set to ‘brief’.

item_results = RSItemSearch(series='B2455', control='HART G H*', record_detail='brief')
assert item_results.total_results == 1
data = item_results.get_results()

# Page layout is different for a single result, so check it's being parsed
assert data['results'][0]['digitised_status'] == True

# The number of digitised pages is included even though record_detail is set to brief
assert data['results'][0]['digitised_pages'] > 0

Calling .refresh_cache will remove all of the data for this search from the cache, and set the results page back to 1.


source

RSItemSearch.refresh_cache

 RSItemSearch.refresh_cache ()

Delete data for this search from the cache, then retrieve a fresh version from RecordSearch.

Series


source

RSSeries

 RSSeries (identifier=None, cache=True, details=None,
           include_number_digitised=True, include_access_status=True)

Class used for extracting data about an individual series. You need to supply the following parameter:

  • identifier – the series number, eg ‘A1’, ‘B2455’

Optional parameters:

  • include_number_digitised (boolean, default: True) – include the number of items in this series that have been digitised.
  • include_access_status (boolean, default: True) – include the number of items in this series in each of the access status categories.

The series data is obtained by accessing the series’ .data attribute.

A series is a group of records that have something in common, for example, they might have been part of the same filing system. Series can be related to other series, and to agencies. A single series can also be held across multiple locations. All this means the data can be quite complex.

Not that as well as the standard RecordSearch metadata, the scraper can also extract some extra information about the series, such as the number of items digitised, and the access status of items in the series.

Here are the fields returned:

  • identifier (string)
  • title (string)
  • physical_format (string)
  • arrangement (string)
  • control_symbols (string)
  • locations (list) – a list of locations, each with the fields:
    • quantity (string)
    • location (string)
  • recording_agencies – a list of agencies, each with the fields:
    • identifier (string)
    • title (string)
    • date_str (string)
    • start_date (ISO formatted date)
    • end_date (ISO formatted date)
  • controlling_agencies – a list of agencies, each with the fields:
    • identifier (string)
    • title (string)
    • date_str (string)
    • start_date (ISO formatted date)
    • end_date (ISO formatted date)
  • previous_series – a list of series, each with the fields:
    • identifier (string)
    • title (string)
    • date_str (string)
    • start_date (ISO formatted date)
    • end_date (ISO formatted date)
  • subsequent_series – a list of series, each with the fields:
    • identifier (string)
    • title (string)
    • date_str (string)
    • start_date (ISO formatted date)
    • end_date (ISO formatted date)
  • controlling_series – a list of series, each with the fields:
    • identifier (string)
    • title (string)
    • date_str (string)
    • start_date (ISO formatted date)
    • end_date (ISO formatted date)
  • related_series (list) – a list of series, each with the fields:
    • identifier (string)
    • title (string)
    • date_str (string)
    • start_date (ISO formatted date)
    • end_date (ISO formatted date)
  • items_described (integer)
  • items_described_note (string)
  • contents_date_str (string)
  • contents_start_date (ISO formatted date)
  • contents_end_date (ISO formatted date)
  • accumulation_date_str (string)
  • accumulation_start_date (ISO formatted date)
  • accumulation_end_date (ISO formatted date)
  • items_digitised (integer) – the number of items in this series that have been digitised
  • access_status_totals (dict) – the number of items in each of the access status categories, OPEN, OWE, CLOSED, and NYE.

To retrieve information about a series, just give RSSeries() the series number.

series = RSSeries('A863')

You can then access the series data using the .data attribute.

display(series.data)
{'identifier': 'A863',
 'title': 'Correspondence files, single number series relating to Civil Defence matters',
 'physical_format': 'PAPER FILES AND DOCUMENTS',
 'arrangement': 'Single number',
 'control_symbols': '1 - 468 (with gaps)',
 'locations': [{'quantity': 0.72, 'location': 'ACT'}],
 'note': 'Function and purpose This series consists of correspondence files covering a wide number of topics generally relating to Civil Defence matters. Included are reports from a wide range of sources, discussions on shelters, blood transfusion and medical services to mention a few. System of arrangement and control This series is controlled by a single number system. There are many gaps in the numbering. Finding aids All items in this series have been entered onto the National Archives of Australia item level database. Series history National Archives staff reviewed this series in 2001. Some records transferred into the National Archives of Australia’s custody were assessed as no longer justifying retention in the collection. These records were disposed of in accordance with records disposal authorities approved by the National Archives of Australia and by the Commonwealth agency responsible for the functions to which the records related. Disposal History A863/3 was withdrawn by the agency on 21 September 2016.',
 'recording_agencies': [{'date_str': '01 Jan 1944 - 31 Dec 1948',
   'start_date': '1944-01-01',
   'end_date': '1948-12-31',
   'identifier': 'CA 31',
   'title': 'Department of the Interior [II], Central Office'},
  {'date_str': '01 Jan 1949 - 31 Dec 1961',
   'start_date': '1949-01-01',
   'end_date': '1961-12-31',
   'identifier': 'CA 541',
   'title': 'Directorate of Civil Defence'}],
 'controlling_agencies': [{'date_str': '1944 -',
   'start_date': '1944',
   'end_date': None,
   'identifier': 'CA 46',
   'title': 'Department of Defence [III], Central Office'}],
 'previous_series': [],
 'subsequent_series': [{'date_str': '31 Dec 1961',
   'start_date': '1961-12-31',
   'end_date': None,
   'identifier': 'A5518',
   'title': 'General correspondence files, annual single number series with "CD" or "NDO" prefix'}],
 'controlling_series': [],
 'related_series': [],
 'availability': None,
 'retrieved': '2023-01-20T10:47:48.175830+11:00',
 'items_described': 50,
 'items_described_note': "Click to see items listed on RecordSearch. Please contact the National Reference Service if you can't find the record you want as not all items from the series may be on RecordSearch.",
 'contents_date_str': '1944 - 1961',
 'contents_start_date': '1944',
 'contents_end_date': '1961',
 'accumulation_date_str': '1944 - 1961',
 'accumulation_start_date': '1944',
 'accumulation_end_date': '1961',
 'items_digitised': 0,
 'access_status_totals': {'OPEN': 50, 'OWE': 0, 'CLOSED': 0, 'NYE': 0}}

You can find out how many items within the series are closed to public access. In this case, it should be none.

assert series.data['access_status_totals']['CLOSED'] == 0

You can access both the number of items described and digitised within each series. We’d expect the number described to be greater than or equal to the number digitised.

assert series.data['items_described'] >= series.data['items_digitised']

By default, the scraper adds some extra information to the basic metadata – items_digitised and access_status_totals. To obtain these values, the scraper runs item searches – one to find digitised files, and another four to find all the access status values. This can slow things down considerably. If you want a quick response and don’t care about these values, you can set include_number_digitised and/or include_access_status to False.

In the case below, the series data should not include a value for items_digitised.

series = RSSeries('A3', include_number_digitised=False)

assert 'items_digitised' not in series.data

The extracted data is saved into a simple key-value cache to speed up repeat requests. If you want to scrape a fresh version, use .refresh_cache().


source

RSSeries.refresh_cache

 RSSeries.refresh_cache ()

Delete data for this item from the cache, then extract a fresh version from RecordSearch.

We can check that this has worked by comparing the value of retrieved, which is the date/time the data was scraped.

old_retrieved_date = series.data['retrieved']

series.refresh_cache()

new_retrieved_date = series.data['retrieved']

assert old_retrieved_date != new_retrieved_date

source

RSSeriesSearch

 RSSeriesSearch (results_per_page=20, sort=1, record_detail='brief',
                 **kwargs)

Search for series in RecordSearch.

Supply any of the series search parameters as kwargs to initialise the search.

Optional parameters:

  • results_per_page (default: 20)
  • sort (default: 1 – order by id)
  • page – to retrieve a specific page of results
  • record_detail – amount of detail to include, options are:
    • ‘brief’ (default) – just the info in the search results
    • ‘full’ – get the full individual record for each result (slow)

To access a page of results, use the .get_results() method. This method increments the results page, so you can call it in a loop to retrieve the complete result set.

Useful attributes:

  • .total_results – the total number of results in the results set
  • .total_pages – the total number of result pages
  • .kwargs – a dict containing the supplied search parameters
  • .params – a dict containing the values of the optional parameters

Series search parameters

These are the parameters you can supply as keyword arguments to RSSeriesSearch.

Parameter Input type Values
kw Text Keywords or phrase to search for
kw_options Select How to combine the keywords – see keyword options
kw_exclude Text Keywords or phrase to exclude
kw_exclude_options Select How to combine the keywords – see keyword options
search_notes Checkbox Set to True to search notes as well as titles
series_id Text Search for this series identifier
date_from Text Include series with content after this date (year only) – eg ‘1925’
date_to Text Include series with content before this date (year only) – eg ‘1945’
formats Select Limit to series with items in this format – see format options
formats_exclude Select Exclude series with items in this format – see format options
locations Select Limit to series held in this location – see list of locations
locations_exclude Select Exclude series held in this location – see list of locations
agency_recording Select Limit to series created by this agency or person
agency_controlling Select Limit to series controlled by this agency or person

Series keyword options

Use one of the following values to specify how keywords or phrases should be treated using the kw_options parameter. The default is ‘ALL’.

  • ‘ALL’ (default) – must include all keywords
  • ‘ANY’ – must include at least one of the keywords
  • ‘EXACT’ – treat the keywords as a phrase

Series location options

Use one of the following values with the locations and locations_exclude parameters to limit your results to items held in that location. The default is to include all locations.

  • ‘NAT,ACT’ – National office (ACT)
  • ‘AWM’ – Australian War Memorial
  • ‘NSW’
  • ‘NT’
  • ‘QLD’
  • ‘SA’
  • ‘TAS’
  • ‘VIC’
  • ‘WA’

Series format options

Use one of the following values with the formats and formats_exclude parameters to limit your results to series containing that format. The default is to include all formats.

  • ‘Paper files and documents’
  • ‘Index cards’
  • ‘Bound volumes’
  • ‘Cartographic records’
  • ‘Photographs’
  • ‘Microforms’
  • ‘Audio-visual records’
  • ‘Audio records’
  • ‘Electronic records’
  • ‘3-dimensional records’
  • ‘Scientific specimens’
  • ‘Textiles’

Examples

Initialise a search.

series_results = RSSeriesSearch(agency_recording='CA 1196')

You can access the .total_results attribute to find out how many results there are.

series_results.total_results
100

Naturally enough, the .total_results value should be an integer, as should .total_pages.

assert isinstance(series_results.total_results, int)
assert isinstance(series_results.total_pages, int)
series_results.params
{'results_per_page': 20, 'sort': 1, 'record_detail': 'brief'}

source

RSSeriesSearch.get_results

 RSSeriesSearch.get_results (page=None)

Return a list of results from a search results page.

The page value is incremented with each request, so you can call this method in a loop to retrieve the complete results set. When you reach then of the results, this method will return an empty list.

Optional parameter:

  • page – request a specific page from the results set
series_results.get_results(2)
{'total_results': 100,
 'page': 2,
 'number_of_results': 20,
 'results': [{'identifier': 'A1644',
   'title': "'[Master] subject index' [list of indexable headings] for correspondence files, annual single number series",
   'locations': [],
   'items_described': 0,
   'accumulation_date_str': 'circa1967 - circa1973',
   'accumulation_start_date': None,
   'accumulation_end_date': None,
   'contents_date_str': 'circa1967 - circa1973',
   'contents_start_date': None,
   'contents_end_date': None,
   'retrieved': '2023-01-20T10:48:04.337177+11:00'},
  {'identifier': 'A1645',
   'title': 'Subject index cards for correspondence files, annual single number series',
   'locations': [{'location': 'ACT', 'quantity': '1.35m'}],
   'items_described': 6,
   'accumulation_date_str': '1967 - 1973',
   'accumulation_start_date': '1967',
   'accumulation_end_date': '1973',
   'contents_date_str': '1967 - 1973',
   'contents_start_date': '1967',
   'contents_end_date': '1973',
   'retrieved': '2023-01-20T10:48:04.337177+11:00'},
  {'identifier': 'A1646',
   'title': 'Movement cards for annual single number series',
   'locations': [],
   'items_described': 0,
   'accumulation_date_str': 'circa1967 - circa1970',
   'accumulation_start_date': None,
   'accumulation_end_date': None,
   'contents_date_str': 'circa1967 - circa1970',
   'contents_start_date': None,
   'contents_end_date': None,
   'retrieved': '2023-01-20T10:48:04.337177+11:00'},
  {'identifier': 'A1647',
   'title': 'Correspondence files, DES series (Cabinet)',
   'locations': [],
   'items_described': 0,
   'accumulation_date_str': '1967 - ',
   'accumulation_start_date': '1967',
   'accumulation_end_date': None,
   'contents_date_str': '1967 - ',
   'contents_start_date': '1967',
   'contents_end_date': None,
   'retrieved': '2023-01-20T10:48:04.337177+11:00'},
  {'identifier': 'A1648',
   'title': 'Cabinet file register',
   'locations': [],
   'items_described': 0,
   'accumulation_date_str': '1967 - ',
   'accumulation_start_date': '1967',
   'accumulation_end_date': None,
   'contents_date_str': '1967 - ',
   'contents_start_date': '1967',
   'contents_end_date': None,
   'retrieved': '2023-01-20T10:48:04.337177+11:00'},
  {'identifier': 'A1649',
   'title': 'Subject index cards, Cabinet files',
   'locations': [],
   'items_described': 0,
   'accumulation_date_str': '1967 - ',
   'accumulation_start_date': '1967',
   'accumulation_end_date': None,
   'contents_date_str': '1967 - ',
   'contents_start_date': '1967',
   'contents_end_date': None,
   'retrieved': '2023-01-20T10:48:04.337177+11:00'},
  {'identifier': 'A1873',
   'title': "Staffing files [individual staff files], single number series with 'S' prefix",
   'locations': [],
   'items_described': 0,
   'accumulation_date_str': '1949 - 1972',
   'accumulation_start_date': '1949',
   'accumulation_end_date': '1972',
   'contents_date_str': '1930 - 1983',
   'contents_start_date': '1930',
   'contents_end_date': '1983',
   'retrieved': '2023-01-20T10:48:04.337177+11:00'},
  {'identifier': 'A1904',
   'title': "Name index cards, DES [Department of Education and Science] State Offices (formerly 'COE [Commonwealth Office of Education] All States')",
   'locations': [{'location': 'ACT', 'quantity': '0.18m'}],
   'items_described': 0,
   'accumulation_date_str': '1965 - 1973',
   'accumulation_start_date': '1965',
   'accumulation_end_date': '1973',
   'contents_date_str': '1961 - 1972',
   'contents_start_date': '1961',
   'contents_end_date': '1972',
   'retrieved': '2023-01-20T10:48:04.337177+11:00'},
  {'identifier': 'A1905',
   'title': 'Movement Cards for Interior Files',
   'locations': [],
   'items_described': 0,
   'accumulation_date_str': '1968 - 1968',
   'accumulation_start_date': '1968',
   'accumulation_end_date': '1968',
   'contents_date_str': '1963 - 1968',
   'contents_start_date': '1963',
   'contents_end_date': '1968',
   'retrieved': '2023-01-20T10:48:04.337177+11:00'},
  {'identifier': 'A1906',
   'title': 'Name index cards, Commonwealth Co-operation in Education, alphabetical series',
   'locations': [{'location': 'ACT', 'quantity': '0.18m'}],
   'items_described': 0,
   'accumulation_date_str': '1968 - 1973',
   'accumulation_start_date': '1968',
   'accumulation_end_date': '1973',
   'contents_date_str': '1966 - 1972',
   'contents_start_date': '1966',
   'contents_end_date': '1972',
   'retrieved': '2023-01-20T10:48:04.337177+11:00'},
  {'identifier': 'A1907',
   'title': 'Name index cards, Anzac Fellowship, alphabetical series',
   'locations': [{'location': 'ACT', 'quantity': '0.18m'}],
   'items_described': 0,
   'accumulation_date_str': '1967 - 1973',
   'accumulation_start_date': '1967',
   'accumulation_end_date': '1973',
   'contents_date_str': '1967 - 1972',
   'contents_start_date': '1967',
   'contents_end_date': '1972',
   'retrieved': '2023-01-20T10:48:04.337177+11:00'},
  {'identifier': 'A1908',
   'title': 'Name index cards, Commonwealth Scholarship and Fellowship Plan',
   'locations': [{'location': 'ACT', 'quantity': '0.18m'}],
   'items_described': 0,
   'accumulation_date_str': '1968 - 1973',
   'accumulation_start_date': '1968',
   'accumulation_end_date': '1973',
   'contents_date_str': '1966 - 1972',
   'contents_start_date': '1966',
   'contents_end_date': '1972',
   'retrieved': '2023-01-20T10:48:04.337177+11:00'},
  {'identifier': 'A1909',
   'title': 'Name index cards, Scholarships by Foreign Governments, alphabetical series',
   'locations': [{'location': 'ACT', 'quantity': '0.18m'}],
   'items_described': 0,
   'accumulation_date_str': '1967 - 1973',
   'accumulation_start_date': '1967',
   'accumulation_end_date': '1973',
   'contents_date_str': '1966 - 1972',
   'contents_start_date': '1966',
   'contents_end_date': '1972',
   'retrieved': '2023-01-20T10:48:04.337177+11:00'},
  {'identifier': 'A1910',
   'title': 'Name index cards, CBI Scholarships (Confederation of British Industries) alphabetical series',
   'locations': [{'location': 'ACT', 'quantity': '0.09m'}],
   'items_described': 0,
   'accumulation_date_str': '1968 - 1973',
   'accumulation_start_date': '1968',
   'accumulation_end_date': '1973',
   'contents_date_str': '1968 - 1972',
   'contents_start_date': '1968',
   'contents_end_date': '1972',
   'retrieved': '2023-01-20T10:48:04.337177+11:00'},
  {'identifier': 'A1911',
   'title': 'Name index cards, Australian American Educational Foundation, alphabetical series',
   'locations': [{'location': 'ACT', 'quantity': '0.18m'}],
   'items_described': 1,
   'accumulation_date_str': '1968 - 1973',
   'accumulation_start_date': '1968',
   'accumulation_end_date': '1973',
   'contents_date_str': '1965 - 1972',
   'contents_start_date': '1965',
   'contents_end_date': '1972',
   'retrieved': '2023-01-20T10:48:04.337177+11:00'},
  {'identifier': 'A1912',
   'title': "Correspondence files, single number series with '68' prefix (12, 000 block) (School Libraries)",
   'locations': [],
   'items_described': 0,
   'accumulation_date_str': '1968 - 1973',
   'accumulation_start_date': '1968',
   'accumulation_end_date': '1973',
   'contents_date_str': '1968 - 1973',
   'contents_start_date': '1968',
   'contents_end_date': '1973',
   'retrieved': '2023-01-20T10:48:04.337177+11:00'},
  {'identifier': 'A1913',
   'title': "Register book for correspondence files, single number series with '68' prefix (12,000 block) ('School Libraries')",
   'locations': [],
   'items_described': 0,
   'accumulation_date_str': '1968 - 1973',
   'accumulation_start_date': '1968',
   'accumulation_end_date': '1973',
   'contents_date_str': '1968 - 1973',
   'contents_start_date': '1968',
   'contents_end_date': '1973',
   'retrieved': '2023-01-20T10:48:04.337177+11:00'},
  {'identifier': 'A1914',
   'title': 'Movement cards for correspondence files (School Libraries)',
   'locations': [],
   'items_described': 0,
   'accumulation_date_str': '1968 - 1973',
   'accumulation_start_date': '1968',
   'accumulation_end_date': '1973',
   'contents_date_str': '1968 - 1973',
   'contents_start_date': '1968',
   'contents_end_date': '1973',
   'retrieved': '2023-01-20T10:48:04.337177+11:00'},
  {'identifier': 'A1915',
   'title': 'Name index cards, School Libraries',
   'locations': [{'location': 'ACT', 'quantity': '0.18m'}],
   'items_described': 0,
   'accumulation_date_str': '1968 - 1973',
   'accumulation_start_date': '1968',
   'accumulation_end_date': '1973',
   'contents_date_str': '1968 - 1972',
   'contents_start_date': '1968',
   'contents_end_date': '1972',
   'retrieved': '2023-01-20T10:48:04.337177+11:00'},
  {'identifier': 'A2102',
   'title': 'Paid Claims (Treasury Form 12)',
   'locations': [],
   'items_described': 0,
   'accumulation_date_str': '1967 - ',
   'accumulation_start_date': '1967',
   'accumulation_end_date': None,
   'contents_date_str': None,
   'contents_start_date': None,
   'contents_end_date': None,
   'retrieved': '2023-01-20T10:48:04.337177+11:00'}],
 'retrieved': '2023-01-20T10:48:04.347385+11:00'}

Calling .refresh_cache will remove all of the data for this search from the cache, and set the results page back to 1.


source

RSSeriesSearch.refresh_cache

 RSSeriesSearch.refresh_cache ()

Delete data for this search from the cache, then retrieve a fresh version from RecordSearch.

Agencies


source

RSAgency

 RSAgency (identifier=None, cache=True, details=None,
           include_series_count=True)

Class used for extracting data about an individual agency. You need to supply the following parameter:

  • identifier – the series number, eg ‘A1’, ‘B2455’

The series data is obtained by accessing the series’ .data attribute.

Here are the the fields returned:

  • identifier (string)
  • title (string)
  • location (string)
  • functions – a list of functions performed by this agency, each with the fields:
    • identifier (string)
    • title (string)
    • date_str (string)
    • start_date (ISO formatted date)
    • end_date (ISO formatted date)
  • controlling_organisation – a list of organisations, each with the fields:
    • identifier (string)
    • title (string)
    • date_str (string)
    • start_date (ISO formatted date)
    • end_date (ISO formatted date)
  • previous_agencies – a list of agencies, each with the fields:
    • identifier (string)
    • title (string)
    • date_str (string)
    • start_date (ISO formatted date)
    • end_date (ISO formatted date)
  • subsequent_agencies – a list of agencies, each with the fields:
    • identifier (string)
    • title (string)
    • date_str (string)
    • start_date (ISO formatted date)
    • end_date (ISO formatted date)
  • superior_agencies – a list of agencies, each with the fields:
    • identifier (string)
    • title (string)
    • date_str (string)
    • start_date (ISO formatted date)
    • end_date (ISO formatted date)
  • controlled_agencies – a list of agencies, each with the fields:
    • identifier (string)
    • title (string)
    • date_str (string)
    • start_date (ISO formatted date)
    • end_date (ISO formatted date)
  • associated_people (list) – a list of people, each with the fields:
    • identifier (string)
    • title (string)
    • date_str (string)
    • start_date (ISO formatted date)
    • end_date (ISO formatted date)
  • date_str (string)
  • start_date (ISO formatted date)
  • end_date (ISO formatted date)
  • number_of_series (integer) – number of series created by this agency
  • retrieved (ISO formatted date)

Examples

To retrieve information about an agency, just give RSAgency the agency identifier.

agency = RSAgency('CA 343')

You can then access the agency data using the .data attribute.

agency.data
{'identifier': 'CA 343',
 'title': 'Industrial Atomic Energy Policy Committee',
 'institution_title': None,
 'agency_status': 'Head Office',
 'location': 'New South Wales',
 'functions': [{'date_str': '01 Jan 1949 - 30 Apr 1952',
   'start_date': '1949-01-01',
   'end_date': '1952-04-30',
   'identifier': 'ENERGY',
   'title': 'ENERGY'}],
 'note': 'On 19 August, 1949 the Commonwealth government decided to establish an Industrial Atomic Energy Policy Committee. The Committee was set up before the end of the year. Its function was to study possible industrial applications of atomic energy and to recommend a possible programme of development. The Committee consisted of Professor Marcus Oliphant (Chairman), Dr F.W.G. White, Chief Executive Officer of the Commonwealth Scientific and Industrial Research Organization (Deputy Chairman); Professor J.P. Baxter, Professor of Chemical Engineering, New South Wales University of Technology, Mr H.P. Breen, Secretary, Department of Supply and Development, Mr H.J. Goodes, representing the Treasury, Professor L.H. Martin, Professor of Physics, University of Melbourne and Commonwealth Defence Scientific Adviser and Dr H.G. Raggett, Director of the Bureau of Mineral Resources, Geology and Geophysics. (Australian Atomic Energy Commission, First Annual Report, 1953, p.7) The Committee held a succession of meetings collecting and studying information on technical aspects of atomic energy and arrangements overseas for atomic energy development and in the light of these studies, prepared basic recommendations for an Australian programme. The Committee also prepared basic proposals for the training of staff for a research and development programme. By 1952 the Committee came to the conclusion that the problems referred to it required consideration under wider terms of reference. Accordingly it made a recommendation that it be disbanded and replaced by a new committee with revised functions. This recommendation was accepted and in April 1952 the Atomic Energy Policy Committee (CA 332) was established to replace the Industrial Atomic Energy Policy Committee. Historical agency address Subject to research Sydney, NSW',
 'controlling_organisation': [{'date_str': '01 Jan 1949 - 30 Apr 1952',
   'start_date': '1949-01-01',
   'end_date': '1952-04-30',
   'identifier': 'CO 1',
   'title': 'COMMONWEALTH OF AUSTRALIA'}],
 'previous_agencies': [],
 'subsequent_agencies': [{'date_str': '',
   'start_date': None,
   'end_date': None,
   'identifier': 'CA 332',
   'title': 'Atomic Energy Policy Committee'}],
 'superior_agencies': [{'date_str': '01 Jan 1949 - 16 Mar 1950',
   'start_date': '1949-01-01',
   'end_date': '1950-03-16',
   'identifier': 'CA 54',
   'title': 'Department of Supply and Development [II]'},
  {'date_str': '16 Mar 1950 - 30 Apr 1952',
   'start_date': '1950-03-16',
   'end_date': '1952-04-30',
   'identifier': 'CA 57',
   'title': 'Department of Supply, Central Office'}],
 'controlled_agencies': [],
 'associated_people': [],
 'retrieved': '2023-01-20T10:48:05.099605+11:00',
 'date_str': '01 Jan 1949 -  30 Apr 1952',
 'start_date': '1949-01-01',
 'end_date': '1952-04-30',
 'number_of_series': 0}

Use agency.data[FIELD NAME] to access individual fields. The agency_status value of this agency should be ‘Head Office’.

assert agency.data['agency_status'] == 'Head Office'

The extracted data is saved into a simple key-value cache to speed up repeat requests. If you want to scrape a fresh version, use .refresh_cache().


source

RSAgency.refresh_cache

 RSAgency.refresh_cache ()

Delete data for this entity from the cache, then extract a fresh version from RecordSearch.


source

RSAgencySearch

 RSAgencySearch (results_per_page=20, sort=1, record_detail='brief',
                 **kwargs)

Search for agencies in RecordSearch.

Supply any of the agency search parameters as kwargs to initialise the search.

Optional parameters:

  • results_per_page (default: 20)
  • sort (default: 1 – order by id)
  • page – to retrieve a specific page of results
  • record_detail – amount of detail to include, options are:
    • ‘brief’ (default) – just the info in the search results
    • ‘full’ – get the full individual record for each result (slow)

To access a page of results, use the .get_results() method. This method increments the results page, so you can call it in a loop to retrieve the complete result set.

Useful attributes:

  • .total_results – the total number of results in the results set
  • .total_pages – the total number of result pages
  • .kwargs – a dict containing the supplied search parameters
  • .params – a dict containing the values of the optional parameters

Agency search parameters

These are the parameters you can supply as keyword arguments to RSAgencySearch.

Parameter Input type Values
kw Text Keywords or phrase to search for
kw_options Select How to combine the keywords – see keyword options
kw_exclude Text Keywords or phrase to exclude
kw_exclude_options Select How to combine the keywords – see keyword options
function Text Limit to agencies that performed this function – see note
date_from Text Include agencies that existed after this date (year only) – eg ‘1925’
date_to Text Include agencies that existed before this date (year only) – eg ‘1945’
locations Select Limit to agencies in this location – see list of locations
locations_exclude Select Exclude agencies in this location – see list of locations
agency_status Select Limit to agencies with this status – see list of possible values
agency_status_exclude Select Exclude agencies with this status – see list of possible values

Agency keyword options

Use one of the following values to specify how keywords or phrases should be treated using the kw_options parameter. The default is ‘ALL’.

  • ‘ALL’ (default) – must include all keywords
  • ‘ANY’ – must include at least one of the keywords
  • ‘EXACT’ – treat the keywords as a phrase

Agency function note

In theory, functions are a controlled, hierarchical list, but previous examinations have shown that the use of functions in RecordSearch can be inconsistent. Here’s a list of functions extracted from the RecordSearch interface that you can use as values with the function parameter.

Agency location options

Use one of the following values with the locations and locations_exclude parameters to limit your results to agencies in that location. The default is to include all locations.

  • ‘NAT,ACT’
  • ‘COCOS OR CHRISTMAS ISLAND’
  • ‘NSW’
  • ‘NT’
  • ‘OVERSEAS’
  • ‘PNG’ – Papua New Guinea
  • ‘QLD’
  • ‘SA’
  • ‘TAS’
  • ‘VIC’
  • ‘WA’

Agency status options

Use one of the following values with the agency_status and agency_status_exclude parameters to limit your results to agencies with that status. The default is to include all status values.

  • ‘DOS’ – Department of State
  • ‘HO’ – Head Office
  • ‘RO’ – Regional or State Office
  • ‘INTGOV’ – Intergovernmental agency
  • ‘COURT’ – Judicial Court or Tribunal
  • ‘LO’ – Local Office
  • ‘NONEX’ – Non-Executive government agency (Courts, Parliament)

Examples

Search for all agencies that have performed the ‘SCIENCE’ function.

agency_search = RSAgencySearch(function='science')

Initialising the RSAgencySearch class sets up the search and retrieves some information about the results set. For example, to see the total number of results, we just access the .total_results attribute.

agency_search.total_results
60

source

RSAgencySearch.get_results

 RSAgencySearch.get_results (page=None)

Return a list of results from a search results page.

The page value is incremented with each request, so you can call this method in a loop to retrieve the complete results set. When you reach then of the results, this method will return an empty list.

Optional parameter:

  • page – request a specific page from the results set
agency_search.get_results()
{'total_results': 60,
 'page': 1,
 'number_of_results': 20,
 'results': [{'identifier': 'CA 49',
   'title': 'Department of Post-War Reconstruction, Central Office',
   'date_str': '1942 - 1950',
   'start_date': '1942',
   'end_date': '1950',
   'retrieved': '2023-01-20T10:48:06.515238+11:00'},
  {'identifier': 'CA 333',
   'title': 'Defence Scientific Advisory Committee',
   'date_str': '1947 - 1948',
   'start_date': '1947',
   'end_date': '1948',
   'retrieved': '2023-01-20T10:48:06.515238+11:00'},
  {'identifier': 'CA 429',
   'title': 'Scientific Advisory Committee, Foodstuffs',
   'date_str': '1943 - 1947',
   'start_date': '1943',
   'end_date': '1947',
   'retrieved': '2023-01-20T10:48:06.515238+11:00'},
  {'identifier': 'CA 1936',
   'title': 'Senate Standing Committee on Education, Science and the Arts',
   'date_str': '1971 - 1976',
   'start_date': '1971',
   'end_date': '1976',
   'retrieved': '2023-01-20T10:48:06.515238+11:00'},
  {'identifier': 'CA 2423',
   'title': 'Australian Science, Technology and Engineering Council (ASTEC)',
   'date_str': '1977 - 1989',
   'start_date': '1977',
   'end_date': '1989',
   'retrieved': '2023-01-20T10:48:06.515238+11:00'},
  {'identifier': 'CA 3277',
   'title': 'Senate Standing Committee on Science and the Environment/ (from 1983) Senate Standing Committee on Science, Technology and the Environment',
   'date_str': '1976 - 1987',
   'start_date': '1976',
   'end_date': '1987',
   'retrieved': '2023-01-20T10:48:06.515238+11:00'},
  {'identifier': 'CA 4136',
   'title': 'Department of Science [III], Central Office',
   'date_str': '1984 - 1987',
   'start_date': '1984',
   'end_date': '1987',
   'retrieved': '2023-01-20T10:48:06.515238+11:00'},
  {'identifier': 'CA 6703',
   'title': 'Senate Standing Committee on Industry, Science, Technology, Transport, Communications and Infrastructure',
   'date_str': '1993 - 1994',
   'start_date': '1993',
   'end_date': '1994',
   'retrieved': '2023-01-20T10:48:06.515238+11:00'},
  {'identifier': 'CA 6944',
   'title': 'House of Representatives Standing Committee on Industry, Science and Resources',
   'date_str': '1987 - ',
   'start_date': '1987',
   'end_date': None,
   'retrieved': '2023-01-20T10:48:06.515238+11:00'},
  {'identifier': 'CA 7780',
   'title': 'Questacon – The National Science and Technology Centre',
   'date_str': '1985 - ',
   'start_date': '1985',
   'end_date': None,
   'retrieved': '2023-01-20T10:48:06.515238+11:00'},
  {'identifier': 'CA 7902',
   'title': 'Department of Industry, Science and Technology, Central Office',
   'date_str': '1994 - 1996',
   'start_date': '1994',
   'end_date': '1996',
   'retrieved': '2023-01-20T10:48:06.515238+11:00'},
  {'identifier': 'CA 7923',
   'title': 'Department of Industry, Science and Technology, State Office, New South Wales',
   'date_str': '1994 - 1996',
   'start_date': '1994',
   'end_date': '1996',
   'retrieved': '2023-01-20T10:48:06.515238+11:00'},
  {'identifier': 'CA 7924',
   'title': 'Department of Industry, Science and Technology, State Office, Western Australia',
   'date_str': '1994 - 1996',
   'start_date': '1994',
   'end_date': '1996',
   'retrieved': '2023-01-20T10:48:06.515238+11:00'},
  {'identifier': 'CA 7925',
   'title': 'Department of Industry, Science and Technology, State Office, Queensland',
   'date_str': '1994 - 1996',
   'start_date': '1994',
   'end_date': '1996',
   'retrieved': '2023-01-20T10:48:06.515238+11:00'},
  {'identifier': 'CA 7926',
   'title': 'Department of Industry, Science and Technology, State Office, South Australia',
   'date_str': '1994 - 1996',
   'start_date': '1994',
   'end_date': '1996',
   'retrieved': '2023-01-20T10:48:06.515238+11:00'},
  {'identifier': 'CA 7927',
   'title': 'Department of Industry, Science and Technology, State Office, Tasmania',
   'date_str': '1994 - 1996',
   'start_date': '1994',
   'end_date': '1996',
   'retrieved': '2023-01-20T10:48:06.515238+11:00'},
  {'identifier': 'CA 7930',
   'title': 'Department of Industry, Science and Technology, State Office, Victoria',
   'date_str': '1994 - 1996',
   'start_date': '1994',
   'end_date': '1996',
   'retrieved': '2023-01-20T10:48:06.515238+11:00'},
  {'identifier': 'CA 8247',
   'title': 'Department of Industry, Science and Tourism, Central Office',
   'date_str': '1996 - 1998',
   'start_date': '1996',
   'end_date': '1998',
   'retrieved': '2023-01-20T10:48:06.515238+11:00'},
  {'identifier': 'CA 8267',
   'title': 'Department of Industry, Science and Tourism, State Office, Queensland',
   'date_str': '1996 - 1998',
   'start_date': '1996',
   'end_date': '1998',
   'retrieved': '2023-01-20T10:48:06.515238+11:00'},
  {'identifier': 'CA 8268',
   'title': 'Department of Industry, Science and Tourism, State Office, New South Wales',
   'date_str': '1996 - 1998',
   'start_date': '1996',
   'end_date': '1998',
   'retrieved': '2023-01-20T10:48:06.515238+11:00'}],
 'retrieved': '2023-01-20T10:48:06.520278+11:00'}
# This returns a single agency that created a single series
# It should not cause any exceptions
agency_search = RSAgencySearch(function='consular services')
agency_search.get_results()
{'total_results': 1,
 'page': 1,
 'number_of_results': 1,
 'results': [{'identifier': 'CA 9343',
   'title': 'Australian Embassy, East Timor [Dili]',
   'institution_title': None,
   'agency_status': 'Head Office',
   'location': 'Overseas',
   'functions': [{'date_str': '25 Oct 1999 -',
     'start_date': '1999-10-25',
     'end_date': None,
     'identifier': 'CONSULAR SERVICES',
     'title': 'CONSULAR SERVICES'},
    {'date_str': '25 Oct 1999 -',
     'start_date': '1999-10-25',
     'end_date': None,
     'identifier': 'GOVERNMENT REPRESENTATION OVERSEAS',
     'title': 'GOVERNMENT REPRESENTATION OVERSEAS'},
    {'date_str': '25 Oct 1999 -',
     'start_date': '1999-10-25',
     'end_date': None,
     'identifier': 'INTERNATIONAL RELATIONS',
     'title': 'INTERNATIONAL RELATIONS'}],
   'note': 'Summary heading Agency registration notes Abolition Creation Functions and activities An Embassy, headed by an Ambassador, is the highest form of diplomatic mission. An Australian Ambassador carries Letters of Credence from the Governor-General addressed to the Head of State of the country to which Australia is represented. Accreditation of an Ambassador comprises the formal presentation and acceptance of these credentials. An Australian Embassy’s broad function is to represent to the country to which it is accredited the interests of the Commonwealth of Australia. This involves a variety of tasks including - negotiating with governments and international organisations on issues of concern to Australia; - observing and reporting on events and developments in the host country and other countries or organisations for which it has representational, visiting, reporting or consular responsibilities; - carrying out various duties on behalf of other Australian government agencies; - promoting and facilitating trade relations with Australia; - disseminating information about Australia and its policies; and - assisting Australian citizens residing in or visiting the country to which the embassy is accredited. The Australian Embassy to East Timor was officially opened as an Embassy on 20 May 2002, following the granting of independence to East Timor from Indonesia on 20 May 2002. Prior to this, Australia was officially represented in East Timor by an Australian Mission from 25 October 1999, headed by James F Batley, (PSM 2000) until the opening of the embassy. The date of Presentation of the Credentials was 9 July 2002. The Australian Government had previously been represented in Timor by an Australian Consulate, Dili , which commenced on 1 January 1946 with the appointment of Mr Charles Eaton, OBE as Consul. The last Consul was Mr M F Berman, who closed the post on 31 August 1971 (see CA 2766 – Australian Consulate, Dili [Portuguese Timor]. Ambassadors 21 May 2002 James F Batley, PSM, Amb . 08 July 2002 G Paul Foley, Amb . 13 August 2004 Margaret E Twomey , Amb . 09 January 2008 Peter M Heyward, Amb Legislation administered Administrative structure Historical agency address 1999 - present Address: Avenida dos Martires da Patria, Dili PO Box 332 Ph : (670) 3322111 Fax: (670) 3322247 State/regional structure An Australian Embassy’s broad function is to represent to the country to which it is accredited the interests of the Commonwealth of Australia, as directed by the Department of Foreign Affairs and Trade Central Office. Records created by the agency The Australian Embassy in East Timor is scheduled to transition from primarily using a paper-based recordkeeping system, managed in TRIM Captura , to using an electronic document and records management system (TRIM Context) in 2011. Additional information End notes Sources Department of Foreign Affairs and Trade (2009). Statement of Service: appointments and biographies – 15 th edition. Commonwealth Government of Australia.',
   'controlling_organisation': [{'date_str': '25 Oct 1999 -',
     'start_date': '1999-10-25',
     'end_date': None,
     'identifier': 'CO 1',
     'title': 'COMMONWEALTH OF AUSTRALIA'}],
   'previous_agencies': [{'date_str': '',
     'start_date': None,
     'end_date': None,
     'identifier': 'CA 3217',
     'title': 'Official Representative, Portuguese Timor [Dili]'},
    {'date_str': '',
     'start_date': None,
     'end_date': None,
     'identifier': 'CA 2766',
     'title': 'Australian Consulate, Dili [Portuguese Timor]'}],
   'subsequent_agencies': [],
   'superior_agencies': [{'date_str': '25 Oct 1999 -',
     'start_date': '1999-10-25',
     'end_date': None,
     'identifier': 'CA 5987',
     'title': 'Department of Foreign Affairs and Trade, Central Office'}],
   'controlled_agencies': [],
   'associated_people': [],
   'retrieved': '2023-01-20T10:48:07.708766+11:00',
   'date_str': '25 Oct 1999 -',
   'start_date': '1999-10-25',
   'end_date': None,
   'number_of_series': 1}],
 'retrieved': '2023-01-20T10:48:08.597801+11:00'}

Calling .refresh_cache will remove all of the data for this search from the cache, and set the results page back to 1.


source

RSSeriesSearch.refresh_cache

 RSSeriesSearch.refresh_cache ()

Delete data for this search from the cache, then retrieve a fresh version from RecordSearch.