Scrapers

A set of classes for scraping data about items, series, and agencies from the National Archives of Australia’s online database, RecordSearch.

The main entities described within RecordSearch are Items, Series, and Agencies. Put simply, items are contained within series, and series are created and controlled by agencies. But the Series System, on which RecordSearch is based, allows a much more complex range of relationships between entities to be documented.

Parameter	Input type	Values
kw	Text	Keywords or phrase to search for
kw_options	Select	How to combine the keywords – see keyword options
kw_exclude	Text	Keywords or phrase to exclude
kw_exclude_options	Select	How to combine the keywords – see keyword options
search_notes	Checkbox	Set to `True` to search notes as well as titles
series	Text	Limit to items from this series – eg ‘A1’
series_exclude	Text	Exclude items from this series
control	Text	Limit to items with this control symbol (use * for wildcards) – eg ’1947/2*’
control_exclude	Text	Exclude items with this control symbol
item_id	Text	Get the item with this identifier (no wildcards allowed)
date_from	Text	Include items with content after this date (year only) – eg ‘1925’
date_to	Text	Include items with content before this date (year only) – eg ‘1945’
formats	Select	Limit to items in this format – see format options
formats_exclude	Select	Exclude items in this format – see format options
locations	Select	Limit to items held in this location – see list of locations
locations_exclude	Select	Exclude items held in this location – see list of locations
access	Select	Limit to items with this access status – see access status options
access_exclude	Select	Exclude items with this access status – see access status options
digital	Checkbox	Limit to digitised items – set to `True`

Parameter	Input type	Values
kw	Text	Keywords or phrase to search for
kw_options	Select	How to combine the keywords – see keyword options
kw_exclude	Text	Keywords or phrase to exclude
kw_exclude_options	Select	How to combine the keywords – see keyword options
search_notes	Checkbox	Set to `True` to search notes as well as titles
series_id	Text	Search for this series identifier
date_from	Text	Include series with content after this date (year only) – eg ‘1925’
date_to	Text	Include series with content before this date (year only) – eg ‘1945’
formats	Select	Limit to series with items in this format – see format options
formats_exclude	Select	Exclude series with items in this format – see format options
locations	Select	Limit to series held in this location – see list of locations
locations_exclude	Select	Exclude series held in this location – see list of locations
agency_recording	Select	Limit to series created by this agency or person
agency_controlling	Select	Limit to series controlled by this agency or person

Parameter	Input type	Values
kw	Text	Keywords or phrase to search for
kw_options	Select	How to combine the keywords – see keyword options
kw_exclude	Text	Keywords or phrase to exclude
kw_exclude_options	Select	How to combine the keywords – see keyword options
function	Text	Limit to agencies that performed this function – see note
date_from	Text	Include agencies that existed after this date (year only) – eg ‘1925’
date_to	Text	Include agencies that existed before this date (year only) – eg ‘1945’
locations	Select	Limit to agencies in this location – see list of locations
locations_exclude	Select	Exclude agencies in this location – see list of locations
agency_status	Select	Limit to agencies with this status – see list of possible values
agency_status_exclude	Select	Exclude agencies with this status – see list of possible values

Base

make_browser

make_session

RSSearch

RSEntity

RSBase

Items

RSItem

RSItem.refresh_cache

RSItemSearch

Item search parameters

Keyword options

Format options

Location options

Access status options

Examples

RSItemSearch.get_results

RSItemSearch.refresh_cache

Series

RSSeries

RSSeries.refresh_cache

RSSeriesSearch

Series search parameters

Series keyword options

Series location options

Series format options

Examples

RSSeriesSearch.get_results

RSSeriesSearch.refresh_cache

Agencies

RSAgency

Examples

RSAgency.refresh_cache

RSAgencySearch

Agency search parameters

Agency keyword options

Agency function note

Agency location options

Agency status options

Examples

RSAgencySearch.get_results

RSSeriesSearch.refresh_cache