Gateway to Research (UK publicly funded research)

Generic pipeline (i.e. not project specific) to collect all GtR data, discovering all entities by crawling the official API. The routine then geocodes and loads data to MYSQL.

Root Task (generic)

Luigi routine to collect all GtR data, geocode and load to MYSQL.

class RootTask(*args, **kwargs)[source]

Bases: luigi.task.WrapperTask

A dummy root task, which collects the database configurations and executes the central task.

Parameters:date (datetime) – Date used to label the outputs
date = <luigi.parameter.DateParameter object>
page_size = <luigi.parameter.IntParameter object>
production = <luigi.parameter.BoolParameter object>
requires()[source]

Collects the database configurations and executes the central task.

Data collection

Discover all GtR data via the API.

class GtrTask(*args, **kwargs)[source]

Bases: nesta.core.luigihacks.autobatch.AutoBatchTask

Get all GtR data

date = <luigi.parameter.DateParameter object>
page_size = <luigi.parameter.IntParameter object>
output()[source]

Points to the input database target

prepare()[source]

Prepare the batch job parameters

combine(job_params)[source]

Combine the outputs from the batch jobs

class GtrOnlyRootTask(*args, **kwargs)[source]

Bases: luigi.task.WrapperTask

A dummy root task, which collects the database configurations and executes the central task.

Parameters:date (datetime) – Date used to label the outputs
date = <luigi.parameter.DateParameter object>
page_size = <luigi.parameter.IntParameter object>
production = <luigi.parameter.BoolParameter object>
requires()[source]

Collects the database configurations and executes the central task.

Geocode

Apply geocoding to the collected GtR data. Add country name, iso codes and continent.

class GtrGeocode(*args, **kwargs)[source]

Bases: luigi.task.Task

Perform geocoding on the collected GtR organisations data

Parameters:
  • _routine_id (str) – String used to label the AWS task
  • db_config_path – (str) The output database configuration
date = <luigi.parameter.DateParameter object>
test = <luigi.parameter.BoolParameter object>
db_config_env = <luigi.parameter.Parameter object>
page_size = <luigi.parameter.IntParameter object>
requires()[source]

Collects the database configurations and executes the central task.

output()[source]

Points to the output database engine

run()[source]

Collect and process organizations, categories and long descriptions.