Gateway to Research (UK publicly funded research)¶
Generic pipeline (i.e. not project specific) to collect all GtR data, discovering all entities by crawling the official API. The routine then geocodes and loads data to MYSQL.
Root Task (generic)¶
Luigi routine to collect all GtR data, geocode and load to MYSQL.
-
class
RootTask
(*args, **kwargs)[source]¶ Bases:
luigi.task.WrapperTask
A dummy root task, which collects the database configurations and executes the central task.
Parameters: date (datetime) – Date used to label the outputs -
date
= <luigi.parameter.DateParameter object>¶
-
page_size
= <luigi.parameter.IntParameter object>¶
-
production
= <luigi.parameter.BoolParameter object>¶
-
Data collection¶
Discover all GtR data via the API.
-
class
GtrTask
(*args, **kwargs)[source]¶ Bases:
nesta.core.luigihacks.autobatch.AutoBatchTask
Get all GtR data
-
date
= <luigi.parameter.DateParameter object>¶
-
page_size
= <luigi.parameter.IntParameter object>¶
-
-
class
GtrOnlyRootTask
(*args, **kwargs)[source]¶ Bases:
luigi.task.WrapperTask
A dummy root task, which collects the database configurations and executes the central task.
Parameters: date (datetime) – Date used to label the outputs -
date
= <luigi.parameter.DateParameter object>¶
-
page_size
= <luigi.parameter.IntParameter object>¶
-
production
= <luigi.parameter.BoolParameter object>¶
-
Geocode¶
Apply geocoding to the collected GtR data. Add country name, iso codes and continent.
-
class
GtrGeocode
(*args, **kwargs)[source]¶ Bases:
luigi.task.Task
Perform geocoding on the collected GtR organisations data
Parameters: - _routine_id (str) – String used to label the AWS task
- db_config_path – (str) The output database configuration
-
date
= <luigi.parameter.DateParameter object>¶
-
test
= <luigi.parameter.BoolParameter object>¶
-
db_config_env
= <luigi.parameter.Parameter object>¶
-
page_size
= <luigi.parameter.IntParameter object>¶