CORDIS (EU funded research)¶
Generic pipeline (i.e. not project specific) to collect all CORDIS data, discovering all entities by crawling an unofficial API.
H2020 and FP7 Data Collection¶
Collection of H2020 and FP7 projects, organisations, publications and topics from the unofficial API.
-
class
CordisCollectTask
(*args, **kwargs)[source]¶ Bases:
nesta.core.luigihacks.autobatch.AutoBatchTask
-
process_batch_size
= <luigi.parameter.IntParameter object>¶
-
intermediate_bucket
= <luigi.parameter.Parameter object>¶
-
db_config_path
= <luigi.parameter.Parameter object>¶
-
db_config_env
= <luigi.parameter.Parameter object>¶
-
routine_id
= <luigi.parameter.Parameter object>¶
-
prepare
()[source]¶ You should implement a method which returns a
list
ofdict
, where eachdict
corresponds to inputs to the batchable. Each row of the output must at least contain the following keys:- done (bool): indicating whether the job has already been finished.
- outinfo (str): Text indicating e.g. the location of the output, for use in the batch job and for combine method
Returns: list
ofdict
-
combine
(job_params)[source]¶ You should implement a method which collects the outputs specified by the outinfo key of
job_params
, which is the output from theprepare
method. This method should finally write to theluigi.Target
output.Parameters: job_params ( list
ofdict
) – The batchable job parameters, as returned from theprepare
method.
-
-
class
RootTask
(*args, **kwargs)[source]¶ Bases:
luigi.task.WrapperTask
-
production
= <luigi.parameter.BoolParameter object>¶
-
date
= <luigi.parameter.DateParameter object>¶
-
requires
()[source]¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-