EURITO (piping data to Elasticsearch)¶
Pipeline specific to EURITO for piping existing data to Elasticsearch. A recent “EU” cut of patstat data is transferred from the “main” patstat database, to Nesta’s central database.
Preprocess PATSTAT data¶
Select the EU subset of patstat, by doc family id. This is will significantly speed up transfer to ES.
-
class
PreprocessPatstatTask
(*args, **kwargs)[source]¶ Bases:
luigi.task.Task
-
date
= <luigi.parameter.DateParameter object>¶
-
test
= <luigi.parameter.BoolParameter object>¶
-
-
class
PatstatPreprocessRootTask
(*args, **kwargs)[source]¶ Bases:
luigi.task.WrapperTask
-
date
= <luigi.parameter.DateParameter object>¶
-
production
= <luigi.parameter.BoolParameter object>¶
-
requires
()[source]¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-
Root Task (EURITO)¶
Pipe data from MySQL to Elasticsearch, for use with clio-lite
.
-
class
RootTask
(*args, **kwargs)[source]¶ Bases:
luigi.task.WrapperTask
-
process_batch_size
= <luigi.parameter.IntParameter object>¶
-
production
= <luigi.parameter.BoolParameter object>¶
-
date
= <luigi.parameter.DateParameter object>¶
-
drop_and_recreate
= <luigi.parameter.BoolParameter object>¶
-
requires
()[source]¶ The Tasks that this Task depends on.
A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.
See Task.requires
-