EURITO

Batchables for processing data (which has already been collected elsewhere within this codebase) for the EURITO project. All of these batchables pipe the data into an Elasticsearch database, which is then cloned by EURITO.

run.py (arxiv_eu)

Transfer pre-collected arXiv data from MySQL to Elasticsearch, whilst labelling arXiv articles as being EU or not. This differs slightly from the arXlive pipeline, by reflecting the EURITO project more specificially, and allowing more in depth analysis of MAG fields of study.

run()[source]

run.py (crunchbase_eu)

Transfer pre-collected Crunchbase data from MySQL to Elasticsearch.

run()[source]

run.py (patstat_eu)

Transfer pre-collected PATSTAT data from MySQL to Elasticsearch. Only EU patents since the year 2000 are considered. The patents are grouped by patent families.

select_text(objs, lang_field, text_field)[source]
metadata(orm, session, appln_ids, field_selector=None)[source]
run()[source]