Crunchbase data (private companies)

NB: The Crunchbase pipeline may not work until this issue has been resolved.

Batchables for the collection and processing of Crunchbase data. As documented under packages and routines, the pipeline is executed in the following order (documentation for the run.py files is given below, which isn’t super-informative. You’re better off looking under packages and routines).

The data is collected from proprietary data dumps, parsed into MySQL (tier 0) and then piped into Elasticsearch (tier 1), post-processing.

run.py (crunchbase_collect)

Collect Crunchbase data from the proprietary data dump and pipe into the MySQL database.

run()[source]

run.py (crunchbase_elasticsearch)

Pipe Crunchbase data from MySQL to Elasticsearch.

run()[source]