nesta Logo
latest

Contents:

  • Packages
  • Production
    • How to put code into production at nesta
    • Code and scripts
      • Routines
        • Examples
        • arXiv data (technical research)
        • CORDIS (EU funded research)
        • Crunchbase (private sector companies)
        • EURITO (piping data to Elasticsearch)
        • Gateway to Research (UK publicly funded research)
        • NiH (health research)
        • Meetup (social networking data)
      • Batchables
      • ORMs
      • Ontologies and schemas
      • Luigi Hacks
      • Scripts
      • Scripts
      • Elasticsearch
      • Containerised Luigi
  • AWS FAQ
  • Troubleshooting
nesta
  • Docs »
  • Production »
  • Routines
  • Edit on GitHub

Routines¶

All of our pipelines, implemented as Luigi routines. Some of these pipelines (at least partly) rely on batch computing (via AWS Batch), where the ‘batched’ scripts (run.py modules) are described in core.batchables. Other than luigihacks.autobatch, which is respectively documented, the routine procedure follows the core Luigi documentation.

  • Examples
    • S3 Example
    • Database example
    • Batch Example
  • arXiv data (technical research)
    • Root task (arXlive)
    • Collection task
    • Date task
    • arXiv enriched with MAG (API)
    • arXiv enriched with MAG (SPARQL)
    • arXiv enriched with GRID
    • [AutoML] Topic modelling (CorEx)
    • Pipe to elasticsearch
    • Elasticsearch tokenize
    • Estimate novelty (lolvelty)
    • “Deep learning, Deep Change” analysis
  • CORDIS (EU funded research)
    • H2020 and FP7 Data Collection
  • Crunchbase (private sector companies)
    • Root task (HealthMosaic)
    • Get organisations
    • Non-organisation collection
    • Geocoding
    • Organisation health labeling
    • Merge in parent organisations
    • Apply mesh terms
    • Pipe data to Elasticsearch
    • Novelty score (lolvelty)
  • EURITO (piping data to Elasticsearch)
    • Preprocess PATSTAT data
    • Root Task (EURITO)
  • Gateway to Research (UK publicly funded research)
    • Root Task (generic)
    • Data collection
    • Geocode
  • NiH (health research)
    • Root Task (HealthMosaic)
    • Data collection
    • Pipe to Elasticsearch
    • Assign MeSH terms to abstracts
    • Deduplication of near duplicates
  • Meetup (social networking data)
    • Collect data
    • Topic discovery
    • Pipe to elasticsearch
    • Novelty score (lolvelty)
Next Previous

© Copyright 2018, nesta Revision 8bb8d8b5.

Built with Sphinx using a theme provided by Read the Docs.