Meetup (social networking data)

NB: The Meetup pipeline will not work until this issue has been resolved.

Data collection and processing pipeline of Meetup data, principally for the healthMosaic platform.

Collect data

Starting with a seed country (and Meetup category), extract all groups in that country and subsequently find all groups associated with all members of the original set of groups.

chunks(l, n)[source]

Yield successive n-sized chunks from l.

class CountryGroupsTask(*args, **kwargs)[source]

Bases: nesta.core.luigihacks.autobatch.AutoBatchTask

Extract all groups with corresponding category for this country.

Args:

iso2 = <luigi.parameter.Parameter object>
category = <luigi.parameter.Parameter object>
output()[source]

Points to the input database target

prepare()[source]

Prepare the batch job parameters

combine(job_params)[source]

Combine the outputs from the batch jobs

class GroupsMembersTask(*args, **kwargs)[source]

Bases: nesta.core.luigihacks.autobatch.AutoBatchTask

Parameters:
  • date (datetime) – Date used to label the outputs
  • batchable (str) – Path to the directory containing the run.py batchable
  • job_def (str) – Name of the AWS job definition
  • job_name (str) – Name given to this AWS batch job
  • job_queue (str) – AWS batch queue
  • region_name (str) – AWS region from which to batch
  • poll_time (int) – Time between querying the AWS batch job status
iso2 = <luigi.parameter.Parameter object>
category = <luigi.parameter.Parameter object>
requires()[source]

Gets the input data

output()[source]

Points to the DB target

prepare()[source]

Prepare the batch job parameters

combine(job_params)[source]

Combine the outputs from the batch jobs

class MembersGroupsTask(*args, **kwargs)[source]

Bases: nesta.core.luigihacks.autobatch.AutoBatchTask

Parameters:
  • date (datetime) – Date used to label the outputs
  • batchable (str) – Path to the directory containing the run.py batchable
  • job_def (str) – Name of the AWS job definition
  • job_name (str) – Name given to this AWS batch job
  • job_queue (str) – AWS batch queue
  • region_name (str) – AWS region from which to batch
  • poll_time (int) – Time between querying the AWS batch job status
iso2 = <luigi.parameter.Parameter object>
category = <luigi.parameter.Parameter object>
requires()[source]

Gets the input data

output()[source]

Points to the DB target

prepare()[source]

Prepare the batch job parameters

combine(job_params)[source]

Combine the outputs from the batch jobs

class GroupDetailsTask(*args, **kwargs)[source]

Bases: nesta.core.luigihacks.autobatch.AutoBatchTask

The root task, which adds the surname ‘Muppet’ to the names of the muppets.

Parameters:date (datetime) – Date used to label the outputs
iso2 = <luigi.parameter.Parameter object>
category = <luigi.parameter.Parameter object>
requires()[source]

Get the output from the batchtask

output()[source]

Points to the DB target

prepare()[source]

Prepare the batch job parameters

combine(job_params)[source]

Combine the outputs from the batch jobs

class RootTask(*args, **kwargs)[source]

Bases: luigi.task.WrapperTask

A dummy root task, which collects the database configurations and executes the central task.

Parameters:date (datetime) – Date used to label the outputs
date = <luigi.parameter.DateParameter object>
iso2 = <luigi.parameter.Parameter object>
category = <luigi.parameter.Parameter object>
production = <luigi.parameter.BoolParameter object>
requires()[source]

Collects the database configurations and executes the central task.

Topic discovery

Task to automatically discover relevant topics from meetup data, defined as the most frequently occurring from a set of categories.

class TopicDiscoveryTask(*args, **kwargs)[source]

Bases: luigi.task.Task

Task to automatically discover relevant topics from meetup data, defined as the most frequently occurring from a set of categories.

Parameters:
  • db_config_env (str) – Environmental variable pointing to the path of the DB config.
  • routine_id (str) – The routine UID.
  • core_categories (list) – A list of category_shortnames from which to identify topics.
  • members_perc (int) – A percentile to evaluate the minimum number of members.
  • topic_perc (int) – A percentile to evaluate the most frequent topics.
  • test (bool) – Test mode.
db_config_env = <luigi.parameter.Parameter object>
routine_id = <luigi.parameter.Parameter object>
core_categories = <luigi.parameter.ListParameter object>
members_perc = <luigi.parameter.IntParameter object>
topic_perc = <luigi.parameter.IntParameter object>
test = <luigi.parameter.BoolParameter object>
output()[source]

Points to the S3 Target

run()[source]

Extract the topics of interest

Pipe to elasticsearch

Luigi routine to load the Meetup Group data from MySQL into Elasticsearch.

class MeetupHealthSql2EsTask(*args, **kwargs)[source]

Bases: nesta.core.luigihacks.sql2estask.Sql2EsTask

Task to pipe meetup data to ES. For other arguments, see Sql2EsTask.

Parameters:
  • core_categories (list) – A list of category_shortnames from which to identify topics.
  • members_perc (int) – A percentile to evaluate the minimum number of members.
  • topic_perc (int) – A percentile to evaluate the most frequent topics.
core_categories = <luigi.parameter.ListParameter object>
members_perc = <luigi.parameter.IntParameter object>
topic_perc = <luigi.parameter.IntParameter object>
requires()[source]

The Tasks that this Task depends on.

A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.

See Task.requires

class RootTask(*args, **kwargs)[source]

Bases: luigi.task.WrapperTask

production = <luigi.parameter.BoolParameter object>
date = <luigi.parameter.DateParameter object>
core_categories = <luigi.parameter.ListParameter object>
members_perc = <luigi.parameter.IntParameter object>
topic_perc = <luigi.parameter.IntParameter object>
db_config_env = <luigi.parameter.Parameter object>
drop_and_recreate = <luigi.parameter.BoolParameter object>
requires()[source]

The Tasks that this Task depends on.

A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.

See Task.requires

Novelty score (lolvelty)

Apply “lolvelty” score to Meetup data.

class MeetupLolveltyRootTask(*args, **kwargs)[source]

Bases: luigi.task.WrapperTask

Apply Lolvelty score to meetup data.

Parameters:
  • production (bool) – Running in full production mode?
  • index (str) – Elasticsearch index to append Lolvelty score to.
  • date (datetime) – Date for timestamping this routine.
production = <luigi.parameter.BoolParameter object>
index = <luigi.parameter.Parameter object>
date = <luigi.parameter.DateParameter object>
requires()[source]

The Tasks that this Task depends on.

A Task will only run if all of the Tasks that it requires are completed. If your Task does not require any other Tasks, then you don’t need to override this method. Otherwise, a subclass can override this method to return a single Task, a list of Task instances, or a dict whose values are Task instances.

See Task.requires