Meetup (social networking data)

NB: The Meetup pipeline will not work until this issue has been resolved.

Data collection and processing pipeline of Meetup data, principally for the healthMosaic platform.

Topic discovery

Task to automatically discover relevant topics from meetup data, defined as the most frequently occurring from a set of categories.

class TopicDiscoveryTask(*args, **kwargs)[source]

Bases: luigi.task.Task

Task to automatically discover relevant topics from meetup data, defined as the most frequently occurring from a set of categories.

Parameters:
  • db_config_env (str) – Environmental variable pointing to the path of the DB config.
  • routine_id (str) – The routine UID.
  • core_categories (list) – A list of category_shortnames from which to identify topics.
  • members_perc (int) – A percentile to evaluate the minimum number of members.
  • topic_perc (int) – A percentile to evaluate the most frequent topics.
  • test (bool) – Test mode.
db_config_env = <luigi.parameter.Parameter object>
routine_id = <luigi.parameter.Parameter object>
core_categories = <luigi.parameter.ListParameter object>
members_perc = <luigi.parameter.IntParameter object>
topic_perc = <luigi.parameter.IntParameter object>
test = <luigi.parameter.BoolParameter object>
output()[source]

Points to the S3 Target

run()[source]

Extract the topics of interest