Health data

Initially for our project with the Robert Woods Johnson Foundation (RWJF), these procedures outline the data collection of health-specific data.

Collect NIH

Extract all of the NIH World RePORTER data via their static data dump. N_TABS outputs are produced in CSV format (concatenated across all years), where N_TABS correspondes to the number of tabs in the main table found at:

The data is transferred to the Nesta intermediate data bucket.

get_data_urls(tab_index)[source]

Get all CSV URLs from the tab_index`th tab of the main table found at :code:`TOP_URL.

Parameters:tab_index (int) – Tab number (0-indexed) of table to extract CSV URLs from.
Returns:Title of the tab in the table. hrefs (list): List of URLs pointing to data CSVs.
Return type:title (str)
iterrows(url)[source]

Yield rows from the CSV (found at URL url) as JSON (well, dict objects).

Parameters:url (str) – The URL at which a zipped-up CSV is found.
Yields:dict object, representing one row of the CSV.

Process NIH

Data cleaning and processing procedures for the NIH World Reporter data. Specifically, a lat/lon is generated for each city/country; and the formatting of date fields is unified.