Health data¶

Initially for our project with the Robert Woods Johnson Foundation (RWJF), these procedures outline the data collection of health-specific data.

Collect NIH¶

Extract all of the NIH World RePORTER data via their static data dump. N_TABS outputs are produced in CSV format (concatenated across all years), where N_TABS correspondes to the number of tabs in the main table found at:

https://exporter.nih.gov/ExPORTER_Catalog.aspx

The data is transferred to the Nesta intermediate data bucket.

get_data_urls(tab_index)[source]¶

Get all CSV URLs from the tab_index`th tab of the main table found at :code:`TOP_URL.

Parameters:	tab_index (int) – Tab number (0-indexed) of table to extract CSV URLs from.
Returns:	Title of the tab in the table. hrefs (list): List of URLs pointing to data CSVs.
Return type:	title (str)

iterrows(url)[source]¶

Yield rows from the CSV (found at URL url) as JSON (well, dict objects).

Parameters:	url (str) – The URL at which a zipped-up CSV is found.
Yields:	`dict` object, representing one row of the CSV.

Process NIH¶

Data cleaning and processing procedures for the NIH World Reporter data. Specifically, a lat/lon is generated for each city/country; and the formatting of date fields is unified.