Health data¶
Initially for our project with the Robert Woods Johnson Foundation (RWJF), these procedures outline the data collection of health-specific data.
Collect NIH¶
Extract all of the NIH World RePORTER data via
their static data dump. N_TABS
outputs are produced
in CSV format (concatenated across all years), where
N_TABS
correspondes to the number of tabs in
the main table found at:
The data is transferred to the Nesta intermediate data bucket.
-
get_data_urls
(tab_index)[source]¶ Get all CSV URLs from the
tab_index`th tab of the main table found at :code:`TOP_URL
.Parameters: tab_index (int) – Tab number (0-indexed) of table to extract CSV URLs from. Returns: Title of the tab in the table. hrefs (list): List of URLs pointing to data CSVs. Return type: title (str)
Process NIH¶
Data cleaning and processing procedures for the NIH World Reporter data. Specifically, a lat/lon is generated for each city/country; and the formatting of date fields is unified.