ORMs¶

SQLAlchemy ORMs for the routines, which allows easy integration of testing (including automatic setup of test databases and tables).

Meetup¶

class Group(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

id¶

urlname¶

category_name¶

category_shortname¶

city¶

country¶

created¶

description¶

lat¶

lon¶

members¶

name¶

topics¶

category_id¶

country_name¶

timestamp¶

class GroupMember(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Note: no foreign key constraint, since unknown groups will be found in the member expansion phase

group_id¶

group_urlname¶

member_id¶

NIH schema¶

The schema for the World RePORTER data.

getattr_(entity, attribute)[source]¶: Either unpack the attribute from every item in the entity if the entity is a list, otherwise just return the attribute from the entity. Returns None if the entity is either None or empty.

class Projects(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

application_id¶

activity¶

administering_ic¶

application_type¶

arra_funded¶

award_notice_date¶

base_core_project_num¶

budget_start¶

budget_end¶

cfda_code¶

core_project_num¶

ed_inst_type¶

foa_number¶

full_project_num¶

funding_ics¶

funding_mechanism¶

fy¶

ic_name¶

org_city¶

org_country¶

org_dept¶

org_district¶

org_duns¶

org_fips¶

org_ipf_code¶

org_name¶

org_state¶

org_zipcode¶

phr¶

pi_ids¶

pi_names¶

program_officer_name¶

project_start¶

project_end¶

project_terms¶

project_title¶

serial_number¶

study_section¶

study_section_name¶

suffix¶

support_year¶

direct_cost_amt¶

indirect_cost_amt¶

total_cost¶

subproject_id¶

total_cost_sub_project¶

nih_spending_cats¶

abstract¶

publications¶

patents¶

clinicalstudies¶

abstract_text¶

patent_ids¶

patent_titles¶

pmids¶

clinicaltrial_ids¶

clinicaltrial_titles¶

class Abstracts(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

application_id¶

abstract_text¶

class Publications(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

pmid¶

author_name¶

affiliation¶

author_list¶

country¶

issn¶

journal_issue¶

journal_title¶

journal_title_abbr¶

journal_volume¶

lang¶

page_number¶

pub_date¶

pub_title¶

pub_year¶

pmc_id¶

class Patents(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

patent_id¶

patent_title¶

project_id¶

patent_org_name¶

class LinkTables(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

pmid¶

project_number¶

class ClinicalStudies(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

clinicaltrials_gov_id¶

core_project_number¶

study¶

study_status¶

class PhrVector(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Document vectors for NiH Public Health Relevance (PHR) statements.

application_id¶

vector¶

class AbstractVector(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Document vectors for NiH abstracts.

application_id¶

vector¶

class TextDuplicate(**kwargs)[source]¶

Bases: sqlalchemy.ext.declarative.api.Base

Link table to describe for NiH text-field duplicates, which probably imply that projects are related, either formally (if weight > 0.8 they are normally almost exact duplicates of each other) or contextually (if weight > 0.5 it is normally in the same general subject area).

The cut-off for inclusion in this table is a weight of 0.5, because the core interest for using this method is to identify texts which are near duplicates, since texts which are contextually similar can also be found by other metrics (topic modelling, etc) and there can be some weird side-effects of using BERT for this; e.g. finding texts with a similar writing style rather than topic.

application_id_1¶

application_id_2¶

text_field¶

weight¶