ORMs¶
SQLAlchemy
ORMs for the routines, which allows easy integration of testing (including automatic setup of test databases and tables).
Meetup¶
NIH schema¶
The schema for the World RePORTER data.
-
getattr_
(entity, attribute)[source]¶ Either unpack the attribute from every item in the entity if the entity is a list, otherwise just return the attribute from the entity. Returns None if the entity is either None or empty.
-
class
Projects
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
-
application_id
¶
-
activity
¶
-
administering_ic
¶
-
application_type
¶
-
arra_funded
¶
-
award_notice_date
¶
-
base_core_project_num
¶
-
budget_start
¶
-
budget_end
¶
-
cfda_code
¶
-
core_project_num
¶
-
ed_inst_type
¶
-
foa_number
¶
-
full_project_num
¶
-
funding_ics
¶
-
funding_mechanism
¶
-
fy
¶
-
ic_name
¶
-
org_city
¶
-
org_country
¶
-
org_dept
¶
-
org_district
¶
-
org_duns
¶
-
org_fips
¶
-
org_ipf_code
¶
-
org_name
¶
-
org_state
¶
-
org_zipcode
¶
-
phr
¶
-
pi_ids
¶
-
pi_names
¶
-
program_officer_name
¶
-
project_start
¶
-
project_end
¶
-
project_terms
¶
-
project_title
¶
-
serial_number
¶
-
study_section
¶
-
study_section_name
¶
-
suffix
¶
-
support_year
¶
-
direct_cost_amt
¶
-
indirect_cost_amt
¶
-
total_cost
¶
-
subproject_id
¶
-
total_cost_sub_project
¶
-
nih_spending_cats
¶
-
abstract
¶
-
publications
¶
-
patents
¶
-
clinicalstudies
¶
-
abstract_text
¶
-
patent_ids
¶
-
patent_titles
¶
-
pmids
¶
-
clinicaltrial_ids
¶
-
clinicaltrial_titles
¶
-
-
class
Abstracts
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
-
application_id
¶
-
abstract_text
¶
-
-
class
Publications
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
-
pmid
¶
-
affiliation
¶
-
country
¶
-
issn
¶
-
journal_issue
¶
-
journal_title
¶
-
journal_title_abbr
¶
-
journal_volume
¶
-
lang
¶
-
page_number
¶
-
pub_date
¶
-
pub_title
¶
-
pub_year
¶
-
pmc_id
¶
-
-
class
Patents
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
-
patent_id
¶
-
patent_title
¶
-
project_id
¶
-
patent_org_name
¶
-
-
class
LinkTables
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
-
pmid
¶
-
project_number
¶
-
-
class
ClinicalStudies
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
-
clinicaltrials_gov_id
¶
-
core_project_number
¶
-
study
¶
-
study_status
¶
-
-
class
PhrVector
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Document vectors for NiH Public Health Relevance (PHR) statements.
-
application_id
¶
-
vector
¶
-
-
class
AbstractVector
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Document vectors for NiH abstracts.
-
application_id
¶
-
vector
¶
-
-
class
TextDuplicate
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Link table to describe for NiH text-field duplicates, which probably imply that projects are related, either formally (if weight > 0.8 they are normally almost exact duplicates of each other) or contextually (if weight > 0.5 it is normally in the same general subject area).
The cut-off for inclusion in this table is a weight of 0.5, because the core interest for using this method is to identify texts which are near duplicates, since texts which are contextually similar can also be found by other metrics (topic modelling, etc) and there can be some weird side-effects of using BERT for this; e.g. finding texts with a similar writing style rather than topic.
-
application_id_1
¶
-
application_id_2
¶
-
text_field
¶
-
weight
¶
-