ORMs

SQLAlchemy ORMs for the routines, which allows easy integration of testing (including automatic setup of test databases and tables).

Meetup

class Group(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

id
urlname
category_name
category_shortname
city
country
created
description
lat
lon
members
name
topics
category_id
country_name
timestamp
class GroupMember(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Note: no foreign key constraint, since unknown groups will be found in the member expansion phase

group_id
group_urlname
member_id

NIH schema

The schema for the World RePORTER data.

getattr_(entity, attribute)[source]

Either unpack the attribute from every item in the entity if the entity is a list, otherwise just return the attribute from the entity. Returns None if the entity is either None or empty.

class Projects(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

application_id
activity
administering_ic
application_type
arra_funded
award_notice_date
base_core_project_num
budget_start
budget_end
cfda_code
core_project_num
ed_inst_type
foa_number
full_project_num
funding_ics
funding_mechanism
fy
ic_name
org_city
org_country
org_dept
org_district
org_duns
org_fips
org_ipf_code
org_name
org_state
org_zipcode
phr
pi_ids
pi_names
program_officer_name
project_start
project_end
project_terms
project_title
serial_number
study_section
study_section_name
suffix
support_year
direct_cost_amt
indirect_cost_amt
total_cost
subproject_id
total_cost_sub_project
nih_spending_cats
abstract
publications
patents
clinicalstudies
abstract_text
patent_ids
patent_titles
pmids
clinicaltrial_ids
clinicaltrial_titles
class Abstracts(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

application_id
abstract_text
class Publications(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

pmid
author_name
affiliation
author_list
country
issn
journal_issue
journal_title
journal_title_abbr
journal_volume
lang
page_number
pub_date
pub_title
pub_year
pmc_id
class Patents(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

patent_id
patent_title
project_id
patent_org_name
class LinkTables(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

pmid
project_number
class ClinicalStudies(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

clinicaltrials_gov_id
core_project_number
study
study_status
class PhrVector(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Document vectors for NiH Public Health Relevance (PHR) statements.

application_id
vector
class AbstractVector(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Document vectors for NiH abstracts.

application_id
vector
class TextDuplicate(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Link table to describe for NiH text-field duplicates, which probably imply that projects are related, either formally (if weight > 0.8 they are normally almost exact duplicates of each other) or contextually (if weight > 0.5 it is normally in the same general subject area).

The cut-off for inclusion in this table is a weight of 0.5, because the core interest for using this method is to identify texts which are near duplicates, since texts which are contextually similar can also be found by other metrics (topic modelling, etc) and there can be some weird side-effects of using BERT for this; e.g. finding texts with a similar writing style rather than topic.

application_id_1
application_id_2
text_field
weight