Establish a connection specifying the footprint database of your interest to query.

library(RPostgreSQL)
## Loading required package: DBI
database.host <- "bddsrds.globusgenomics.org"
dbname <- "brain_hint_20"

db <- dbConnect(PostgreSQL(), user="trena", password="trena", port=5432, 
                host=database.host, dbname=dbname)
summary(db)
## <PostgreSQLConnection:(22026,0)> 
##   User: trena 
##   Host: bddsrds.globusgenomics.org 
##   Dbname: brain_hint_20 
##   No resultSet available
dbListTables(db)
## [1] "hits"    "regions"

We have successfully connected to the footprint database that returned two tables - regions and hits

In general, “regions” is used to query for a genomic region of interest; then, execute a second query for the possibly many overlapping footprint/motif calls in each region. We use a very conservative proximal promoter for MEF2C, for example, which is encoded on the minus strand.

chromosome <- "chr5"
tss <- 88904257         # hg38 coordinates
query.regions <- sprintf("select * from regions where chrom='%s' and start > %d and endpos < %d",
                      chromosome, tss - 500, tss + 2000)
tbl.regions <- dbGetQuery(db, query.regions)
dim(tbl.regions)
## [1] 43  4
head(tbl.regions, n=5)

The “loc” column in this data.frame corresponds to the primary key of the hits table, making for fast retrieval of the footprints annotated to motif.

loc.set <- sprintf("('%s')", paste(tbl.regions$loc, collapse="','"))
query.hits <- sprintf("select * from hits where loc IN %s", loc.set)
tbl.hits <- dbGetQuery(db, query.hits)
dim(tbl.hits)
## [1] 56 16
head(tbl.hits, n=5)