This Biological Entity Dictionary (BED) has been developed to address three main challenges. The first one is related to the completeness of identifier mappings. Indeed, direct mapping information provided by the different systems are not always complete and can be enriched by mappings provided by other resources. More interestingly, direct mappings not identified by any of these resources can be indirectly inferred by using mappings to a third reference. For example, many human Ensembl gene ID are not directly mapped to any Entrez gene ID but such mappings can be inferred using respective mappings to HGNC ID. The second challenge is related to the mapping of deprecated identifiers. Indeed, entity identifiers can change from one resource release to another. The identifier history is provided by some resources, such as Ensembl or the NCBI, but it is generally not used by mapping tools. The third challenge is related to the automation of the mapping process according to the relationships between the biological entities of interest. Indeed, mapping between gene and protein ID scopes should not be done the same way than between two scopes regarding gene ID. Also, converting identifiers from different organisms should be possible using gene orthologs information.
This document shows how to use the BED (Biological Entity Dictionary) R package to get and explore mapping between identifiers of biological entities (BE). This package provides a way to connect to a BED Neo4j database in which the relationships between the identifiers from different sources are recorded.
This package and the underlying research has been published in this peer reviewed article:
This BED package depends on the following packages available in the CRAN repository:
All these packages must be installed before installing BED.
devtools::install_github("patzaw/BED")
If you get an error like the following…
Error: package or namespace load failed for ‘BED’:
.onLoad failed in loadNamespace() for 'BED', details:
call: connections[[connection]][["cache"]]
error: subscript out of bounds
… remove the BED folder located here:
file.exists(file.path(Sys.getenv("HOME"), "R", "BED"))
Before using BED, the connection needs to be established with the
underlying Neo4j DB. url
, username
and
password
should be adapted.
library(BED)
connectToBed(url="localhost:5454", remember=FALSE, useCache=FALSE)
The remember
parameter can be set to TRUE
in order to save connection information that will be automatically used
the next time the connectToBed()
function is called or the
next time the BED library is loaded. By default, this parameter is set
to FALSE
to comply with CRAN policies. Saved connection can
be managed with the lsBedConnections()
and the
forgetBedConnection()
functions.
The useCache
parameter is by default set to
FALSE
to comply with CRAN policies. However, it is
recommended to set it to TRUE
to improve the speed of
recurrent queries: the results of some large queries are saved locally
in a file.
The connection can be checked the following way.
checkBedConn(verbose=TRUE)
## http://bel040344:5454
## BED
## UCB-Human-Internal
## 2022.04.25
## Cache ON
## [1] TRUE
## attr(,"dbVersion")
## name instance version
## 1 BED UCB-Human-Internal 2022.04.25
If the verbose
parameter is set to TRUE, the URL and the
content version are displayed as messages.
lsBedConnections()
## [[1]]
## [[1]]$url
## [1] "bel040344:5454"
##
## [[1]]$username
## [1] NA
##
## [[1]]$password
## [1] NA
##
## [[1]]$cache
## [1] TRUE
##
## [[1]]$name
## [1] "BED"
##
## [[1]]$instance
## [1] "UCB-Human-Internal"
##
## [[1]]$version
## [1] "2021.12.16"
##
##
## [[2]]
## [[2]]$url
## [1] "localhost:5420"
##
## [[2]]$username
## [1] NA
##
## [[2]]$password
## [1] NA
##
## [[2]]$cache
## [1] TRUE
##
## [[2]]$name
## [1] "BED"
##
## [[2]]$instance
## [1] "UCB-Human-Internal"
##
## [[2]]$version
## [1] "2021.12.16"
##
##
## [[3]]
## [[3]]$url
## [1] "localhost:5410"
##
## [[3]]$username
## [1] NA
##
## [[3]]$password
## [1] NA
##
## [[3]]$cache
## [1] TRUE
##
## [[3]]$name
## [1] "BED"
##
## [[3]]$instance
## [1] "UCB-Human"
##
## [[3]]$version
## [1] "2021.12.16"
##
##
## [[4]]
## [[4]]$url
## [1] "localhost:5454"
##
## [[4]]$username
## [1] NA
##
## [[4]]$password
## [1] NA
##
## [[4]]$cache
## [1] TRUE
##
## [[4]]$name
## [1] "BED"
##
## [[4]]$instance
## [1] "UCB-Human"
##
## [[4]]$version
## [1] "2020.05.03"
The connection
param of the connectToBed
function can be used to connect to a saved connection other than the
last one.
The BED underlying data model can be shown at any time using the following command.
showBedDataModel()