Updating gene length data using ensembldb
Here is a small R function to download updated gene lengths for a set on Ensembl Gene IDs. I use it to obtain the length of the genes from the zebrafish Zv10 assembly when performing gene ontology enrichment of a RNAseq in goseq
.
# Function downloadGeneLengthData.
# Input: the ID of the EnsDB object.
# Output: A vector with the gene Length Data.
downloadGeneLengthData <- function(id) {
library(ensembldb)
library(AnnotationHub)
ah <- AnnotationHub()
ahDb <- query(ah, "EnsDb")
EnsDb <- ahDb[[id]]
return(lengthOf(EnsDb, of="gene"))
}
We can create a named vector with only the gene length of our universe ids (in my case, in the UniverseIDs
object):
geneLengthData <- downloadGeneLengthData('AH57746')
geneLengthData <- geneLengthData[names(geneLengthData) %in% UniverseIDs]
and use that vector in, for example, goseq
to specify gene length bias.
The AH57746
object is the Ensembl 90 EnsDb for Danio Rerio. You can find more information about the ensembldb
package and how to download different DBs for other species and assembly versions in its vignettes.