Updating gene length data using ensembldb

Here is a small R function to download updated gene lengths for a set on Ensembl Gene IDs. I use it to obtain the length of the genes from the zebrafish Zv10 assembly when performing gene ontology enrichment of an rnaseq in goseq.

Here is a small R function to download updated gene lengths for a set on Ensembl Gene IDs. I use it to obtain the length of the genes from the zebrafish Zv10 assembly when performing gene ontology enrichment of a RNAseq in goseq.

# Function downloadGeneLengthData.
# Input: the ID of the EnsDB object.
# Output: A vector with the gene Length Data.
downloadGeneLengthData <- function(id) {
  library(ensembldb)
  library(AnnotationHub)
  ah <- AnnotationHub()
  ahDb <- query(ah, "EnsDb")
  EnsDb <- ahDb[[id]]
  return(lengthOf(EnsDb, of="gene"))
}

We can create a named vector with only the gene length of our universe ids (in my case, in the UniverseIDs object):

geneLengthData <- downloadGeneLengthData('AH57746')
geneLengthData <- geneLengthData[names(geneLengthData) %in% UniverseIDs]

and use that vector in, for example, goseq to specify gene length bias.

The AH57746 object is the Ensembl 90 EnsDb for Danio Rerio. You can find more information about the ensembldb package and how to download different DBs for other species and assembly versions in its vignettes.