<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title><![CDATA[bits and genes]]></title><description><![CDATA[Thoughts, stories and ideas.]]></description><link>https://bitsandgen.es/</link><image><url>https://bitsandgen.es/favicon.png</url><title>bits and genes</title><link>https://bitsandgen.es/</link></image><generator>Ghost 5.77</generator><lastBuildDate>Sun, 22 Mar 2026 08:42:37 GMT</lastBuildDate><atom:link href="https://bitsandgen.es/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[RainCloudPlots: shiny app]]></title><description><![CDATA[I created an interactive click interface to easily customize RainCloud plots without writing code. ]]></description><link>https://bitsandgen.es/my-first-shiny-app-raincloudplots/</link><guid isPermaLink="false">65b80e50fc60ec721007a9bf</guid><dc:creator><![CDATA[Gabriel Forn-Cuni]]></dc:creator><pubDate>Mon, 15 Oct 2018 13:02:56 GMT</pubDate><media:content url="https://bitsandgen.es/content/images/2018/10/rainCloudPlot.png" medium="image"/><content:encoded><![CDATA[<img src="https://bitsandgen.es/content/images/2018/10/rainCloudPlot.png" alt="RainCloudPlots: shiny app"><p>It should provide a nice head-start to know how the ggplot package of R works, or alternatively, a nice click and interactive interface to make nice plots.</p><p>In the last few weeks, I have been working on my first public Shiny app. I have been working a bit with R lately and was interested in trying its internet interface. The objective was to decress the entry barrier to R and ggplot to some of my colleagues while keeping it interesting, so I decided to create an app that showed side-by-side a plot with the important code sections relevant to make it plot possible.</p><p>After a long debate about it, I decided to tweet and open-source it. The app is now uploaded <a href="//gabrifc.shinyapps.io/raincloudplots/">here</a>, and the I published the source code on Github <a href="https://github.com/gabrifc/raincloud-shiny?ref=bitsandgen.es">here</a>.</p><figure class="kg-card kg-embed-card"><blockquote class="twitter-tweet"><p lang="en" dir="ltr">My coworkers and I found Raincloud Plots by <a href="https://twitter.com/micahgallen?ref_src=twsrc%5Etfw&amp;ref=bitsandgen.es">@micahgallen</a> really interesting but did not have enough experience in R to comfortably use the code, so I made a Shiny app. Hopefully you also find it useful. <a href="https://twitter.com/hashtag/shiny?src=hash&amp;ref_src=twsrc%5Etfw&amp;ref=bitsandgen.es">#shiny</a> <a href="https://twitter.com/hashtag/rstats?src=hash&amp;ref_src=twsrc%5Etfw&amp;ref=bitsandgen.es">#rstats</a> <a href="https://twitter.com/hashtag/dataviz?src=hash&amp;ref_src=twsrc%5Etfw&amp;ref=bitsandgen.es">#dataviz</a> <a href="https://t.co/nvwP0Zj2ry?ref=bitsandgen.es">https://t.co/nvwP0Zj2ry</a> <a href="https://t.co/MXL7inquBZ?ref=bitsandgen.es">pic.twitter.com/MXL7inquBZ</a></p>&#x2014; Gabriel Forn-Cun&#xED; (@furniest) <a href="https://twitter.com/furniest/status/1049585220888989697?ref_src=twsrc%5Etfw&amp;ref=bitsandgen.es">October 9, 2018</a></blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
</figure><p>The app was well-received, even by the publishers of the original idea, and the overall feedback has been positive, so I am sure that updates and new apps are coming in the near future :)</p><figure class="kg-card kg-embed-card"><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Fantastic! We&apos;ll definitely be adding a link to this in the manuscript! <a href="https://t.co/6WxbBwymfG?ref=bitsandgen.es">https://t.co/6WxbBwymfG</a></p>&#x2014; neuroconscience (@neuroconscience) <a href="https://twitter.com/neuroconscience/status/1049742287083843585?ref_src=twsrc%5Etfw&amp;ref=bitsandgen.es">October 9, 2018</a></blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
</figure><figure class="kg-card kg-embed-card"><blockquote class="twitter-tweet"><p lang="en" dir="ltr">&#x1F42D; quite a week for R &#x2229; clicky-viz!<br>&#x2614; &quot;raincloud-shiny: shiny app for customizing Raincloud plots&quot; by <a href="https://twitter.com/furniest?ref_src=twsrc%5Etfw&amp;ref=bitsandgen.es">@furniest</a><a href="https://t.co/9JU2x2WZl5?ref=bitsandgen.es">https://t.co/9JU2x2WZl5</a> <a href="https://twitter.com/hashtag/rstats?src=hash&amp;ref_src=twsrc%5Etfw&amp;ref=bitsandgen.es">#rstats</a> <a href="https://twitter.com/hashtag/dataviz?src=hash&amp;ref_src=twsrc%5Etfw&amp;ref=bitsandgen.es">#dataviz</a> <a href="https://twitter.com/hashtag/rshiny?src=hash&amp;ref_src=twsrc%5Etfw&amp;ref=bitsandgen.es">#rshiny</a> <a href="https://t.co/Oy9kg2lU8x?ref=bitsandgen.es">pic.twitter.com/Oy9kg2lU8x</a></p>&#x2014; Mara Averick (@dataandme) <a href="https://twitter.com/dataandme/status/1050755775579258881?ref_src=twsrc%5Etfw&amp;ref=bitsandgen.es">October 12, 2018</a></blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
</figure><p>Please let me know if you use it, if you find it useful, if you find something is missing, or any other kind of feedback.</p>]]></content:encoded></item><item><title><![CDATA[A tree 400MYA in the making: choosing species for a phylogenetic analysis between zebrafish and human genes]]></title><description><![CDATA[As of June of 2018, this is the list of species that have a sequenced genome (at least partially) and I use in phylogenetic and reconciliation analysis focused on investigating genetic evolution between zerbafish and mammals:]]></description><link>https://bitsandgen.es/bridging-the-phylogenetic-gap-between-zebrafish-and-mammals/</link><guid isPermaLink="false">65b80e50fc60ec721007a9be</guid><dc:creator><![CDATA[Gabriel Forn-Cuni]]></dc:creator><pubDate>Fri, 15 Jun 2018 11:42:40 GMT</pubDate><content:encoded><![CDATA[<!--kg-card-begin: markdown--><p>One of the analysis that I like to do when trying to understand the function of a specific gene is to recapitulate its evolution since the last common ancestor between zebrafish and humans. This brings a problem: from which species do I have to search and download data to convincingly fill the gap on this divergence times? How many data points do we need to bridge a 400 million year gap, including (at least) one full whole-genome duplication?</p>
<p><img src="https://bitsandgen.es/content/images/2018/06/simpleTree_species.png" alt="Species Tree from TimeTree.org" loading="lazy"><br>
<small>Species tree created with <a href="http://timetree.org/?ref=bitsandgen.es">TimeTree</a></small></p>
<p>While there are countless publications with phylogenetic trees made with the sequences from zebrafish, mouse, humans, and either chicken or frogs, I found these datasets to be skewed towards tetrapods, with insufficient fish species coverage and usually lead to wrong conclusions.</p>
<p>As of 2018, this is the list of species that have a sequenced genome (at least partially) and I use in phylogenetic and reconciliation analysis focused on investigating genetic evolution between zebrafish and mammals:</p>
<table>
    <thead><tr><th>Last common taxonomy</th><th>Species</th></tr></thead>
    <tbody>
        <tr><td><i>Vertebrata</i></td><td>Lampreys</td></tr>
        <tr><td><i>Gnathostomata</i></td><td>Sharks</td></tr>
        <tr><td><i>Telestomi</i></td><td>&#xA0;</td></tr>
        <tr><td><i>Eutelestomi</i></td><td><i>Latimeria</i>, lungfishes &amp; Tetrapods</td></tr>
        <tr><td><i>Actinopterygii</i></td><td>&#xA0;</td></tr>
        <tr><td><i>Actinopteri</i></td><td>&#xA0;</td></tr>
        <tr><td><i>Neopterygii</i></td><td><i>Lepisosteus oculatus</i></td></tr>
        <tr><td><i>Teleostei</i></td><td><i>Anguilla anguilla</i></td></tr>
        <tr><td><i>Osteoglossocephalai</i></td><td><i>Scleropages formosus</i>, <i>Paramormyrops kingsleyae</i></td></tr>
        <tr><td><i>Clupeocephala</i></td><td>Great majority of fishes with sequenced genome:<br><i>Takifugu</i>, <i>Tetraodon</i>, <i>Oryzias</i>, <i>Gasterosteus</i>, <i>Peocilia</i>, <i>Xiphophorus</i>, ...</td></tr>
        <tr><td><i>Otomorpha</i></td><td><i>Clupea harengus</i></td></tr>
        <tr><td><i>Ostariophysi</i></td><td>&#xA0;</td></tr>
        <tr><td><i>Otophysi</i></td><td><i>Astyanax mexicanus</i>, <i>Electrophorus electricus</i>, <i>Ictalurus punctatus</i>, <br><i>Pygocentrus nattereri</i></td></tr>
        <tr><td><i>Cypriniphysae</i></td><td>&#xA0;</td></tr>
        <tr><td><i>Cypriniformes</i></td><td>&#xA0;</td></tr>
        <tr><td><i>Cyprinoidea</i></td><td>&#xA0;</td></tr>
        <tr><td><i>Cyprinidae</i></td><td><i>Cyprinus carpio</i>, <i>Squalius pyrenaicus</i>, <i>Sinocyclocheilus rhinocerous</i>, <br><i>anshuiensis</i>, and <i>grahami</i>, <i>Leuciscus waleckii</i>, <i>Pimephales promelas</i></td></tr>
        <tr><td><i>Danio</i></td><td><i>Danio rerio</i></td></tr>
    </tbody>
</table>
<p>I often use sequences from lampreys and sharks to root the trees; the coealacanth and lungfishes help in joining the tetrapod and often diverging fish lineages; and the spotted gar is the only non-teleost fish with sequenced genome that I know of, which is useful to pinpoint duplications emerging from the whole genome duplication.</p>
<!--kg-card-end: markdown-->]]></content:encoded></item><item><title><![CDATA[Should we trust the zebrafish as a model for human inflammatory diseases?]]></title><description><![CDATA[Zebrafish is emerging as a suitable research animal for modelling human diseases, but what is the translational potential of its research to humans?]]></description><link>https://bitsandgen.es/should-we-trust-the-zebrafish-as-a-model-for-human-inflammatory-diseases/</link><guid isPermaLink="false">65b80e50fc60ec721007a9bd</guid><dc:creator><![CDATA[Gabriel Forn-Cuni]]></dc:creator><pubDate>Sat, 10 Mar 2018 08:51:00 GMT</pubDate><media:content url="https://bitsandgen.es/content/images/2018/03/zebrafish.jpg" medium="image"/><content:encoded><![CDATA[<!--kg-card-begin: markdown--><img src="https://bitsandgen.es/content/images/2018/03/zebrafish.jpg" alt="Should we trust the zebrafish as a model for human inflammatory diseases?"><p>This is an excerpt of the talk I gave last week on the Leiden University IBL &quot;In The Spotlight&quot; Biology Seminars. Zebrafish is emerging as a suitable research animal for modelling human diseases, but what is the translational potential of its research to humans?</p>
<h2 id="introduction">Introduction</h2>
<p>In 2011, I started my PhD working with zebrafish to study innate immunity and inflammation at the <a href="http://www.iim.csic.es/index.php/inmunologia-y-genomica/?lang=en&amp;ref=bitsandgen.es">Immunology and Genomics group of the Institute of Marine Research</a> in Vigo, Spain. While I was there, a paper published in PNAS (&quot;Genomic responses in mouse models poorly mimic human inflammatory diseases&quot;, by Seok J <em>et al</em>.; <a href="http://www.pnas.org/content/110/9/3507?ref=bitsandgen.es">link</a>) had huge media coverage and went viral through news oulets over the world. The article criticized the lack of correlation between human and mouse gene regulation after an acute inflammatory stimuli, and it sparked a nice discussion of the appropriateness of animal models in research (a direct reply: &quot;Genomic responses in mouse models greatly mimic human inflammatory diseases&quot;, by Takao K &amp; Miyakawa T; <a href="http://www.pnas.org/content/112/4/1167?ref=bitsandgen.es">link</a>).</p>
<p>My opinion on the matter is complex and may warrant its own post in the future, but on this talk I wanted to approach a question that directly affected my PhD: If mice, the cornerstone of biomedical research are being challenged as a suitable model for human inflammatory diseases... what the hell are we even trying to do with the zebrafish, which is even less evolutionary related? How dare we even propose that as a model?</p>
<blockquote>
<p>If the use of murine models for studying human inflammation processes is questioned, how relevant can a more evolutionarily distant species such as zebrafish be for modelling human diseases? (<a href="https://www.nature.com/articles/srep41905?ref=bitsandgen.es">Forn-Cuni G <em>et al</em>., 2017</a>)</p>
</blockquote>
<p>Turns out, I just finished publishing my first 1st author paper -a characterization in zebrafish of a central protein in the inflammatory response, the <em>C3</em> gene of the complement system-, and it had a major impact in how I approached zebrafish research further on.</p>
<h2 id="immunerelatedgeneduplicationinzebrafish">Immune-related Gene Duplication in Zebrafish</h2>
<p>The C3 is a central protein in the <a href="https://en.wikipedia.org/wiki/Complement_system?ref=bitsandgen.es">complement system cascade</a>. It interacts with pathogens, opsonizes them for easier phagocytosis and can lead to the downstream formation of pores in the pathogen membrane through the Membrane Attack Complex. It was already known that in zebrafish there were 3 independent <em>c3</em> genes instead of one <a href="http://onlinelibrary.wiley.com/doi/10.1046/j.1365-3083.1998.00457.x/abstract;jsessionid=DDCC574A1A9ECCCC52585C4D3AAE9593.f04t03?ref=bitsandgen.es">since a while ago</a>, but when my supervisors turned their eye to them and searched on the genome, they found 8 instead. The duplication of immune-related genes is a story that keeps repeating when working with zebrafish.</p>
<p>So I started to wonder... Why are so many duplications of immune genes in zebrafish? Which is the origin of all these genes, and how do they relate to the ones in humans? Are they regulated similarly to humans? Do they even have the same functions?</p>
<p>Teleost fish experienced an additional round of whole genome duplication than tetrapods, the sometimes wrongly called the Fish Specific Genome Duplication (salmonids had fourth one). That explains why most of the genes we work on zebrafish that may be under strong evolutionary pressure (as the ones directly interacting with pathogens) are duplicated. It is so common that we have <a href="https://wiki.zfin.org/display/general/ZFIN+Zebrafish+Nomenclature+Conventions?ref=bitsandgen.es#ZFINZebrafishNomenclatureConventions-1.2">official nomenclature rules for naming these genes</a>.</p>
<p>However, the origin of duplicated genes in zebrafish does not stop here. According to Lu <em>et al</em>, 2012 (<a href="https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-13-246?ref=bitsandgen.es">link</a>), zebrafish also had an increased capacity of creating and retaining tandem gene duplications through its lineage evolution. Furthermore, when they analysed these genes that were duplicated in tandem they found that they were enriched in immune functions. Consequently, it is not unusual to find multiple copies of immune-related genes in the zebrafish genome.</p>
<p>The problem arises when these gene copies diversify to get new specific functions, sometimes specific for different tissues or pathogens, sometimes they can evolve faster and gain new functions, and sometimes they can gain regulatory or inhibitory functions. Referencing the previous paper, two of the <em>c3</em> genes that we characterized, <em>c3b.1</em> and <em>c3b.2</em>, have opposite, regulatory functions than the typical <em>c3</em> genes during inflammation, probably because of competitive inhibition or by being scavenger receptors, who knows.</p>
<blockquote>
<p>Immune-related genes that interact with pathogens are often duplicated in zebrafish -because of the additional Whole Genome Duplication and the teleost tendency of duplicating genes in tandem-, and they have evolved to new regulations and functions.</p>
</blockquote>
<p>While <a href="https://www.nature.com/articles/nature12111?ref=bitsandgen.es">zebrafish may share 70% of genes with mammals</a>, immune-related genes are vastly diversified and most of them don&apos;t have 1-to-1 orthologs. In that context, how do we compare the functions and regulation during inflammation of genes with no clear evolutionary conservation? And therefore, if most of the proteins interacting with pathogens, activating and involved in the inflammatory response are not evolutionary conserved, how good of a model the zebrafish can actually be for these processes?</p>
<h2 id="comparisonofgeneregulationduringacuteinflammation">Comparison of Gene Regulation During Acute Inflammation</h2>
<p>To find out, we compared the transcriptomic response to an acute inflammatory stimulus between zebrafish and mammals... with a little twist. In most vertebrates, <a href="https://en.wikipedia.org/wiki/Lipopolysaccharide?ref=bitsandgen.es">bacterial LPS</a> is recognized by the <a href="https://en.wikipedia.org/wiki/TLR4?ref=bitsandgen.es">innate immunity receptor complex TLR4/CD14/MD2</a>. However, fish have been classically regarded as &quot;LPS-resistant&quot;, mostly due to the fact that most species lack MD2, <a href="http://www.jimmunol.org/content/183/9/5896?ref=bitsandgen.es">and the TLR4 ohnolog present in some of them is not evolutionarily conserved and does not recognize LPS</a>. Currently, the LPS sensing mechanism in fish is unknown. Despite that, stimulating zebrafish with LPS produces an acute inflammatory response, and we wanted to characterize it.</p>
<p>As expected, this analysis was not void of technical (and philosophical) difficulties. How can we compare the zebrafish gene regulation to mice or humans when only about 10% of the genes regulated in the inflammatory response have homology without question? And if there are two paralog genes with different expression patterns, which one do we consider for the comparison? Bluntly, if we apply the same 1-to-1 correlation method that we use in murine models to compare genes between zebrafish and humans, we are going to have a bad time.</p>
<p>Moreover, this small fraction of genes that are conserved -which have correlation values from zebrafish to mammals similar than to mice to humans- are the typical already known inflammatory markers, so there is not much new information that we can learn from it.</p>
<blockquote>
<p>1-to-1 gene correlation between zebrafish and humans is difficult to establish, and due to evolutionary pressure, most genes are not functional homologues. But the cellular systems and pathways themselves during the inflammatory response are strikingly well conserved.</p>
</blockquote>
<p>But, on the other side, when comparing the enrichment of the whole transcriptomic response, the similarities are obvious. When sensing inflammatory stimuli, zebrafish activate the same response framework than us: immune response activators (<em>nfkb</em>, <em>ap1</em> components, ...); immune response inhibitors; pro-apoptotic and anti-apoptotic genes; inflammatory mediators (as cytokines and chemokines); IFN-stimulated, antigen presentation, tissue invasion, autophagy-related genes, etc. The whole framework is the same, it&apos;s just that the specific tools are different and specialized.</p>
<h2 id="discussion">Discussion</h2>
<p>The zebrafish provides a vast tool set to study innate immunity and inflammation.  But it is a relatively recent addition to the biomedical research community and we have tons of work to do in order to learn the specific nuances of its immune responses and interaction with pathogens. This is a crucial step, not only to better understand our model and its limitations, but also because it can lead to potential therapeutic breakthroughs on its own.</p>
<p>But, in the meantime, from my point of view, if we want to achieve translational impact of our research to the human inflammatory disease and the clinic, we should forget about specific genes, which function may -or may not- be conserved and may be better suited to study in other models, and stick to what our model does well: reproduce whole pathways, systems, and cellular dynamics. In a way, move away from specific significant differentially expressed genes into other, more systems biology oriented, approaches.</p>
<iframe src="//www.slideshare.net/slideshow/embed_code/key/jpTK2jZgLhMwEx" width="595" height="485" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen> </iframe> 
<small>This are the slides of the talk.</small><!--kg-card-end: markdown-->]]></content:encoded></item><item><title><![CDATA[Updating Gene Ontology mappings using biomaRt]]></title><description><![CDATA[This is another small R function to download updated Gene Ontology mappings for a set of Ensembl Gene IDs. I use it to obtain the Gene Ontology categories linked to each gene from the zebrafish Zv10 assembly when performing enrichment of a RNAseq in goseq.]]></description><link>https://bitsandgen.es/updating-gene-ontology-mappings-using-biomart/</link><guid isPermaLink="false">65b80e50fc60ec721007a9b9</guid><dc:creator><![CDATA[Gabriel Forn-Cuni]]></dc:creator><pubDate>Mon, 19 Feb 2018 13:14:04 GMT</pubDate><content:encoded><![CDATA[<!--kg-card-begin: markdown--><p>This is another small R function to download updated Gene Ontology mappings for a set of Ensembl Gene IDs. I use it to obtain the Gene Ontology categories linked to each gene from the zebrafish Zv10 assembly when performing enrichment of a RNAseq in <code>goseq</code>.</p>
<p>I use this function to update the GO Mappings to Ensembl Gene IDs because he data in the amazing <code>org.Dr.eg.db</code> package is not up to date, and the <code>EnsDb</code> for Zebrafish used to update the gene lengths in the <a href="https://bitsandgen.es/updating-gene-length-data-using-ensembldb/">previous post</a> does not include GO Data.</p>
<pre><code class="language-r"># Function downloadGOMapping
# Input: the a list of Ensembl IDs.
# Output: A dataframe with the GO mappings.
downloadGOMapping &lt;- function(vectorOfIDs) {
  library(&quot;biomaRt&quot;)
  # print(&quot;Annotating using BioMaRt&quot;)
  zfishMart &lt;- useMart(&quot;ensembl&quot;, dataset=&quot;drerio_gene_ensembl&quot;)
  goMapping &lt;- getBM(attributes = c(&apos;ensembl_gene_id&apos;,&apos;go_id&apos;), 
                     filters = &apos;ensembl_gene_id&apos;,
                     values = vectorOfIDs, 
                     mart = zfishMart)
  # Clean blank mappings
  goMapping &lt;- goMapping[!goMapping$go_id==&quot;&quot;,]
  return(goMapping)
}
</code></pre>
<p>The <code>zfishMart</code> object here is the Ensembl biomart link for Danio Rerio. You can change this object to the species of the Ensembl DB that you are working on.</p>
<!--kg-card-end: markdown-->]]></content:encoded></item><item><title><![CDATA[Updating gene length data using ensembldb]]></title><description><![CDATA[Here is a small R function to download updated gene lengths for a set on Ensembl Gene IDs. I use it to obtain the length of the genes from the zebrafish Zv10 assembly when performing gene ontology enrichment of an rnaseq in goseq.]]></description><link>https://bitsandgen.es/updating-gene-length-data-using-ensembldb/</link><guid isPermaLink="false">65b80e50fc60ec721007a9ba</guid><dc:creator><![CDATA[Gabriel Forn-Cuni]]></dc:creator><pubDate>Sat, 10 Feb 2018 10:39:28 GMT</pubDate><content:encoded><![CDATA[<!--kg-card-begin: markdown--><p>Here is a small R function to download updated gene lengths for a set on Ensembl Gene IDs. I use it to obtain the length of the genes from the zebrafish Zv10 assembly when performing gene ontology enrichment of a RNAseq in <code>goseq</code>.</p>
<pre><code class="language-r"># Function downloadGeneLengthData.
# Input: the ID of the EnsDB object.
# Output: A vector with the gene Length Data.
downloadGeneLengthData &lt;- function(id) {
  library(ensembldb)
  library(AnnotationHub)
  ah &lt;- AnnotationHub()
  ahDb &lt;- query(ah, &quot;EnsDb&quot;)
  EnsDb &lt;- ahDb[[id]]
  return(lengthOf(EnsDb, of=&quot;gene&quot;))
}
</code></pre>
<p>We can create a named vector with only the gene length of our universe ids (in my case, in the <code>UniverseIDs</code> object):</p>
<pre><code class="language-r">geneLengthData &lt;- downloadGeneLengthData(&apos;AH57746&apos;)
geneLengthData &lt;- geneLengthData[names(geneLengthData) %in% UniverseIDs]
</code></pre>
<p>and use that vector in, for example, <code>goseq</code> to specify gene length bias.</p>
<p>The <code>AH57746</code> object is the Ensembl 90 EnsDb for Danio Rerio. You can find more information about the <code>ensembldb</code> package and how to download different DBs for other species and assembly versions <a href="https://bioconductor.org/packages/3.7/bioc/vignettes/ensembldb/inst/doc/ensembldb.html?ref=bitsandgen.es">in its vignettes</a>.</p>
<!--kg-card-end: markdown-->]]></content:encoded></item><item><title><![CDATA[Preparing a zebrafish RNAseq for GSEA]]></title><description><![CDATA[Recently I had to annotate a rnaseq analysis that was only mapped to Ensembl IDs for an enrichment analysis in GSEA and this is the way in which I am getting the most insightful results on my research. ]]></description><link>https://bitsandgen.es/annotating-a-zebrafish-rnaseq-for-gsea/</link><guid isPermaLink="false">65b80e50fc60ec721007a9b8</guid><dc:creator><![CDATA[Gabriel Forn-Cuni]]></dc:creator><pubDate>Thu, 08 Feb 2018 10:35:22 GMT</pubDate><media:content url="https://bitsandgen.es/content/images/2018/02/rstudio.PNG" medium="image"/><content:encoded><![CDATA[<!--kg-card-begin: markdown--><img src="https://bitsandgen.es/content/images/2018/02/rstudio.PNG" alt="Preparing a zebrafish RNAseq for GSEA"><p>Recently I had to annotate a rnaseq analysis that was only mapped to Ensembl IDs for an enrichment analysis in GSEA. That meant getting the maximum number of zebrafish Ensembl gene IDs to its human homolog/ortholog  (you know the drill with zebrafish) Gene Symbols. So I spent some time analysing different mapping ways from  to lose the least amount of information possible while still preserving the variability of zebrafish duplicates.</p>
<p>Some options that I tried:</p>
<ol>
<li>Use the GSEA-provided zebrafish.chip</li>
<li>Simply get all genes to caps and use only the direct matches between zebrafish and human genes</li>
<li>Get homologs through:<br>
3.1. Biodbnet<br>
3.2. The R <code>AnnotationDBi</code> package (<code>org.Dr.eg.db</code> in this case)<br>
3.3. <code>biomaRt</code></li>
<li>A bunch of other crazy stuff</li>
</ol>
<p>In the end, I found that using a two-step process would yield the best results for the time spent:</p>
<ol>
<li>Map the Ensembl IDs to ZFIN IDs</li>
<li>Use the ZFIN database as a chip for the conversion</li>
</ol>
<h2 id="1fromzebrafishensemblgeneidstozfinids">1. From zebrafish Ensembl Gene IDs to ZFIN IDs</h2>
<p>Thankfully, the mapping from Ensembl IDs to ZFIN IDs is pretty easy and straightforward. Here are two R functions to get the job done through the <code>biomaRt</code> and <code>org.Dr.eg.db</code> packages:</p>
<pre><code class="language-r"># Function txdbAnnot.
# Input: a list of Ensembl Gene IDs.
# Return: an annotated dataframe.
txdbAnnot &lt;- function(listOfIDS,
                      attributes = attributes,
                      keys = keys) {
  library(&quot;org.Dr.eg.db&quot;)
  # print(&quot;Annotating using org.Dr.eg.db&quot;)
  txdbAnnotDF &lt;- select(org.Dr.eg.db, 
                      keys=listOfIDS, 
                      columns=attributes, 
                      keytype=keys,
                      multiVals=&quot;first&quot;)
  return(txdbAnnotDF)
}

bioMartConversion &lt;- function(listOfIDS, 
                              attributes = attributes,
                              filters = filters) {
  library(&quot;biomaRt&quot;)
  # print(&quot;Annotating using BioMaRt&quot;)
  zfishMart &lt;- useMart(&quot;ensembl&quot;, dataset=&quot;drerio_gene_ensembl&quot;)
  geneAnnot &lt;- getBM(attributes = attributes, 
                     filters = filters,
                     values = listOfIDS, 
                     mart = zfishMart)
  return(geneAnnot)
}

# Function deleDuplicatesDataFrame.
# Input: a dataframe and a column with duplicates.
# Output: the dataframe without the duplicates.
deleteDuplicatesDataFrame &lt;- function(df, col) {
  dup.idx &lt;- which(duplicated(df[col]))
  return(df[-dup.idx,])
}
</code></pre>
<p>And then annotate using one of the methods:</p>
<pre><code class="language-r"># You can use the call to annotate with other IDs as
# attributes &lt;- c(&quot;ENTREZID&quot;, &quot;SYMBOL&quot;, &quot;ZFIN&quot;, &quot;GENENAME&quot;)
zfinTxdb &lt;- txdbAnnot(listOfIDS, 
                      attributes = &quot;ZFIN&quot;,
                      keys = &quot;ENSEMBL&quot;)
# Clean up the duplicates
zfinTxdb &lt;- deleteDuplicatesDataFrame(zfinTxdb, &quot;ZFIN&quot;)
</code></pre>
<p>or</p>
<pre><code class="language-r"># You can use the call to annotate with other IDs as
# attributes &lt;- c(&quot;ensembl_gene_id&quot;, &quot;entrezgene&quot;, &quot;external_gene_name&quot;, 
              &quot;zfin_id_id&quot;, &quot;description&quot;)
zfinBiomaRt &lt;- bioMartConversion(listOfIDS, 
                                 attributes = c(&quot;ensembl_gene_id&quot;,&quot;zfin_id_id&quot;),
                                 filters = &quot;ensembl_gene_id&quot;)
# Clean up the duplicates
zfinBiomaRt &lt;- deleteDuplicatesDataFrame(zfinBiomaRt, &quot;ensembl_gene_id&quot;)
</code></pre>
<p>I personally prefer using <code>AnnotationDBI</code> because it&apos;s faster.</p>
<h2 id="2thezfinchipforgsea">2. The ZFIN.chip for GSEA</h2>
<p><a href="https://zfin.org/?ref=bitsandgen.es">ZFIN</a> provides data dumps for most of its data, and one really useful table is the Human and Zebrafish Orthology table. You can find it at the <a href="https://zfin.org/downloads?ref=bitsandgen.es">downloads page</a>.</p>
<p>Just create a mapping chip without duplicates using the columns ZFIN ID as <code>Probe</code>, Human gene Symbol as <code>Gene Symbol</code>, and the zebrafish gene name as <code>Gene Title</code>. That way the 1-to-many orthologs can be treated as different (technically, as different probes for the same gene) but we don&apos;t lose info or pick just one randomly.</p>
<p><img src="https://bitsandgen.es/content/images/2018/02/Captura.PNG" alt="Preparing a zebrafish RNAseq for GSEA" loading="lazy"></p>
<p>That&apos;s it. I know that it&apos;s not perfect, but this is the way in which I am getting the most insightful results on my research.</p>
<p>What are your thoughts? Get in touch and discuss it!</p>
<!--kg-card-end: markdown-->]]></content:encoded></item></channel></rss>