One of the analysis that I like to do when trying to understand the function of a specific gene is to recapitulate its evolution since the last common ancestor between zebrafish and humans. This brings a problem: from which species do I have to search and download data to convincingly fill the gap on this divergence times? How many data points do we need to bridge a 400 million year gap, including (at least) one full whole-genome duplication?
Species tree created with TimeTree
While there are countless publications with phylogenetic trees made with the sequences from zebrafish, mouse, humans, and either chicken or frogs, I found these datasets to be skewed towards tetrapods, with insufficient fish species coverage and usually lead to wrong conclusions.
As of 2018, this is the list of species that have a sequenced genome (at least partially) and I use in phylogenetic and reconciliation analysis focused on investigating genetic evolution between zebrafish and mammals:
|Last common taxonomy||Species|
|Eutelestomi||Latimeria, lungfishes & Tetrapods|
|Osteoglossocephalai||Scleropages formosus, Paramormyrops kingsleyae|
|Clupeocephala||Great majority of fishes with sequenced genome:|
Takifugu, Tetraodon, Oryzias, Gasterosteus, Peocilia, Xiphophorus, ...
|Otophysi||Astyanax mexicanus, Electrophorus electricus, Ictalurus punctatus, |
|Cyprinidae||Cyprinus carpio, Squalius pyrenaicus, Sinocyclocheilus rhinocerous, |
anshuiensis, and grahami, Leuciscus waleckii, Pimephales promelas
I often use sequences from lampreys and sharks to root the trees; the coealacanth and lungfishes help in joining the tetrapod and often diverging fish lineages; and the spotted gar is the only non-teleost fish with sequenced genome that I know of, which is useful to pinpoint duplications emerging from the whole genome duplication.