A tree 400MYA in the making: choosing species for a phylogenetic analysis between zebrafish and human genes
One of the analysis that I like to do when trying to understand the function of a specific gene is to recapitulate its evolution since the last common ancestor between zebrafish and humans. This brings a problem: from which species do I have to search and download data to convincingly fill the gap on this divergence times? How many data points do we need to bridge a 400 million year gap, including (at least) one full whole-genome duplication?
Species tree created with TimeTree
While there are countless publications with phylogenetic trees made with the sequences from zebrafish, mouse, humans, and either chicken or frogs, I found these datasets to be skewed towards tetrapods, with insufficient fish species coverage and usually lead to wrong conclusions.
As of 2018, this is the list of species that have a sequenced genome (at least partially) and I use in phylogenetic and reconciliation analysis focused on investigating genetic evolution between zebrafish and mammals:
| Last common taxonomy | Species |
|---|---|
| Vertebrata | Lampreys |
| Gnathostomata | Sharks |
| Telestomi | |
| Eutelestomi | Latimeria, lungfishes & Tetrapods |
| Actinopterygii | |
| Actinopteri | |
| Neopterygii | Lepisosteus oculatus |
| Teleostei | Anguilla anguilla |
| Osteoglossocephalai | Scleropages formosus, Paramormyrops kingsleyae |
| Clupeocephala | Great majority of fishes with sequenced genome: Takifugu, Tetraodon, Oryzias, Gasterosteus, Peocilia, Xiphophorus, ... |
| Otomorpha | Clupea harengus |
| Ostariophysi | |
| Otophysi | Astyanax mexicanus, Electrophorus electricus, Ictalurus punctatus, Pygocentrus nattereri |
| Cypriniphysae | |
| Cypriniformes | |
| Cyprinoidea | |
| Cyprinidae | Cyprinus carpio, Squalius pyrenaicus, Sinocyclocheilus rhinocerous, anshuiensis, and grahami, Leuciscus waleckii, Pimephales promelas |
| Danio | Danio rerio |
I often use sequences from lampreys and sharks to root the trees; the coealacanth and lungfishes help in joining the tetrapod and often diverging fish lineages; and the spotted gar is the only non-teleost fish with sequenced genome that I know of, which is useful to pinpoint duplications emerging from the whole genome duplication.