publications
2024
- Complete sequencing of ape genomesDongAhn Yoo, Arang Rhie, Prajna Hebbar, and 120 more authorsbioRxiv 2024
We present haplotype-resolved reference genomes and comparative analyses of six ape species, namely: chimpanzee, bonobo, gorilla, Bornean orangutan, Sumatran orangutan, and siamang. We achieve chromosome-level contiguity with unparalleled sequence accuracy (<1 error in 500,000 base pairs), completely sequencing 215 gapless chromosomes telomere-to-telomere. We resolve challenging regions, such as the major histocompatibility complex and immunoglobulin loci, providing more in-depth evolutionary insights. Comparative analyses, including human, allow us to investigate the evolution and diversity of regions previously uncharacterized or incompletely studied without bias from mapping to the human reference. This includes newly minted gene families within lineage-specific segmental duplications, centromeric DNA, acrocentric chromosomes, and subterminal heterochromatin. This resource should serve as a definitive baseline for all future evolutionary studies of humans and our closest living ape relatives.
- A multi-million-year natural experiment: comparative genomics on a massive scale and its implications for human healthIker Rivas-González, and Jenny TungEvolution, Medicine, and Public Health 2024
Improving the diversity and quality of genome assemblies for non-human mammals has been a long-standing goal of comparative genomics. The last year saw substantial progress towards this goal, including the release of genome alignments for 240 mammals and nearly half the primate order. These resources have increased our ability to identify evolutionarily constrained regions of the genome, and together strongly support the importance of these regions to biomedically relevant trait variation in humans. They also provide new strategies for identifying the genetic basis of changes unique to individual lineages, illustrating the value of evolutionary comparative approaches for understanding human health
- A region of suppressed recombination misleads neoavian phylogenomicsSiavash Mirarab, Iker Rivas-González, Shaohong Feng, and 16 more authorsProceedings of the National Academy of Sciences 2024
Genomes are typically mosaics of regions with different evolutionary histories. When speciation events are closely spaced in time, recombination makes the regions sharing the same history small, and the evolutionary history changes rapidly as we move along the genome. When examining rapid radiations such as the early diversification of Neoaves 66 Mya, typically no consistent history is observed across segments exceeding kilobases of the genome. Here, we report an exception. We found that a 21-Mb region in avian genomes, mapped to chicken chromosome 4, shows an extremely strong and discordance-free signal for a history different from that of the inferred species tree. Such a strong discordance-free signal, indicative of suppressed recombination across many millions of base pairs, is not observed elsewhere in the genome for any deep avian relationships. Although long regions with suppressed recombination have been documented in recently diverged species, our results pertain to relationships dating circa 65 Mya. We provide evidence that this strong signal may be due to an ancient rearrangement that blocked recombination and remained polymorphic for several million years prior to fixation. We show that the presence of this region has misled previous phylogenomic efforts with lower taxon sampling, showing the interplay between taxon and locus sampling. We predict that similar ancient rearrangements may confound phylogenetic analyses in other clades, pointing to a need for new analytical models that incorporate the possibility of such events.
- Complexity of avian evolution revealed by family-level genomesJosefin Stiller, Shaohong Feng, Al-Aabid Chowdhury, and 49 more authorsNature 2024
Despite tremendous efforts in the past decades, relationships among main avian lineages remain heavily debated without a clear resolution. Discrepancies have been attributed to diversity of species sampled, phylogenetic method, and the choice of genomic regions 1–3. Here, we address these issues by analyzing genomes of 363 bird species 4 (218 taxonomic families, 92% of total). Using intergenic regions and coalescent methods, we present a well-supported tree but also a remarkable degree of discordance. The tree confirms that Neoaves experienced rapid radiation at or near the Cretaceous–Paleogene (K–Pg) boundary. Sufficient loci rather than extensive taxon sampling were more effective in resolving difficult nodes. Remaining recalcitrant nodes involve species that challenge modeling due to extreme GC content, variable substitution rates, incomplete lineage sorting, or complex evolutionary events such as ancient hybridization. Assessment of the impacts of different genomic partitions showed high heterogeneity across the genome. We discovered sharp increases in effective population size, substitution rates, and relative brain size following the K–Pg extinction event, supporting the hypothesis that emerging ecological opportunities catalyzed the diversification of modern birds. The resulting phylogenetic estimate offers novel insights into the rapid radiation of modern birds and provides a taxon-rich backbone tree for future comparative studies.
- Phase-type distributions in mathematical population genetics: An emerging frameworkAsger Hobolth, Iker Rivas-González, Mogens Bladt, and 1 more authorTheoretical Population Biology 2024
A phase-type distribution is the time to absorption in a continuous- or discrete-time Markov chain. Phase-type distributions can be used as a general framework to calculate key properties of the standard coalescent model and many of its extensions. Here, the ‘phases’ in the phase-type distribution correspond to states in the ancestral process. For example, the time to the most recent common ancestor and the total branch length are phase-type distributed. Furthermore, the site frequency spectrum follows a multivariate discrete phase-type distribution and the joint distribution of total branch lengths in the two-locus coalescent-with-recombination model is multivariate phase-type distributed. In general, phase-type distributions provide a powerful mathematical framework for coalescent theory because they are analytically tractable using matrix manipulations. The purpose of this review is to explain the phase-type theory and demonstrate how the theory can be applied to derive basic properties of coalescent models. These properties can then be used to obtain insight into the ancestral process, or they can be applied for statistical inference. In particular, we show the relation between classical first-step analysis of coalescent models and phase-type calculations. We also show how reward transformations in phase-type theory lead to easy calculation of covariances and correlation coefficients between e.g. tree height, tree length, external branch length, and internal branch length. Furthermore, we discuss how these quantities can be used for statistical inference based on estimating equations. Providing an alternative to previous work based on the Laplace transform, we derive likelihoods for small-size coalescent trees based on phase-type theory. Overall, our main aim is to demonstrate that phase-type distributions provide a convenient general set of tools to understand aspects of coalescent models that are otherwise difficult to derive. Throughout the review, we emphasize the versatility of the phase-type framework, which is also illustrated by our accompanying R-code. All our analyses and figures can be reproduced from code available on GitHub.
- TRAILS: Tree reconstruction of ancestry using incomplete lineage sortingIker Rivas-González, Mikkel H. Schierup, John Wakeley, and 1 more authorPLOS Genetics 2024
Genome-wide genealogies of multiple species carry detailed information about demographic and selection processes on individual branches of the phylogeny. Here, we introduce TRAILS, a hidden Markov model that accurately infers time-resolved population genetics parameters, such as ancestral effective population sizes and speciation times, for ancestral branches using a multi-species alignment of three species and an outgroup. TRAILS leverages the information contained in incomplete lineage sorting fragments by modelling genealogies along the genome as rooted three-leaved trees, each with a topology and two coalescent events happening in discretized time intervals within the phylogeny. Posterior decoding of the hidden Markov model can be used to infer the ancestral recombination graph for the alignment and details on demographic changes within a branch. Since TRAILS performs posterior decoding at the base-pair level, genome-wide scans based on the posterior probabilities can be devised to detect deviations from neutrality. Using TRAILS on a human-chimp-gorilla-orangutan alignment, we recover speciation parameters and extract information about the topology and coalescent times at high resolution.
2023
- Phylogenomic analyses provide insights into primate evolutionYong Shao, Long Zhou, Fang Li, and 40 more authorsScience 2023
Comparative analysis of primate genomes within a phylogenetic context is essential for understanding the evolution of human genetic architecture and primate diversity. We present such a study of 50 primate species spanning 38 genera and 14 families, including 27 genomes first reported here, with many from previously less well represented groups, the New World monkeys and the Strepsirrhini. Our analyses reveal heterogeneous rates of genomic rearrangement and gene evolution across primate lineages. Thousands of genes under positive selection in different lineages play roles in the nervous, skeletal, and digestive systems and may have contributed to primate innovations and adaptations. Our study reveals that many key genomic innovations occurred in the Simiiformes ancestral node and may have had an impact on the adaptive radiation of the Simiiformes and human evolution.
- Pervasive incomplete lineage sorting illuminates speciation and selection in primatesIker Rivas-González, Marjolaine Rousselle, Fang Li, and 7 more authorsScience 2023
Incomplete lineage sorting (ILS) causes the phylogeny of some parts of the genome to differ from the species tree. In this work, we investigate the frequencies and determinants of ILS in 29 major ancestral nodes across the entire primate phylogeny. We find up to 64% of the genome affected by ILS at individual nodes. We exploit ILS to reconstruct speciation times and ancestral population sizes. Estimated speciation times are much more recent than genomic divergence times and are in good agreement with the fossil record. We show extensive variation of ILS along the genome, mainly driven by recombination but also by the distance to genes, highlighting a major impact of selection on variation along the genome. In many nodes, ILS is reduced more on the X chromosome compared with autosomes than expected under neutrality, which suggests higher impacts of natural selection on the X chromosome. Finally, we show an excess of ILS in genes with immune functions and a deficit of ILS in housekeeping genes. The extensive ILS in primates discovered in this study provides insights into the speciation times, ancestral population sizes, and patterns of natural selection that shape primate evolution.
- PhaseTypeR: an R package for phase-type distributions in population geneticsIker Rivas-González, Lars Nørvang Andersen, and Asger HobolthJournal of Open Source Software 2023
Phase-type distributions describe the time until absorption of a continuous or discrete-time Markov chain. The probabilistic properties of phase-type distributions (i.e., the probability density function, cumulative distribution function, quantile function, moments and generating functions) are well-described and analytically tractable using matrix manipulations. Phase-type distributions have traditionally been used in actuarial sciences and queuing theory, and, more recently, in population genetics. In order to facilitate the use of phase-type theory in population genetics, we present PhaseTypeR, a general-purpose and user-friendly R package which contains all key functions —mean, (co)variance, probability density function, cumulative distribution function, quantile function and random sampling— for both continuous and discrete phase-type distributions. Importantly, univariate and multivariate reward transformations are implemented for continuous and discrete phase-type distributions. Multivariate reward transformations have great potential for applications in population genetics, and we have included two examples. The first is concerned with the easy calculation of the variance-covariance matrix for the site frequency spectrum (SFS) of the n-coalescent, and the second is concerned with the correlation between tree heights in the two-locus ancestral recombination graph.
2022
- Incomplete lineage sorting and phenotypic evolution in marsupialsShaohong Feng, Ming Bai, Iker Rivas-González, and 8 more authorsCell 2022
Incomplete lineage sorting (ILS) makes ancestral genetic polymorphisms persist during rapid speciation events, inducing incongruences between gene trees and species trees. ILS has complicated phylogenetic inference in many lineages, including hominids. However, we lack empirical evidence that ILS leads to incongruent phenotypic variation. Here, we performed phylogenomic analyses to show that the South American monito del monte is the sister lineage of all Australian marsupials, although over 31% of its genome is closer to the Diprotodontia than to other Australian groups due to ILS during ancient radiation. Pervasive conflicting phylogenetic signals across the whole genome are consistent with some of the morphological variation among extant marsupials. We detected hundreds of genes that experienced stochastic fixation during ILS, encoding the same amino acids in non-sister species. Using functional experiments, we confirm how ILS may have directly contributed to hemiplasy in morphological traits that were established during rapid marsupial speciation ca. 60 mya.