publications
2024
- Complete sequencing of ape genomesDongAhn Yoo, Arang Rhie, Prajna Hebbar, and 120 more authorsbioRxiv 2024
We present haplotype-resolved reference genomes and comparative analyses of six ape species, namely: chimpanzee, bonobo, gorilla, Bornean orangutan, Sumatran orangutan, and siamang. We achieve chromosome-level contiguity with unparalleled sequence accuracy (<1 error in 500,000 base pairs), completely sequencing 215 gapless chromosomes telomere-to-telomere. We resolve challenging regions, such as the major histocompatibility complex and immunoglobulin loci, providing more in-depth evolutionary insights. Comparative analyses, including human, allow us to investigate the evolution and diversity of regions previously uncharacterized or incompletely studied without bias from mapping to the human reference. This includes newly minted gene families within lineage-specific segmental duplications, centromeric DNA, acrocentric chromosomes, and subterminal heterochromatin. This resource should serve as a definitive baseline for all future evolutionary studies of humans and our closest living ape relatives.
@article{yoo2024apet2t, author = {Yoo, DongAhn and Rhie, Arang and Hebbar, Prajna and Antonacci, Francesca and Logsdon, Glennis A. and Solar, Steven J. and Antipov, Dmitry and Pickett, Brandon D. and Safonova, Yana and Montinaro, Francesco and Luo, Yanting and Malukiewicz, Joanna and Storer, Jessica M. and Lin, Jiadong and Sequeira, Abigail N. and Mangan, Riley J. and Hickey, Glenn and Anez, Graciela Monfort and Balachandran, Parithi and Bankevich, Anton and Beck, Christine R. and Biddanda, Arjun and Borchers, Matthew and Bouffard, Gerard G. and Brannan, Emry and Brooks, Shelise Y. and Carbone, Lucia and Carrel, Laura and Chan, Agnes P. and Crawford, Juyun and Diekhans, Mark and Engelbrecht, Eric and Feschotte, Cedric and Formenti, Giulio and Garcia, Gage H. and Gennaro, Luciana de and Gilbert, David and Green, Richard E. and Guarracino, Andrea and Gupta, Ishaan and Haddad, Diana and Han, Junmin and Harris, Robert S. and Hartley, Gabrielle A. and Harvey, William T. and Hiller, Michael and Hoekzema, Kendra and Houck, Marlys L. and Jeong, Hyeonsoo and Kamali, Kaivan and Kellis, Manolis and Kille, Bryce and Lee, Chul and Lee, Youngho and Lees, William and Lewis, Alexandra P. and Li, Qiuhui and Loftus, Mark and Loh, Yong Hwee Eddie and Loucks, Hailey and Ma, Jian and Mao, Yafei and Martinez, Juan F. I. and Masterson, Patrick and McCoy, Rajiv C. and McGrath, Barbara and McKinney, Sean and Meyer, Britta S. and Miga, Karen H. and Mohanty, Saswat K. and Munson, Katherine M. and Pal, Karol and Pennell, Matt and Pevzner, Pavel A. and Porubsky, David and Potapova, Tamara and Ringeling, Francisca R. and Rocha, Joana L. and Ryder, Oliver A. and Sacco, Samuel and Saha, Swati and Sasaki, Takayo and Schatz, Michael C. and Schork, Nicholas J. and Shanks, Cole and Smeds, Linn{\'e}a and Son, Dongmin R. and Steiner, Cynthia and Sweeten, Alexander P. and Tassia, Michael G. and Thibaud-Nissen, Fran{\c c}oise and Torres-Gonz{\'a}lez, Edmundo and Trivedi, Mihir and Wei, Wenjie and Wertz, Julie and Yang, Muyu and Zhang, Panpan and Zhang, Shilong and Zhang, Yang and Zhang, Zhenmiao and Zhao, Sarah A. and Zhu, Yixin and Jarvis, Erich D. and Gerton, Jennifer L. and Rivas-Gonz{\'a}lez, Iker and Paten, Benedict and Szpiech, Zachary A. and Huber, Christian D. and Lenz, Tobias L. and Konkel, Miriam K. and Yi, Soojin V. and Canzar, Stefan and Watson, Corey T. and Sudmant, Peter H. and Molloy, Erin and Garrison, Erik and Lowe, Craig B. and Ventura, Mario and O{\textquoteright}Neill, Rachel J. and Koren, Sergey and Makova, Kateryna D. and Phillippy, Adam M. and Eichler, Evan E.}, title = {Complete sequencing of ape genomes}, elocation-id = {2024.07.31.605654}, year = {2024}, doi = {10.1101/2024.07.31.605654}, publisher = {Cold Spring Harbor Laboratory}, journal = {bioRxiv}, }
- A multi-million-year natural experiment: comparative genomics on a massive scale and its implications for human healthIker Rivas-González, and Jenny TungEvolution, Medicine, and Public Health 2024
Improving the diversity and quality of genome assemblies for non-human mammals has been a long-standing goal of comparative genomics. The last year saw substantial progress towards this goal, including the release of genome alignments for 240 mammals and nearly half the primate order. These resources have increased our ability to identify evolutionarily constrained regions of the genome, and together strongly support the importance of these regions to biomedically relevant trait variation in humans. They also provide new strategies for identifying the genetic basis of changes unique to individual lineages, illustrating the value of evolutionary comparative approaches for understanding human health
@article{rivas-gonzalez2024comparative, author = {Rivas-González, Iker and Tung, Jenny}, title = {{A multi-million-year natural experiment: comparative genomics on a massive scale and its implications for human health}}, journal = {Evolution, Medicine, and Public Health}, pages = {eoae006}, year = {2024}, month = apr, issn = {2050-6201}, doi = {10.1093/emph/eoae006}, }
- A region of suppressed recombination misleads neoavian phylogenomicsSiavash Mirarab, Iker Rivas-González, Shaohong Feng, and 16 more authorsProceedings of the National Academy of Sciences 2024
Genomes are typically mosaics of regions with different evolutionary histories. When speciation events are closely spaced in time, recombination makes the regions sharing the same history small, and the evolutionary history changes rapidly as we move along the genome. When examining rapid radiations such as the early diversification of Neoaves 66 Mya, typically no consistent history is observed across segments exceeding kilobases of the genome. Here, we report an exception. We found that a 21-Mb region in avian genomes, mapped to chicken chromosome 4, shows an extremely strong and discordance-free signal for a history different from that of the inferred species tree. Such a strong discordance-free signal, indicative of suppressed recombination across many millions of base pairs, is not observed elsewhere in the genome for any deep avian relationships. Although long regions with suppressed recombination have been documented in recently diverged species, our results pertain to relationships dating circa 65 Mya. We provide evidence that this strong signal may be due to an ancient rearrangement that blocked recombination and remained polymorphic for several million years prior to fixation. We show that the presence of this region has misled previous phylogenomic efforts with lower taxon sampling, showing the interplay between taxon and locus sampling. We predict that similar ancient rearrangements may confound phylogenetic analyses in other clades, pointing to a need for new analytical models that incorporate the possibility of such events.
@article{mirarab2024region, volume = {121}, issn = {0027-8424, 1091-6490}, doi = {10.1073/pnas.2319506121}, language = {en}, number = {15}, urldate = {2024-04-02}, journal = {Proceedings of the National Academy of Sciences}, author = {Mirarab, Siavash and Rivas-González, Iker and Feng, Shaohong and Stiller, Josefin and Fang, Qi and Mai, Uyen and Hickey, Glenn and Chen, Guangji and Brajuka, Nadolina and Fedrigo, Olivier and Formenti, Giulio and Wolf, Jochen B. W. and Howe, Kerstin and Antunes, Agostinho and Schierup, Mikkel H. and Paten, Benedict and Jarvis, Erich D. and Zhang, Guojie and Braun, Edward L.}, year = {2024}, pages = {e2319506121}, }
- Complexity of avian evolution revealed by family-level genomesJosefin Stiller, Shaohong Feng, Al-Aabid Chowdhury, and 49 more authorsNature 2024
Despite tremendous efforts in the past decades, relationships among main avian lineages remain heavily debated without a clear resolution. Discrepancies have been attributed to diversity of species sampled, phylogenetic method, and the choice of genomic regions 1–3. Here, we address these issues by analyzing genomes of 363 bird species 4 (218 taxonomic families, 92% of total). Using intergenic regions and coalescent methods, we present a well-supported tree but also a remarkable degree of discordance. The tree confirms that Neoaves experienced rapid radiation at or near the Cretaceous–Paleogene (K–Pg) boundary. Sufficient loci rather than extensive taxon sampling were more effective in resolving difficult nodes. Remaining recalcitrant nodes involve species that challenge modeling due to extreme GC content, variable substitution rates, incomplete lineage sorting, or complex evolutionary events such as ancient hybridization. Assessment of the impacts of different genomic partitions showed high heterogeneity across the genome. We discovered sharp increases in effective population size, substitution rates, and relative brain size following the K–Pg extinction event, supporting the hypothesis that emerging ecological opportunities catalyzed the diversification of modern birds. The resulting phylogenetic estimate offers novel insights into the rapid radiation of modern birds and provides a taxon-rich backbone tree for future comparative studies.
@article{stiller2024avian, title = {Complexity of avian evolution revealed by family-level genomes}, issn = {1476-4687}, doi = {10.1038/s41586-024-07323-1}, journal = {Nature}, author = {Stiller, Josefin and Feng, Shaohong and Chowdhury, Al-Aabid and Rivas-González, Iker and Duchêne, David A. and Fang, Qi and Deng, Yuan and Kozlov, Alexey and Stamatakis, Alexandros and Claramunt, Santiago and Nguyen, Jacqueline M. T. and Ho, Simon Y. W. and Faircloth, Brant C. and Haag, Julia and Houde, Peter and Cracraft, Joel and Balaban, Metin and Mai, Uyen and Chen, Guangji and Gao, Rongsheng and Zhou, Chengran and Xie, Yulong and Huang, Zijian and Cao, Zhen and Yan, Zhi and Ogilvie, Huw A. and Nakhleh, Luay and Lindow, Bent and Morel, Benoit and Fjeldså, Jon and Hosner, Peter A. and da Fonseca, Rute R. and Petersen, Bent and Tobias, Joseph A. and Székely, Tamás and Kennedy, Jonathan David and Reeve, Andrew Hart and Liker, Andras and Stervander, Martin and Antunes, Agostinho and Tietze, Dieter Thomas and Bertelsen, Mads and Lei, Fumin and Rahbek, Carsten and Graves, Gary R. and Schierup, Mikkel H. and Warnow, Tandy and Braun, Edward L. and Gilbert, M. Thomas P. and Jarvis, Erich D. and Mirarab, Siavash and Zhang, Guojie}, year = {2024}, }
- Phase-type distributions in mathematical population genetics: An emerging frameworkAsger Hobolth, Iker Rivas-González, Mogens Bladt, and 1 more authorTheoretical Population Biology 2024
A phase-type distribution is the time to absorption in a continuous- or discrete-time Markov chain. Phase-type distributions can be used as a general framework to calculate key properties of the standard coalescent model and many of its extensions. Here, the ‘phases’ in the phase-type distribution correspond to states in the ancestral process. For example, the time to the most recent common ancestor and the total branch length are phase-type distributed. Furthermore, the site frequency spectrum follows a multivariate discrete phase-type distribution and the joint distribution of total branch lengths in the two-locus coalescent-with-recombination model is multivariate phase-type distributed. In general, phase-type distributions provide a powerful mathematical framework for coalescent theory because they are analytically tractable using matrix manipulations. The purpose of this review is to explain the phase-type theory and demonstrate how the theory can be applied to derive basic properties of coalescent models. These properties can then be used to obtain insight into the ancestral process, or they can be applied for statistical inference. In particular, we show the relation between classical first-step analysis of coalescent models and phase-type calculations. We also show how reward transformations in phase-type theory lead to easy calculation of covariances and correlation coefficients between e.g. tree height, tree length, external branch length, and internal branch length. Furthermore, we discuss how these quantities can be used for statistical inference based on estimating equations. Providing an alternative to previous work based on the Laplace transform, we derive likelihoods for small-size coalescent trees based on phase-type theory. Overall, our main aim is to demonstrate that phase-type distributions provide a convenient general set of tools to understand aspects of coalescent models that are otherwise difficult to derive. Throughout the review, we emphasize the versatility of the phase-type framework, which is also illustrated by our accompanying R-code. All our analyses and figures can be reproduced from code available on GitHub.
@article{hobolth2024phasetype, title = {Phase-type distributions in mathematical population genetics: An emerging framework}, journal = {Theoretical Population Biology}, year = {2024}, issn = {0040-5809}, doi = {https://doi.org/10.1016/j.tpb.2024.03.001}, author = {Hobolth, Asger and Rivas-González, Iker and Bladt, Mogens and Futschik, Andreas}, keywords = {Coalescent, Laplace transform, Likelihood inference, Phase–type theory, Population genetics, Reward transformation}, }
- TRAILS: Tree reconstruction of ancestry using incomplete lineage sortingIker Rivas-González, Mikkel H. Schierup, John Wakeley, and 1 more authorPLOS Genetics 2024
Genome-wide genealogies of multiple species carry detailed information about demographic and selection processes on individual branches of the phylogeny. Here, we introduce TRAILS, a hidden Markov model that accurately infers time-resolved population genetics parameters, such as ancestral effective population sizes and speciation times, for ancestral branches using a multi-species alignment of three species and an outgroup. TRAILS leverages the information contained in incomplete lineage sorting fragments by modelling genealogies along the genome as rooted three-leaved trees, each with a topology and two coalescent events happening in discretized time intervals within the phylogeny. Posterior decoding of the hidden Markov model can be used to infer the ancestral recombination graph for the alignment and details on demographic changes within a branch. Since TRAILS performs posterior decoding at the base-pair level, genome-wide scans based on the posterior probabilities can be devised to detect deviations from neutrality. Using TRAILS on a human-chimp-gorilla-orangutan alignment, we recover speciation parameters and extract information about the topology and coalescent times at high resolution.
@article{rivas-gonzalez2024trails, title = {TRAILS: Tree reconstruction of ancestry using incomplete lineage sorting}, volume = {20}, issn = {1553-7404}, doi = {10.1371/journal.pgen.1010836}, number = {2}, journal = {PLOS Genetics}, publisher = {Public Library of Science (PLoS)}, author = {Rivas-Gonz{\'a}lez, Iker and Schierup, Mikkel H. and Wakeley, John and Hobolth, Asger}, editor = {Palamara, Pier Francesco}, year = {2024}, pages = {e1010836}, }
2023
- Phylogenomic analyses provide insights into primate evolutionYong Shao, Long Zhou, Fang Li, and 40 more authorsScience 2023
Comparative analysis of primate genomes within a phylogenetic context is essential for understanding the evolution of human genetic architecture and primate diversity. We present such a study of 50 primate species spanning 38 genera and 14 families, including 27 genomes first reported here, with many from previously less well represented groups, the New World monkeys and the Strepsirrhini. Our analyses reveal heterogeneous rates of genomic rearrangement and gene evolution across primate lineages. Thousands of genes under positive selection in different lineages play roles in the nervous, skeletal, and digestive systems and may have contributed to primate innovations and adaptations. Our study reveals that many key genomic innovations occurred in the Simiiformes ancestral node and may have had an impact on the adaptive radiation of the Simiiformes and human evolution.
@article{shao2023phylogenomic, author = {Shao, Yong and Zhou, Long and Li, Fang and Zhao, Lan and Zhang, Bao-Lin and Shao, Feng and Chen, Jia-Wei and Chen, Chun-Yan and Bi, Xupeng and Zhuang, Xiao-Lin and Zhu, Hong-Liang and Hu, Jiang and Sun, Zongyi and Li, Xin and Wang, Depeng and Rivas-González, Iker and Wang, Sheng and Wang, Yun-Mei and Chen, Wu and Li, Gang and Lu, Hui-Meng and Liu, Yang and Kuderna, Lukas F. K. and Farh, Kyle Kai-How and Fan, Peng-Fei and Yu, Li and Li, Ming and Liu, Zhi-Jin and Tiley, George P. and Yoder, Anne D. and Roos, Christian and Hayakawa, Takashi and Marques-Bonet, Tomas and Rogers, Jeffrey and Stenson, Peter D. and Cooper, David N. and Schierup, Mikkel Heide and Yao, Yong-Gang and Zhang, Ya-Ping and Wang, Wen and Qi, Xiao-Guang and Zhang, Guojie and Wu, Dong-Dong}, title = {Phylogenomic analyses provide insights into primate evolution}, journal = {Science}, volume = {380}, number = {6648}, pages = {913-924}, year = {2023}, doi = {10.1126/science.abn6919}, }
- Pervasive incomplete lineage sorting illuminates speciation and selection in primatesIker Rivas-González, Marjolaine Rousselle, Fang Li, and 7 more authorsScience 2023
Incomplete lineage sorting (ILS) causes the phylogeny of some parts of the genome to differ from the species tree. In this work, we investigate the frequencies and determinants of ILS in 29 major ancestral nodes across the entire primate phylogeny. We find up to 64% of the genome affected by ILS at individual nodes. We exploit ILS to reconstruct speciation times and ancestral population sizes. Estimated speciation times are much more recent than genomic divergence times and are in good agreement with the fossil record. We show extensive variation of ILS along the genome, mainly driven by recombination but also by the distance to genes, highlighting a major impact of selection on variation along the genome. In many nodes, ILS is reduced more on the X chromosome compared with autosomes than expected under neutrality, which suggests higher impacts of natural selection on the X chromosome. Finally, we show an excess of ILS in genes with immune functions and a deficit of ILS in housekeeping genes. The extensive ILS in primates discovered in this study provides insights into the speciation times, ancestral population sizes, and patterns of natural selection that shape primate evolution.
@article{rivas-gonzalez2023pervasive, author = {Rivas-González, Iker and Rousselle, Marjolaine and Li, Fang and Zhou, Long and Dutheil, Julien Y. and Munch, Kasper and Shao, Yong and Wu, Dongdong and Schierup, Mikkel H. and Zhang, Guojie}, title = {Pervasive incomplete lineage sorting illuminates speciation and selection in primates}, journal = {Science}, volume = {380}, number = {6648}, pages = {eabn4409}, year = {2023}, doi = {10.1126/science.abn4409}, }
- PhaseTypeR: an R package for phase-type distributions in population geneticsIker Rivas-González, Lars Nørvang Andersen, and Asger HobolthJournal of Open Source Software 2023
Phase-type distributions describe the time until absorption of a continuous or discrete-time Markov chain. The probabilistic properties of phase-type distributions (i.e., the probability density function, cumulative distribution function, quantile function, moments and generating functions) are well-described and analytically tractable using matrix manipulations. Phase-type distributions have traditionally been used in actuarial sciences and queuing theory, and, more recently, in population genetics. In order to facilitate the use of phase-type theory in population genetics, we present PhaseTypeR, a general-purpose and user-friendly R package which contains all key functions —mean, (co)variance, probability density function, cumulative distribution function, quantile function and random sampling— for both continuous and discrete phase-type distributions. Importantly, univariate and multivariate reward transformations are implemented for continuous and discrete phase-type distributions. Multivariate reward transformations have great potential for applications in population genetics, and we have included two examples. The first is concerned with the easy calculation of the variance-covariance matrix for the site frequency spectrum (SFS) of the n-coalescent, and the second is concerned with the correlation between tree heights in the two-locus ancestral recombination graph.
@article{rivas-gonzalez2023phasetyper, author = {Rivas-González, Iker and Andersen, Lars Nørvang and Hobolth, Asger}, title = {{PhaseTypeR}: an {R} package for phase-type distributions in population genetics}, journal = {Journal of Open Source Software}, volume = {8}, number = {82}, pages = {5054}, year = {2023}, publisher = {The Open Journal}, doi = {10.21105/joss.05054}, }
2022
- Incomplete lineage sorting and phenotypic evolution in marsupialsShaohong Feng, Ming Bai, Iker Rivas-González, and 8 more authorsCell 2022
Incomplete lineage sorting (ILS) makes ancestral genetic polymorphisms persist during rapid speciation events, inducing incongruences between gene trees and species trees. ILS has complicated phylogenetic inference in many lineages, including hominids. However, we lack empirical evidence that ILS leads to incongruent phenotypic variation. Here, we performed phylogenomic analyses to show that the South American monito del monte is the sister lineage of all Australian marsupials, although over 31% of its genome is closer to the Diprotodontia than to other Australian groups due to ILS during ancient radiation. Pervasive conflicting phylogenetic signals across the whole genome are consistent with some of the morphological variation among extant marsupials. We detected hundreds of genes that experienced stochastic fixation during ILS, encoding the same amino acids in non-sister species. Using functional experiments, we confirm how ILS may have directly contributed to hemiplasy in morphological traits that were established during rapid marsupial speciation ca. 60 mya.
@article{feng2022incomplete, title = {Incomplete lineage sorting and phenotypic evolution in marsupials}, author = {Feng, Shaohong and Bai, Ming and Rivas-Gonz{\'a}lez, Iker and Li, Cai and Liu, Shiping and Tong, Yijie and Yang, Haidong and Chen, Guangji and Xie, Duo and Sears, Karen E and others}, journal = {Cell}, volume = {185}, number = {10}, pages = {1646--1660}, year = {2022}, publisher = {Elsevier}, doi = {10.1016/j.cell.2022.03.034}, }