Incomplete lineage sorting


Figure 1. The pretransmission interval and incomplete lineage sorting in the phylogeny of a human-transmissible virus. The shaded tree represents a transmission chain where each region represents the pathogen population in each of three patients. The width of the shaded regions corresponds to the genetic diversity. In this scenario, A infects B with an imperfect transmission bottleneck, and then B infects C. The genealogy at the bottom is reconstructed from a sample of a single lineage from each patient at three distinct time points. When diversity exists in donor A, a pre-transmission interval will occur at each inferred transmission event (MRCA(A,B) precedes transmission from A to B), and the order of transmission events may become randomized in the virus genealogy. Note that the pre-transmission interval also is a random variable defined by the donor’s diversity at time of each transmission. Terminal branch lengths are also elongated due to these processes.

Incomplete lineage sorting,[1][2][3] also termed deep coalescence, retention of ancestral polymorphism, or trans-species polymorphism, describes a phenomenon in population genetics when ancestral gene copies fail to coalesce (looking backwards in time) into a common ancestral copy until deeper than previous speciation events.[4] In other words, the tree produced by a single gene differs from the population or species level tree, producing a discordant tree. As a result, a generated species level tree may differ depending on the selected genes used for assessment.[5][6] This is in contrast to complete lineage sorting, where the tree produced by the gene is the same as the population or species level tree. Both are common results in phylogenetic analysis, although it depends on the gene, organism, and sampling technique.


The concept of incomplete lineage sorting has some important implications for phylogenetic techniques. The concept itself is somewhat challenging and relies on persistence of polymorphisms across different speciation events. Suppose two subsequent speciation events occur where the ancient species gives rise to species A, and secondly to species B and C. When studying a single gene, it can contain multiple haplotypes (a polymorphism). A haplotype can be lost or fixed in a species by genetic drift. If the ancestral species has 2 haplotypes, species A will contain haplotype 1 and 2, and by genetic drift and divergence by further mutation it can fix haplotype 1a. the lineage between species A and species B and C still contain haplotypes 1 and 2. This lineage has thus incomplete sorting of the gene lineages. In species B haplotype 2 can become fixed, whereas haplotype 1b can become fixed in species C. If the phylogeny of these species is based on these genes, it will not represent the actual relationships between the species. In other words, the most related species will not necessarily inherit the most related haplotypes of genes. This is of course a simplified example and in real research it is usually more complex containing more genes and/or species.[7]

When studying primates, chimpanzees and bonobos are more related to each other than any other taxa and are thus sister taxa. Still, for 1.6% of the bonobo genome, sequences are more closely related to homologues of humans than to chimpanzees, which is probably a result of incomplete lineage sorting.[5]


Incomplete lineage sorting has important implications for phylogenetic research. There is a chance that when creating a phylogenetic tree it may not resemble actual relationships because of this incomplete lineage sorting. However, gene flow between lineages by hybridization or horizontal gene transfer may produce the same conflicting phylogenetic tree. Distinguishing these different processes may seem difficult, but much research and different statistical approaches are (being) developed to gain greater insight in these evolutionary dynamics.[8] One of the resolutions to reduce the implications of incomplete lineage sorting is to use multiple genes for creating species or population phylogenies. The more genes used, the more reliable the phylogeny becomes.[7]

In diploid organisms

Incomplete lineage sorting commonly happens with sexual reproduction because the species cannot be traced back to a single person or breeding pair. When organism tribe populations are large (i.e. thousands) each gene has some diversity and the gene tree consists of other pre-existing lineages. If the population is bigger these ancestral lineages are going to persist longer. When you get large ancestral populations together with closely timed speciation events these different pieces of DNA retain conflicting affiliations. This makes it hard to determine a common ancestor or points of branching.[5]

In Human Evolution

In human evolution, incomplete lineage sorting is used to diagram hominin lineages that may have failed to sort out at the same time that speciation occurred in prehistory.[9] Due to the advent of genetic testing and genome sequencing, researchers found that the genetic relationships between hominin lineages might disagree with previous understandings of their relatedness based on physical characteristics.[9] Moreover, divergence of the last common ancestor (LCA) may not necessarily occur at the same time as speciation.[10] Lineage sorting is a method that allows paleoanthropologists to explore the genetic relationships and divergences that may not fit with their previous speciation models based on phylogeny alone.[9]

Incomplete lineage sorting of the human family tree is an area of great interest. There are a number of unknowns when considering both the transition from archaic humans to modern humans and divergence of the other great apes from the hominin lineage.[11]

Ape and Hominin / Human divergence

Using genetic testing we can determine that the human and chimpanzee genome split dates further back than that of the human and gorilla split. What that means is the common ancestor of the human and chimpanzee have left traces of genetic material that can be found in the common ancestor of human, chimpanzee, and gorilla. This makes the most recent common ancestor between gorilla and human.[10] However, the genetic tree slightly differs from that of the species or phylogeny tree.[12] In the phylogeny tree when we look at the evolutionary relationship between the human, bonobo chimpanzee, and gorilla, the results show that the separation of bonobo and chimpanzee transpired in a close proximity of time to the split of the common ancestor, the bonobo-chimpanzee ancestor, and humans.[10] Indicating that humans and chimpanzees shared a common ancestor for several million years after separation from gorillas. This creates the phenomenon that is Incomplete lineage sorting. Today researchers are relying on DNA fragments in order to study the evolutionary relationships among humans and their counterparts in hopes that it provides information about speciation and ancestral processes from genomes from different types of humans.[13]


  1. ^ Simpson, Michael G (2010-07-19). Plant Systematics. ISBN 9780080922089.
  2. ^ Kuritzin, A; Kischka, T; Schmitz, J; Churakov, G (2016). "Incomplete Lineage Sorting and Hybridization Statistics for Large-Scale Retroposon Insertion Data". PLOS Computational Biology. 12 (3): e1004812. Bibcode:2016PLSCB..12E4812K. doi:10.1371/journal.pcbi.1004812. PMC 4788455. PMID 26967525.
  3. ^ Suh, A; Smeds, L; Ellegren, H (2015). "The Dynamics of Incomplete Lineage Sorting across the Ancient Adaptive Radiation of Neoavian Birds". PLOS Biology. 13 (8): e1002224. doi:10.1371/journal.pbio.1002224. PMC 4540587. PMID 26284513.
  4. ^ Maddison, Wayne P. (1997-09-01). Wiens, John J. (ed.). "Gene Trees in Species Trees". Systematic Biology. Oxford University Press (OUP). 46 (3): 523–536. doi:10.1093/sysbio/46.3.523. ISSN 1076-836X.
  5. ^ a b c Rogers, Jeffrey; Gibbs, Richard A. (2014-05-01). "Comparative primate genomics: emerging patterns of genome content and dynamics". Nature Reviews Genetics. 15 (5): 347–359. doi:10.1038/nrg3707. PMC 4113315. PMID 24709753.
  6. ^ Shen, Xing-Xing; Hittinger, Chris Todd; Rokas, Antonis (2017). "Contentious relationships in phylogenomic studies can be driven by a handful of genes". Nature Ecology & Evolution. 1 (5): 126. doi:10.1038/s41559-017-0126. ISSN 2397-334X. PMC 5560076. PMID 28812701.
  7. ^ a b Futuyma, Douglas J. (2013-07-15). Evolution (3rd ed.). Sunderland, Massachusetts U.S.A. ISBN 9781605351155. OCLC 824532153.
  8. ^ Warnow, Tandy; Bayzid, Md Shamsuzzoha; Mirarab, Siavash (2016-05-01). "Evaluating Summary Methods for Multilocus Species Tree Estimation in the Presence of Incomplete Lineage Sorting". Systematic Biology. 65 (3): 366–380. doi:10.1093/sysbio/syu063. ISSN 1063-5157. PMID 25164915.
  9. ^ a b c Maddison, Wayne P. (1997-09-01). "Gene Trees in Species Trees". Systematic Biology. 46 (3): 523–536. doi:10.1093/sysbio/46.3.523. ISSN 1076-836X.
  10. ^ a b c Mailund, Thomas; Munch, Kasper; Schierup, Mikkel Heide (2014-11-23). "Lineage Sorting in Apes". Annual Review of Genetics. 48 (1): 519–535. doi:10.1146/annurev-genet-120213-092532. ISSN 0066-4197.
  11. ^ Nichols, Richard (July 2001). "Gene trees and species trees are not the same". Trends in Ecology & Evolution. 16 (7): 358–364. doi:10.1016/s0169-5347(01)02203-0. ISSN 0169-5347.
  12. ^ "Primate Speciation: A Case Study of African Apes | Learn Science at Scitable". Retrieved 2020-05-30.
  13. ^ Peyrégne, Stéphane; Boyle, Michael James; Dannemann, Michael; Prüfer, Kay (September 2017). "Detecting ancient positive selection in humans using extended lineage sorting". Genome Research. 27 (9): 1563–1572. doi:10.1101/gr.219493.116. ISSN 1088-9051. PMC 5580715. PMID 28720580.

In viruses

Incomplete lineage sorting is a common feature in viral phylodynamics, where the phylogeny represented by transmission of a disease from one person to the next, which is to say the population level tree, often doesn't correspond to the tree created from a genetic analysis due to the population bottlenecks that are an inherent feature of viral transmission of disease. Figure 1 illustrates how this can occur. This has relevance to criminal transmission of HIV where in some criminal cases, a phylogenetic analysis of one or two genes from the strains from the accused and the victim have been used to infer transmission; however, the commonality of incomplete lineage sorting means that transmission cannot be inferred solely on the basis of such a basic analysis.[1]

See also


  1. ^ Leitner, Thomas (May 2019). "Phylogenetics in HIV transmission: taking within-host diversity into account". Current Opinion in HIV and AIDS. 14 (3): 181–187. doi:10.1097/COH.0000000000000536. ISSN 1746-630X. PMC 6449181. PMID 30920395.

External links

  • Venema, D. (2013-08-01). "Evolution Basics: Incomplete Lineage Sorting and Ancestral Population Sizes". BioLogos. Retrieved 29 June 2018.
  • Maddison, Wayne P.; Knowles, L. Lacey; Collins, Tim (2006). "Inferring Phylogeny Despite Incomplete Lineage Sorting". Systematic Biology. 55 (1): 21–30. doi:10.1080/10635150500354928. ISSN 1076-836X. PMID 16507521.
  • Joly, Simon; McLenachan, Patricia A.; Lockhart, Peter J. (2009). "A Statistical Approach for Distinguishing Hybridization and Incomplete Lineage Sorting". The American Naturalist. 174 (2): E54–E70. doi:10.1086/600082. PMID 19519219.
  • Carstens, Bryan C.; Knowles, L. Lacey; Collins, Tim (2007). "Estimating Species Phylogeny from Gene-Tree Probabilities Despite Incomplete Lineage Sorting: An Example from Melanoplus Grasshoppers". Systematic Biology. 56 (3): 400–411. doi:10.1080/10635150701405560. ISSN 1076-836X. PMID 17520504.
  • Scornavacca, C.; Galtier, N. (2017). "Incomplete lineage sorting in mammalian phylogenomics". Systematic Biology. 66 (1): 112–120. doi:10.1093/sysbio/syw082. PMID 28173480.