Incomplete lineage sorting, also termed deep coalescence, retention of ancestral polymorphism, or trans-species polymorphism, describes a phenomenon in population genetics when ancestral gene copies fail to coalesce (looking backwards in time) into a common ancestral copy until deeper than previous speciation events. In other words, the tree produced by a single gene differs from the population or species level tree, producing a discordant tree. As a result, a generated species level tree may differ depending on the selected genes used for assessment. This is in contrast to complete lineage sorting, where the tree produced by the gene is the same as the population or species level tree. Both are common results in phylogenetic analysis, although it depends on the gene, organism, and sampling technique.
The concept of incomplete lineage sorting has some important implications for phylogenetic techniques. The concept itself is somewhat challenging and relies on persistence of polymorphisms across different speciation events. Suppose two subsequent speciation events occur where the ancient species gives rise to species A, and secondly to species B and C. When studying a single gene, it can contain multiple haplotypes (a polymorphism). A haplotype can be lost or fixed in a species by genetic drift. If the ancestral species has 2 haplotypes, species A will contain haplotype 1 and 2, and by genetic drift and divergence by further mutation it can fix haplotype 1a. the lineage between species A and species B and C still contain haplotypes 1 and 2. This lineage has thus incomplete sorting of the gene lineages. In species B haplotype 2 can become fixed, whereas haplotype 1b can become fixed in species C. If the phylogeny of these species is based on these genes, it will not represent the actual relationships between the species. In other words, the most related species will not necessarily inherit the most related haplotypes of genes. This is of course a simplified example and in real research it is usually more complex containing more genes and/or species.
When studying primates, chimpanzees and bonobos are more related to each other than any other taxa and are thus sister taxa. Still, for 1.6% of the bonobo genome, sequences are more closely related to homologues of humans than to chimpanzees, which is probably a result of incomplete lineage sorting.
Incomplete lineage sorting has important implications for phylogenetic research. There is a chance that when creating a phylogenetic tree it may not resemble actual relationships because of this incomplete lineage sorting. However, gene flow between lineages by hybridization or horizontal gene transfer may produce the same conflicting phylogenetic tree. Distinguishing these different processes may seem difficult, but much research and different statistical approaches are (being) developed to gain greater insight in these evolutionary dynamics. One of the resolutions to reduce the implications of incomplete lineage sorting is to use multiple genes for creating species or population phylogenies. The more genes used, the more reliable the phylogeny becomes.
Incomplete lineage sorting commonly happens with sexual reproduction because the species cannot be traced back to a single person or breeding pair. When organism tribe populations are large (i.e. thousands) each gene has some diversity and the gene tree consists of other pre-existing lineages. If the population is bigger these ancestral lineages are going to persist longer. When you get large ancestral populations together with closely timed speciation events these different pieces of DNA retain conflicting affiliations. This makes it hard to determine a common ancestor or points of branching.
In human evolution, incomplete lineage sorting is used to diagram hominin lineages that may have failed to sort out at the same time that speciation occurred in prehistory. Due to the advent of genetic testing and genome sequencing, researchers found that the genetic relationships between hominin lineages might disagree with previous understandings of their relatedness based on physical characteristics. Moreover, divergence of the last common ancestor (LCA) may not necessarily occur at the same time as speciation. Lineage sorting is a method that allows paleoanthropologists to explore the genetic relationships and divergences that may not fit with their previous speciation models based on phylogeny alone.
Incomplete lineage sorting of the human family tree is an area of great interest. There are a number of unknowns when considering both the transition from archaic humans to modern humans and divergence of the other great apes from the hominin lineage.
Using genetic testing we can determine that the human and chimpanzee genome split dates further back than that of the human and gorilla split. What that means is the common ancestor of the human and chimpanzee have left traces of genetic material that can be found in the common ancestor of human, chimpanzee, and gorilla. This makes the most recent common ancestor between gorilla and human. However, the genetic tree slightly differs from that of the species or phylogeny tree. In the phylogeny tree when we look at the evolutionary relationship between the human, bonobo chimpanzee, and gorilla, the results show that the separation of bonobo and chimpanzee transpired in a close proximity of time to the split of the common ancestor, the bonobo-chimpanzee ancestor, and humans. Indicating that humans and chimpanzees shared a common ancestor for several million years after separation from gorillas. This creates the phenomenon that is Incomplete lineage sorting. Today researchers are relying on DNA fragments in order to study the evolutionary relationships among humans and their counterparts in hopes that it provides information about speciation and ancestral processes from genomes from different types of humans.
Incomplete lineage sorting is a common feature in viral phylodynamics, where the phylogeny represented by transmission of a disease from one person to the next, which is to say the population level tree, often doesn't correspond to the tree created from a genetic analysis due to the population bottlenecks that are an inherent feature of viral transmission of disease. Figure 1 illustrates how this can occur. This has relevance to criminal transmission of HIV where in some criminal cases, a phylogenetic analysis of one or two genes from the strains from the accused and the victim have been used to infer transmission; however, the commonality of incomplete lineage sorting means that transmission cannot be inferred solely on the basis of such a basic analysis.