Perfect phylogeny

Summary

Perfect phylogeny is a term used in computational phylogenetics to denote a phylogenetic tree in which all internal nodes may be labeled such that all characters evolve down the tree without homoplasy. That is, characteristics do not hold to evolutionary convergence, and do not have analogous structures. Statistically, this can be represented as an ancestor having state "0" in all characteristics where 0 represents a lack of that characteristic. Each of these characteristics changes from 0 to 1 exactly once and never reverts to state 0. It is rare that actual data adheres to the concept of perfect phylogeny.[1][2]

Building edit

In general there are two different data types that are used in the construction of a phylogenetic tree. In distance-based computations a phylogenetic tree is created by analyzing relationships among the distance between species and the edge lengths of a corresponding tree. Using a character-based approach employs character states across species as an input in an attempt to find the most "perfect" phylogenetic tree.[3][4]

The statistical components of a perfect phylogenetic tree can best be described as follows:[3]

A perfect phylogeny for an n x m character state matrix M is a rooted tree T with n leaves satisfying:


i. Each row of M labels exactly one leaf of T
ii. Each column of M labels exactly one edge of T
iii. Every interior edge of T is labeled by at least one column of M

iv. The characters associated with the edges along the unique path from root to a leaf v exactly specify the character vector of v, i.e. the character vector has a 1 entry in all columns corresponding to characters associated to path edges and a 0 entry otherwise.

It is worth noting that it is very rare to find actual phylogenetic data that adheres to the concepts and limitations detailed here. Therefore, it is often the case that researchers are forced to compromise by developing trees that simply try to minimize homoplasy, finding a maximum-cardinality set of compatible characters, or constructing phylogenies that match as closely as possible to the partitions implied by the characters.

Example edit

Both of these data sets illustrate examples of character state matrices. Using matrix M'1 one is able to observe that the resulting phylogenetic tree can be created such that each of the characters label exactly one edge of the tree. In contrast, when observing matrix M'2, one can see that there is no way to set up the phylogenetic tree such that each character labels only one edge length.[3] If the samples come from variant allelic frequency (VAF) data of a population of cells under study, the entries in the character matrix are frequencies of mutations, and take a value between 0 and 1. Namely, if   represents a position in the genome, then the entry corresponding to   and sample   will hold the frequencies of genomes in sample   with a mutation in position  .[5][6][7][8][9]

Usage edit

Perfect phylogeny is a theoretical framework that can also be used in more practical methods. One such example is that of Incomplete Directed Perfect Phylogeny. This concept involves utilizing perfect phylogenies with real, and therefore incomplete and imperfect, datasets. Such a method utilizes SINEs to determine evolutionary similarity. These Short Interspersed Elements are present across many genomes and can be identified by their flanking sequences. SINEs provide information on the inheritance of certain traits across different species. Unfortunately, if a SINE is missing it is difficult to know whether those SINEs were present prior to the deletion. By utilizing algorithms derived from perfect phylogeny data we are able to attempt to reconstruct a phylogenetic tree in spite of these limitations.[10]

Perfect phylogeny is also used in the construction of haplotype maps. By utilizing the concepts and algorithms described in perfect phylogeny one can determine information regarding missing and unavailable haplotype data.[11] By assuming that the set of haplotypes that result from genotype mapping corresponds and adheres to the concept of perfect phylogeny (as well as other assumptions such as perfect Mendelian inheritance and the fact that there is only one mutation per SNP), one is able to infer missing haplotype data.[12][13][14] [15]

Inferring a phylogeny from noisy VAF data under the PPM is a hard problem.[5] Most inference tools include some heuristic step to make inference computationally tractable. Examples of tools that infer phylogenies from noisy VAF data include AncesTree, Canopy, CITUP, EXACT, and PhyloWGS.[5][6][7][8][9] In particular, EXACT performs exact inference by using GPUs to compute a posterior probability on all possible trees for small size problems. Extensions to the PPM have been made with accompanying tools.[16][17] For example, tools such as MEDICC, TuMult, and FISHtrees allow the number of copies of a given genetic element, or ploidy, to both increase, or decrease, thus effectively allowing the removal of mutations.[18][19][20]

External links edit

  • List of phylogenetics software
  • One of several programs available for analysis and creation of phylogenetic trees
  • Another such program for phylogenetic tree analysis
  • Additional program for tree analysis Archived 2018-03-14 at the Wayback Machine
  • A paper detailing an example of how perfect phylogeny can be utilized outside of the field of genetics, as in language association
  • Github for "Algorithm for clonal tree reconstruction from multi-sample cancer sequencing data" (AncesTree)
  • Github for "Accessing Intra-Tumor Heterogeneity and Tracking Longitudinal and Spatial Clonal Evolutionary History by Next-Generation Sequencing" (Canopy)
  • Github for "Clonality Inference in Tumors Using Phylogeny" (CITUP)
  • Github for "Exact inference under the perfect phylogeny model" (EXACT)
  • Github for "Reconstructing subclonal composition and evolution from whole-genome sequencing of tumors" (PhyloWGS)

References edit

  1. ^ Fernandez-Baca D. "The Perfect Phylogeny Problem" (PDF). Kluwer Academic Publishers. Retrieved 30 September 2012.
  2. ^ Nakhleh L, Ringe D, Warnow T. "Perfect Phylogenetic Networks: A New Methodology for Reconstructing the Evolutionary History of Natural Languages" (PDF). Retrieved 1 October 2012.
  3. ^ a b c Uhler C. "Finding a Perfect Phylogeny" (PDF). Archived from the original (PDF) on 4 March 2016. Retrieved 29 September 2012.
  4. ^ Nikaido M, Rooney AP, Okada N (August 1999). "Phylogenetic relationships among cetartiodactyls based on insertions of short and long interpersed elements: hippopotamuses are the closest extant relatives of whales". Proceedings of the National Academy of Sciences of the United States of America. 96 (18): 10261–6. Bibcode:1999PNAS...9610261N. doi:10.1073/pnas.96.18.10261. PMC 17876. PMID 10468596.
  5. ^ a b c El-Kebir M, Oesper L, Acheson-Field H, Raphael BJ (June 2015). "Reconstruction of clonal trees and tumor composition from multi-sample sequencing data". Bioinformatics. 31 (12): i62-70. doi:10.1093/bioinformatics/btv261. PMC 4542783. PMID 26072510.
  6. ^ a b Satas G, Raphael BJ (July 2017). "Tumor phylogeny inference using tree-constrained importance sampling". Bioinformatics. 33 (14): i152–i160. doi:10.1093/bioinformatics/btx270. PMC 5870673. PMID 28882002.
  7. ^ a b Malikic S, McPherson AW, Donmez N, Sahinalp CS (May 2015). "Clonality inference in multiple tumor samples using phylogeny". Bioinformatics. 31 (9): 1349–56. doi:10.1093/bioinformatics/btv003. PMID 25568283.
  8. ^ a b Ray S, Jia B, Safavi S, van Opijnen T, Isberg R, Rosch J, Bento J (2019-08-22). "Exact inference under the perfect phylogeny model". arXiv:1908.08623 [q-bio.QM].
  9. ^ a b Deshwar AG, Vembu S, Yung CK, Jang GH, Stein L, Morris Q (February 2015). "PhyloWGS: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors". Genome Biology. 16 (1): 35. doi:10.1186/s13059-015-0602-8. PMC 4359439. PMID 25786235.
  10. ^ Pe'er I, Pupko T, Shamir R, Sharan R. "Incomplete Directed Perfect Phylogeny". Tel-Aviv University. Archived from the original on 20 October 2013. Retrieved 30 October 2012.{{cite web}}: CS1 maint: bot: original URL status unknown (link)
  11. ^ Eskin E, Halperin E, Karp RM (April 2003). "Efficient reconstruction of haplotype structure via perfect phylogeny" (PDF). Journal of Bioinformatics and Computational Biology. 1 (1). University of California, Berkeley: 1–20. doi:10.1142/S0219720003000174. PMID 15290779. Retrieved 30 October 2012.
  12. ^ Gusfield D. "An Overview of Computational Methods for Haplotype Inference" (PDF). University of California, Davis. Retrieved 18 November 2012.
  13. ^ Ding Z, Filkov V, Gusfield D. "A Linear Time Algorithm for the Perfect Phylogeny Haplotyping Problem". University of California, Davis. Retrieved 18 November 2012.
  14. ^ Bafna V, Gusfield D, Lancia G, Yooseph S (2003). "Haplotyping as perfect phylogeny: a direct approach". Journal of Computational Biology. 10 (3–4): 323–40. doi:10.1089/10665270360688048. PMID 12935331.
  15. ^ Seyalioglu H. "Haplotyping as Perfect Phylogeny" (PDF). Archived from the original (PDF) on 30 September 2011. Retrieved 30 October 2012.
  16. ^ Bonizzoni P, Carrieri AP, Della Vedova G, Trucco G (October 2014). "Explaining evolution via constrained persistent perfect phylogeny". BMC Genomics. 15 Suppl 6 (S6): S10. doi:10.1186/1471-2164-15-S6-S10. PMC 4240218. PMID 25572381.
  17. ^ Hajirasouliha I, Raphael BJ (2014), Brown D, Morgenstern B (eds.), "Reconstructing Mutational History in Multiply Sampled Tumors Using Perfect Phylogeny Mixtures", Algorithms in Bioinformatics, Lecture Notes in Computer Science, vol. 8701, Springer Berlin Heidelberg, pp. 354–367, doi:10.1007/978-3-662-44753-6_27, ISBN 9783662447529
  18. ^ Schwarz RF, Trinh A, Sipos B, Brenton JD, Goldman N, Markowetz F (April 2014). Beerenwinkel N (ed.). "Phylogenetic quantification of intra-tumour heterogeneity". PLOS Computational Biology. 10 (4): e1003535. arXiv:1306.1685. Bibcode:2014PLSCB..10E3535S. doi:10.1371/journal.pcbi.1003535. PMC 3990475. PMID 24743184.
  19. ^ Letouzé E, Allory Y, Bollet MA, Radvanyi F, Guyon F (2010). "Analysis of the copy number profiles of several tumor samples from the same patient reveals the successive steps in tumorigenesis". Genome Biology. 11 (7): R76. doi:10.1186/gb-2010-11-7-r76. PMC 2926787. PMID 20649963.
  20. ^ Gertz EM, Chowdhury SA, Lee WJ, Wangsa D, Heselmeyer-Haddad K, Ried T, et al. (2016-06-30). "FISHtrees 3.0: Tumor Phylogenetics Using a Ploidy Probe". PLOS ONE. 11 (6): e0158569. Bibcode:2016PLoSO..1158569G. doi:10.1371/journal.pone.0158569. PMC 4928784. PMID 27362268.