TRANSFAC

Summary

TRANSFAC (TRANScription FACtor database) is a manually curated database of eukaryotic transcription factors, their genomic binding sites and DNA binding profiles. The contents of the database can be used to predict potential transcription factor binding sites.

TRANSFAC
Content
DescriptionTranscription Factor Database
Data types
captured
Eukaryotic transcription factors, their binding sites and binding profiles
Organismseukaryotes
Contact
Research centerHelmholtz Centre for Infection Research; BIOBASE GmbH; geneXplain GmbH
Primary citationWingender (2008)[1]
Release date1988
Access
WebsiteTRANSFAC 7.0 Public 2005

Introduction edit

The origin of the database was an early data collection published 1988.[2] The first version that was released under the name TRANSFAC was developed at the former German National Research Centre for Biotechnology and designed for local installation (now: Helmholtz Centre for Infection Research).[3] In one of the first publicly funded bioinformatics projects, launched in 1993, TRANSFAC developed into a resource that became available on the Internet.[4]

In 1997, TRANSFAC was transferred to a newly established company, BIOBASE, in order to secure long-term financing of the database. Since then, the most up-to-date version has to be licensed, whereas older versions are free for non-commercial users.[5][6] Since July 2016, TRANSFAC is maintained and distributed by geneXplain GmbH, Wolfenbüttel, Germany.[7]

Content and features edit

The content of the database is organized in a way that it is centered around the interaction between transcription factors (TFs) and their DNA binding sites (TFBS). TFs are described with regard to their structural and functional features, extracted from the original scientific literature. They are classified to families, classes and superclasses according to the features of their DNA binding domains.[8][9][10][11]

Binding of a TF to a genomic site is documented by specifying the localization of the site, its sequence and the experimental method applied. All sites that refer to one TF, or a group of closely related TFs, are aligned and used to construct a position-specific scoring matrix (PSSM), or count matrix. Many matrices of the TRANSFAC matrix library have been constructed by a team of curators, others were taken from scientific publications.

Applications edit

The TRANSFAC database can be used as an encyclopedia of eukaryotic transcription factors. The target sequences and the regulated genes can be listed for each TF, which can be used as benchmark for TFBS recognition tools or as training sets for new transcription factor binding sites (TFBS) recognition algorithms.[12] The TF classification enables to analyze such data sets with regard to the properties of the DNA-binding domains.[13] Another application is to retrieve all TFs that regulate a given (set of) gene(s). In the context of systems-biological studies, the TF-target gene relations documented in TRANSFAC were used to construct and analyze transcription regulatory networks.[14][15] By far the most frequent use of TRANSFAC is the computational prediction of potential TFBS. A number of algorithms exist which either use the individual binding sites or the matrix library for this purpose:

  • Patch – analyzes sequence similarities with the binding sites documented in TRANSFAC; it is provided along with the database.[16][17]
  • SiteSeer – analyzes sequence similarities with the binding sites documented in TRANSFAC.[18][19]
  • Match – identifies potential TFBS using the matrix library; it is provided along with the database.[20][21]
  • TESS (Transcription Element Search System) – analyzes sequence similarities with binding sites of TRANSFAC as well as potential binding sites using the matrix libraries of TRANSFAC and three other sources.[22][23] TESS also provides a program for the identification of cis-regulatory modules (CRMs, characteristic combinations of TFBSs), which uses TRANSFAC matrices.[24]
  • PROMO – matrix-based prediction of TFBSs with aid of the commercial database version[25][26]
  • TFM Explorer – Identification of common potential TFBSs in a set of genes[27][28]
  • MotifMogul – matrix-based sequence analysis with a number of different algorithms[29]
  • ConTra – matrix-based sequence analysis in conserved promoter regions[30][31]
  • PMS (Poly Matrix Search) – matrix-based sequence analysis in conserved promoter regions [32][33]

Comparison of matrices with the matrix library of TRANSFAC and other sources:

  • T-Reg Comparator[34] to compare individual or groups of matrices with those of TRANSFAC or other libraries.
  • MACO (Poly Matrix Search)[35][36] – matrix comparison with matrix libraries.

A number of servers provide genomic annotations computed with the aid of TRANSFAC.[37][38] Others have used such analyses to infer target gene sets.[39][40]

See also edit

References edit

  1. ^ Wingender E (July 2008). "The TRANSFAC project as an example of framework technology that supports the analysis of genomic regulation". Brief. Bioinformatics. 9 (4): 326–32. doi:10.1093/bib/bbn016. PMID 18436575.
  2. ^ Wingender E (March 1988). "Compilation of transcription regulating proteins". Nucleic Acids Res. 16 (5): 1879–902. doi:10.1093/nar/16.5.1879. PMC 338188. PMID 3282223.
  3. ^ Wingender E, Heinemeyer T, Lincoln D (1991). "Regulatory DNA sequences: predictability of their function". Genome Analysis - from Sequence to Function; BioTechForum - Advances in Molecular Genetics (J. Collins, A.J. Driesel, Eds.). 4: 95–108.
  4. ^ Wingender E, Dietze P, Karas H, Knüppel R (January 1996). "TRANSFAC: a database on transcription factors and their DNA binding sites". Nucleic Acids Res. 24 (1): 238–41. doi:10.1093/nar/24.1.238. PMC 145586. PMID 8594589.
  5. ^ TRANSFAC Public on the gene regulation portal of BIOBASE
  6. ^ Access to TRANSFAC Public via TESS Archived 2012-07-24 at the Wayback Machine at the Computational Biology and Informatics Laboratory (CBIL) of University of Pennsylvania (Penn)
  7. ^ TRANSFAC taken over by geneXplain
  8. ^ Wingender E (1997). "[Classification of eukaryotic transcription factors]". Mol. Biol. (Mosk.) (in Russian). 31 (4): 584–600. PMID 9340487.
  9. ^ Heinemeyer T, Chen X, Karas H, Kel AE, Kel OV, Liebich I, Meinhardt T, Reuter I, Schacherer F, Wingender E (January 1999). "Expanding the TRANSFAC database towards an expert system of regulatory molecular mechanisms". Nucleic Acids Res. 27 (1): 318–22. doi:10.1093/nar/27.1.318. PMC 148171. PMID 9847216.
  10. ^ Stegmaier P, Kel AE, Wingender E (2004). "Systematic DNA-binding domain classification of transcription factors". Genome Inform. 15 (2): 276–86. PMID 15706513.
  11. ^ Wingender, E: The classification of transcription factors
  12. ^ Tompa M, Li N, Bailey TL, Church GM, De Moor B, Eskin E, Favorov AV, Frith MC, Fu Y, Kent WJ, Makeev VJ, Mironov AA, Noble WS, Pavesi G, Pesole G, Régnier M, Simonis N, Sinha S, Thijs G, van Helden J, Vandenbogaert M, Weng Z, Workman C, Ye C, Zhu Z (January 2005). "Assessing computational tools for the discovery of transcription factor binding sites". Nat. Biotechnol. 23 (1): 137–44. doi:10.1038/nbt1053. PMID 15637633. S2CID 3234451.
  13. ^ Narlikar L, Gordân R, Ohler U, Hartemink AJ (July 2006). "Informative priors based on transcription factor structural class improve de novo motif discovery". Bioinformatics. 22 (14): e384–92. doi:10.1093/bioinformatics/btl251. PMID 16873497.
  14. ^ Goemann B, Wingender E, Potapov AP (2009). "An approach to evaluate the topological significance of motifs and other patterns in regulatory networks". BMC Syst Biol. 3: 53. doi:10.1186/1752-0509-3-53. PMC 2694767. PMID 19454001.
  15. ^ Kozhenkov S, Dubinina Y, Sedova M, Gupta A, Ponomarenko J, Baitaluk M (2010). "BiologicalNetworks 2.0--an integrative view of genome biology data". BMC Bioinformatics. 11: 610. doi:10.1186/1471-2105-11-610. PMC 3019228. PMID 21190573.
  16. ^ Patch on the free portal of BIOBASE
  17. ^ Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M, Hornischer K, Voss N, Stegmaier P, Lewicki-Potapov B, Saxel H, Kel AE, Wingender E (January 2006). "TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes". Nucleic Acids Res. 34 (Database issue): D108–10. doi:10.1093/nar/gkj143. PMC 1347505. PMID 16381825.
  18. ^ SiteSeer Archived 2011-06-25 at the Wayback Machine of the University of Manchester
  19. ^ Boardman PE, Oliver SG, Hubbard SJ (July 2003). "SiteSeer: Visualisation and analysis of transcription factor binding sites in nucleotide sequences". Nucleic Acids Res. 31 (13): 3572–5. doi:10.1093/nar/gkg511. PMC 168918. PMID 12824368.
  20. ^ Match on the free portal of BIOBASE
  21. ^ Kel AE, Gössling E, Reuter I, Cheremushkin E, Kel-Margoulis OV, Wingender E (July 2003). "MATCH: A tool for searching transcription factor binding sites in DNA sequences". Nucleic Acids Res. 31 (13): 3576–9. doi:10.1093/nar/gkg585. PMC 169193. PMID 12824369.
  22. ^ TESS (Transcription Element Search System) at CBIL of the University of Pennsylvania
  23. ^ Site Search bei TESS Archived 2012-07-24 at the Wayback Machine
  24. ^ AnGEL CRM Searches Archived 2012-07-24 at the Wayback Machine in the TESS system
  25. ^ PROMO on the ALGGEN server of the Polytechnic University of Catalonia (UPC)
  26. ^ Messeguer X, Escudero R, Farré D, Núñez O, Martínez J, Albà MM (February 2002). "PROMO: detection of known transcription regulatory elements using species-tailored searches". Bioinformatics. 18 (2): 333–4. doi:10.1093/bioinformatics/18.2.333. PMID 11847087.
  27. ^ TFM Explorer on the bioinformatics software server of the SEQUOIA group
  28. ^ Tonon L, Touzet H, Varré JS (July 2010). "TFM-Explorer: mining cis-regulatory regions in genomes". Nucleic Acids Res. 38 (Web Server issue): W286–92. doi:10.1093/nar/gkq473. PMC 2896114. PMID 20522509.
  29. ^ MotifMogul of the Institute for Systems Biology in Seattle
  30. ^ ConTra of the Ghent University
  31. ^ Hooghe B, Hulpiau P, van Roy F, De Bleser P (July 2008). "ConTra: a promoter alignment analysis tool for identification of transcription factor binding sites across species". Nucleic Acids Res. 36 (Web Server issue): W128–32. doi:10.1093/nar/gkn195. PMC 2447729. PMID 18453628.
  32. ^ PMS Archived 2012-07-10 at archive.today, developed at the Nanjing University
  33. ^ Su G, Mao B, Wang J (2006). "A web server for transcription factor binding site prediction". Bioinformation. 1 (5): 156–7. doi:10.6026/97320630001156. PMC 1891680. PMID 17597879.
  34. ^ T-Reg Comparator Archived 2012-07-18(Timestamp length) at archive.today on the server of the Max Planck Institute for Molecular Genetics
  35. ^ MACO Archived 2012-07-10 at archive.today, developed at Nanjing University
  36. ^ Su G, Mao B, Wang J (2006). "MACO: a gapped-alignment scoring tool for comparing transcription factor binding sites". In Silico Biol. (Gedrukt). 6 (4): 307–10. PMID 16922693.
  37. ^ PReMOD: Human and mouse genome of the years 2004 & 2005; IRCM / McGill University, Montreal
  38. ^ PRIMA: Human genome of 2004; Tel-Aviv University
  39. ^ MSigDB: Mammalian transcription factor target gene sets; GSEA wiki server of Broad Institute of MIT and Harvard, Cambridge, MA
  40. ^ Xie X, Lu J, Kulbokas EJ, Golub TR, Mootha V, Lindblad-Toh K, Lander ES, Kellis M (March 2005). "Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals". Nature. 434 (7031): 338–45. Bibcode:2005Natur.434..338X. doi:10.1038/nature03441. PMC 2923337. PMID 15735639.

External links edit

  • History of the TRANSFAC database on the homepage of Edgar Wingender