Biological databases are stores of biological information. The journal Nucleic Acids Research regularly publishes special issues on biological databases and has a list of such databases. The 2018 issue has a list of about 180 such databases and updates to previously described databases. Omics Discovery Index can be used to browse and search several biological databases.
Meta databases are databases of databases that collect data about data to generate new data. They are capable of merging information from different sources and making it available in a new and more convenient form, or with an emphasis on a particular disease or organism.[metadatabase is a database model for metadata management, global query of independent database, and distributed data processing. The word metadatabase is an addition to the dictionary]. originally ,metadata was only common term referring simply to data about data such a tags ,keywords, and markup headers.
ConsensusPathDB: a molecular functional interaction database, integrating information from 12 other
DDBJ (Japan), GenBank (USA) and European Nucleotide Archive (Europe) are repositories for nucleotide sequence data from all organisms. All three accept nucleotide sequence submissions, and then exchange new and updated data on a daily basis to achieve optimal synchronisation between them. These three databases are primary databases, as they house original sequence data. They collaborate with Sequence Read Archive (SRA), which archives raw reads from high-throughput sequencing instruments.
1000 Genomes Project: launched in January 2008. The genomes of more than a thousand anonymous participants from a number of different ethnic groups were analyzed and made publicly available.
EggNOG Database: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. It provides multiple sequence alignments and maximum-likelihood trees, as well as broad functional annotation.
These databases collect genome sequences, annotate and analyze them, and provide public access. Some add curation of experimental literature to improve computed annotations. These databases may hold many species genomes, or a single model organism genome.
ArrayExpress: archive of functional genomics data; stores data from high-throughput functional genomics experiments from EMBL
Ensembl Genomes: provides genome-scale data for bacteria, protists, fungi, plants and invertebrate metazoa, through a unified set of interactive and programmatic interfaces (using the Ensembl software platform)
Gene Expression Omnibus (GEO): a public functional genomics data repository from the U.S. National Cancer Institute (NCI), which supports array- and sequence-based data. Tools for querying and downloading gene expression profiles are provided.
Human Protein Atlas (HPA): a public database with expression profiles of human protein coding genes both on mRNA and protein level in tissues, cells, subcellular compartments, and cancer tumors.
PHI-base: pathogen-host interaction database. It links gene information to phenotypic information from microbial pathogens on their hosts. Information is manually curated from peer reviewed literature.
Several publicly available data repositories and resources have been developed to support and manage protein related information, biological knowledge discovery and data-driven hypothesis generation. The databases in the table below are selected from the databases listed in the Nucleic Acids Research (NAR) databases issues and database collection and the databases cross-referenced in the UniProtKB. Most of these databases are cross-referenced with UniProt / UniProtKB so that identifiers can be mapped to each other.
Database Short Name
The Consensus CDS protein set database
DNA Data Bank of Japan
European Nucleotide Archive
GenBank nucleotide sequence database
NCBI Reference Sequence Database
Database of computationally identifies transcripts from the same locus
Universal Protein Resource (UniProt)
3D structure protein databases
Database Short Name
Database of Protein Disorder
Database of intrinsically disordered and mobile proteins
Database of Comparative Protein Structure Models
Pictorial database of 3D structures in the Protein Data Bank
Protein Model Portal of the PSI-Nature Structural Biology Knowledgebase
Numerous databases collect information about species and other taxonomic categories. The Catalogue of Life is a special case as it is a meta-database of about 150 specialized "global species databases" (GSDs) that have collected the names and other information on (almost) all described and thus "known" species.
BacDive: bacterial metadatabase that provides strain-linked information about bacterial and archaeal biodiversity, including taxonomy information
EzTaxon-e: database for the identification of prokaryotes based on 16S ribosomal RNA gene sequences
NCBI Taxonomy: a taxonomic database operated by NCBI and concentrating on all taxa for which DNA sequences are available (those sequences are stored by GenBank, another database operated by NCBI).
Images play a critical role in biomedicine, ranging from images of anthropological specimens to zoology. However, there are relatively few databases dedicated to image collection, although some projects such as iNaturalist collect photos as a main part of their data. A special case of "images" are 3-dimensional images such as protein structures or 3D-reconstructions of anatomical structures. Image databases include, among others:
The Cancer Genome Atlas (TCGA): provides data from hundreds of cancer samples obtained using high-throughput techniques such as gene expression profiling, copy number variation profiling, SNP genotyping, genome-wide DNA methylation profiling, microRNA profiling, and exon sequencing of at least 1,200 genes
DiProDB: a database to collect and analyse thermodynamic, structural and other dinucleotide properties
Housekeeping and Reference Transcript Atlas (HRT Atlas)  web-based tool for searching cell specific candidate reference genes/transcripts suitable for qPCR experiment normalization. HRT Atlas also describes a complete list of human and mouse housekeeping genes and transcripts
Dryad: repository of data underlying scientific publications in the basic and applied biosciences
^Lock A, Rutherford K, Harris MA, Hayles J, Oliver SG, Bähler J, Wood V (January 2019). "PomBase 2018: user-driven reimplementation of the fission yeast database provides rapid and intuitive access to diverse, interconnected information". Nucleic Acids Research. 47 (D1): D821–D827. doi:10.1093/nar/gky961. PMC6324063. PMID30321395.
^Zhu B, Stülke J (January 2018). "SubtiWiki in 2018: from genes and proteins to functional network annotation of the model organism Bacillus subtilis". Nucleic Acids Research. 46 (D1): D743–D748. doi:10.1093/nar/gkx908. PMC5753275. PMID29788229.
^"The Human Protein Atlas". www.proteinatlas.org. Retrieved 2019-05-27.
^Dash S, Campbell JD, Cannon EK, Cleary AM, Huang W, Kalberer SR, et al. (January 2016). "Legume information system (LegumeInfo.org): a key component of a set of federated data resources for the legume family". Nucleic Acids Research. 44 (D1): D1181-8. doi:10.1093/nar/gkv1159. PMC4702835. PMID26546515.
^Grant D, Nelson RT, Cannon SB, Shoemaker RC (January 2010). "SoyBase, the USDA-ARS soybean genetics and genomics database". Nucleic Acids Research. 38 (Database issue): D843-6. doi:10.1093/nar/gkp798. PMC2808871. PMID20008513.
^ abChen C, Huang H, Wu CH (2017). Wu CH, Arighi CN, Ross KE (eds.). "Protein Bioinformatics Databases and Resources". Methods in Molecular Biology. New York, NY: Springer New York. 1558: 3–39. doi:10.1007/978-1-4939-6783-4_1. ISBN 978-1-4939-6781-0. PMC5506686. PMID28150231.
^Mir S, Alhroub Y, Anyango S, Armstrong DR, Berrisford JM, Clark AR, et al. (January 2018). "PDBe: towards reusable data delivery infrastructure at protein data bank in Europe". Nucleic Acids Research. 46 (D1): D486–D492. doi:10.1093/nar/gkx1070. PMC5753225. PMID29126160.
^Kinjo AR, Bekker GJ, Suzuki H, Tsuchiya Y, Kawabata T, Ikegawa Y, Nakamura H (January 2017). "Protein Data Bank Japan (PDBj): updated user interfaces, resource description framework, analysis tools for large structures". Nucleic Acids Research. 45 (D1): D282–D288. doi:10.1093/nar/gkw962. PMC5210648. PMID27789697.
^Rose PW, Prlić A, Altunkaya A, Bi C, Bradley AR, Christie CH, et al. (January 2017). "The RCSB protein data bank: integrative view of protein, gene and 3D structural information". Nucleic Acids Research. 45 (D1): D271–D281. doi:10.1093/nar/gkw1000. PMC5210513. PMID27794042.
^ abEllenberg J, Swedlow JR, Barlow M, Cook CE, Sarkans U, Patwardhan A, et al. (November 2018). "A call for public archives for biological image data". Nature Methods. 15 (11): 849–854. doi:10.1038/s41592-018-0195-8. PMC6884425. PMID30377375.
^Tendler BC, Hanayik T, Ansorge O, Bangerter-Christensen S, Berns GS, Bertelsen MF, et al. (March 2022). "The Digital Brain Bank, an open access platform for post-mortem imaging datasets". eLife. 11: e73153. doi:10.7554/eLife.73153. PMC9042233. PMID35297760.
^Iudin A, Korir PK, Salavert-Torres J, Kleywegt GJ, Patwardhan A (May 2016). "EMPIAR: a public archive for raw electron microscopy image data". Nature Methods. 13 (5): 387–388. doi:10.1038/nmeth.3806. PMID27067018.
^Hounkpe BW, Chenou F, de Lima F, De Paula EV (January 2021). "HRT Atlas v1.0 database: redefining human and mouse housekeeping genes and candidate reference transcripts by mining massive RNA-seq datasets". Nucleic Acids Research. 49 (D1): D947–D955. doi:10.1093/nar/gkaa609. PMC7778946. PMID32663312.
^Valverde H, Cantón FR, Aledo JC (November 2019). "MetOSite: an integrated resource for the study of methionine residues sulfoxidation". Bioinformatics. 35 (22): 4849–4850. doi:10.1093/bioinformatics/btz462. PMC6853639. PMID31197322.
Nucleic Acid Research Molecular Biology Database Collection – over 1,600 databases