Combinatorial chemistry comprises chemical synthetic methods that make it possible to prepare a large number (tens to thousands or even millions) of compounds in a single process. These compound libraries can be made as mixtures, sets of individual compounds or chemical structures generated by computer software.[1] Combinatorial chemistry can be used for the synthesis of small molecules and for peptides.
Strategies that allow identification of useful components of the libraries are also part of combinatorial chemistry. The methods used in combinatorial chemistry are applied outside chemistry, too.
The basic principle of combinatorial chemistry is to prepare libraries of a very large number of compounds and identify those which are useful as potential drugs or agrochemicals. This relies on high-throughput screening which is capable of assessing the output at sufficient scale.[2]
Although combinatorial chemistry has only really been taken up by industry since the 1990s,[3] its roots can be seen as far back as the 1960s when a researcher at Rockefeller University, Bruce Merrifield, started investigating the solid-phase synthesis of peptides. Synthesis of peptides in a combinatorial fashion quickly leads to large numbers of molecules. Using the twenty natural amino acids, for example, in a tripeptide creates 8,000 (203) possibilities. Solid-phase methods for small molecules were later introduced and Furka devised a "split and mix" approach[2][4]
In its modern form, combinatorial chemistry has probably had its biggest impact in the pharmaceutical industry.[5] Researchers attempting to optimize the activity profile of a compound create a 'library' of many different but related compounds.[6][7] Advances in robotics have led to an industrial approach to combinatorial synthesis, enabling companies to routinely produce over 100,000 new and unique compounds per year.[8]
In order to handle the vast number of structural possibilities, researchers often create a 'virtual library', a computational enumeration of all possible structures of a given pharmacophore with all available reactants.[9] Such a library can consist of thousands to millions of 'virtual' compounds. The researcher will select a subset of the 'virtual library' for actual synthesis, based upon various calculations and criteria (see ADME, computational chemistry, and QSAR).
Combinatorial split-mix (split and pool) synthesis [10][11] is based on the solid-phase synthesis developed by Merrifield.[12] If a combinatorial peptide library is synthesized using 20 amino acids (or other kinds of building blocks) the bead form solid support is divided into 20 equal portions. This is followed by coupling a different amino acid to each portion. The third step is the mixing of all portions. These three steps comprise a cycle. Elongation of the peptide chains can be realized by simply repeating the steps of the cycle.
The procedure is illustrated by the synthesis of a dipeptide library using the same three amino acids as building blocks in both cycles. Each component of this library contains two amino acids arranged in different orders. The amino acids used in couplings are represented by yellow, blue and red circles in the figure. Divergent arrows show dividing solid support resin (green circles) into equal portions, vertical arrows mean coupling and convergent arrows represent mixing and homogenizing the portions of the support.
The figure shows that in the two synthetic cycles 9 dipeptides are formed. In the third and fourth cycles, 27 tripeptides and 81 tetrapeptides would form, respectively.
The "split-mix synthesis" has several outstanding features:
In 1990 three groups described methods for preparing peptide libraries by biological methods[13][14][15] and one year later Fodor et al. published a remarkable method for synthesis of peptide arrays on small glass slides.[16]
A "parallel synthesis" method was developed by Mario Geysen and his colleagues for preparation of peptide arrays.[17] They synthesized 96 peptides on plastic rods (pins) coated at their ends with the solid support. The pins were immersed into the solution of reagents placed in the wells of a microtiter plate. The method is widely applied particularly by using automatic parallel synthesizers. Although the parallel method is much slower than the real combinatorial one, its advantage is that it is exactly known which peptide or other compound forms on each pin.
Further procedures were developed to combine the advantages of both split-mix and parallel synthesis. In the method described by two groups[18][19] the solid support was enclosed into permeable plastic capsules together with a radiofrequency tag that carried the code of the compound to be formed in the capsule. The procedure was carried out similar to the split-mix method. In the split step, however, the capsules were distributed among the reaction vessels according to the codes read from the radiofrequency tags of the capsules.
A different method for the same purpose was developed by Furka et al.[20] is named "string synthesis". In this method, the capsules carried no code. They are strung like the pearls in a necklace and placed into the reaction vessels in stringed form. The identity of the capsules, as well as their contents, are stored by their position occupied on the strings. After each coupling step, the capsules are redistributed among new strings according to definite rules.
In the drug discovery process, the synthesis and biological evaluation of small molecules of interest have typically been a long and laborious process. Combinatorial chemistry has emerged in recent decades as an approach to quickly and efficiently synthesize large numbers of potential small molecule drug candidates. In a typical synthesis, only a single target molecule is produced at the end of a synthetic scheme, with each step in a synthesis producing only a single product. In a combinatorial synthesis, when using only single starting material, it is possible to synthesize a large library of molecules using identical reaction conditions that can then be screened for their biological activity. This pool of products is then split into three equal portions containing each of the three products, and then each of the three individual pools is then reacted with another unit of reagent B, C, or D, producing 9 unique compounds from the previous 3. This process is then repeated until the desired number of building blocks is added, generating many compounds. When synthesizing a library of compounds by a multi-step synthesis, efficient reaction methods must be employed, and if traditional purification methods are used after each reaction step, yields and efficiency will suffer.
Solid-phase synthesis offers potential solutions to obviate the need for typical quenching and purification steps often used in synthetic chemistry. In general, a starting molecule is adhered to a solid support (typically an insoluble polymer), then additional reactions are performed, and the final product is purified and then cleaved from the solid support. Since the molecules of interest are attached to a solid support, it is possible to reduce the purification after each reaction to a single filtration/wash step, eliminating the need for tedious liquid-liquid extraction and solvent evaporation steps that most synthetic chemistry involves. Furthermore, by using heterogeneous reactants, excess reagents can be used to drive sluggish reactions to completion, which can further improve yields. Excess reagents can simply be washed away without the need for additional purification steps such as chromatography.
Over the years, a variety of methods have been developed to refine the use of solid-phase organic synthesis in combinatorial chemistry, including efforts to increase the ease of synthesis and purification, as well as non-traditional methods to characterize intermediate products. Although the majority of the examples described here will employ heterogeneous reaction media in every reaction step, Booth and Hodges provide an early example of using solid-supported reagents only during the purification step of traditional solution-phase syntheses.[21] In their view, solution-phase chemistry offers the advantages of avoiding attachment and cleavage reactions necessary to anchor and remove molecules to resins as well as eliminating the need to recreate solid-phase analogues of established solution-phase reactions.
The single purification step at the end of a synthesis allows one or more impurities to be removed, assuming the chemical structure of the offending impurity is known. While the use of solid-supported reagents greatly simplifies the synthesis of compounds, many combinatorial syntheses require multiple steps, each of which still requires some form of purification. Armstrong, et al. describe a one-pot method for generating combinatorial libraries, called multiple-component condensations (MCCs).[22] In this scheme, three or more reagents react such that each reagent is incorporated into the final product in a single step, eliminating the need for a multi-step synthesis that involves many purification steps. In MCCs, there is no deconvolution required to determine which compounds are biologically active because each synthesis in an array has only a single product, thus the identity of the compound should be unequivocally known.
In another array synthesis, Still generated a large library of oligopeptides by split synthesis.[23] The drawback to making many thousands of compounds is that it is difficult to determine the structure of the formed compounds. Their solution is to use molecular tags, where a tiny amount (1 pmol/bead) of a dye is attached to the beads, and the identity of a certain bead can be determined by analyzing which tags are present on the bead. Despite how easy attaching tags makes identification of receptors, it would be quite impossible to individually screen each compound for its receptor binding ability, so a dye was attached to each receptor, such that only those receptors that bind to their substrate produce a color change.
When many reactions need to be run in an array (such as the 96 reactions described in one of Armstrong's MCC arrays), some of the more tedious aspects of synthesis can be automated to improve efficiency. This work, the "DIVERSOMER method" was pioneered at Parke-Davis in the early 1990s to run up to 40 chemical reactions in parallel. These efforts led to the first commercially available equipment for combinatorial chemistry (Diversomer synthesizer which was sold by Chemglass) and the first use of liquid handling robotics within a chemistry labortory.[24][25] This method uses a device that automates the resin loading and wash cycles, as well as the reaction cycle monitoring and purification, and demonstrates the feasibility of their method and apparatus by using it to synthesize a variety of molecule classes, such as hydantoins and benzodiazepines, running 8 or 40 individual reactions in parallel. This and several other pioneering efforts in combinatorial chemistry were featured as "classical" papers in the field in 1999.[26]
Oftentimes, it is not possible to use expensive equipment, and Schwabacher, et al. describe a simple method of combining parallel synthesis of library members and evaluation of entire libraries of compounds.[27] In their method, a thread that is partitioned into different regions is wrapped around a cylinder, where a different reagent is then coupled to each region which bears only a single species. The thread is then re-divided and wrapped around a cylinder of a different size, and this process is then repeated. The beauty of this method is that the identity of each product can be known simply by its location along the thread, and the corresponding biological activity is identified by Fourier transformation of fluorescence signals.
In most of the syntheses described here, it is necessary to attach and remove the starting reagent to/from a solid support. This can lead to the generation of a hydroxyl group, which can potentially affect the biological activity of a target compound. Ellman uses solid phase supports in a multi-step synthesis scheme to obtain 192 individual 1,4-benzodiazepine derivatives, which are well-known therapeutic agents.[28] To eliminate the possibility of potential hydroxyl group interference, a novel method using silyl-aryl chemistry is used to link the molecules to the solid support which cleaves from the support and leaves no trace of the linker.
When anchoring a molecule to a solid support, intermediates cannot be isolated from one another without cleaving the molecule from the resin. Since many of the traditional characterization techniques used to track reaction progress and confirm product structure are solution-based, different techniques must be used. Gel-phase 13C NMR spectroscopy, MALDI mass spectrometry, and IR spectroscopy have been used to confirm structure and monitor the progress of solid-phase reactions.[29] Gordon et al., describe several case studies that utilize imines and peptidyl phosphonates to generate combinatorial libraries of small molecules.[29] To generate the imine library, an amino acid tethered to a resin is reacted in the presence of an aldehyde. The authors demonstrate the use of fast 13C gel phase NMR spectroscopy and magic angle spinning 1 H NMR spectroscopy to monitor the progress of reactions and showed that most imines could be formed in as little as 10 minutes at room temperature when trimethyl orthoformate was used as the solvent. The formed imines were then derivatized to generate 4-thiazolidinones, B-lactams, and pyrrolidines.
The use of solid-phase supports greatly simplifies the synthesis of large combinatorial libraries of compounds. This is done by anchoring a starting material to a solid support and then running subsequent reactions until a sufficiently large library is built, after which the products are cleaved from the support. The use of solid-phase purification has also been demonstrated for use in solution-phase synthesis schemes in conjunction with standard liquid-liquid extraction purification techniques.
Combinatorial libraries are special multi-component mixtures of small-molecule chemical compounds that are synthesized in a single stepwise process. They differ from collection of individual compounds as well as from series of compounds prepared by parallel synthesis. It is an important feature that mixtures are used in their synthesis. The use of mixtures ensures the very high efficiency of the process. Both reactants can be mixtures and in this case the procedure would be even more efficient. For practical reasons however, it is advisable to use the split-mix method in which one of two mixtures is replaced by single building blocks (BBs). The mixtures are so important that there are no combinatorial libraries without using mixture in the synthesis, and if a mixture is used in a process inevitably combinatorial library forms. The split-mix synthesis is usually realized using solid support but it is possible to apply it in solution, too. Since he structures the components are unknown deconvolution methods need to be used in screening. One of the most important features of combinatorial libraries is that the whole mixture can be screened in a single process. This makes these libraries very useful in pharmaceutical research. Partial libraries of full combinatorial libraries can also be synthesized. Some of them can be used in deconvolution[30]
If the synthesized molecules of a combinatorial library are cleaved from the solid support a soluble mixture forms. In such solution, millions of different compounds may be found. When this synthetic method was developed, it first seemed impossible to identify the molecules, and to find molecules with useful properties. Strategies for identification of the useful components had been developed, however, to solve the problem. All these strategies are based on synthesis and testing of partial libraries. An early iterative strategy was devised by Furka in 1982.[4] The method was later independently published by Erb et al. under the name "Recursive deconvolution"[31]
The method is made understandable by the figure. A 27-member peptide library is synthesized from three amino acids. After the first (A) and second (B) cycles samples were set aside before mixing them. The products of the third cycle (C) are cleaved down before mixing then are tested for activity. Suppose the group labeled by + sign is active. All members have the red amino acid at the last coupling position (CP). Consequently, the active member also has the red amino acid at the last CP. Then the red amino acid is coupled to the three samples set aside after the second cycle (B) to get samples D. After cleaving, the three E samples are formed. If after testing the sample marked by + is the active one it shows that the blue amino acid occupies the second CP in the active component. Then to the three A samples first the blue then the red amino acid is coupled (F) then tested again after cleaving (G). If the + component proves to be active, the sequence of the active component is determined and shown in H.
Positional scanning was introduced independently by Furka et al.[32] and Pinilla et al.[33] The method is based on the synthesis and testing of series of sublibraries. in which a certain sequence position is occupied by the same amino acid. The figure shows the nine sublibraries (B1-D3) of a full peptide trimer library (A) made from three amino acids. In sublibraries there is a position which is occupied by the same amino acid in all components. In the synthesis of a sublibrary the support is not divided and only one amino acid is coupled to the whole sample. As a result, one position is really occupied by the same amino acid in all components. For example, in the B2 sublibrary position 2 is occupied by the "yellow" amino acid in all the nine components. If in a screening test this sublibrary gives positive answer it means that position 2 in the active peptide is also occupied by the "yellow" amino acid. The amino acid sequence can be determined by testing all the nine (or sometime less) sublibraries.
In omission libraries[34][35] a certain amino acid is missing from all peptides of the mixture. The figure shows the full library and the three omission libraries. At the top the omitted amino acids are shown. If the omission library gives a negative test the omitted amino acid is present in the active component.
If the peptides are not cleaved from the solid support we deal with a mixture of beads, each bead containing a single peptide. Smith and his colleagues[36] showed earlier that peptides could be tested in tethered form, too. This approach was also used in screening peptide libraries. The tethered peptide library was tested with a dissolved target protein. The beads to which the protein was attached were picked out, removed the protein from the bead then the tethered peptide was identified by sequencing. A somewhat different approach was followed by Taylor and Morken.[37] They used infrared thermography to identify catalysts in non-peptide tethered libraries. The method is based on the heat that is evolved in the beads that contain a catalyst when the tethered library immersed into a solution of a substrate. When the beads are examined through an infrared microscope the catalyst containing beads appear as bright spots and can be picked out.
If we deal with a non-peptide organic libraries library it is not as simple to determine the identity of the content of a bead as in the case of a peptide one. In order to circumvent this difficulty methods had been developed to attach to the beads, in parallel with the synthesis of the library, molecules that encode the structure of the compound formed in the bead. Ohlmeyer and his colleagues published a binary encoding method[38] They used mixtures of 18 tagging molecules that after cleaving them from the beads could be identified by Electron Capture Gas Chromatography. Sarkar et al. described chiral oligomers of pentenoic amides (COPAs) that can be used to construct mass encoded OBOC libraries.[39] Kerr et al. introduced an innovative encoding method[40] An orthogonally protected removable bifunctional linker was attached to the beads. One end of the linker was used to attach the non-natural building blocks of the library while to the other end encoding amino acid triplets were linked. The building blocks were non-natural amino acids and the series of their encoding amino acid triplets could be determined by Edman degradation. The important aspect of this kind of encoding was the possibility to cleave down from the beads the library members together with their attached encoding tags forming a soluble library. The same approach was used by Nikolajev et al. for encoding with peptides.[41] In 1992 by Brenner and Lerner introduced DNA sequences to encode the beads of the solid support that proved to be the most successful encoding method.[42] Nielsen, Brenner and Janda also used the Kerr approach for implementing the DNA encoding[43] In the latest period of time there were important advancements in DNA sequencing. The next generation techniques make it possible to sequence large number of samples in parallel that is very important in screening of DNA encoded libraries. There was another innovation that contributed to the success of DNA encoding. In 2000 Halpin and Harbury omitted the solid support in the split-mix synthesis of the DNA encoded combinatorial libraries and replaced it by the encoding DNA oligomers. In solid phase split and pool synthesis the number of components of libraries can't exceed the number of the beads of the support. By the novel approach of the authors, this restraint was eliminated and made it possible to prepare new compounds in practically unlimited number.[44] The Danish company Nuevolution for example synthesized a DNA encoded library containing 40 trillion! components[45] The DNA encoded libraries are soluble that makes possible to apply the efficient affinity binding in screening. Some authors apply the DEL for acromim of DNA encoded combinatorial libraries others are using DECL. The latter seems better since in this name the combinatorial nature of these libraries is clearly expressed. Several types of DNA encoded combinatorial libraries had been introduced and described in the first decade of the present millennium. These libraries are very successfully applied in drug research.
Details are found about their synthesis and application in the page DNA-encoded chemical library. The DNA encoded soluble combinatorial libraries have drawbacks, too. First of all the advantage coming from the use of solid support is completely lost. In addition, the polyionic character of DNA encoding chains limits the utility of non-aqueous solvents in the synthesis. For this reason many laboratories choose to develop DNA compatible reactions for use in the synthesis of DECLs. Quite a few of available ones are already described[51][52][53]
Materials science has applied the techniques of combinatorial chemistry to the discovery of new materials. This work was pioneered by P.G. Schultz et al. in the mid-nineties [54] in the context of luminescent materials obtained by co-deposition of elements on a silicon substrate. His work was preceded by J. J. Hanak in 1970[55] but the computer and robotics tools were not available for the method to spread at the time. Work has been continued by several academic groups[56][57][58][59] as well as companies with large research and development programs (Symyx Technologies, GE, Dow Chemical etc.). The technique has been used extensively for catalysis,[60] coatings,[61] electronics,[62] and many other fields.[63] The application of appropriate informatics tools is critical to handle, administer, and store the vast volumes of data produced.[64] New types of design of experiments methods have also been developed to efficiently address the large experimental spaces that can be tackled using combinatorial methods.[65]
Even though combinatorial chemistry has been an essential part of early drug discovery for more than two decades, so far only one de novo combinatorial chemistry-synthesized chemical has been approved for clinical use by FDA (sorafenib, a multikinase inhibitor indicated for advanced renal cancer).[66] The analysis of the poor success rate of the approach has been suggested to connect with the rather limited chemical space covered by products of combinatorial chemistry.[67] When comparing the properties of compounds in combinatorial chemistry libraries to those of approved drugs and natural products, Feher and Schmidt[67] noted that combinatorial chemistry libraries suffer particularly from the lack of chirality, as well as structure rigidity, both of which are widely regarded as drug-like properties. Even though natural product drug discovery has not probably been the most fashionable trend in the pharmaceutical industry in recent times,[2] a large proportion of new chemical entities still are nature-derived compounds,[68][69][70][71] and thus, it has been suggested that effectiveness of combinatorial chemistry could be improved by enhancing the chemical diversity of screening libraries.[72] As chirality and rigidity are the two most important features distinguishing approved drugs and natural products from compounds in combinatorial chemistry libraries, these are the two issues emphasized in so-called diversity oriented libraries, i.e. compound collections that aim at coverage of the chemical space, instead of just huge numbers of compounds.[73][74][75][76][77][78]
In the 8th edition of the International Patent Classification (IPC), which entered into force on January 1, 2006, a special subclass has been created for patent applications and patents related to inventions in the domain of combinatorial chemistry: "C40B".