About BEST-CSP

Description of the Action

The physical and chemical stability of organic molecules is important for the rational design and development of fine chemicals (e.g., pharmaceuticals and agrochemicals). Crystal structure prediction, the computational generation of crystal structures and energy rankings, has become an important tool in finding crystalline forms and determining their relative stability; however, despite the large quantity of thermodynamic data in the literature, no well-defined benchmark of equilibrium data of crystalline polymorphs of pharmaceutical and other technically important molecules exists against which computational results can be validated. Through this Action, a set of benchmark compounds will be established through tight collaboration between experimental and computational scientists.

The Action will result in a standard against which computational methods can be tested and validated in the future. Moreover, the Action will organise a blind test similar to the computational crystal structure prediction test organised by the Cambridge Crystallographic Data Centre, but with a focus on thermodynamics and the prediction of physical properties. The close-knit collaboration will be fostered by educating PhD students in both computational and experimental disciplines to secure an optimal synergy between them, will advance the general understanding of crystalline polymorphism, and will facilitate formulation processes dealing with polymorph stability in industry.

BEST-CSP in Numbers

Total Participants

Total Countries

Background

How accurate are experimental data and computational results? Can computational outcomes increase experimental accuracy? This COST proposal touches upon the basis of scientific observation and will improve accurate design and quality control. For example, it is of major importance that a drug formulation remains within its specifications up to the expiration date; therefore, the physical and chemical stability profiles of active pharmaceutical ingredients (API) are drawn up. One key concern is structural polymorphism (Figure 1) as only one crystal structure is the most stable. Examples including rotigotine¹ and ritonavir² demonstrate that the sudden appearance of an unknown, more stable polymorph in drug formulations can have serious societal and financial consequences. A survey of polymorph screening results from two pharmaceutical companies reveals that the majority of drug molecules exhibits polymorphism.³ The fact that many organic molecules can crystallise as salts, co-crystals, solvates and mixtures thereof only increases the complexity of the problem. Besides being critical for pharmaceutical formulations, the stability of crystalline molecular materials is also essential for manufacturing and performance of dyes, explosives, agrochemicals, and other functional materials.

Although Haleblian and McCrone introduced polymorphism to pharmacists sixty years ago,^4,5 both industry and academia were surprised by the impact of “disappearing polymorphs” and patent litigation cases such as ranitidine⁶, revealing how little was known. Twenty years ago, a European Polymorphism Network envisioned the interdisciplinary and industrial-academic partnerships needed to study polymorphism.⁷ This network, summer schools, and Bernstein’s monograph^8-10 have increased awareness of polymorphism. It has recently been demonstrated that computational crystal structure prediction (CSP) would have shown the existence of the late-appearing polymorph of rotigotine and it would have provided the structure and a focused experimental approach.¹¹ Another recent industry-academic partnership demonstrates the importance for industry to combine CSP and experimental studies.¹² These high-impact studies emphasise the desperate need in industry to bring the accuracy of computational predictions of polymorph stability to the next level to fully quantify the solid-state properties of an organic molecule in silico.

CSP has long been seen as an essential scientific capability.^13-15 Over the last two decades of development, it is starting to show its potential as a computational method to help design molecular materials with functional properties or to predict polymorphs as an aid to pharmaceutical development. The Cambridge Crystallographic Data Centre (CCDC) launched the 7^th Blind Test in 2020,¹⁶ in which the ability is evaluated to predict and rank structures, which are kept secret until the participating groups have submitted their predictions. The 2020-2022 results have not been published yet. In 2016, 93 scientists participated from around the world, though the successful submissions were dominated by European groups.¹⁷ Five polymorph structures of a single API had been provided and huge differences existed in the stability order of these polymorphs between the many state-of-the-art computational methods. Unfortunately, experimental stability data was not available. At a Faraday Discussion in 2018, it was argued that CSP is becoming an essential component of industrial polymorph screening.¹⁸ This has been borne out by the recent publication of galunisertib,¹² where CSP studies were essential for structurally characterising several forms. However, the computational work showed that potentially more structures exist than experimentally observed and that predicting relative polymorph stabilities still present a serious challenge for computational methods. Since November 2019, the CCDC coordinates a CSP consortium for developing methods, including standardization of reports on computational results and a web-based platform to facilitate the management of CSP-generated knowledge.

A clear drive exists to develop CSP, and this Action aims to bring prediction of temperature dependent polymorph stabilities to the next level. Crystal structures of organic molecules are primarily predicted by anisotropic atom-atom force fields, semi-empirical methods (e.g. DFTB), quantum-chemical methods, such as density functional theory, or by hybrid QM embedded methods.^19,20 By far, the most common method for high-accuracy energy ranking estimations of crystalline organic materials based on their structure is the PBE (Perdew-Burke-Ernzerhof) density functional²¹ supplemented with a dispersion correction.²² Current CSP methods, including PBE, are not sufficiently accurate to determine the relative stabilities between polymorphs. In fact, errors in calculated lattice energies are usually 2 to 4 times larger than typical energy-differences between pairs of polymorphs, making it near impossible to obtain predictions with any degree of confidence (Figure 2). Hence, standard methods rely heavily on error cancellation effects. Recent efforts have focused on reducing these errors by accounting for long range dispersion interactions with increasing sophistication.^23-25 Also, lattice vibrations/thermal expansion have been included in the modelling,^26,27 whereas it is still debated whether such corrections are important.²⁸ There are also anecdotal examples in which commonly used density functionals perform surprisingly poorly, even leading to qualitatively incorrect predictions.^29,30 Machine learning (ML) is being used for CSP^31-33 and the results of the 7^th Blind Test may demonstrate its current progress. This Action may contribute to the high-quality data necessary for ML training sets.

Many common CSP methods have been benchmarked, most notably against the benchmark “X23” data set.^34-36 However, many publications remark that the small number of structures, the molecules’ limited size, flexibility, and chemical variation, while energies are based on sublimation data, make it difficult to pinpoint causes of systematic errors in the computational methods. It is also quite probable that further errors may occur for larger, more conformationally flexible molecules, not represented in this limited benchmark set. A strong need exists for new highly accurate experimental data, covering a wide range of compounds, temperatures, and methods. However, it is much easier to verify the precision of a measured physical quantity than its accuracy. The latter depends on “universal” calibration of the equipment, the absence of systematic errors in the method/equipment, purity of the sample, the reference etc. A quick look into databases such as SciFinder (Chemical Abstracts) and the CSD (Cambridge Structural Database) clearly demonstrates that essential properties such as melting points even of compounds like paracetamol are not well defined. This is in part caused by measurements carried out for simple descriptive values of chemical substances and not for accurate reference values. In a list of literature data, the data’s background and its quality can be hard to determine. In fact, a universal verification of experimental properties in the literature is lacking. Another problem with literature data is that some of the properties may be accurately defined, such as the crystal structure, but not the accompanying melting temperatures and enthalpies. For CSP evaluations, an accurate dataset (structure, melting point, enthalpy, …) is needed for a given chemical compound. IUPAC, CCDC, and NIST curate literature data; however, the critical difference with this Action is the feedback loop (see 1.2.1) in which experiment and evaluation are repeated to obtain the most accurate result.

For the experimental analysis of polymorphism, mainly differential scanning calorimetry (DSC) and single-crystal or powder X-ray diffraction (SC-XRD, PXRD) are used. The structures and transition enthalpies (∆H_tr) can be compared with CSP results. With proper calibration and sample preparation, DSC provides transition temperatures (T_tr) with a precision of ±0.3°C. Improper sample preparation or calibration, hard to detect in a report, increases the error considerably. Transition enthalpies by DSC are generally not precise; the error is about 5 J g^-1 (for paracetamol, 20 atoms, 151.2 g mol^-1, this equals 0.8 kJ mol^-1; thus, with increasing molar mass, the error in kJ mol^-1 increases). Errors are due to calibration, peak integration, sample quantity, weighing accuracy, and DSC sensitivity. Crystal quality and purity can contribute to a wrongly estimated ∆H_tr.^37,38 More accurate estimates of ∆H_tr can be obtained by adiabatic calorimetry (AC). Quantities are larger, reducing the error in weighing and in energy exchange (measured directly). T_tr can be determined with a precision down to ±0.1°C and ∆H_tr down to ±0.1 J g^-1. Transition kinetics are less critical in AC, but measurements can take several days or more. Pressure can be used to induce different polymorphs;³⁹ however, the focus in this proposal will be on temperature, because it causes by far the most difficulties in CSP and it is important in industrial applications. Pressure will be used as a thermodynamic variable, where applicable.

SC-XRD is the most accurate laboratory XRD and the unit-cell volume error is about 1%. Older literature data are less precise and lack accurate measurement temperatures. PXRD data has similar errors of about 1% in the unit cell volume. Differences between diffractometers are mostly due to the wavelength, calibration, and outlining. Contrary to calorimetric methods, the temperature in T-resolved XRD is less controlled through a warm/cold gas flow and its exp. error is therefore often large. So, although high-quality, high-resolution crystal structures exist,⁴⁰ their thermal expansion and related calorimetric data are often not accurate limiting their usefulness for the calibration of CSP results.

Many other exp. methods are used to study polymorphism and they cannot all be mentioned here. Some more important techniques are solid-state NMR, hot-stage microscopy (HSM), thermogravimetric analysis (TGA), synchrotron XRD, neutron scattering, IR and Raman spectroscopy, densitometry, solubility, heat of dissolution, vapour pressure measurements, while chromatography is used for purity determination or purification. Solubility and heat of dissolution provide information on polymorph stability and transition. Errors are large and may be several degrees and several J g^-1 or more. In addition, the solvent risks affecting thermodynamic equilibrium between polymorphs through interactions with the solid. Vapour pressure has errors similar to solubility, however, vapour pressure is a direct measure of the Gibbs energy, valuable for comparison with CSP results. HSM can be an extremely precise tool for the visualisation of a phase transition and its T_tr. TGA is an excellent method to identify solvates and hydrates. NMR data alone are not sufficient to solve structures but provide information about the asymmetric unit and about local disorder. Synchrotron and neutron sources can be used for unit cell volumes with high precision (errors <1%, although accuracy is lower) and crystal structure determination. Neutron measurements can take a long time and need large crystals but provide precise information about the position of hydrogen atoms in hydrogen bonds. Infrared and Raman spectroscopy, although useful to identify polymorphs, do not provide information on their stability. Densitometry is used to determine liquid densities, which cannot be obtained by XRD. Its error, depending on calibration, can be much smaller than 1%. Disorder may cause problems in structure determination and in calorimetry; however, it may also be an inherent, thermodynamically controlled, property of a polymorph.⁴¹ Sample purity, often not extensively verified, especially for newly synthesized compounds, may be an issue,⁴² but not necessarily, as it may also provide seeds for nucleation or for triggering phase transitions to more stable forms.⁴³ Impurities and disorder may lead to pretransitional effects. Although obviously such effects are part of the experimental landscape, for this benchmark study, compounds with well-defined transitions will be given priority to facilitate comparison with CSP results.

Specific Objectives

To achieve the main objective described in this MoU, the following specific objectives shall be accomplished:

01. Research Coordination

The primary objective is to improve the thermodynamic modelling used in computational crystal structure prediction (CSP), to validate experimental data, and to establish a freely accessible benchmark of thermodynamic data on organic crystalline polymorphs.

The accuracy of computational and experimental approaches will be compared to at least 20 equilibrium temperatures between two condensed phases.

The accuracy of computational and experimental approaches will be compared against at least 20 enthalpy differences between two condensed phases.

The accuracy of computational and experimental approaches will be compared against a benchmark of at least 40 highly accurate crystal structures.

CSP input will be streamlined through the establishment of standardized crystal structure modelling input files and the development of best practices for conducting precise quantum-chemical calculations in the domain of organic crystalline thermodynamics.

Streamlined acquisition of accurate experimental data on thermodynamic properties of organic crystalline polymorphs will be facilitated by the development of standardized measurement and analysis protocols. When feasible, the creation and generalization of polymorph preparation protocols will improve the ability to obtain polymorphs.

The comparison between experimental data and CSP results will be facilitated and streamlined by protocols, which will also serve as the foundation for educating young researchers and innovators. This initiative aims to cultivate a proficient community of researchers well-versed in how experimental and computational techniques complement each other.

The final benchmark data will be published in a fully public format (e.g. in an open access journal and/or on a website). An assessment of the benchmark data will be organised in a form similar to a Faraday discussion. The publicity WG will organise the final assessment.

Optional objectives of this Action involve physical property data of organic crystalline compounds such as solubility, vapour pressure, heat capacity by adiabatic calorimetry, Raman and NMR spectroscopy. Feasibility will depend on the molecule involved and the reliability and availability of the equipment.

02. Capacity Building

To establish a durable network of experts in organic crystal structure prediction and experimental polymorph analysis, fostering a fully synergistic approach to analyse organic crystalline polymorphs. This collaborative effort will facilitate the design of polymorphs and their associated properties.

Creating understanding and awareness among young researchers of the possibilities and limitations of computational and experimental methods to help them design realistic approaches to obtain properties and polymorphs. Young researchers and innovators possessing a combination of experimental and computational experience will prove highly valuable for industry.

Assessing methods that leverage machine learning techniques to enhance both experimental and computational approaches associated with polymorph design and preparation.

The network will investigate how it can be sustained beyond the lifetime of the Action. Early career scientists will be encouraged to be involved for example through education and additional scientific funding.