Artificial intelligence for natural product drug discovery

  • Dobson, P. D., Patel, Y. & Kell, D. B. ‘Metabolite-likeness’ as a criterion in the design and selection of pharmaceutical drug libraries. Drug Discov. Today 14, 31–40 (2009).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Newman, D. J. & Cragg, G. M. Natural products as sources of new drugs over the nearly four decades from 01/1981 to 09/2019. J. Nat. Prod. 83, 770–803 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Koehn, F. E. & Carter, G. T. The evolving role of natural products in drug discovery. Nat. Rev. Drug. Discov. 4, 206–220 (2005).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Terlouw, B. R. et al. MIBiG 3.0: a community-driven effort to annotate experimentally validated biosynthetic gene clusters. Nucleic Acids Res. 51, D603–D610 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Gavriilidou, A. et al. Compendium of specialized metabolite biosynthetic diversity encoded in bacterial genomes. Nat. Microbiol. 7, 726–735 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • van der Hooft, J. J. J. et al. Linking genomics and metabolomics to chart specialized metabolic diversity. Chem. Soc. Rev. 49, 3297–3314 (2020).

    Article 
    PubMed 

    Google Scholar
     

  • Doerr, S. et al. TorchMD: a deep learning framework for molecular simulations. J. Chem. Theory Comput. 17, 2355–2363 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Rodríguez-Espigares, I. et al. GPCRmd uncovers the dynamics of the 3D-GPCRome. Nat. Methods 17, 777–787 (2020).

    Article 
    PubMed 

    Google Scholar
     

  • Liu, X., IJzerman, A. P. & van Westen, G. J. P. Computational approaches for de novo drug design: past, present, and future. Methods Mol. Biol. 2190, 139–165 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Choudhury, C., Arul Murugan, N. & Priyakumar, U. D. Structure-based drug repurposing: traditional and advanced AI/ML-aided methods. Drug Discov. Today 27, 1847–1861 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Blin, K. et al. antiSMASH 6.0: improving cluster detection and comparison capabilities. Nucleic Acids Res. 49, W29–W35 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Skinnider, M. A. et al. Comprehensive prediction of secondary metabolite structure and biological activity from microbial genome sequences. Nat. Commun. 11, 6058 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Medema, M. H. & Fischbach, M. A. Computational approaches to natural product discovery. Nat. Chem. Biol. 11, 639–648 (2015).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Medema, M. H., de Rond, T. & Moore, B. S. Mining genomes to illuminate the specialized chemistry of life. Nat. Rev. Genet. 22, 553–571 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Cimermancic, P. et al. Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters. Cell 158, 412–421 (2014).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Hannigan, G. D. et al. A deep learning genome-mining strategy for biosynthetic gene cluster prediction. Nucleic Acids Res. 47, e110 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Carroll, L. M. et al. Accurate de novo identification of biosynthetic gene clusters with GECCO. Preprint at bioRxiv https://doi.org/10.1101/2021.05.03.442509 (2021).

  • Sanchez, S. et al. Expansion of novel biosynthetic gene clusters from diverse environments using SanntiS. Preprint at bioRxiv https://doi.org/10.1101/2023.05.23.540769 (2023).

  • Kloosterman, A. M. et al. Expansion of RiPP biosynthetic space through integration of pan-genomics and machine learning uncovers a novel class of lanthipeptides. PLoS Biol. 18, e3001026 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • de Los Santos, E. L. C. NeuRiPP: neural network identification of RiPP precursor peptides. Sci. Rep. 9, 13406 (2019).

    Article 

    Google Scholar
     

  • Merwin, N. J. et al. DeepRiPP integrates multiomics data to automate discovery of novel ribosomally synthesized natural products. Proc. Natl Acad. Sci. USA 117, 371–380 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Tietz, J. I. et al. A new genome-mining tool redefines the lasso peptide biosynthetic landscape. Nat. Chem. Biol. 13, 470–478 (2017).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Louwen, J. J. R. & van der Hooft, J. J. J. Comprehensive large-scale integrative analysis of omics data to accelerate specialized metabolite discovery. mSystems 6, e0072621 (2021).

    Article 
    PubMed 

    Google Scholar
     

  • Huber, F. et al. Spec2Vec: improved mass spectral similarity scoring through learning of structural relationships. PLoS Comput. Biol. 17, e1008724 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Huber, F., van der Burg, S., van der Hooft, J. J. J. & Ridder, L. MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra. J. Cheminform. 13, 84 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Ludwig, M. et al. Databse-independent molecular formula annotation using Gibbs sampling through ZODIAC. Nat. Mach. Intell. 2, 629–641 (2020).

    Article 

    Google Scholar
     

  • Hoffmann, M. A. et al. High-confidence structural annotation of metabolites absent from spectral libraries. Nat. Biotechnol. 40, 411–421 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Dührkop, K. et al. Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra. Nat. Biotechnol. 39, 462–471 (2021).

    Article 
    PubMed 

    Google Scholar
     

  • Kim, H. W. et al. NPClassifier: a deep neural network-based structural classification tool for natural products. J. Nat. Prod. 84, 2795–2807 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Aalizadeh, R., Nika, M.-C. & Thomaidis, N. S. Development and application of retention time prediction models in the suspect and non-target screening of emerging contaminants. J. Hazard. Mater. 363, 277–285 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Chen, D., Wang, Z., Guo, D., Orekhov, V. & Qu, X. Review and prospect: deep learning in nuclear magnetic resonance spectroscopy. Chemistry 26, 10391–10401 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Wu, K. et al. Improvement in signal-to-noise ratio of liquid-state NMR spectroscopy via a deep neural network DN-unet. Anal. Chem. 93, 1377–1382 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Ito, K., Xu, X. & Kikuchi, J. Improved prediction of carbonless NMR spectra by the machine learning of theoretical and fragment descriptors for environmental mixture analysis. Anal. Chem. 93, 6901–6906 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Li, D.-W., Hansen, A. L., Yuan, C., Bruschweiler-Li, L. & Brüschweiler, R. DEEP picker is a deep neural network for accurate deconvolution of complex two-dimensional NMR spectra. Nat. Commun. 12, 5229 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Zheng, S. et al. Deep learning driven biosynthetic pathways navigation for natural products with BioNavi-NP. Nat. Commun. 13, 3342 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Milanowski, D. J. et al. Unequivocal determination of caulamidines A and B: application and validation of new tools in the structure elucidation tool box. Chem. Sci. 9, 307–314 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Audoin, C. et al. Metabolome consistency: additional parazoanthines from the mediterranean zoanthid parazoanthus axinellae. Metabolites 4, 421–432 (2014).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Fox Ramos, A. E. et al. CANPA: computer-assisted natural products anticipation. Anal. Chem. 91, 11247–11252 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Jones, C. G. et al. The CryoEM method MicroED as a powerful tool for small molecule structure determination. ACS Cent. Sci. 4, 1587–1592 (2018).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Kim, L. J. et al. Prospecting for natural products by genome mining and microcrystal electron diffraction. Nat. Chem. Biol. 17, 872–877 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Dührkop, K., Shen, H., Meusel, M., Rousu, J. & Böcker, S. Searching molecular structure databases with tandem mass spectra using CSI:fingerID. Proc. Natl Acad. Sci. USA 112, 12580–12585 (2015).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Lindsay, R. K. Applications of Artificial Intelligence for Organic Chemistry: The DENDRAL Project (McGraw-Hill, 1980).

  • Dührkop, K. et al. SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information. Nat. Methods 16, 299–302 (2019).

    Article 
    PubMed 

    Google Scholar
     

  • Stravs, M. A., Dührkop, K., Böcker, S. & Zamboni, N. MSNovelist: de novo structure generation from mass spectra. Nat. Methods 19, 865–870 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Colby, S. M., Nuñez, J. R., Hodas, N. O., Corley, C. D. & Renslow, R. R. Deep learning to generate chemical property libraries and candidate molecules for small molecule identification in complex samples. Anal. Chem. 92, 1720–1729 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Burns, D. C., Mazzola, E. P. & Reynolds, W. F. The role of computer-assisted structure elucidation (CASE) programs in the structure elucidation of complex natural products. Nat. Prod. Rep. 36, 919–933 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Reher, R. et al. A convolutional neural network-based approach for the rapid annotation of molecularly diverse natural products. J. Am. Chem. Soc. 142, 4114–4120 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Kim, H. W., Zhang, C., Cottrell, G. W. & Gerwick, W. H. SMART‐Miner: a convolutional neural network‐based metabolite identification from 1H‐13C HSQC spectra. Magn. Reson. Chem. 60, 1070–1075 (2022).

    Article 
    PubMed 

    Google Scholar
     

  • Wang, C. et al. COLMAR lipids web server and ultrahigh-resolution methods for two-dimensional nuclear magnetic resonance- and mass spectrometry-based lipidomics. J. Proteome Res. 19, 1674–1683 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Smith, S. G. & Goodman, J. M. Assigning stereochemistry to single diastereoisomers by GIAO NMR calculation: the DP4 probability. J. Am. Chem. Soc. 132, 12946–12959 (2010).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Howarth, A., Ermanis, K. & Goodman, J. DP4-AI automated NMR data analysis: straight from spectrometer to structure. Chem. Sci. 11, 4351–4359 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Das, S., Edison, A. S. & Merz, K. M. Jr. Metabolite structure assignment using in silico NMR techniques. Anal. Chem. 92, 10412–10419 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Rodrigues, T., Reker, D., Schneider, P. & Schneider, G. Counting on natural products for drug design. Nat. Chem. 8, 531–541 (2016).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Lanz, J. & Riedl, R. Merging allosteric and active site binding motifs: de novo generation of target selectivity and potency via natural-product-derived fragments. ChemMedChem 10, 451–454 (2015).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Reker, D. et al. Revealing the macromolecular targets of complex natural products. Nat. Chem. 6, 1072–1078 (2014).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Wassermann, A. M. et al. A screening pattern recognition method finds new and divergent targets for drugs and natural products. ACS Chem. Biol. 9, 1622–1631 (2014).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Rollinger, J. M., Hornick, A., Langer, T., Stuppner, H. & Prast, H. Acetylcholinesterase inhibitory activity of scopolin and scopoletin discovered by virtual screening of natural products. J. Med. Chem. 47, 6248–6254 (2004).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Reker, D. et al. Machine learning uncovers food- and excipient-drug interactions. Cell Rep. 30, 3710–3716.e4 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Conde, J. et al. Allosteric antagonist modulation of TRPV2 by piperlongumine impairs glioblastoma progression. ACS Cent. Sci. 7, 868–881 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Lagunin, A., Filimonov, D. & Poroikov, V. Multi-targeted natural products evaluation based on biological activity prediction with PASS. Curr. Pharm. Des. 16, 1703–1717 (2010).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Sá, M. S. et al. Antimalarial activity of physalins B, D, F, and G. J. Nat. Prod. 74, 2269–2272 (2011).

    Article 
    PubMed 

    Google Scholar
     

  • Schneider, G. et al. Deorphaning the macromolecular targets of the natural anticancer compound doliculide. Angew. Chem. Int. Ed. Engl. 55, 12408–12411 (2016).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Bertoni, M. et al. Bioactivity descriptors for uncharacterized chemical compounds. Nat. Commun. 12, 3932 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 181, 475–483 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Yang, K. et al. Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model. 59, 3370–3388 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Liu, G. et al. Deep learning-guided discovery of an antibiotic targeting Acinetobacter baumannii. Nat. Chem. Biol. https://doi.org/10.1038/s41589-023-01349-8 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Pandey, M. et al. The transformational role of GPU computing and deep learning in drug discovery. Nat. Mach. Intell. 4, 211–221 (2022).

    Article 

    Google Scholar
     

  • Schindler, C. E. M. et al. Large-scale assessment of binding free energy calculations in active drug discovery projects. J. Chem. Inf. Model. 60, 5457–5474 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Walker, A. S. & Clardy, J. A machine learning bioinformatics method to predict biological activity from biosynthetic gene clusters. J. Chem. Inf. Model. 61, 2560–2571 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Yang, Z. et al. Deep-BGCpred: a unified deep learning genome-mining framework for biosynthetic gene cluster prediction. Preprint at bioRxiv https://doi.org/10.1101/2021.11.15.468547 (2021).

  • Mikolov, T., Chen, K., Corrado, G. & Dean, J. Efficient estimation of word representations in vector space. Preprint at arXiv. https://doi.org/10.48550/ARXIV.1301.3781 (2013).

  • Thaker, M. N. et al. Identifying producers of antibacterial compounds by screening for antibiotic resistance. Nat. Biotechnol. 31, 922–927 (2013).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Alcock, B. P. et al. CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database. Nucleic Acids Res. 48, D517–D525 (2020).

    CAS 
    PubMed 

    Google Scholar
     

  • Bortolaia, V. et al. ResFinder 4.0 for predictions of phenotypes from genotypes. J. Antimicrob. Chemother. 75, 3491–3500 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Mungan, M. D. et al. ARTS 2.0: feature updates and expansion of the antibiotic resistant target seeker for comparative genome mining. Nucleic Acids Res. 48, W546–W552 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Jia, B. et al. CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database. Nucleic Acids Res. 45, D566–D573 (2017).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Sélem-Mojica, N., Aguilar, C., Gutiérrez-García, K., Martínez-Guerrero, C. E. & Barona-Gómez, F. EvoMining reveals the origin and fate of natural product biosynthetic enzymes. Microb. Genom. 5, e000260 (2019).

    PubMed 
    PubMed Central 

    Google Scholar
     

  • Chevrette, M. G. et al. Evolutionary dynamics of natural product biosynthesis in bacteria. Nat. Prod. Rep. 37, 566–599 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Cereto-Massagué, A. et al. Molecular fingerprint similarity search in virtual screening. Methods 71, 58–63 (2015).

    Article 
    PubMed 

    Google Scholar
     

  • Willighagen, E. L. et al. The chemistry development kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J. Cheminform. 9, 33 (2017).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Todeschini, R. & Consonni, V. Handbook of Molecular Descriptors (John Wiley & Sons, 2008).

  • Skinnider, M. A., Dejong, C. A., Franczak, B. C., McNicholas, P. D. & Magarvey, N. A. Comparative analysis of chemical similarity methods for modular natural products with a hypothetical structure enumeration algorithm. J. Cheminform. 9, 46 (2017).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Riniker, S. & Landrum, G. A. Open-source platform to benchmark fingerprints for ligand-based virtual screening. J. Cheminform. 5, 26 (2013).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • O’Boyle, N. M. & Sayle, R. A. Comparing structural fingerprints using a literature-based similarity benchmark. J. Cheminform. 8, 36 (2016).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Grisoni, F. et al. Scaffold hopping from natural products to synthetic mimetics by holistic molecular similarity. Commun. Chem. 1, 44 (2018).

    Article 

    Google Scholar
     

  • Capecchi, A., Probst, D. & Reymond, J.-L. One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome. J. Cheminform. 12, 43 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Capecchi, A. & Reymond, J.-L. Assigning the origin of microbial natural products by chemical space map and machine learning. Biomolecules 10, 1385 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Riniker, S. Molecular dynamics fingerprints (MDFP): machine learning from MD data to predict free-energy differences. J. Chem. Inf. Model. 57, 726–741 (2017).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Esposito, C., Wang, S., Lange, U. E. W., Oellien, F. & Riniker, S. Combining machine learning and molecular dynamics to predict p-glycoprotein substrates. J. Chem. Inf. Model. 60, 4730–4749 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Bannan, C. C. et al. Blind prediction of cyclohexane–water distribution coefficients from the SAMPL5 challenge. J. Comput. Aided Mol. Des. 30, 927–944 (2016).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wang, S. & Riniker, S. Use of molecular dynamics fingerprints (MDFPs) in SAMPL6 octanol-water log P blind challenge. J. Comput. Aided Mol. Des. 34, 393–403 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Gorostiola González, M. et al. 3DDPDs: describing protein dynamics for proteochemometric bioactivity prediction. A case for (mutant) G protein-coupled receptors. Preprint at ChemRxiv https://doi.org/10.26434/chemrxiv-2023-90082 (2023).

  • Durairaj, J., Akdel, M., de Ridder, D. & van Dijk, A. D. J. Geometricus represents protein structures as shape-mers derived from moment invariants. Bioinformatics 36, i718–i725 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Paull, K. D. et al. Display and analysis of patterns of differential activity of drugs against human tumor cell lines: development of mean graph and COMPARE algorithm. J. Natl Cancer Inst. 81, 1088–1092 (1989).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Kauvar, L. M. et al. Predicting ligand binding to proteins by affinity fingerprinting. Chem. Biol. 2, 107–118 (1995).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Petrone, P. M. et al. Rethinking molecular similarity: comparing compounds on the basis of biological activity. ACS Chem. Biol. 7, 1399–1409 (2012).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Norinder, U., Spjuth, O. & Svensson, F. Using predicted bioactivity profiles to improve predictive modeling. J. Chem. Inf. Model. 60, 2830–2837 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Mater, A. C. & Coote, M. L. Deep learning in chemistry. J. Chem. Inf. Model. 59, 2545–2559 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Bronstein, M. M., Bruna, J., Cohen, T. & Veličković, P. Geometric deep learning: grids, groups, graphs, geodesics, and gauges. Preprint at arXiv. https://doi.org/10.48550/arXiv.2104.13478 (2021).

  • Wu, Z. et al. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • van Tilborg, D., Alenicheva, A. & Grisoni, F. Exposing the limitations of molecular machine learning with activity cliffs. J. Chem. Inf. Model. 62, 5938–5951 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Samek, W., Montavon, G., Vedaldi, A., Hansen, L. K. & Müller, K.-R. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (Springer Nature, 2019).

  • Jiménez-Luna, J., Grisoni, F. & Schneider, G. Drug discovery with explainable artificial intelligence. Nat. Mach. Intell. 2, 573–584 (2020).

    Article 

    Google Scholar
     

  • Jiménez-Luna, J., Skalic, M., Weskamp, N. & Schneider, G. Coloring molecules with explainable artificial intelligence for preclinical relevance assessment. J. Chem. Inf. Model. 61, 1083–1094 (2021).

    Article 
    PubMed 

    Google Scholar
     

  • Preuer, K., Klambauer, G., Rippmann, F., Hochreiter, S. & Unterthiner, T. in Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (eds Samek, W., Montavon, G., Vedaldi, A., Hansen, L. K. & Müller, K.-R.) 331–345 (Springer International Publishing, 2019).

  • Webel, H. E. et al. Revealing cytotoxic substructures in molecules using deep learning. J. Comput. Aided Mol. Des. 34, 731–746 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Kearnes, S., McCloskey, K., Berndl, M., Pande, V. & Riley, P. Molecular graph convolutions: moving beyond fingerprints. J. Comput. Aided Mol. Des. 30, 595–608 (2016).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Coley, C. W., Barzilay, R., Green, W. H., Jaakkola, T. S. & Jensen, K. F. Convolutional embedding of attributed molecular graphs for physical property prediction. J. Chem. Inf. Model. 57, 1757–1772 (2017).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Duvenaud, D. et al. Convolutional networks on graphs for learning molecular fingerprints. in Advances in Neural Information Processing Systems 28 (NIPS 015).

  • Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. in Proceedings of the 34th International Conference on Machine Learning 1263–1272 (2017).

  • Nguyen, T. et al. GraphDTA: predicting drug-target binding affinity with graph neural networks. Bioinformatics 37, 1140–1147 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Yuan, W. et al. Chemical space mimicry for drug discovery. J. Chem. Inf. Model. 57, 875–882 (2017).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Segler, M. H. S., Kogej, T., Tyrchan, C. & Waller, M. P. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4, 120–131 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Liu, X., Ye, K., van Vlijmen, H. W. T., IJzerman, A. P. & van Westen, G. J. P. DrugEx v3: scaffold-constrained drug design with graph transformer-based reinforcement learning. J. Cheminform. 15, 24 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Li, X. & Fourches, D. Inductive transfer learning for molecular activity prediction: next-gen QSAR models with MolPMoFiT. J. Cheminform. 12, 27 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Karpov, P., Godin, G. & Tetko, I. V. Transformer-CNN: Swiss knife for QSAR modeling and interpretation. J. Cheminform. 12, 17 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Gainza, P. et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods 17, 184–192 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Winter, R., Montanari, F., Noé, F. & Clevert, D.-A. Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations. Chem. Sci. 10, 1692–1701 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Bjerrum, E. J. & Sattarov, B. Improving chemical autoencoder latent space and molecular generation diversity with heteroencoders. Biomolecules 8, 131 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Atz, K., Grisoni, F. & Schneider, G. Geometric deep learning on molecular representations. Nat. Mach. Intell. 3, 1023–1032 (2021).

    Article 

    Google Scholar
     

  • Callaway, E. After AlphaFold: protein-folding contest seeks next big breakthrough. Nature 613, 13–14 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Wallner, B. AFsample: improving multimer prediction with alphafold using aggressive sampling. Preprint at bioRxiv https://doi.org/10.1101/2022.12.20.521205 (2022).

  • Bender, A. & Cortés-Ciriano, I. Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 1: ways to make an impact, and why we are not there yet. Drug Discov. Today 26, 511–524 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Bender, A. & Cortés-Ciriano, I. Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 2: a discussion of chemical and biological data. Drug Discov. Today 26, 1040–1052 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Sydow, D., Rodríguez-Guerra, J. & Volkamer, A. in Teaching Programming across the Chemistry Curriculum 135–158 ACS Symposium Series vol. 1387 (American Chemical Society, 2021).

  • Korshunova, M., Ginsburg, B., Tropsha, A. & Isayev, O. OpenChem: a deep learning toolkit for computational chemistry and drug design. J. Chem. Inf. Model. 61, 7–13 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Sieg, J., Flachsenberg, F. & Rarey, M. In need of bias control: evaluating chemical data for machine learning in structure-based virtual screening. J. Chem. Inf. Model. 59, 947–961 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Lenselink, E. B. et al. Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set. J. Cheminform. 9, 45 (2017).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Topçuoğlu, B. D., Lesniak, N. A., Ruffin, M. T. 4th, Wiens, J. & Schloss, P. D. A framework for effective application of machine learning to microbiome-based classification problems. MBio 11, e00434-20 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Quinn, T. P. & Erb, I. Examining microbe–metabolite correlations by linear methods. Nat. Methods 18, 37–39 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Morger, A. et al. KnowTox: pipeline and case study for confident prediction of potential toxic effects of compounds in early phases of development. J. Cheminform. 12, 24 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Soleimany, A. P. et al. Evidential deep learning for guided molecular property prediction and discovery. ACS Cent. Sci. 7, 1356–1367 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Manica, M. et al. Toward explainable anticancer compound sensitivity prediction via multimodal attention-based convolutional encoders. Mol. Pharm. 16, 4797–4806 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Grinsztajn, L., Oyallon, E. & Varoquaux, G. in Advances in Neural Information Processing Systems 35 (NeurIPS 2022) 507–520 (2022).

  • Chithrananda, S., Grand, G. & Ramsundar, B. ChemBERTa: large-scale self-supervised pretraining for molecular property prediction. Preprint at https://doi.org/10.48550/arXiv.2010.09885 (2020).

  • Irwin, R., Dimitriadis, S., He, J. & Bjerrum, E. J. Chemformer: a pre-trained transformer for computational chemistry. Mach. Learn. Sci. Technol. 3, 015022 (2022).

    Article 

    Google Scholar
     

  • Chapelle, O., Zien, A. & Schölkopf, B. (Eds) Semi-Supervised Learning (MIT, 2006).

  • Zhang, Y. & Lee, A. A. Bayesian semi-supervised learning for uncertainty-calibrated prediction of molecular properties and active learning. Chem. Sci. 10, 8154–8163 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Röttig, M. et al. NRPSpredictor2—a web server for predicting NRPS adenylation domain specificity. Nucleic Acids Res. 39, W362–W367 (2011).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Torrey, L. & Shavlik, J. in Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques 242–264 (IGI Global, 2010).

  • Cai, C. et al. Transfer learning for drug discovery. J. Med. Chem. 63, 8683–8694 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Moret, M., Helmstädter, M., Grisoni, F., Schneider, G. & Merk, D. Beam search for automated design and scoring of novel ROR ligands with machine intelligence. Angew. Chem. Int. Ed. Engl. 60, 19477–19482 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Moret, M., Friedrich, L., Grisoni, F., Merk, D. & Schneider, G. Generative molecular design in low data regimes. Nat. Mach. Intell. 2, 171–180 (2020).

    Article 

    Google Scholar
     

  • Moret, M. et al. Leveraging molecular structure and bioactivity with chemical language models for de novo drug design. Nat. Commun. 14, 114 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Reker, D. Practical considerations for active machine learning in drug discovery. Drug Discov. Today Technol. 32–33, 73–79 (2019).

    Article 
    PubMed 

    Google Scholar
     

  • Reker, D., Schneider, P. & Schneider, G. Multi-objective active machine learning rapidly improves structure-activity models and reveals new protein-protein interaction inhibitors. Chem. Sci. 7, 3919–3927 (2016).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Djoumbou Feunang, Y. et al. ClassyFire: automated chemical classification with a comprehensive, computable taxonomy. J. Cheminform. 8, 61 (2016).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Reher, R. et al. Native metabolomics identifies the rivulariapeptolide family of protease inhibitors. Nat. Commun. 13, 4619 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Olivecrona, M., Blaschke, T., Engkvist, O. & Chen, H. Molecular de-novo design through deep reinforcement learning. J. Cheminform. 9, 48 (2017).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Popova, M., Isayev, O. & Tropsha, A. Deep reinforcement learning for de novo drug design. Sci. Adv. 4, eaap7885 (2018).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Liu, X., Ye, K., van Vlijmen, H. W. T., IJzerman, A. P. & van Westen, G. J. P. An exploration strategy improves the diversity of de novo ligands using deep reinforcement learning: a case for the adenosine A2A receptor. J. Cheminform. 11, 35 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Coley, C. W. et al. A robotic platform for flow synthesis of organic compounds informed by AI planning. Science 365, eaax1566 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Thakkar, A., Kogej, T., Reymond, J.-L., Engkvist, O. & Bjerrum, E. J. Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain. Chem. Sci. 11, 154–168 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Koch, M., Duigou, T. & Faulon, J.-L. Reinforcement learning for bioretrosynthesis. ACS Synth. Biol. 9, 157–168 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Kramer, C., Kalliokoski, T., Gedeck, P. & Vulpetti, A. The experimental uncertainty of heterogeneous public ki data. J. Med. Chem. 55, 5165–5173 (2012).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Tiikkainen, P., Bellis, L., Light, Y. & Franke, L. Estimating error rates in bioactivity databases. J. Chem. Inf. Model. 53, 2499–2505 (2013).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Sorokina, M. & Steinbeck, C. Review on natural products databases: where to find data in 2020. J. Cheminform. 12, 1–51 (2020).

    Article 

    Google Scholar
     

  • Mendez, D. et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 47, D930–D940 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Liu, T., Lin, Y., Wen, X., Jorissen, R. N. & Gilson, M. K. BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Res. 35, D198–D201 (2007).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Wimalaratne, S. M. et al. Uniform resolution of compact identifiers for biomedical data. Sci. Data 5, 180029 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Rajan, K., Zielesny, A. & Steinbeck, C. DECIMER 1.0: deep learning for chemical image recognition using transformers. J. Cheminformatics 13, 61 (2021).

    Article 

    Google Scholar
     

  • Rajan, K., Brinkhaus, H. O., Sorokina, M., Zielesny, A. & Steinbeck, C. DECIMER-segmentation: automated extraction of chemical structure depictions from scientific literature. J. Cheminform. 13, 20 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Schymanski, E. L. & Bolton, E. E. FAIR chemical structures in the Journal of Cheminformatics. J. Cheminform. 13, 50 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Kautsar, S. A. et al. MIBiG 2.0: a repository for biosynthetic gene clusters of known function. Nucleic Acids Res. 48, D454–D458 (2020).

    PubMed 

    Google Scholar
     

  • van Santen, J. A. et al. The natural products atlas: an open access knowledge base for microbial natural products discovery. ACS Cent. Sci. 5, 1824–1833 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • van Santen, J. A. et al. The natural products atlas 2.0: a database of microbially-derived natural products. Nucleic Acids Res. 50, D1317–D1323 (2021).

    PubMed Central 

    Google Scholar
     

  • Wang, M. et al. Sharing and community curation of mass spectrometry data with global natural products social molecular networking. Nat. Biotechnol. 34, 828–837 (2016).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wishart, D. S. et al. NP-MRD: the natural products magnetic resonance database. Nucleic Acids Res. 50, D665–D677 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Flissi, A. et al. Norine: update of the nonribosomal peptide resource. Nucleic Acids Res. 48, D465–D469 (2020).

    CAS 
    PubMed 

    Google Scholar
     

  • Jarmusch, S. A., van der Hooft, J. J. J., Dorrestein, P. C. & Jarmusch, A. K. Advancements in capturing and mining mass spectrometry data are transforming natural products research. Nat. Prod. Rep. 38, 2066–2082 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Jarmusch, A. K. et al. ReDU: a framework to find and reanalyze public mass spectrometry data. Nat. Methods 17, 901–904 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Proteau, P. J. Journal of Natural Products 2022: perspectives, monthly cover art, and more. J. Nat. Products 85, 1–2 (2022).

    Article 
    CAS 

    Google Scholar
     

  • Clark, T. N. et al. Interlaboratory comparison of untargeted mass spectrometry data uncovers underlying causes for variability. J. Nat. Prod. 84, 824–835 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Fiehn, O. et al. The metabolomics standards initiative (MSI). Metabolomics 3, 175–178 (2007).

    Article 
    CAS 

    Google Scholar
     

  • Frank, A. M. et al. Clustering millions of tandem mass spectra. J. Proteome Res. 7, 113–122 (2008).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Miller, I. J. et al. Autometa: automated extraction of microbial genomes from individual shotgun metagenomes. Nucleic Acids Res. 47, e57 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Schymanski, E. L. et al. Identifying small molecules via high resolution mass spectrometry: communicating confidence. Environ. Sci. Technol. 48, 2097–2098 (2014).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Deutsch, E. W. et al. Universal spectrum identifier for mass spectra. Nat. Methods 18, 768–770 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Bittremieux, W. et al. Universal MS/MS visualization and retrieval with the metabolomics spectrum resolver web service. Preprint at BioRxiv https://doi.org/10.1101/2020.05.09.086066 (2020).

  • Gordon, J. E. Chemical inference. 2. formalization of the language of organic chemistry: generic systematic nomenclature. J. Chem. Inf. Comput. Sci. 24, 81–92 (1984).

    Article 
    CAS 

    Google Scholar
     

  • Wang, Y. et al. PubChem’s bioassay database. Nucleic Acids Res. 40, D400–D412 (2012).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Banerjee, P. et al. Super Natural II—a database of natural products. Nucleic Acids Res. 43, D935–D939 (2015).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Zeng, X. et al. NPASS: natural product activity and species source database for natural product research, discovery and tool development. Nucleic Acids Res. 46, D1217–D1222 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • van der Hooft, J. J. J. A community-driven paired data platform to accelerate natural product mining by combining structural information from genomes and metabolomes. Preprint at https://doi.org/10.18174/fairdata2018.16286 (2018).

  • Eldjárn, G. H. et al. Ranking microbial metabolomic and genomic links in the NPLinker framework using complementary scoring functions. PLoS Comput. Biol. 17, e1008920 (2021).

    Article 

    Google Scholar
     

  • Schorn, M. A. et al. A community resource for paired genomic and metabolomic data mining. Nat. Chem. Biol. 17, 363–368 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Doroghazi, J. R. et al. A roadmap for natural product discovery based on large-scale genomics and metabolomics. Nat. Chem. Biol. 10, 963–968 (2014).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • McClure, R. A. et al. Elucidating the rimosamide-detoxin natural product families and their biosynthesis using metabolite/gene cluster correlations. ACS Chem. Biol. 11, 3452–3460 (2016).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Goering, A. W. et al. Metabologenomics: correlation of microbial gene clusters with metabolites drives discovery of a nonribosomal peptide with an unusual amino acid monomer. ACS Cent. Sci. 2, 99–108 (2016).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Parkinson, E. I. et al. Discovery of the tyrobetaine natural products and their biosynthetic gene cluster via metabologenomics. ACS Chem. Biol. 13, 1029–1037 (2018).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Caesar, L. K. et al. Correlative metabologenomics of 110 fungi reveals metabolite-gene cluster pairs. Nat. Chem. Biol. 19, 846–854 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Soldatou, S. et al. Comparative metabologenomics analysis of polar actinomycetes. Mar. Drugs 19, 103 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Sulheim, S. et al. Enzyme-constrained models and omics analysis of streptomyces coelicolor reveal metabolic changes that enhance heterologous production. iScience 23, 101525 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Amos, G. C. A. et al. Comparative transcriptomics as a guide to natural product discovery and biosynthetic gene cluster functionality. Proc. Natl Acad. Sci. USA 114, E11121–E11130 (2017).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wandy, J. & Daly, R. GraphOmics: an interactive platform to explore and integrate multi-omics data. BMC Bioinform. 22, 603 (2021).

    Article 

    Google Scholar
     

  • Eren, A. M. et al. Community-led, integrated, reproducible multi-omics with anvi’o. Nat. Microbiol. 6, 3–6 (2020).

    Article 

    Google Scholar
     

  • Sorokina, M., Merseburger, P., Rajan, K., Yirik, M. A. & Steinbeck, C. COCONUT online: collection of open natural products database. J. Cheminform. 13, 2 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Rutz, A. et al. The LOTUS initiative for open knowledge management in natural products research. eLife 11, e70780 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Chen, Y., Stork, C., Hirte, S. & Kirchmair, J. NP-scout: machine learning approach for the quantification and visualization of the natural product-likeness of small molecules. Biomolecules 9, 43 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Cao, L. et al. MolDiscovery: learning mass spectrometry fragmentation of small molecules. Nat. Commun. 12, 3718 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Visser, U. et al. BioAssay Ontology (BAO): a semantic description of bioassays and high-throughput screening results. BMC Bioinform. 12, 257 (2011).

    Article 

    Google Scholar
     

  • Sarntivijai, S. et al. CLO: the cell line ontology. J. Biomed. Semant. 5, 37 (2014).

    Article 

    Google Scholar
     

  • Shoemaker, R. H. The NCI60 human tumour cell line anticancer drug screen. Nat. Rev. Cancer 6, 813–823 (2006).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Cooper, M. A. A community-based approach to new antibiotic discovery. Nat. Rev. Drug. Discov. 14, 587–588 (2015).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Cech, N. B., Medema, M. H. & Clardy, J. Benefiting from big data in natural products: importance of preserving foundational skills and prioritizing data quality. Nat. Prod. Rep. 38, 1947–1953 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Blin, K., Shaw, S., Kautsar, S. A., Medema, M. H. & Weber, T. The antiSMASH database version 3: increased taxonomic coverage and new query features for modular enzymes. Nucleic Acids Res. 49, D639–D643 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Horai, H. et al. MassBank: a public repository for sharing mass spectral data for life sciences. J. Mass. Spectrom. 45, 703–714 (2010).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Haug, K. et al. MetaboLights: a resource evolving in response to the needs of its scientific community. Nucleic Acids Res. 48, D440–D444 (2020).

    CAS 
    PubMed 

    Google Scholar
     

  • Kuhn, S. & Schlörer, N. E. Facilitating quality control for spectra assignments of small organic molecules: nmrshiftdb2–a free in-house NMR database with integrated LIMS for academic service laboratories. Magn. Reson. Chem. 53, 582–589 (2015).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Irwin, J. J. et al. ZINC20—a free ultralarge-scale chemical database for ligand discovery. J. Chem. Inf. Model. 60, 6065–6073 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Hastings, J. et al. ChEBI in 2016: improved services and an expanding collection of metabolites. Nucleic Acids Res. 44, D1214–D1219 (2016).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Martens, M. et al. WikiPathways: connecting communities. Nucleic Acids Res. 49, D613–D621 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Jassal, B. et al. The reactome pathway knowledgebase. Nucleic Acids Res. 48, D498–D503 (2020).

    CAS 
    PubMed 

    Google Scholar
     

  • Blaskovich, M. A. T., Zuegg, J., Elliott, A. G. & Cooper, M. A. Helping chemists discover new antibiotics. ACS Infect. Dis. 1, 285–287 (2015).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Waagmeester, A. et al. Wikidata as a knowledge graph for the life sciences. eLife 9, e52614 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Reker, D., Rodrigues, T., Schneider, P. & Schneider, G. Target prediction by cascaded self-organizing maps for ligand de-orphaning and side-effect investigation. J. Cheminform. 6, P47 (2014).

    Article 
    PubMed Central 

    Google Scholar
     

  • Navarro-Muñoz, J. C. et al. A computational framework to explore large-scale biosynthetic diversity. Nat. Chem. Biol. 16, 60–68 (2020).

    Article 
    PubMed 

    Google Scholar
     

  • van der Hooft, J. J. J., Wandy, J., Barrett, M. P., Burgess, K. E. V. & Rogers, S. Topic modeling for untargeted substructure exploration in metabolomics. Proc. Natl Acad. Sci. USA 113, 13738–13743 (2016).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Reymond, J.-L. The chemical space project. Acc. Chem. Res. 48, 722–730 (2015).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug. Deliv. Rev. 46, 3–26 (2001).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Janssen, A. P. A. et al. Drug discovery maps, a machine learning model that visualizes and predicts kinome–inhibitor interaction landscapes. J. Chem. Inf. Model. 59, 1221–1229 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • McInnes, L., Healy, J., Saul, N. & Großberger, L. UMAP: uniform manifold approximation and projection. J. Open. Source Softw. 3, 861 (2018).

    Article 

    Google Scholar
     

  • Probst, D. & Reymond, J.-L. Visualization of very large high-dimensional data sets as minimum spanning trees. J. Cheminform. 12, 12 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Feher, M. & Schmidt, J. M. Property distributions: differences between drugs, natural products, and molecules from combinatorial chemistry. J. Chem. Inf. Comput. Sci. 43, 218–227 (2003).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Béquignon, O. J. M. et al. Papyrus: a large-scale curated dataset aimed at bioactivity predictions. J. Cheminform. 15, 3 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar