NCI/CADD iRL-Based Database of Commercially Offered Screening Compounds

Release 1
ncilogo

Release 1 File Series - July 2019

This is a database put together by the CADD Group of structures from the iResearch™ Library (iRL) of commercial screening compounds in the framework of the agreement and long-standing collaboration with ChemNavigator (http://www.chemnavigator.com/nih.asp). This is a cumulative collection compiled from all quarterly releases available to us (2004Q3 – 2017Q4) and is therefore larger than any individual iRL release. It contains just over 140 million structures. We have annotated each structure with the first date (release) it appeared in the iRL as well as the last time it was present. It contains only unique structure entries, with the ChemNavigator Structure ID (SID – not to be confused with PubChem’s SID) as the unique ID, which has been transferred as the value of the field PUBCHEM_EXT_DATASOURCE_REGID in the SD files. For up-to-date info about actual samples (current availability, purity, pricing etc.) please consult the appropriate ChemNavigator/MilliporeSigma web pages.

Notes:

(1) We have corrected a number of entries that were chemically incorrect, mostly structures that seemed to have been mis-parsed from the original supplier’s data that contained one of the commonly used functional group abbreviations that clash with element symbols, most prominently Ac, which clearly meant acetyl but was interpreted as the radioactive element actinium.

(2) Subsets of the iRL releases categorized as virtual have not been included in this collection. The assumption is that most of these molecules did at least at some point in time exist as a real physical sample.

(3) These files have been uploaded to PubChem for inclusion in the PubChem database.

Example with explanation of fields in the SD files:

> <PUBCHEM_EXT_DATASOURCE_REGID>

369383873   -- This is the ChemNavigator SID;

> <CHEMNAV_FIRST_AVAILABLE_IN_RELEASE>

2012Q3   -- First time we encountered this structure in an iRL release;

> <CHEMNAV_LAST_AVAILABLE_IN_RELEASE>

2017Q4   -- Most recent iRL release in which this structure occurred;

> <NCICADD_SMILES>

C(NCCN[S](=O)(=O)C1=CC=CC2=C1C=CC=C2N(C)C)(C)=O   -- CACTVS-calculated SMILES;

> <NCICADD_FICUS_ID>   -- See (1);

242DD390E29000DF-FICuS-01-4D

> <NCICADD_UUUUU_ID>   -- See (1);

242DD390E29000DF-uuuuu-01-FC

> <PUBCHEM_EXT_DATASOURCE_URL>

https://cactus.nci.nih.gov/download/ncicadd_irl   -- The URL for this database on our server;

$$$$

Acknowledgements:

We thank Bret Daniel from ChemNavigator/MilliporeSigma for his help in backfilling some of the really old iRL releases from the early 2000’s.

(1) Sitzmann, M.; Filippov, I. V.; Nicklaus, M. C. Internet Resources Integrating Many Small-Molecule Databases. SAR QSAR Environ. Res. 2008, 19 (1–2), 1–9. https://doi.org/10.1080/10629360701843540.

M. C. Nicklaus

Last Update: 2019-08-14