FDA Substance Index SPL Files Converted to SD Format

Release 1 | Release 2 | Release 3 |
ncilogo

New: Release 3 File Series - February 2015

Over 46,000 Substance Index SPL Files Released by FDA through February 2015

An SD file created from 46,611 SPL file released by FDA through February 2015 at http://www.fda.gov/ForIndustry/DataStandards/StructuredProductLabeling/ucm377913.htm. The number of molfile blocks (molecules) in this SD file is 61,178, i.e. some SPL files yielded more than one structure, typically if they represented mixtures. The conversion of SPL to SDF was done with the same methodology and field names as for Release 2.

61,178 structures in SDF format. This is a 40 MB gzipped file that uncompresses to about 323 MB.

Download

Release 2 File Series - September 2014

Over 43,000 Substance Index SPL Files Released by FDA September 2014

An SD file created from 43,440 SPL file released by FDA in September 2014 (Part One) at http://www.fda.gov/ForIndustry/DataStandards/StructuredProductLabeling/ucm377913.htm. The number of molfile blocks (molecules) in this SD file is 57,137, i.e. some SPL files yielded more than one structure, typically if they represented mixtures. The conversion of SPL to SDF was done with a script written in the chemoinformatics toolkit CACTVS. The SD file contains the tags: E_UNII, E_AUTHOR, SPL_FILENAME, COMPONENTS_TOTAL, COMPONENTS_NO. It also has E_NAME and E_NAMESET if this type of information was present in the SPL file and parsed out by CACTVS. Each SPL file contains the unique ingredient identifier (UNII) and the defining characteristics (e.g., chemical structures) of a substance. Note that FDA does not any more endorse specific or preferred names for substances. However, names and additional identifiers have been provided at http://fdasis.nlm.nih.gov/srs/jsp/srs/uniiListDownload.jsp. We have therefore folded data from the UNII Data files into the SD file, using the field names
UNII_ANNOTATION_PREFERRED_TERM
UNII_ANNOTATION_RN
UNII_ANNOTATION_MF
UNII_ANNOTATION_INCHIKEY
UNII_ANNOTATION_EINECS_REG_ID
UNII_ANNOTATION_NCI_THESAURUS_CONCEPT_CODE
UNII_ANNOTATION_ITIS_TAXONOMIC_SERIAL_NUMBER
UNII_ANNOTATION_NCBI_TAXONOMY_ORGANISM_ID
UNII_ANNOTATION_USDA_PLANTS_ORGANISM_ID
UNII_ANNOTATION_SMILES
UNII_ANNOTATION_INN_ID
UNII_ANNOTATION_UNII_TYPE
Names present in approx. 4,400 previously released SPL files were preserved in the SD file. For additional name-to-UNII mapping other services can be used such as the SRS NLM site or PubChem.
Note: No verification or curation of these data has been performed, only format conversion.

57,137 structures in SDF format. This is a 38 MB gzipped file that uncompresses to about 302 MB.

Download

Release 1 File Series - May 2014

Over 40,000 Substance Index SPL Files Released by FDA May 2014

An SD file created from 41,032 SPL file released by FDA in May 2014 at http://www.fda.gov/ForIndustry/DataStandards/StructuredProductLabeling/ucm377913.htm. The number of molfile blocks (molecules) in this SD file is 53,804, i.e. some SPL files yielded more than one structure, typically if they represented mixtures. The conversion of SPL to SDF was done with a script written in the chemoinformatics toolkit CACTVS. The SD file contains the tags: E_UNII, E_AUTHOR, SPL_FILENAME, COMPONENTS_TOTAL, COMPONENTS_NO. It also has E_NAME and E_NAMESET if this type of information was present in the SPL file and parsed out by CACTVS. Each SPL file contains the unique ingredient identifier (UNII) and the defining characteristics (e.g., chemical structures) of a substance. Note that FDA does not any more endorse specific or preferred names for substances. Names present in approx. 4,400 previously released SPL files were preserved in the SD file. For name-to-UNII mapping other services can be used such as the SRS NLM site or PubChem.

53,804 structures in SDF format. This is a 32 MB gzipped file that uncompresses to about 255 MB.

Download

M. C. Nicklaus

Last Update: 2017-10-16

s