Substance Indexing Mapping File
Substance Indexing Mapping File to more than 124,000 SPL Files Released by FDA through January 2022
Since FDA has started posting Substance Index SPL files not just for small-molecule based substances but also for biological substances (therapeutic proteins and biological organisms), it is no longer practical to convert these files to SD format. Instead we provide a mapping file that has direct links to the SPL files containing the substance definitions. The structure of these files is described in sections 1, 2 and 14 of the SPL implementation guide https://www.fda.gov/downloads/forindustry/datastandards/structuredproductlabeling/ucm321876.pdf. Users are strongly encouraged to use this method of accessing structure definitions for UNIIs and not the SD files of the previous releases since those files may contain structures that correspond to an older version of the UNII definition. A change in the UNII definition triggers creation of a new version of the SPL file by FDA.
SPL Mapping file with 129,085 hecords. This is a 12 MB zip file that unzips to about 35 MB.
DownloadIf, for whatever reason, you are interested in older versions of the mapping file, please see here.
Note that the URLs provided in the downloadable file SubstanceIndexingAllVersionsPublic.dat (after decompression) should not be opened with a web browser. They are exclusively meant for data download. The SPL .xml files referenced in the URLs should be downloaded and the data (including the chemical structures) then extracted/visualized with your favorite tool or programming language. (For example, the Structure Browser (csbr tool) of the CACTVS chemoinformatics toolkit (Xemistry GmbH) can read these .xml files and display the structures contained in them.)
Format of the mapping file: SubstanceIndexingAllVersionsPublic.dat is a pipe (|) delimited file that maps FDA UNIIs to the substance definitions they represent.
Column | Description |
---|---|
UNII | FDA Unique Ingredient Identifier |
Hash Code | Computed from the definition |
Citation | Reference to a substance definition not maintained by FDA |
Accessdata Link | Direct link to the substance definition |
Document Version Id | Uniquely identifies the version of an SPL document with a substance definition |
Document Version Set Id | Uniquely identifies a group of versions of an SPL document |
Document Version Number | Version number of the SPL document |
Document Submission Time | Date when the version was uploaded |
Replaced by Document Id | Identifies the next version of the SPL document. Empty for the most current version. |
New Set Id | Indicates if the SPL document was deprecated (UNII merge or UNII split) |
Structured Product Labeling (SPL) Implementation Guide with Validation Procedures
Version 1, Revision 2021-04-16, of the SPL Implementation Guide
A PDF file with the Version 1, Revision 2021-04-16, of the Structured Product Labeling (SPL) Implementation Guide with Validation Procedures for substances.
Release 3 File Series - February 2015
Over 46,000 Substance Index SPL Files Released by FDA through February 2015
An SD file created from 46,611 SPL file released by FDA through February 2015 at http://www.fda.gov/ForIndustry/DataStandards/StructuredProductLabeling/ucm377913.htm. The number of molfile blocks (molecules) in this SD file is 61,178, i.e. some SPL files yielded more than one structure, typically if they represented mixtures. The conversion of SPL to SDF was done with the same methodology and field names as for Release 2.
61,178 structures in SDF format. This is a 40 MB gzipped file that uncompresses to about 323 MB.
DownloadRelease 2 File Series - September 2014
Over 43,000 Substance Index SPL Files Released by FDA September 2014
An SD file created from 43,440 SPL file released by FDA in September 2014 (Part One) at
http://www.fda.gov/ForIndustry/DataStandards/StructuredProductLabeling/ucm377913.htm.
The number of molfile blocks (molecules) in this SD file is
57,137, i.e. some SPL files yielded more than one structure, typically if they represented mixtures. The conversion of SPL to SDF was done with a script
written in the chemoinformatics toolkit CACTVS. The SD file contains the tags: E_UNII, E_AUTHOR, SPL_FILENAME, COMPONENTS_TOTAL, COMPONENTS_NO. It also
has E_NAME and E_NAMESET if this type of information was present in the SPL file and parsed out by CACTVS. Each SPL file contains the unique ingredient
identifier (UNII) and the defining characteristics (e.g., chemical structures) of a substance. Note that FDA does not any more endorse specific or
preferred names for substances. However, names and additional identifiers have been provided at
http://fdasis.nlm.nih.gov/srs/jsp/srs/uniiListDownload.jsp.
We have therefore folded data from the UNII Data files into the SD file, using the field names
UNII_ANNOTATION_PREFERRED_TERM
UNII_ANNOTATION_RN
UNII_ANNOTATION_MF
UNII_ANNOTATION_INCHIKEY
UNII_ANNOTATION_EINECS_REG_ID
UNII_ANNOTATION_NCI_THESAURUS_CONCEPT_CODE
UNII_ANNOTATION_ITIS_TAXONOMIC_SERIAL_NUMBER
UNII_ANNOTATION_NCBI_TAXONOMY_ORGANISM_ID
UNII_ANNOTATION_USDA_PLANTS_ORGANISM_ID
UNII_ANNOTATION_SMILES
UNII_ANNOTATION_INN_ID
UNII_ANNOTATION_UNII_TYPE
Names present in approx. 4,400 previously released SPL files were preserved in the SD file.
For additional name-to-UNII mapping other services can be used such as the
SRS NLM site or PubChem.
Note: No verification or curation of these data has been performed, only format conversion.
57,137 structures in SDF format. This is a 38 MB gzipped file that uncompresses to about 302 MB.
DownloadRelease 1 File Series - May 2014
Over 40,000 Substance Index SPL Files Released by FDA May 2014
An SD file created from 41,032 SPL file released by FDA in May 2014 at http://www.fda.gov/ForIndustry/DataStandards/StructuredProductLabeling/ucm377913.htm. The number of molfile blocks (molecules) in this SD file is 53,804, i.e. some SPL files yielded more than one structure, typically if they represented mixtures. The conversion of SPL to SDF was done with a script written in the chemoinformatics toolkit CACTVS. The SD file contains the tags: E_UNII, E_AUTHOR, SPL_FILENAME, COMPONENTS_TOTAL, COMPONENTS_NO. It also has E_NAME and E_NAMESET if this type of information was present in the SPL file and parsed out by CACTVS. Each SPL file contains the unique ingredient identifier (UNII) and the defining characteristics (e.g., chemical structures) of a substance. Note that FDA does not any more endorse specific or preferred names for substances. Names present in approx. 4,400 previously released SPL files were preserved in the SD file. For name-to-UNII mapping other services can be used such as the SRS NLM site or PubChem.
53,804 structures in SDF format. This is a 32 MB gzipped file that uncompresses to about 255 MB.
DownloadLast Update: 2019-12-04