• 2002-04-20: Choice of aromatic vs. Kekule representation of aromatic rings added
  • 2002-03-21: Major upgrade - now allows additional input and output formats, both in single structure and multistructure.
  • 2001-07-24: Add the SD file upload function for bigger datasets
  • 1999-10-22: Launch of Online Service.

Input (SMILES strings and single- and multi-structure SD, PDB, MOL files etc.)

The upper left input field accepts SMILES strings as structure specifications. If you are familiar with the syntax, you can type in simple queries manually. However, most of the time you will want to use some graphical structure editor. If your favorite desktop molecule editor supports Copy&Paste of SMILES strings, you can simply use this editor, copy the structure as a SMILES string to the clipboard, and paste it into the entry field. Editors which support this operation include ChemWindow and ChemDraw. Or you can use the Java Molecular Editor provided by clicking on the Start Structure Editor button.

For larger datasets, you should use the multistructure SD file upload function on the upper right hand side. Files in PDB and MOL format (and, in fact, in any format CACTVS recognizes) are also accepted. The resulting SMILES strings will be returned as a text file or in the format specified.

The system will automatically add hydrogens to your input structure(s) according to standard valences before generating SMILES string(s). This will prevent very strange looking, and probably not intended, SMILES strings from being generated (such as [C][C][C][C]O[C]O[C][C][C][C] vs. CCCCOCOCCCC). If you ever truly need SMILES strings for structures with explicitly missing hydrogens, please contact us and we may add this as an option.

Output Format

The service has several output options. You can choose between Unique SMILES (USMILES) displayed on-screen or saved to a text file, MDL SD and MOL file format and PDB file format. For the last three formats, it is possible to select between 2D or 3D coordinates. The 3D coordinates will be computed with the program CORINA of Prof. Gasteiger, Erlangen.

If the input file contains a single structure, the output will also be single structure. Multiple structure input formats will generate multiple structure output for those formats that support this. Otherwise, only the first structure will be used. SD files will contain a UNIQUE_SMILES field for unique SMILES and an USER_SUPPLIED_SMILES field for the user-supplied SMILES (if avalaible)

Even within Unique SMILES, you have the choice between aromatic and Kekule representation of aromatic rings, which produce non-identical USMILES strings.
Example: NSC# 5 is
NC1=CC2=C(C=C1)C(=O)C3=C(C=CC=C3)C2=O (Kekule)
Nc1ccc2C(=O)c3ccccc3C(=O)c2c1 (aromatic)
Choose whichever suits you better. The Enhanced NCI Database Browser uses the Kekule representation for output SMILES format.

SMILES and Unique SMILES Definition

A (incomplete) SMILES string definition can be found here. A SMILES Tutorial can be found on Daylight's Web site. USMILES is briefly mentioned there. The best reference is probably still the 1989 publication (cited here). Please note that the definition of USMILES has been changed by Daylight since 1989, but has not been published. USMILES generated here will therefore be different from Daylight-generated ones for some compounds (an informal test showed this to be the case for approximately 30% of a typical organic compound set). Should the current USMILES definition become available, we will have a look at updating our algorithms.

The CGI script connected to the form runs on a non-GUI version of a class of progammable (in Tcl/Tk and Extensions) general-purpose chemical structure handling programs of the CACTVS system. Using the powerful scripting language interface of these programs, it is possible to implement nearly every graphical or structure handling application very rapidly. The CACTVS program was developed by Wolf-Dietrich Ihlenfeldt.

3D atomic coordinates are computed by the algorithm of the CORINA program, which is used here as a dynamically loaded module.

We thank Peter Ertl from Novartis Crop Protection AG for kindly allowing us to use the JME molecular editor. Novartis

F. Oellien

Last Update: 2010-09-14