SMILES is a simple, concise and rather readable molecular structure specification format. It is (incompletely) published in: D. Weininger, SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules, J. Chem. Inf. Comput. Sci. 1988, 28, 31-36.

arrow Daylight has a SMILES tutorial online.

The basic syntax rules are:

  • Atoms are specified by their symbol, normally with ab upper-case first letter. Elements not in the organic subset (B, C, N, O, P, S, F, Cl, Br, I) and their potential attributes must be enclosed in square brackets. Elements of the organic subset need to be bracketed only if they have attributes or unusual hydrogen counts. Example: [Au] is a gold atom.
  • Elements of the organic subset automatically receive the number of hydrogen atoms necessary to reach the lowest common oxidation state. Other elements do not automatically receive hydrogen atoms. Example: C is methane.
  • Attached hydrogens and formal charges are always specified inside brackets. The number of hydrogens is written as H, followed by the count. Charges are one or more plus or minus symbol(s), optionally followed by a digit. Example: [NH4+] is the ammonia cation.
  • Bonds are represented by the symbols -, =, # and : for single, double, triple and quadruple bonds. The single bond symbol is optional. Examples: CC=C is propene, [H][H] is molecular hydrogen.
  • If an atom is sp2 hybridized, this can be indicated by writing its symbol with lowercase letters. No bond order specificication to neighboring sp2 atoms is necessary, and the automatic hydrogen addition is automatically adjusted. Example: CC=C and Ccc are alternative represenations of propene.
  • Branches are specified by enclosures in parantheses. They can be nested. Example: C(C)(C)(C)O is t-butanol.
  • Rings are closed by ring link tags, which must follow immediately after the (possibly bracketed) atom symbol. Multiple ring link tags can be present at a single atom. They are arbitrary single- or two-digit numbers. Two-digit numbers must be prefixed with a % sign. These tags must appear pairwise. Ring link tags can be reused if the closing tag has been encountered parsing from left to right. Examples: C1CCC1 is cyclobutane, c1ccccc1c1ccccc1 is biphenyl, and C12C3C4C1C5C4C3C25 is cubane.
  • Disconnected structures forming a molecular ensemble are indicated with a '.' connection. Example: c1cc([O-].[Na+])ccc1 is (not the most obvious, but a legal representation of) sodium phenoxide.
  • You are free to enter aromatic rings in Kekulé fashion or with aromatic atoms, i.e. C1=CC=CC=C1 and c1ccccc1 are identical (benzene). This works even with charged systems.
  • Bond cis/trans type stereochemistry is specified with the / and \ characters. Cl/C=C/Cl is trans-dichloroethene. From the left chlorine atom the bond goes UP to the C=C core and on the other side UP again to the second chlorine atom. Consequently, Cl/C=C\Cl or Cl\C=C/Cl is the cis-compound.
  • Isotope labelling is expressed with a prefix before the atom symbol. Labelled atoms must be enclosed by square brackets. Example: [13CH4] is 13C-methane and C([2H])([2H])([2H])[2H] is fully deuterated methane.

The SMILES conversion routine behind the 3D coordinate service will accept some SMILES strings which are strictly speaking syntactically incorrect, but are still resolvable (i.e. allow ring closure numbers after bond order indicators, or ignore case of atoms not resolvable as pi-centers). The decoder also understands a number of local, but compatible SMILES syntax extensions such atom lists, etc..

F. Oellien

Last Update: 2011-09-01