A.: The Chemical Structure Lookup Service (CSLS) is meant to work as an address book for chemical
structures. It has two major modes of operation: The first is when you submit a chemical structure
in the form of an SD file, SMILES string(s), or some other molecular structure
format (CSLS can read over 20 different chemistry formats). The service will
determine whether the submitted structures
are present in any of the databases which we currently have indexed.
The second mode is when you submit a document and the service will try to
extract all possible chemical information this document might contain - InChI identifiers, FICuS hashcodes, molecular formulas, SMILES strings etc.
-
and then conduct a search with these extracted chemical data. You can also enter
such search values (InChI, FICuS etc.) directly.
To see examples of possible searches see below.
3. What are these "FICuS" and "uuuuu" identifiers?
A.: FICuS and uuuuu are structure-based, calculable unique identifiers for
any small molecule. They are based on
hashcode calculations built into the chemoinformatics toolkit
CACTVS. A FICuS identifier very closely
represents a chemical ("stuff in the bottle"), taking into account all its
chemical features (stereochemistry, additional fragments such as counter ions,
isotopes; it is only invariant to tautomerism), while a uuuuu identifier
is much less sensitive (i.e. beyond tautomerism it ignores counter ions,
isotopes, charges, stereochemistry etc) . A uuuuu identifier can therefore be interpreted
as a representation of a molecule's "parent structure".
For more technical discussion see here .
A.: Only for InChI. For InChIs, it may actually make a lot of sense to use
wildcards because of the layered structure of InChI strings. Layers of
increasing specificity are concatenated in an InChI string, separated by
slashes. If you, e.g., have an InChI that contains the "fixed hydrogen" layer
(i.e. represents a specific tautomer), denoted by the substring "/f/h7H,6H2" in
"InChI=1/C5H12N2O/c1-2-3-4-7-5(6)8/h2-4H2,1H3,(H3,6,7,8)/f/h7H,6H2", then you
may not find a structure by InChI lookup if that structure is stored as the
tautomer-invariant InChI, i.e.
"InChI=1/C5H12N2O/c1-2-3-4-7-5(6)8/h2-4H2,1H3,(H3,6,7,8)". However, if you
would search with
"InChI=1/C5H12N2O/c1-2-3-4-7-5(6)8/h2-4H2,1H3,(H3,6,7,8)/*", you would find it.
5. Can I download all structures in CSLS? Or at least specific databases?
A.: No. The purpose of CSLS is not to provide a source of structures, but to
allow the user quick lookup of where a structure occurs. For those databases
that are truly public, we are preparing a web page with bulk download
capabilities in several formats. For aggregated public structure sets, you can
also go to the PubChem FTP
site.
A.: The "Auto detect" mode is trying to make the best guess about the type of the query you have
entered and search for the most information available to answer that query. You can submit a single word,
number or structure (as SMILES string, SD file etc.) or a complete journal article and it will try to
extract all possible chemical information from your query and look it up in our database. While not
100% error-free, it is the recommended mode of search for most common queries.
7. Can I enter several queries at once? How do I do this? Can I mix InChIs, Formulae, etc.?
A.: Yes. Enter them tab- or space-separated. As we are preparing to
introduce chemical name search it's better not to use characters that might appear
in a valid chemical name or SMILES string - for example do not use commas, semicolons, dashes, slashes, parentheses, periods, quotation signs or plus signs to separate several queries. So you have:
740 741
is OK, whereas
740,741
740, 741
will not work (as intended). Of course, placing individual query values on separate lines is even safer:
740
741
Please also note that if the query is a molecular structure file (such as SMILES or SD file) it should be
in the correct format to be understood by the parser. Auto-detect will try to pick SMILES out
of a mixed context (such as a journal article), but for obvious reasons its capabilities are limited.
It will not be able to automatically recognize metallo-organic SMILES for example.
8. Why wasn't my formula found, I know it should be there?
A.: Did you enter the formula in the correct case? We cannot distinguish between carbon monoxide
and cobalt if you put everything in lower case!
Did you use the Hill system for the ordering
of the elements? It is not as important as having elements in
the correct upper or lower case, as our software will try to convert your input into Hill order,
but it could be helpful. If all else fails, try searching for an InChI string with a wildcard.
So, to summarize:
C6H6, not c6h6
If still not found: InChI=1/C6H6*
A.: Yes! Please send an e-mail to Marc C. Nicklaus (mn1**at**helix.nih.gov) to
discuss the technical details. Generally, we will be happy to add any
small-molecule database to which you have the rights or which is
public.