Chemical Structure Lookup Service Help | Disclaimer 
Search in 39 million indexed structures from over 80 databases (27 million unique structures) Contact Us 
 

Help

1. What is this place?
2. Can I see some examples?
3. What are these "FICuS" and "uuuuu" identifiers?
4. Can I use wildcards in queries?
5. Can I download all structures in CSLS? Or at least specific databases?
6. How does "Auto detect" work?
7. Can I enter several queries at once? How do I do this? Can I mix InChIs, Formulae, etc.?
8. Why wasn't my formula found, I know it should be there?
9. Can I search for a compound by chemical name?
10. Can I have my database added to this service?
11. How do you pronounce "CSLS"?

1. What is this place?

A.: The Chemical Structure Lookup Service (CSLS) is meant to work as an address book for chemical structures. It has two major modes of operation: The first is when you submit a chemical structure in the form of an SD file, SMILES string(s), or some other molecular structure format (CSLS can read over 20 different chemistry formats). The service will determine whether the submitted structures are present in any of the databases which we currently have indexed. The second mode is when you submit a document and the service will try to extract all possible chemical information this document might contain - InChI identifiers, FICuS hashcodes, molecular formulas, SMILES strings etc. - and then conduct a search with these extracted chemical data. You can also enter such search values (InChI, FICuS etc.) directly. To see examples of possible searches see below.

2. Can I see some examples?

A.: Upload SD file - Results
Submit a PDF document - Results
Search for ethanol as a SMILES string "CCO" - Results

3. What are these "FICuS" and "uuuuu" identifiers?

A.: FICuS and uuuuu are structure-based, calculable unique identifiers for any small molecule. They are based on hashcode calculations built into the chemoinformatics toolkit CACTVS. A FICuS identifier very closely represents a chemical ("stuff in the bottle"), taking into account all its chemical features (stereochemistry, additional fragments such as counter ions, isotopes; it is only invariant to tautomerism), while a uuuuu identifier is much less sensitive (i.e. beyond tautomerism it ignores counter ions, isotopes, charges, stereochemistry etc) . A uuuuu identifier can therefore be interpreted as a representation of a molecule's "parent structure". For more technical discussion see here .

4. Can I use wildcards in queries?

A.: Only for InChI. For InChIs, it may actually make a lot of sense to use wildcards because of the layered structure of InChI strings. Layers of increasing specificity are concatenated in an InChI string, separated by slashes. If you, e.g., have an InChI that contains the "fixed hydrogen" layer (i.e. represents a specific tautomer), denoted by the substring "/f/h7H,6H2" in "InChI=1/C5H12N2O/c1-2-3-4-7-5(6)8/h2-4H2,1H3,(H3,6,7,8)/f/h7H,6H2", then you may not find a structure by InChI lookup if that structure is stored as the tautomer-invariant InChI, i.e. "InChI=1/C5H12N2O/c1-2-3-4-7-5(6)8/h2-4H2,1H3,(H3,6,7,8)". However, if you would search with "InChI=1/C5H12N2O/c1-2-3-4-7-5(6)8/h2-4H2,1H3,(H3,6,7,8)/*", you would find it.

5. Can I download all structures in CSLS? Or at least specific databases?

A.: No. The purpose of CSLS is not to provide a source of structures, but to allow the user quick lookup of where a structure occurs. For those databases that are truly public, we are preparing a web page with bulk download capabilities in several formats. For aggregated public structure sets, you can also go to the PubChem FTP site.

6. How does "Auto detect" work?

A.: The "Auto detect" mode is trying to make the best guess about the type of the query you have entered and search for the most information available to answer that query. You can submit a single word, number or structure (as SMILES string, SD file etc.) or a complete journal article and it will try to extract all possible chemical information from your query and look it up in our database. While not 100% error-free, it is the recommended mode of search for most common queries.

7. Can I enter several queries at once? How do I do this? Can I mix InChIs, Formulae, etc.?

A.: Yes. Enter them tab- or space-separated. As we are preparing to introduce chemical name search it's better not to use characters that might appear in a valid chemical name or SMILES string - for example do not use commas, semicolons, dashes, slashes, parentheses, periods, quotation signs or plus signs to separate several queries. So you have:
740 741
is OK, whereas
740,741
740, 741
will not work (as intended). Of course, placing individual query values on separate lines is even safer:
740
741
Please also note that if the query is a molecular structure file (such as SMILES or SD file) it should be in the correct format to be understood by the parser. Auto-detect will try to pick SMILES out of a mixed context (such as a journal article), but for obvious reasons its capabilities are limited. It will not be able to automatically recognize metallo-organic SMILES for example.

8. Why wasn't my formula found, I know it should be there?

A.: Did you enter the formula in the correct case? We cannot distinguish between carbon monoxide and cobalt if you put everything in lower case! Did you use the Hill system for the ordering of the elements? It is not as important as having elements in the correct upper or lower case, as our software will try to convert your input into Hill order, but it could be helpful. If all else fails, try searching for an InChI string with a wildcard. So, to summarize:
C6H6, not c6h6
If still not found: InChI=1/C6H6*

9. Can I search for a compound by chemical name?

A.: Not yet. But we are working on this.

10. Can I have my database added to this service?

A.: Yes! Please send an e-mail to Marc C. Nicklaus (mn1**at**helix.nih.gov) to discuss the technical details. Generally, we will be happy to add any small-molecule database to which you have the rights or which is public.

11. How do you pronounce "CSLS"?

A.: We pronounce it like "sizzles".

Last changed: 2007-03-23