More on Chemical Name Resolving

First, we’d like to announce that we have updated OPSIN to version 1.1.0. Secondly, there is a new resolver module available in CIR: ChemSpider provides a name index of excellent quality which you can use now from CIR:

http://cactus.nci.nih.gov/chemical/structure/L-alanin/smiles?resolver=name_by_chemspider

Internally, this request is passed through directly to ChemSpider. As we don’t want to forward our entire traffic through ChemSpider’s service, the URL parameter “?resolver=name_by_chemspider” has to be added explicitly to the URL sent to the CIR. If this parameter is not given, the provided name is resolved as previously: first by OSPIN module in CIR, if this fails, by a lookup in the local name index of CIR.

If you want to change the order of this procedure and/or add the lookup at ChemSpider, you can do the following:

http://cactus.nci.nih.gov/chemical/structure/L-alanin/smiles?resolver=name_by_chemspider,name_by_opsin,name_by_cir

This attempts to resolver the name “L-alanin” first by chemSpider resolver module, then by the OPSIN resolver module and finally with the database name index of CIR. As the lookup is already successful using the ChemSpider module, CIR stops there and doesn’t apply the other two modules.

If you like to see what all three name resolving modules reply, you have to use the xml representation of CIR:

http://cactus.nci.nih.gov/chemical/structure/L-alanin/smiles/xml?resolver=name_by_chemspider,name_by_opsin,name_by_cir

If you like to compare whether all three modules return the same structure for a name, you can “hash” the resolved structures using the HASHISY function available in CACTVS:

http://cactus.nci.nih.gov/chemical/structure/L-alanin/hashisy/xml?resolver=name_by_chemspider,name_by_opsin,name_by_cir

Fortunately, we get the same hashcode value from each module, but that is not generally true. For instance, the ChemSpider name resolver module returns both forms forĀ ”fructose”while the other two modules return only the open-chain form of fructose (and of course, other reasons could be some nasty nasty bug):

http://cactus.nci.nih.gov/chemical/structure/fructose/smiles/xml?resolver=name_by_chemspider,name_by_opsin,name_by_cir

Markus

2 thoughts on “More on Chemical Name Resolving

  1. Thanks Stuart for the positive remarks.

    Regarding the XML format: it was actually purposely designed to return everything possible and classify the information. How big is the “burden” for you to just parse the xml and only take care about the first representation in the first data item? (BTW, it is always returned in the order in which the resolver modules have been specified in the URL). I will put your request on my ToDo-List for the next version – however, this may take a while, we are working on something else (web-base).

    Markus

Comments are closed.