Using pychembl (3) – Active & Parent Molecules


A quite interesting table in ChEMBLdb, also linked to table molecule_dictionary by the mutual primary key molregno, is table molecule_hierarchy. As the name suggests, it stores hierarchical relationships between row entries in table molecule_dictionary and provides a linkage to the parent and active form of a molecule if available in ChEMBLdb.

But first of all, let us load an example molecule from the database again:

> molecule = chembldb.query(MoleculeDictionary).filter(MoleculeDictionary.molregno==47340).one()

Like shown in previous posts, this delivers a MoleculeDictionary object:

> print molecule
% <pychembl.db.auto_schema.MoleculeDictionary object at 0x374d750>

The following two command lines first walk to table molecule_hierarchy using the pre-defined table relationship hierarchy. From there an immediate jump back to table molecule_hierarchy is performed either using the named relationship parent or active. Both calls again provide a MoleculeDictionary object, however, this time the corresponding object represent either the parent structure or the active form of the original molecule.

> print molecule.hierarchy.parent
% <pychembl.db.auto_schema.MoleculeDictionary object at 0x374dbd0>
> print molecule.hierarchy.active
% <pychembl.db.auto_schema.MoleculeDictionary object at 0x374de90>

A really cool feature of SQLAlchemy is, that it allows to pre-define relationships which can walk over more than one actual table-to-table relationship (more examples of this will come in future posts). In the example shown here, this allows us to eliminate the explicit call of the hierarchy relationship of the MoleculeDictionary object (“hierarchy” is a hard word to type anyway :-) ). Internally, these new relationships follow the walk over the same relationship paths as just shown, but provide the attributes “parent” and “active” as direct attributes of the object stored in variable molecule:

> print molecule.parent
% <pychembl.db.auto_schema.MoleculeDictionary object at 0x374dbd0>
> print molecule.active
% <pychembl.db.auto_schema.MoleculeDictionary object at 0x374de90>

And in the same fashion as described in the pychembl (2) post, we can ask now either for attributes of the original molecule, the parent molecule or the active form of the original molecule:

> print molecule.pref_name
% TAMOXIFEN CITRATE
> print molecule.parent.pref_name
% TAMOXIFEN
> print molecule.active.pref_name
% 4-HYDROXYTAMOXIFEN

…, or follow the structure relationship to structural information of each of the three molecules:

> print molecule.structure.canonical_smiles
% CC\C(=C(/c1ccccc1)\c2ccc(OCCN(C)C)cc2)\c3ccccc3.OC(=O)CC(O)(CC(=O)O)C(=O)O
> print molecule.parent.structure.canonical_smiles
% CC\C(=C(/c1ccccc1)\c2ccc(OCCN(C)C)cc2)\c3ccccc3
> print molecule.active.structure.canonical_smiles
% CC\C(=C(/c1ccc(O)cc1)\c2ccc(OCCN(C)C)cc2)\c3ccccc3

…, or ask for properties of the corresponding CompoundProperty object:

> print molecule.property.hba
% 2
> print molecule.parent.property.hba
% 2
> print molecule.active.property.hba
% 3

Makes walking through ChEMBLdb pretty easy, doesn’t it?
Markus

Note: In case you already had installed pychem earlier, please pull/download it again from GitHub since I added the new relationships for a MoleculeDictionary object.