Today, I want to start with some simple examples how to use pychembl. For this, let us walk through the molecule_dictionary table available in ChEMBL.
Well, like in the previous post we first have to import pychembl:
from pychembl.settings import *
from pychembl.db.auto_schema import *
To access the row entries in this table, pychembl’s table mapper class MoleculeDictionary has to be passed to the SQLAlchemy query object available in pychembl (or chembldb, respectively):
> molecules = chembldb.query(MoleculeDictionary)
> print type(molecules)
% <class 'sqlalchemt.orm.query.Query>
Let us count how many are present:
> print molecules.count()
% 658075
If you like to access a specific molecule identified by its ChEMBL molregno (the primary key in this table), one of following filter statements can be used (these are all alternative ways to do it):
> molecule = molecules.filter(MoleculeDictionary.molregno==675049).all()
> molecule = molecules.filter(MoleculeDictionary.molregno==675049).one()
> molecule = molecules.filter(MoleculeDictionary.molregno==675049).first()
> molecule = molecules.get((675049,))
The .all() method returns a python list object with all matching row objects. As it is already clear, that the statement will return only a single object, the .one() method retrieves only this object without generating a list. However, a request using .one() will generate an error message in case the filter criterion would return more than one object. This can be avoided by using the .first() method, which definiteltly returns only the first object regardless of how many rows in the table were matching the filter criterion. Finally, the .get() method can be used if a row is identified by its primary key – it expects a python tuple object as input (if a multi-column primary key is used in a table, the tuple has to contain the corresponding number of elements).
Of course, you can also do the creation of the query object and the definition of the filter criterion as a single statement, e.g. like this:
molecule = chembldb.query(MoleculeDictionary).filter(MoleculeDictionary.molregno==675049).one()
Accessing the attributes of a row object, i.e. the attributes of the molecule we just fetched from the database, is simple:
print molecule.molregno
% 658075
print molecule.pref_name
% CEFOTETAN DISODIUM
print molecule.chembl_id
% CHEMBL1201098
print molecule.first_approval
% 1984
print molecule.natural_product
% 1
All attribute names available for an object (e.g. molregno, chembl_id, first_approval, etc., see table molecule_dictionary in the schema) are auto-loaded from the database, hence are not changed by pychembl in any form. The python datatype of a returned attribute is according the column datatype as specified in the database (these are also auto-loaded).
With the statement shown earlier
> molecules = chembldb.query(MoleculeDictionary)
you make each query fetching a MoleculeDictionary object for each matching database row. In order to retrieve only certain attributes of a molecule, you can name the attributes:
> molecules = chembldb.query(MoleculeDictionary.chembl_id, MoleculeDictionary.chebi_id)
From this, for instance, you can very easily generate a python dictionary associating the ChEMBL ID with its corresponding ChEBI ID (we restrict it here to the first five):
> molecules = chembldb.query(MoleculeDictionary.chembl_id, MoleculeDictionary.chebi_id)
> chembl_to_chebi_id_dictionary = dict(molecules.limit(5).all())
> print chembl_to_chebi_id
% {'CHEMBL6328': 100002L, 'CHEMBL6329': 100001L, 'CHEMBL267864': 100005L, 'CHEMBL6362': 100004L, 'CHEMBL265667': 100003L}
Markus