1. Description
Description:
OSRA is a utility designed to convert graphical representations of
chemical structures, as they appear in journal articles, patent documents,
textbooks, trade magazines etc., into SMILES (Simplified Molecular
Input Line Entry Specification - see
http://en.wikipedia.org/wiki/SMILES) or SD file -
a computer recognizable molecular structure format. OSRA can read a document
in any of the over 90 graphical formats parseable by ImageMagick - including
GIF, JPEG, PNG, TIFF, PDF, PS etc., and generate the SMILES or SDF representation of
the molecular structure images encountered within that document.
Note that any software designed for optical recognition is unlikely to
be perfect, and the output produced might, and probably will, contain
errors, so a curation by a human knowledgeable in chemical structures
is highly recommended.
News:
Command-line options:
./osra --help
will give you a list of available options with short descriptions.
Most common use: ./osra [-r <resolution>] <filename>
Resolution in dpi, default is 300 (unless it's a PS or PDF file as
mentioned above), filename is the name of your image file (or
PS/PDF document).
Other options:
-t, --threshold: Gray level threshold, default is 0.2
for black-and-white images,
-n, --negate: Inverts colors (for white on black images),
-o, --output: Sets a prefix for writing recognized images to files - i.e.
"-o tmp" will create files tmp0.png, tmp1.png... for
each of the structures,
-s, --size: Resize images on output - can be useful for running OSRA
as a backend for a webservice. Example: "-s 300x400".
-g, --guess: Prints out resolution guess when you chose to have automatic
resolution estimate.
-p, --print: Prints out the value of confidence function estimate.
-f, --format: Output format (either smi for SMILES or sdf for SD file format)
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. See also http://www.gnu.org/.
See the file COPYING for details.
Download:
OSRA is Free and Open Source Software. You are welcome to download
and use it, provided that you understand the terms described above.
Participation in the development is highly encouraged!
osra-1.2.1.tgz
- Improved speed (up to 30% increase) and double and triple bond detection.
Windows installer bundled with Symyx Draw AddIn is now available.
Download Windows executable and plugins as zip
archive or windows installer.
osra-1.2.0.tgz -
Page layout analysis algorithm completely re-written, added plugins
for integration with several popular molecular editors.
osra-1.1.0.tgz -
Added SD file format output, improved wedge bond detection.
osra-1.0.1.tgz - Minor bug
fixes. OpenBabel-2.2.0 or svn snapshot of RDKit are recommended with this
version.
osra-1.0.0.tgz - Significant
update of the recognition engine. Simplified built instructions.
Please note that the dependencies have changed since the previous
version.
osra-0.9.9.tgz - Build
system upgraded to allow linking to the newer versions of gocr (0.45).
osra-0.9.8.tgz - Added
recognition of old-style aromatic rings with heteroatoms.
osra-0.9.7.tgz - Improved
recognition of color and low-res images.
osra-0.9.6.tgz - Introduced
automatic resolution detection.
osra-0.9.5.tgz - Source code
modified to facilitate compiling with MinGW for Windows platform.
osra-0.9.4.tgz - added
old-style benzene ring recognition
osra-0.9.3.tgz - added
rudimentary formal charge recognition
osra-0.9.2.tgz - improved
handling of hash and wedge bonds
osra-0.9.1.tgz - slightly improved handling of
72dpi color images
osra-0.9.tgz - original
public release
We also welcome your feedback - send us your comments, suggestions,
criticism, or praise to the contact email address below.