Hashcodes

CACTVS hashcodes are 64-bit unsigned integers, which are represented as 16-digit hexadecimal values, such as EDC9A714B0571CE2. There are several variants of hashcode-based identifiers, only two of which are used here.
       The FICuS identifier is sensitive to compounds' fragment status (i.e. which, if any, counter ions, solvent molecules etc. are present); to isotope labels; to formal charges; and to stereochemistry. It is not sensitive to the tautomeric state, i.e. all different tautomers of the same compound will receive the same FICuS. It is also not sensitive to which resonance structure is used. More information is available here.
       The uuuuu identifier is not sensitive to any of the above structural features, i.e. it ignores additional fragments, isotopes, charges, and stereochemistry. It projects each of these features onto a "canonical" value, and can therefore be interpreted as the identifier of a molecule's "parent structure."
       Depending on the complexity of the molecule, many different compounds/isomers can have the same uuuuu. However, a structure can also be its own parent structure, in which case (the hexadecimal parts of) FICuS and uuuuu are the same.
       To help distinguish between FICuS and uuuuu, we append the identifier type to the 16-digit hexadecimal value, and to this the number of the version of CACTVS that was used to calculate the identifier. It should not matter which version of CACTVS is used, but bugs are always possible, so this is an additional safety feature. A full FICuS thus looks like EDC9A714B0571CE2-FICuS-3.324. These tags also help in parsing out FICuS and uuuuu that are embedded in other text. We strongly recommend always using the full identifiers including the tags.

What are the limitations of these hashcodes?

While all contributors to the FICuS and uuuuu development (foremost Wolf-Dietrich Ihlenfeldt, the author of CACTVS) have gone to great lengths to make them as reliable as possible, they have (a) their conceptual limitations, and (b) may always have some bugs in the algorithm. They are generally very reliable for small, drug-like molecules. Though the algorithm will attempt, and usually succeed, in calculating hashcodes for larger biomolecules, they are not really designed for peptides, proteins, DNA, RNA etc. Much better identifier systems exist for these classes of molecules. In fact, we limit the calculation of hashcodes to structures with less than 400 heavy atoms. Ring-opening/-closing tautomerism is not taken into account by the tautomer-invariant hashcodes. Higher-order stereochemistry (other than R/S centers, E/Z double bonds and square planar stereochemistry) is currently not supported. The hashcodes are not designed for polymers. The hashcode-based identifiers that ignore additional fragments do so by heuristically searching for the largest of all the fragments present. If, e.g., a solvent molecule is larger than the solute, this may lead to misleading results. Very large and complex ring systems may present problems.