Class CIPChirality


  • public class CIPChirality
    extends Object
    A fully validated relatively efficient implementation of Cahn-Ingold-Prelog rules for assigning R/S, M/P, and E/Z stereochemical descriptors. Based on IUPAC Blue Book rules of 2013 and assorted corrections. IUPAC Project: Corrections, Revisions and Extension for the Nomenclature of Organic Chemistry - IUPAC Recommendations and Preferred Names 2013 (the IUPAC Blue Book) https://iupac.org/projects/project-details/?project_nr=2001-043-1-800 http://www.sbcs.qmul.ac.uk/iupac/bibliog/BBerrors.html Settable options: set testflag1 use advanced in/out-sensitive Rule 6 (r,r-bicyclo[2.2.2]octane) set testflag2 turn off tracking (saving of _M.CIPInfo) for speed Features include: - deeply validated - includes revised Rules 1b, and 2 - includes a proposed Rule 6 - implemented in Java (Jmol) and JavaScript (JSmol) - only a few Java classes; < 1000 lines - efficient, one-pass process for each center using a single finite digraph for all auxiliary descriptors - exhaustive processing of all 9 sequence rules (1a, 1b, 2, 3, 4a, 4b, 4c, 5, 6) - includes R/S, r/s, M/P (axial, not planar), E/Z - covers any-length odd and even cumulenes - uses Jmol conformational SMARTS to detect atropisomers and helicenes - covers chiral phosphorus and sulfur, including trigonal pyramidal and tetrahedral - properly treats complex combinations of R/S, M/P, and seqCis/seqTrans centers (Rule 4b) - properly treats neutral-species resonance structures using fractional atomic mass and a modified Rule 1b - implements CIP spiro rule (BB P-93.5.3.1) as part of Rule 6 - detects small rings (fewer than 8 members) and removes E/Z specifications for such - detects chiral bridgehead nitrogens and E/Z imines and diazines - reports atom descriptor along with the rule that ultimately decided it - fills _M.CIPInfo with detailed information about how each ligand was decided (feature turned off by set testflag2) - generates advanced Rule 6 descriptors for cubane and the like. (Generally 'r') using set testflag1 Primary 236-compound Chapter-9 validation set (AY-236) provided by Andrey Yerin, ACD/Labs (Moscow). Mikko Vainio also supplied a 64-compound testing suite (MV-64), which is available on SourceForge in the Jmol-datafiles directory. (https://sourceforge.net/p/jmol/code/HEAD/tree/trunk/Jmol-datafiles/cip). Additional test structures provided by John Mayfield. Additional thanks to the IUPAC Blue Book Revision project, specifically Karl-Heinz Hellwich for alerting me to the errata page for the 2013 IUPAC specs (http://www.chem.qmul.ac.uk/iupac/bibliog/BBerrors.html), Gerry Moss for discussions, Andrey Yerin for discussion and digraph checking. Many thanks to the members of the BlueObelisk-Discuss group, particularly Mikko Vainio, John Mayfield (aka John May), Wolf Ihlenfeldt, and Egon Willighagen, for encouragement, examples, serious skepticism, and extremely helpful advice. References: CIP(1966) R.S. Cahn, C. Ingold, V. Prelog, Specification of Molecular Chirality, Angew.Chem. Internat. Edit. 5, 385ff Custer(1986) Roland H. Custer, Mathematical Statements About the Revised CIP-System, MATCH, 21, 1986, 3-31 http://match.pmf.kg.ac.rs/electronic_versions/Match21/match21_3-31.pdf Mata(1993) Paulina Mata, Ana M. Lobo, Chris Marshall, A.Peter Johnson The CIP sequence rules: Analysis and proposal for a revision, Tetrahedron: Asymmetry, Volume 4, Issue 4, April 1993, Pages 657-668 Mata(1994) Paulina Mata, Ana M. Lobo, Chris Marshall, and A. Peter Johnson, Implementation of the Cahn-Ingold-Prelog System for Stereochemical Perception in the LHASA Program, J. Chem. Inf. Comput. Sci. 1994, 34, 491-504 491 http://pubs.acs.org/doi/abs/10.1021/ci00019a004 Mata(2005) Paulina Mata, Ana M. Lobo, The Cahn, Ingold and Prelog System: eliminating ambiguity in the comparison of diastereomorphic and enantiomorphic ligands, Tetrahedron: Asymmetry, Volume 16, Issue 13, 4 July 2005, Pages 2215-2223 Favre(2013) Henri A Favre, Warren H Powell, Nomenclature of Organic Chemistry : IUPAC Recommendations and Preferred Names 2013 DOI:10.1039/9781849733069 http://pubs.rsc.org/en/content/ebook/9780854041824#!divbookcontent code history: 5/12/18 Jmol 14.29.14 fixes minor Rule 5 bug and adds advanced Rule 6 in/out testflag1 option (857 lines) 5/1/18 Jmol 14.29.14 fixes enantiomorphic Rule 5 R/S check for BH64_85 and BH64_86 4/25/18 Jmol 14.29.14 fixes spiroallene Rule 6 issue for BH64_84 4/23/18 Jmol 14.29.14 fixes Rule 2 for JM_008, involving mass and duplicates (824 lines) 4/11/18 Jmol 14.29.13 adds optional CIPDataTracker class (822 lines) 4/2/18 Jmol 14.29.13 adds optional CIPDataSmiles class 4/2/18 Jmol 14.29.13 adds John's "mancude-like" cyclic conjugated ene Kekule averaging 12/10/17 Jmol 14.29.9 adds CIPData, mancude Kekule averaging 11/11/17 Jmol 14.25.1 adds "duplicate over terminal" in Rule 1b; streamlined (777 lines) 11/05/17 Jmol 14.24.1 fixes a problem with seqCis/seqTrans and also with Rule 2 (799 lines) 10/17/17 Jmol 14.20.10 adds S4 check in Rule 6 and also fixes bug in aux descriptors being skipped when two ligands are equivalent for the root (798 lines) 9/19/17 CIPChirality code simplification (778 lines) 9/14/17 Jmol 14.20.6 switching to Mikko's idea for Rule 4b and 5. Abandons "thread" idea. Uses breadth-first algorithm for generating bitsets for R and S. Processing time reduced by 50%. Still could be optimized some. (820 lines) 7/25/17 Jmol 14.20.4 consolidates all ene determinations; moves auxiliary descriptor generation to prior to Rule 3 (850 lines) 7/23/17 Jmol 14.20.4 adds Rule 6; rewrite/consolidate spiro, C3, double spiran code (853 lines) 7/19/17 Jmol 14.20.3 fixing Rule 2 (880 lines) 7/13/17 Jmol 14.20.3 more thorough spiro testing (858 lines) 7/10/17 Jmol 14.20.2 adding check for C3 and double spiran (CIP 1966 #32 and #33) 7/8/17 Jmol 14.20.2 adding presort for Rules 4a and 4c (test12.mol; 828 lines) 7/7/17 Jmol 14.20.1 minor coding efficiencies (833 lines) 7/6/17 Jmol 14.20.1 major rewrite to correct and simplify logic; full validation for 433 structures (many duplicates) in AY236, BH64, MV64, MV116, JM, and L (836 lines) 6/30/17 Jmol 14.20.1 major rewrite of Rule 4b (999 lines) 6/25/17 Jmol 14.19.1 minor fixes for Rule 4b and 5 for BH64_012-015; better atropisomer check 6/12/2017 Jmol 14.18.2 tested for Rule 1b sphere (AY236.53, 163, 173, 192); 957 lines 6/8/2017 Jmol 14.18.2 removed unnecessary presort for Rule 1b 5/27/17 Jmol 14.17.2 fully interfaced using SimpleNode and SimpleEdge 5/27/17 Jmol 14.17.1 fully validated; simplified code; 978 lines 5/17/17 Jmol 14.16.1. adds helicene M/P chirality; 959 lines validated using CCDC structures HEXHEL02 HEXHEL03 HEXHEL04 ODAGOS ODAHAF http://pubs.rsc.org/en/content/articlehtml/2017/CP/C6CP07552E 5/14/17 Jmol 14.15.5. trimmed up and documented; no need for lone pairs; 948 lines 5/13/17 Jmol 14.15.4. algorithm simplified; validated for mixed Rule 4b systems involving auxiliary R/S, M/P, and seqCis/seqTrans; 959 lines 5/06/17 validated for 236 compound set AY-236. 5/02/17 validated for 161 compounds, including M/P, m/p (axial chirality for biaryls and odd-number cumulenes) 4/29/17 validated for 160 compounds, including M/P, m/p (axial chirality for biaryls and odd-number cumulenes) 4/28/17 Validated for 146 compounds, including imines and diazines, sulfur, phosphorus 4/27/17 Rules 3-5 preliminary version 14.15.1 4/6/17 Introduced in Jmol 14.12.0; validated for Rules 1 and 2 in Jmol 14.13.2; 100 lines NOTE! NOTE! NOTE! NOTE! NOTE! NOTE! NOTE! NOTE! NOTE! NOTE! NOTE! NOTE! NOTE! Added logic to Rule 1b: Rule 1b: In comparing duplicate atoms, the one with lower root distance has precedence, where root distance is defined as: (a) in the case of ring-closure duplicates, the sphere of the duplicated atom; and (b) in the case of multiple-bond duplicates, the sphere of the atom to which the duplicate atom is attached. Rationale: Using only the distance of the duplicated atom (current definition) introduces a Kekule bias, which can be illustrated with various simple models. By moving that distance to be the sphere of the parent atom of the duplicate, the problem is resolved. Added clarification to Rule 2: Rule 2: Higher mass precedes lower mass, where mass is defined in the case of nonduplicate atoms with identified isotopes for elements as their exact isotopic mass and, in all other cases, as their element's atomic weight. Rationale: BB is not self-consistent, including both "mass number" (in the rule) and "atomic mass" in the description, where "79Br < Br < 81Br". And again we have the same Kekule-ambiguous issue as in Rule 1b. The added clarification fixes the Kekule issue (not using isotope mass number for duplicate atoms), solves the problem that F < 19F (though 100% nat. abundance), and is easily programmable. In Jmol the logic is very simple, actually using the isotope mass number, but doing two checks: a) if one of four specific isotopes (16O, 52Cr, 96Mo, 175Lu), reverse the test, and b) if on the list of 100% natural isotopes or one of the non-natural elements, use the element's accepted atomic weight. See CIPAtom.getMass(); PROPOSED Rule 6: An undifferentiated reference node has priority over any other undifferentiated node. Rationale: This rule is stated in CIP(1966) p. 357.
    Author:
    Bob Hanson hansonr@stolaf.edu
    • Field Detail

      • RULE_2_nXX_EQ_XX

        static final String RULE_2_nXX_EQ_XX
        These elements have 100% natural abundance; we will use their isotope mass number instead of their actual average mass, since there is no difference
        See Also:
        Constant Field Values
      • RULE_2_REDUCE_ISOTOPE_MASS_NUMBER

        static final String RULE_2_REDUCE_ISOTOPE_MASS_NUMBER
        These elements have an isotope number that is a bit higher than the average mass, even though their actual isotope mass is a bit lower. We will change 16 to 15.9, 52 to 51.9, 96 to 95.9, 175 to 174.9 so as to force the unspecified mass atom to be higher priority than the specified one. All other isotopes can use their integer isotope mass number instead of looking up their exact isotope mass.
        See Also:
        Constant Field Values
      • ruleNames

        static final String[] ruleNames
      • MAX_PATH

        static final int MAX_PATH
        maximum path to display for debugging only using SET DEBUG in Jmol
        See Also:
        Constant Field Values
      • SMALL_RING_MAX

        static final int SMALL_RING_MAX
        maximum ring size that can have a double bond with no E/Z designation; also used for identifying aromatic rings and bridgehead nitrogens
        See Also:
        Constant Field Values
      • currentRule

        int currentRule
        the current rule being applied exhaustively
      • data

        CIPData data
        collected bitsets and more specialized SMILES/SMARTS searches and vwr references
      • doTrack

        boolean doTrack
        are we tracking pathways for _M.CIPInfo?
      • isAux

        boolean isAux
        are we in the midst of auxiliary center creation?
      • bsNeedRule

        javajs.util.BS bsNeedRule
        set bits RULE_1a - RULE_6 to indicate a need for that rule based on what is in the model
      • havePseudoAuxiliary

        boolean havePseudoAuxiliary
        do we have r or s and so will need to recalculate Mata like/unlike lists in Rule 5?
      • ptIDLogger

        int ptIDLogger
        incremental pointer providing a unique ID to every CIPAtom for debugging
    • Constructor Detail

      • CIPChirality

        public CIPChirality()
    • Method Detail

      • getRuleName

        public String getRuleName​(int rule)
      • getChiralityForAtoms

        public void getChiralityForAtoms​(CIPData data)
        A general determination of chirality that involves ultimately all of Rules 1-6.
        Parameters:
        data -
      • setStereoFromSmiles

        private void setStereoFromSmiles​(javajs.util.BS bsHelix,
                                         int stereo,
                                         SimpleNode[] atoms)
      • preFilterAtomList

        private boolean preFilterAtomList​(SimpleNode[] atoms,
                                          javajs.util.BS bsToDo,
                                          javajs.util.BS bsEnes)
        Remove unnecessary atoms from the list and let us know if we have alkenes to consider.
        Parameters:
        atoms -
        bsToDo -
        bsEnes -
        Returns:
        whether we have any alkenes that could be EZ
      • isFirstRow

        static boolean isFirstRow​(SimpleNode a)
        Check if an atom is 1st row.
        Parameters:
        a -
        Returns:
        elemno > 2 && elemno ≤ 10
      • clearSmallRingEZ

        private void clearSmallRingEZ​(SimpleNode[] atoms,
                                      javajs.util.Lst<int[]> lstEZ)
        Remove E/Z designations for small-rings double bonds (IUPAC 2013.P-93.5.1.4.1).
        Parameters:
        atoms -
        lstEZ -
      • getAtomBondChirality

        private void getAtomBondChirality​(SimpleNode atom,
                                          javajs.util.Lst<int[]> lstEZ,
                                          javajs.util.BS bsToDo)
        Get E/Z characteristics for specific atoms. Also check here for atropisomeric M/P designations
        Parameters:
        atom -
        lstEZ -
        bsToDo -
      • getLastCumuleneAtom

        private SimpleNode getLastCumuleneAtom​(SimpleEdge bond,
                                               SimpleNode atom,
                                               int[] nSP2,
                                               SimpleNode[] parents)
        Parameters:
        bond -
        atom -
        nSP2 - returns the number of sp2 carbons in this alkene or cumulene
        parents -
        Returns:
        the terminal atom of this alkene or cumulene
      • getAtomChiralityLimited

        int getAtomChiralityLimited​(SimpleNode atom,
                                    CIPChirality.CIPAtom cipAtom,
                                    SimpleNode parentAtom)
        Determine R/S or one half of E/Z determination
        Parameters:
        atom - ignored if a is not null (just checking ene end top priority)
        cipAtom - ignored if atom is not null
        parentAtom - null for tetrahedral, other alkene carbon for E/Z
        Returns:
        if and E/Z test, [0:none, 1: atoms[0] is higher, 2: atoms[1] is higher] otherwise [0:none, 1:R, 2:S]
      • getBondChiralityLimited

        private int getBondChiralityLimited​(SimpleEdge bond,
                                            SimpleNode a)
        Determine the axial or E/Z chirality for this bond, with the given starting atom a
        Parameters:
        bond -
        a - first atom to consider, or null
        Returns:
        one of: {NO_CHIRALITY | STEREO_Z | STEREO_E | STEREO_Ra | STEREO_Sa | STEREO_ra | STEREO_sa}
      • setBondChirality

        private int setBondChirality​(SimpleNode a,
                                     SimpleNode pa,
                                     SimpleNode pb,
                                     SimpleNode b,
                                     boolean isAxial)
        Determine the axial or E/Z chirality for the a-b bond.
        Parameters:
        a -
        pa -
        pb -
        b -
        isAxial -
        Returns:
        one of: {NO_CHIRALITY | STEREO_Z | STEREO_E | STEREO_M | STEREO_P | STEREO_m | STEREO_p}
      • getEneChirality

        int getEneChirality​(CIPChirality.CIPAtom winner1,
                            CIPChirality.CIPAtom end1,
                            CIPChirality.CIPAtom end2,
                            CIPChirality.CIPAtom winner2,
                            boolean isAxial,
                            boolean allowPseudo)
        Determine the stereochemistry of a bond
        Parameters:
        winner1 -
        end1 -
        end2 -
        winner2 -
        isAxial - if an odd-cumulene
        allowPseudo - if we are working from a high-level bond stereochemistry method
        Returns:
        STEREO_M, STEREO_P, STEREO_Z, STEREO_E, or NO_CHIRALITY