Class CompactHashSet<E>

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.Iterable<E>, java.util.Collection<E>, java.util.Set<E>
    Direct Known Subclasses:
    CompactLinkedHashSet

    @GwtIncompatible
    class CompactHashSet<E>
    extends java.util.AbstractSet<E>
    implements java.io.Serializable
    CompactHashSet is an implementation of a Set. All optional operations (adding and removing) are supported. The elements can be any objects.

    contains(x), add(x) and remove(x), are all (expected and amortized) constant time operations. Expected in the hashtable sense (depends on the hash function doing a good job of distributing the elements to the buckets to a distribution not far from uniform), and amortized since some operations can trigger a hash table resize.

    Unlike java.util.HashSet, iteration is only proportional to the actual size(), which is optimal, and not the size of the internal hashtable, which could be much larger than size(). Furthermore, this structure only depends on a fixed number of arrays; add(x) operations do not create objects for the garbage collector to deal with, and for every element added, the garbage collector will have to traverse 1.5 references on average, in the marking phase, not 5.0 as in java.util.HashSet.

    If there are no removals, then iteration order is the same as insertion order. Any removal invalidates any ordering guarantees.

    This class should not be assumed to be universally superior to java.util.HashSet. Generally speaking, this class reduces object allocation and memory consumption at the price of moderately increased constant factors of CPU. Only use this class when there is a specific reason to prioritize memory over CPU.

    • Field Summary

      Fields 
      Modifier and Type Field Description
      (package private) java.lang.Object[] elements
      The elements contained in the set, in the range of [0, size()).
      private int[] entries
      Contains the logical entries, in the range of [0, size()).
      (package private) static double HASH_FLOODING_FPP
      Maximum allowed false positive probability of detecting a hash flooding attack given random input.
      private static int MAX_HASH_BUCKET_LENGTH
      Maximum allowed length of a hash table bucket before falling back to a j.u.LinkedHashSet based implementation.
      private int metadata
      Keeps track of metadata like the number of hash table bits and modifications of this data structure (to make it possible to throw ConcurrentModificationException in the iterator).
      private int size
      The number of elements contained in the set.
      private java.lang.Object table
      The hashtable object.
    • Constructor Summary

      Constructors 
      Constructor Description
      CompactHashSet()
      Constructs a new empty instance of CompactHashSet.
      CompactHashSet​(int expectedSize)
      Constructs a new instance of CompactHashSet with the specified capacity.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      boolean add​(E object)  
      (package private) int adjustAfterRemove​(int indexBeforeRemove, int indexRemoved)
      Updates the index an iterator is pointing to after a call to remove: returns the index of the entry that should be looked at after a removal on indexRemoved, with indexBeforeRemove as the index that *was* the next entry that would be looked at.
      (package private) int allocArrays()
      Handle lazy allocation of arrays.
      void clear()  
      boolean contains​(java.lang.Object object)  
      (package private) java.util.Set<E> convertToHashFloodingResistantImplementation()  
      static <E> CompactHashSet<E> create()
      Creates an empty CompactHashSet instance.
      static <E> CompactHashSet<E> create​(E... elements)
      Creates a mutable CompactHashSet instance containing the given elements in unspecified order.
      static <E> CompactHashSet<E> create​(java.util.Collection<? extends E> collection)
      Creates a mutable CompactHashSet instance containing the elements of the given collection in unspecified order.
      private java.util.Set<E> createHashFloodingResistantDelegate​(int tableSize)  
      static <E> CompactHashSet<E> createWithExpectedSize​(int expectedSize)
      Creates a CompactHashSet instance, with a high enough "initial capacity" that it should hold expectedSize elements without growth.
      (package private) java.util.Set<E> delegateOrNull()  
      (package private) int firstEntryIndex()  
      void forEach​(java.util.function.Consumer<? super E> action)  
      (package private) int getSuccessor​(int entryIndex)  
      private int hashTableMask()
      Gets the hash table mask using the stored number of hash table bits.
      (package private) void incrementModCount()  
      (package private) void init​(int expectedSize)
      Pseudoconstructor for serialization support.
      (package private) void insertEntry​(int entryIndex, E object, int hash, int mask)
      Creates a fresh entry with the specified object at the specified position in the entry arrays.
      boolean isEmpty()  
      (package private) boolean isUsingHashFloodingResistance()  
      java.util.Iterator<E> iterator()  
      (package private) void moveLastEntry​(int dstIndex, int mask)
      Moves the last entry in the entry array into dstIndex, and nulls out its old position.
      (package private) boolean needsAllocArrays()
      Returns whether arrays need to be allocated.
      private void readObject​(java.io.ObjectInputStream stream)  
      boolean remove​(java.lang.Object object)  
      (package private) void resizeEntries​(int newCapacity)
      Resizes the internal entries array to the specified capacity, which may be greater or less than the current capacity.
      private void resizeMeMaybe​(int newSize)
      Resizes the entries storage if necessary.
      private int resizeTable​(int mask, int newCapacity, int targetHash, int targetEntryIndex)  
      private void setHashTableMask​(int mask)
      Stores the hash table mask as the number of bits needed to represent an index.
      int size()  
      java.util.Spliterator<E> spliterator()  
      java.lang.Object[] toArray()  
      <T> T[] toArray​(T[] a)  
      void trimToSize()
      Ensures that this CompactHashSet has the smallest representation in memory, given its current size.
      private void writeObject​(java.io.ObjectOutputStream stream)  
      • Methods inherited from class java.util.AbstractSet

        equals, hashCode, removeAll
      • Methods inherited from class java.util.AbstractCollection

        addAll, containsAll, retainAll, toString
      • Methods inherited from class java.lang.Object

        clone, finalize, getClass, notify, notifyAll, wait, wait, wait
      • Methods inherited from interface java.util.Collection

        parallelStream, removeIf, stream, toArray
      • Methods inherited from interface java.util.Set

        addAll, containsAll, retainAll
    • Field Detail

      • HASH_FLOODING_FPP

        static final double HASH_FLOODING_FPP
        Maximum allowed false positive probability of detecting a hash flooding attack given random input.
        See Also:
        Constant Field Values
      • MAX_HASH_BUCKET_LENGTH

        private static final int MAX_HASH_BUCKET_LENGTH
        Maximum allowed length of a hash table bucket before falling back to a j.u.LinkedHashSet based implementation. Experimentally determined.
        See Also:
        Constant Field Values
      • table

        private transient java.lang.Object table
        The hashtable object. This can be either:
        • a byte[], short[], or int[], with size a power of two, created by CompactHashing.createTable, whose values are either
          • UNSET, meaning "null pointer"
          • one plus an index into the entries and elements array
        • another java.util.Set delegate implementation. In most modern JDKs, normal java.util hash collections intelligently fall back to a binary search tree if hash table collisions are detected. Rather than going to all the trouble of reimplementing this ourselves, we simply switch over to use the JDK implementation wholesale if probable hash flooding is detected, sacrificing the compactness guarantee in very rare cases in exchange for much more reliable worst-case behavior.
        • null, if no entries have yet been added to the map
      • entries

        private transient int[] entries
        Contains the logical entries, in the range of [0, size()). The high bits of each int are the part of the smeared hash of the element not covered by the hashtable mask, whereas the low bits are the "next" pointer (pointing to the next entry in the bucket chain), which will always be less than or equal to the hashtable mask.
         hash  = aaaaaaaa
         mask  = 0000ffff
         next  = 0000bbbb
         entry = aaaabbbb
         

        The pointers in [size(), entries.length) are all "null" (UNSET).

      • elements

        transient java.lang.Object[] elements
        The elements contained in the set, in the range of [0, size()). The elements in [size(), elements.length) are all null.
      • metadata

        private transient int metadata
        Keeps track of metadata like the number of hash table bits and modifications of this data structure (to make it possible to throw ConcurrentModificationException in the iterator). Note that we choose not to make this volatile, so we do less of a "best effort" to track such errors, for better performance.
      • size

        private transient int size
        The number of elements contained in the set.
    • Constructor Detail

      • CompactHashSet

        CompactHashSet()
        Constructs a new empty instance of CompactHashSet.
      • CompactHashSet

        CompactHashSet​(int expectedSize)
        Constructs a new instance of CompactHashSet with the specified capacity.
        Parameters:
        expectedSize - the initial capacity of this CompactHashSet.
    • Method Detail

      • create

        public static <E> CompactHashSet<E> create()
        Creates an empty CompactHashSet instance.
      • create

        public static <E> CompactHashSet<E> create​(java.util.Collection<? extends E> collection)
        Creates a mutable CompactHashSet instance containing the elements of the given collection in unspecified order.
        Parameters:
        collection - the elements that the set should contain
        Returns:
        a new CompactHashSet containing those elements (minus duplicates)
      • create

        @SafeVarargs
        public static <E> CompactHashSet<E> create​(E... elements)
        Creates a mutable CompactHashSet instance containing the given elements in unspecified order.
        Parameters:
        elements - the elements that the set should contain
        Returns:
        a new CompactHashSet containing those elements (minus duplicates)
      • createWithExpectedSize

        public static <E> CompactHashSet<E> createWithExpectedSize​(int expectedSize)
        Creates a CompactHashSet instance, with a high enough "initial capacity" that it should hold expectedSize elements without growth.
        Parameters:
        expectedSize - the number of elements you expect to add to the returned set
        Returns:
        a new, empty CompactHashSet with enough capacity to hold expectedSize elements without resizing
        Throws:
        java.lang.IllegalArgumentException - if expectedSize is negative
      • init

        void init​(int expectedSize)
        Pseudoconstructor for serialization support.
      • needsAllocArrays

        boolean needsAllocArrays()
        Returns whether arrays need to be allocated.
      • allocArrays

        int allocArrays()
        Handle lazy allocation of arrays.
      • delegateOrNull

        java.util.Set<E> delegateOrNull()
      • createHashFloodingResistantDelegate

        private java.util.Set<E> createHashFloodingResistantDelegate​(int tableSize)
      • convertToHashFloodingResistantImplementation

        java.util.Set<E> convertToHashFloodingResistantImplementation()
      • isUsingHashFloodingResistance

        boolean isUsingHashFloodingResistance()
      • setHashTableMask

        private void setHashTableMask​(int mask)
        Stores the hash table mask as the number of bits needed to represent an index.
      • hashTableMask

        private int hashTableMask()
        Gets the hash table mask using the stored number of hash table bits.
      • incrementModCount

        void incrementModCount()
      • add

        public boolean add​(E object)
        Specified by:
        add in interface java.util.Collection<E>
        Specified by:
        add in interface java.util.Set<E>
        Overrides:
        add in class java.util.AbstractCollection<E>
      • insertEntry

        void insertEntry​(int entryIndex,
                         E object,
                         int hash,
                         int mask)
        Creates a fresh entry with the specified object at the specified position in the entry arrays.
      • resizeMeMaybe

        private void resizeMeMaybe​(int newSize)
        Resizes the entries storage if necessary.
      • resizeEntries

        void resizeEntries​(int newCapacity)
        Resizes the internal entries array to the specified capacity, which may be greater or less than the current capacity.
      • resizeTable

        private int resizeTable​(int mask,
                                int newCapacity,
                                int targetHash,
                                int targetEntryIndex)
      • contains

        public boolean contains​(java.lang.Object object)
        Specified by:
        contains in interface java.util.Collection<E>
        Specified by:
        contains in interface java.util.Set<E>
        Overrides:
        contains in class java.util.AbstractCollection<E>
      • remove

        public boolean remove​(java.lang.Object object)
        Specified by:
        remove in interface java.util.Collection<E>
        Specified by:
        remove in interface java.util.Set<E>
        Overrides:
        remove in class java.util.AbstractCollection<E>
      • moveLastEntry

        void moveLastEntry​(int dstIndex,
                           int mask)
        Moves the last entry in the entry array into dstIndex, and nulls out its old position.
      • firstEntryIndex

        int firstEntryIndex()
      • getSuccessor

        int getSuccessor​(int entryIndex)
      • adjustAfterRemove

        int adjustAfterRemove​(int indexBeforeRemove,
                              int indexRemoved)
        Updates the index an iterator is pointing to after a call to remove: returns the index of the entry that should be looked at after a removal on indexRemoved, with indexBeforeRemove as the index that *was* the next entry that would be looked at.
      • iterator

        public java.util.Iterator<E> iterator()
        Specified by:
        iterator in interface java.util.Collection<E>
        Specified by:
        iterator in interface java.lang.Iterable<E>
        Specified by:
        iterator in interface java.util.Set<E>
        Specified by:
        iterator in class java.util.AbstractCollection<E>
      • spliterator

        public java.util.Spliterator<E> spliterator()
        Specified by:
        spliterator in interface java.util.Collection<E>
        Specified by:
        spliterator in interface java.lang.Iterable<E>
        Specified by:
        spliterator in interface java.util.Set<E>
      • forEach

        public void forEach​(java.util.function.Consumer<? super E> action)
        Specified by:
        forEach in interface java.lang.Iterable<E>
      • size

        public int size()
        Specified by:
        size in interface java.util.Collection<E>
        Specified by:
        size in interface java.util.Set<E>
        Specified by:
        size in class java.util.AbstractCollection<E>
      • isEmpty

        public boolean isEmpty()
        Specified by:
        isEmpty in interface java.util.Collection<E>
        Specified by:
        isEmpty in interface java.util.Set<E>
        Overrides:
        isEmpty in class java.util.AbstractCollection<E>
      • toArray

        public java.lang.Object[] toArray()
        Specified by:
        toArray in interface java.util.Collection<E>
        Specified by:
        toArray in interface java.util.Set<E>
        Overrides:
        toArray in class java.util.AbstractCollection<E>
      • toArray

        public <T> T[] toArray​(T[] a)
        Specified by:
        toArray in interface java.util.Collection<E>
        Specified by:
        toArray in interface java.util.Set<E>
        Overrides:
        toArray in class java.util.AbstractCollection<E>
      • trimToSize

        public void trimToSize()
        Ensures that this CompactHashSet has the smallest representation in memory, given its current size.
      • clear

        public void clear()
        Specified by:
        clear in interface java.util.Collection<E>
        Specified by:
        clear in interface java.util.Set<E>
        Overrides:
        clear in class java.util.AbstractCollection<E>
      • writeObject

        private void writeObject​(java.io.ObjectOutputStream stream)
                          throws java.io.IOException
        Throws:
        java.io.IOException
      • readObject

        private void readObject​(java.io.ObjectInputStream stream)
                         throws java.io.IOException,
                                java.lang.ClassNotFoundException
        Throws:
        java.io.IOException
        java.lang.ClassNotFoundException