Class PositiveIntSet_impl

  • All Implemented Interfaces:
    PositiveIntSet

    public class PositiveIntSet_impl
    extends java.lang.Object
    implements PositiveIntSet
    An set of non-zero integers, ability to iterate over them (possibly in a sorted way), with O(1) operations for adding, removing, and testing for contains. Optimized for adding mostly increasing integers (with some space for less-than-first-one-added). - major uses: indexes of Feature Structures. Optimize for small sets Graceful degradation for completely random integer operations on large sets; keep O(1) operations, at the expense of extra space (< 3 x size). Uses several different representations depending on the range of ints stored and the total number of ints stored. Sizes: Tiny, medium, large Ranges: semi-knowable, unknowable; if semi-knowable, dense, small-range (< 65K), large-range For all sizes, if dense, use IntBitSet (with offset) else For large, (implies large-range, too) use IntHashSet For medium, if small-range < 65K, use IntHashSet with offset else use IntHashSet Arrange switching between representations to occur infrequently, especially as cost of switching (size) grows Arrange checking for switching to occur infrequently, taking into account how underlying data structures grow (which is often by doubling) Switch when adding new member(s) if alternative representation is sufficiently smaller
    • Field Detail

      • HASH_SET_SHORT_MAX_SIZE

        private static final int HASH_SET_SHORT_MAX_SIZE
        See Also:
        Constant Field Values
      • HASH_SET_OVERALLOCATE_DIVIDER_SHIFT

        private static final int HASH_SET_OVERALLOCATE_DIVIDER_SHIFT
        See Also:
        Constant Field Values
      • isBitSet

        boolean isBitSet
      • isIntSet

        boolean isIntSet
      • isHashSet

        boolean isHashSet
      • secondTimeShrinkable

        boolean secondTimeShrinkable
      • useOffset

        boolean useOffset
        Set false once we find we have to reduce the bit offset
    • Constructor Detail

      • PositiveIntSet_impl

        public PositiveIntSet_impl()
      • PositiveIntSet_impl

        public PositiveIntSet_impl​(int initialSize,
                                   int estMin,
                                   int estMax)
        Set up a Positive Bit Set
        Parameters:
        initialSize - - if 0, don't allocate yet, wait for first add. If isBitSetDense, then this number is interpreted as the first int to be added, typically with an offset. The next two params are used only if initialSize is not 0.
        estMin - - the estimated minimum int value to be added
        estMax - - the estimated max int value to be added
    • Method Detail

      • contains

        public boolean contains​(int key)
        Specified by:
        contains in interface PositiveIntSet
        Parameters:
        key - -
        Returns:
        true if key is in the set
      • find

        public int find​(int element)
        Specified by:
        find in interface PositiveIntSet
        Parameters:
        element - an item which may be in the set
        Returns:
        -1 if the item is not in the set, or a position value that can be used with iterators to start at that item.
      • add

        public boolean add​(int key)
        Specified by:
        add in interface PositiveIntSet
        Parameters:
        key - -
        Returns:
        true if this set did not already contain the specified element
      • remove

        public boolean remove​(int key)
        Specified by:
        remove in interface PositiveIntSet
        Parameters:
        key - -
        Returns:
        true if the set had this element before the remove
      • size

        public int size()
        Specified by:
        size in interface PositiveIntSet
        Returns:
        number of elements in the set
      • adjustBitSetForLowerOffset

        private void adjustBitSetForLowerOffset​(int key)
        When a key is added which is lower than the offset, we need to adjust the whole int set table. Although an array copy could do this, we don't have access to the bitset impl. So we just set the offset to 0, and either convert the whole thing to a inthashset or copy all the bits to a new bit set.
        Parameters:
        key -
      • getHashSetOffset

        private static int getHashSetOffset​(int estMax,
                                            int estMin)
      • toUnorderedIntArray

        public int[] toUnorderedIntArray()
      • toOrderedIntArray

        public int[] toOrderedIntArray()
      • getBitSetSpaceFromRange

        private static int getBitSetSpaceFromRange​(int adjKey,
                                                   int spaceUsed)
        Only called if key won't fit in existing IntHashSet, IntSet or IntBitSet allocation If IntBitSet (spaceUsed != 0) compute what its size() (capacity) would be if the key were added (causing doubling); include the overhead of the intbitset class. long words required = larger of double the current size, or the number of long words required
        Parameters:
        adjKey - - the range of bits needed
        spaceUsed - - the number of bits currently allocated (if currently a bit set), else 0
        Returns:
        number of words needed to store the range,
      • getIntSetSpace

        private static int getIntSetSpace​(int size)
        Parameters:
        size - including new element to be added
        Returns:
        number of words including overhead to store that many elements unless is bigger than INT_SET_MAX_SIZE, then return MAX_VALUE
      • getHashSetSpace

        private int getHashSetSpace()
        Only called if key won't fit in existing allocation returns new hash table size ( usually double) + overhead
        Parameters:
        numberOfEntries -
        Returns:
        new hash table size ( usually double) + overhead
      • getHashSetSpace

        private static int getHashSetSpace​(int numberOfElements,
                                           int estMax,
                                           int estMin)
        For the case where the intHashSet doesn't yet exist
        Parameters:
        numberOfElements - -
        estMax - - the largest int to be stored
        estMin - - the smallest int to be stored
        Returns:
        the size in words
      • getHashSetOverAllocateSize

        private static int getHashSetOverAllocateSize​(int existingSize)
        When converting from bitset to hash set, the initial hash set should be big enough so that it takes a while before it is doubled-expanded. Reduce overallocation for small sizes because the cost to switch is low, and the % overallocate could be a large.
        Parameters:
        existingSize -
        Returns:
        overallocate size
      • maybeSwitchRepresentation

        private void maybeSwitchRepresentation​(int newKey)
        logic for switching representation BitSet preferred - is faster, can be more compact, and permits ordered iteration HashSet used when BitSet is too space inefficient. Avoid switching if the new key being added won't increase the representation size - for hash: if number of elements +1 won't cause doubling of the hash table - for bitset: if key will fit in existing "capacity" Compute space needed after adding - bitset: array doubles - hashset: array doubles Preventing jitters back and forth: - removes don't cause switching - compare against other size including its non-linear space jumps.
      • handleBitSet

        private void handleBitSet​(int newKey)
        Bit Set
      • handleIntSet

        private void handleIntSet​(int newKey)
        IntSet *
      • handleHashSet

        private void handleHashSet​(int newKey)
        Hash Set
      • allocateIntBitSet

        private boolean allocateIntBitSet​(int estMax,
                                          int estMin,
                                          int offsetSpec)
        Allocate a new IntBitSet
        Parameters:
        estMax - the largest int to store in the bit set - determines the initial size
        estMin - this is a lower bound on the value that can be in the bit set
        offsetSpec - this is the offset to use, or -1, to calculate the offset from the initial_size (assuming a small amount of adds will be for ints somewhat less than the initial_size entry (being treated as the first int to be added)
        Returns:
        true if allocated with offset, implying guarantee the estMax key fits.
      • estimatedBitSetOffset

        private static int estimatedBitSetOffset​(int estMin,
                                                 int offsetSpec)
      • switchFromBitSet

        private void switchFromBitSet​(int size,
                                      int offset)
        switching from bit set to either IntSet or IntHashSet
        Parameters:
        size - - space needed including new item
        offset - - for IntHashSet values kept as short, the offset subtracted from values before storing them offset == Integer.MIN_VALUE means use 4 byte ints
      • switchToIntSet

        private void switchToIntSet​(int size)
        switch to IntSet
        Parameters:
        size - number of elements to be able to store without expanding
      • switchToHashSet

        private void switchToHashSet​(int size,
                                     int offset)
        Switch from any intSet impl to Hash Set
        Parameters:
        size - - the capacity - this many elements could be stored before swtiching
        offset - - used only when the size < 65K, and is a value subtracted from elements before storing them, in an attempt to fit the results into just "short"s instead of "int"s if == MIN_VALUE, then force 4 byte ints
      • switchToBitSet

        private void switchToBitSet​(int estMax,
                                    int estMin,
                                    int offset)
      • get

        public int get​(int position)
        Description copied from interface: PositiveIntSet
        For FSBagIndex low level iterator use DOESN"T WORK WITH INCREMENTING position VALUES
        Specified by:
        get in interface PositiveIntSet
        Parameters:
        position - - get the element at this position. This is for iterator use only, and is not related to any key
        Returns:
        the element
      • moveToFirst

        public int moveToFirst()
        Description copied from interface: PositiveIntSet
        For FSBagIndex low level iterator use
        Specified by:
        moveToFirst in interface PositiveIntSet
        Returns:
        the position of the first element, or -1;
      • moveToLast

        public int moveToLast()
        Description copied from interface: PositiveIntSet
        For FSBagIndex low level iterator use
        Specified by:
        moveToLast in interface PositiveIntSet
        Returns:
        the position of the last element, or -1;
      • moveToNext

        public int moveToNext​(int position)
        Description copied from interface: PositiveIntSet
        For FSBagIndex low level iterator use
        Specified by:
        moveToNext in interface PositiveIntSet
        Parameters:
        position - -
        Returns:
        the position of the next element, or -1;
      • moveToPrevious

        public int moveToPrevious​(int position)
        Description copied from interface: PositiveIntSet
        For FSBagIndex low level iterator use
        Specified by:
        moveToPrevious in interface PositiveIntSet
        Parameters:
        position - -
        Returns:
        the position of the next element, or -1;
      • isValid

        public boolean isValid​(int position)
        Description copied from interface: PositiveIntSet
        For FSBagIndex low level iterator use
        Specified by:
        isValid in interface PositiveIntSet
        Parameters:
        position - -
        Returns:
        true if the position is between the first and last element inclusive.
      • bulkAddTo

        public void bulkAddTo​(IntVector v)
        Description copied from interface: PositiveIntSet
        add all elements in this set to the IntVector v as a bulk operation
        Specified by:
        bulkAddTo in interface PositiveIntSet
        Parameters:
        v - - to be added to
      • toIntArray

        public int[] toIntArray()
        Specified by:
        toIntArray in interface PositiveIntSet
        Returns:
        the set as an arbitrarily ordered int array
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • isOffsetBitSet

        boolean isOffsetBitSet()
      • isShortHashSet

        boolean isShortHashSet()