Package org.apache.uima.internal.util
Class PositiveIntSet_impl
- java.lang.Object
-
- org.apache.uima.internal.util.PositiveIntSet_impl
-
- All Implemented Interfaces:
PositiveIntSet
public class PositiveIntSet_impl extends java.lang.Object implements PositiveIntSet
An set of non-zero integers, ability to iterate over them (possibly in a sorted way), with O(1) operations for adding, removing, and testing for contains. Optimized for adding mostly increasing integers (with some space for less-than-first-one-added). - major uses: indexes of Feature Structures. Optimize for small sets Graceful degradation for completely random integer operations on large sets; keep O(1) operations, at the expense of extra space (< 3 x size). Uses several different representations depending on the range of ints stored and the total number of ints stored. Sizes: Tiny, medium, large Ranges: semi-knowable, unknowable; if semi-knowable, dense, small-range (< 65K), large-range For all sizes, if dense, use IntBitSet (with offset) else For large, (implies large-range, too) use IntHashSet For medium, if small-range < 65K, use IntHashSet with offset else use IntHashSet Arrange switching between representations to occur infrequently, especially as cost of switching (size) grows Arrange checking for switching to occur infrequently, taking into account how underlying data structures grow (which is often by doubling) Switch when adding new member(s) if alternative representation is sufficiently smaller
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private class
PositiveIntSet_impl.IntListIteratorOverArray
-
Field Summary
Fields Modifier and Type Field Description private static int
BIT_SET_OVERALLOCATE
private static int
HASH_SET_OVERALLOCATE_DIVIDER_SHIFT
private static int
HASH_SET_SHORT_MAX_SIZE
private static int
HYSTERESIS
private static int
INT_SET_MAX_SIZE
private PositiveIntSet
intSet
(package private) boolean
isBitSet
(package private) boolean
isHashSet
(package private) boolean
isIntSet
(package private) boolean
secondTimeShrinkable
(package private) boolean
useOffset
Set false once we find we have to reduce the bit offset-
Fields inherited from interface org.apache.uima.internal.util.PositiveIntSet
IS_TRACE_MODE_SWITCH
-
-
Constructor Summary
Constructors Constructor Description PositiveIntSet_impl()
PositiveIntSet_impl(int initialSize, int estMin, int estMax)
Set up a Positive Bit Set
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
add(int key)
private void
adjustBitSetForLowerOffset(int key)
When a key is added which is lower than the offset, we need to adjust the whole int set table.private boolean
allocateIntBitSet(int estMax, int estMin, int offsetSpec)
Allocate a new IntBitSetvoid
bulkAddTo(IntVector v)
add all elements in this set to the IntVector v as a bulk operationvoid
clear()
remove all members of the setboolean
contains(int key)
private static int
estimatedBitSetOffset(int estMin, int offsetSpec)
int
find(int element)
int
get(int position)
For FSBagIndex low level iterator use DOESN"T WORK WITH INCREMENTING position VALUESprivate static int
getBitSetSpaceFromRange(int adjKey, int spaceUsed)
Only called if key won't fit in existing IntHashSet, IntSet or IntBitSet allocation If IntBitSet (spaceUsed != 0) compute what its size() (capacity) would be if the key were added (causing doubling); include the overhead of the intbitset class.private static int
getHashSetOffset(int estMax, int estMin)
private static int
getHashSetOverAllocateSize(int existingSize)
When converting from bitset to hash set, the initial hash set should be big enough so that it takes a while before it is doubled-expanded.private int
getHashSetSpace()
Only called if key won't fit in existing allocation returns new hash table size ( usually double) + overheadprivate static int
getHashSetSpace(int numberOfElements, int estMax, int estMin)
For the case where the intHashSet doesn't yet existprivate static int
getIntSetSpace(int size)
IntListIterator
getOrderedIterator()
IntListIterator
getUnorderedIterator()
private void
handleBitSet(int newKey)
Bit Setprivate void
handleHashSet(int newKey)
Hash Setprivate void
handleIntSet(int newKey)
IntSet *(package private) boolean
isOffsetBitSet()
(package private) boolean
isShortHashSet()
boolean
isValid(int position)
For FSBagIndex low level iterator useIntListIterator
iterator()
private void
maybeSwitchRepresentation(int newKey)
logic for switching representation BitSet preferred - is faster, can be more compact, and permits ordered iteration HashSet used when BitSet is too space inefficient.int
moveToFirst()
For FSBagIndex low level iterator useint
moveToLast()
For FSBagIndex low level iterator useint
moveToNext(int position)
For FSBagIndex low level iterator useint
moveToPrevious(int position)
For FSBagIndex low level iterator useboolean
remove(int key)
int
size()
private void
switchFromBitSet(int size, int offset)
switching from bit set to either IntSet or IntHashSetprivate void
switchToBitSet(int estMax, int estMin, int offset)
private void
switchToHashSet(int size, int offset)
Switch from any intSet impl to Hash Setprivate void
switchToIntSet(int size)
switch to IntSetint[]
toIntArray()
int[]
toOrderedIntArray()
java.lang.String
toString()
int[]
toUnorderedIntArray()
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface org.apache.uima.internal.util.PositiveIntSet
forAllInts
-
-
-
-
Field Detail
-
HYSTERESIS
private static final int HYSTERESIS
- See Also:
- Constant Field Values
-
INT_SET_MAX_SIZE
private static final int INT_SET_MAX_SIZE
- See Also:
- Constant Field Values
-
HASH_SET_SHORT_MAX_SIZE
private static final int HASH_SET_SHORT_MAX_SIZE
- See Also:
- Constant Field Values
-
BIT_SET_OVERALLOCATE
private static final int BIT_SET_OVERALLOCATE
- See Also:
- Constant Field Values
-
HASH_SET_OVERALLOCATE_DIVIDER_SHIFT
private static final int HASH_SET_OVERALLOCATE_DIVIDER_SHIFT
- See Also:
- Constant Field Values
-
intSet
private PositiveIntSet intSet
-
isBitSet
boolean isBitSet
-
isIntSet
boolean isIntSet
-
isHashSet
boolean isHashSet
-
secondTimeShrinkable
boolean secondTimeShrinkable
-
useOffset
boolean useOffset
Set false once we find we have to reduce the bit offset
-
-
Constructor Detail
-
PositiveIntSet_impl
public PositiveIntSet_impl()
-
PositiveIntSet_impl
public PositiveIntSet_impl(int initialSize, int estMin, int estMax)
Set up a Positive Bit Set- Parameters:
initialSize
- - if 0, don't allocate yet, wait for first add. If isBitSetDense, then this number is interpreted as the first int to be added, typically with an offset. The next two params are used only if initialSize is not 0.estMin
- - the estimated minimum int value to be addedestMax
- - the estimated max int value to be added
-
-
Method Detail
-
iterator
public IntListIterator iterator()
- Specified by:
iterator
in interfacePositiveIntSet
- Returns:
- an iterator (may be ordered or unordered) over the members of the set
-
getUnorderedIterator
public IntListIterator getUnorderedIterator()
-
getOrderedIterator
public IntListIterator getOrderedIterator()
-
clear
public void clear()
Description copied from interface:PositiveIntSet
remove all members of the set- Specified by:
clear
in interfacePositiveIntSet
-
contains
public boolean contains(int key)
- Specified by:
contains
in interfacePositiveIntSet
- Parameters:
key
- -- Returns:
- true if key is in the set
-
find
public int find(int element)
- Specified by:
find
in interfacePositiveIntSet
- Parameters:
element
- an item which may be in the set- Returns:
- -1 if the item is not in the set, or a position value that can be used with iterators to start at that item.
-
add
public boolean add(int key)
- Specified by:
add
in interfacePositiveIntSet
- Parameters:
key
- -- Returns:
- true if this set did not already contain the specified element
-
remove
public boolean remove(int key)
- Specified by:
remove
in interfacePositiveIntSet
- Parameters:
key
- -- Returns:
- true if the set had this element before the remove
-
size
public int size()
- Specified by:
size
in interfacePositiveIntSet
- Returns:
- number of elements in the set
-
adjustBitSetForLowerOffset
private void adjustBitSetForLowerOffset(int key)
When a key is added which is lower than the offset, we need to adjust the whole int set table. Although an array copy could do this, we don't have access to the bitset impl. So we just set the offset to 0, and either convert the whole thing to a inthashset or copy all the bits to a new bit set.- Parameters:
key
-
-
getHashSetOffset
private static int getHashSetOffset(int estMax, int estMin)
-
toUnorderedIntArray
public int[] toUnorderedIntArray()
-
toOrderedIntArray
public int[] toOrderedIntArray()
-
getBitSetSpaceFromRange
private static int getBitSetSpaceFromRange(int adjKey, int spaceUsed)
Only called if key won't fit in existing IntHashSet, IntSet or IntBitSet allocation If IntBitSet (spaceUsed != 0) compute what its size() (capacity) would be if the key were added (causing doubling); include the overhead of the intbitset class. long words required = larger of double the current size, or the number of long words required- Parameters:
adjKey
- - the range of bits neededspaceUsed
- - the number of bits currently allocated (if currently a bit set), else 0- Returns:
- number of words needed to store the range,
-
getIntSetSpace
private static int getIntSetSpace(int size)
- Parameters:
size
- including new element to be added- Returns:
- number of words including overhead to store that many elements unless is bigger than INT_SET_MAX_SIZE, then return MAX_VALUE
-
getHashSetSpace
private int getHashSetSpace()
Only called if key won't fit in existing allocation returns new hash table size ( usually double) + overhead- Parameters:
numberOfEntries
-- Returns:
- new hash table size ( usually double) + overhead
-
getHashSetSpace
private static int getHashSetSpace(int numberOfElements, int estMax, int estMin)
For the case where the intHashSet doesn't yet exist- Parameters:
numberOfElements
- -estMax
- - the largest int to be storedestMin
- - the smallest int to be stored- Returns:
- the size in words
-
getHashSetOverAllocateSize
private static int getHashSetOverAllocateSize(int existingSize)
When converting from bitset to hash set, the initial hash set should be big enough so that it takes a while before it is doubled-expanded. Reduce overallocation for small sizes because the cost to switch is low, and the % overallocate could be a large.- Parameters:
existingSize
-- Returns:
- overallocate size
-
maybeSwitchRepresentation
private void maybeSwitchRepresentation(int newKey)
logic for switching representation BitSet preferred - is faster, can be more compact, and permits ordered iteration HashSet used when BitSet is too space inefficient. Avoid switching if the new key being added won't increase the representation size - for hash: if number of elements +1 won't cause doubling of the hash table - for bitset: if key will fit in existing "capacity" Compute space needed after adding - bitset: array doubles - hashset: array doubles Preventing jitters back and forth: - removes don't cause switching - compare against other size including its non-linear space jumps.
-
handleBitSet
private void handleBitSet(int newKey)
Bit Set
-
handleIntSet
private void handleIntSet(int newKey)
IntSet *
-
handleHashSet
private void handleHashSet(int newKey)
Hash Set
-
allocateIntBitSet
private boolean allocateIntBitSet(int estMax, int estMin, int offsetSpec)
Allocate a new IntBitSet- Parameters:
estMax
- the largest int to store in the bit set - determines the initial sizeestMin
- this is a lower bound on the value that can be in the bit setoffsetSpec
- this is the offset to use, or -1, to calculate the offset from the initial_size (assuming a small amount of adds will be for ints somewhat less than the initial_size entry (being treated as the first int to be added)- Returns:
- true if allocated with offset, implying guarantee the estMax key fits.
-
estimatedBitSetOffset
private static int estimatedBitSetOffset(int estMin, int offsetSpec)
-
switchFromBitSet
private void switchFromBitSet(int size, int offset)
switching from bit set to either IntSet or IntHashSet- Parameters:
size
- - space needed including new itemoffset
- - for IntHashSet values kept as short, the offset subtracted from values before storing them offset == Integer.MIN_VALUE means use 4 byte ints
-
switchToIntSet
private void switchToIntSet(int size)
switch to IntSet- Parameters:
size
- number of elements to be able to store without expanding
-
switchToHashSet
private void switchToHashSet(int size, int offset)
Switch from any intSet impl to Hash Set- Parameters:
size
- - the capacity - this many elements could be stored before swtichingoffset
- - used only when the size < 65K, and is a value subtracted from elements before storing them, in an attempt to fit the results into just "short"s instead of "int"s if == MIN_VALUE, then force 4 byte ints
-
switchToBitSet
private void switchToBitSet(int estMax, int estMin, int offset)
-
get
public int get(int position)
Description copied from interface:PositiveIntSet
For FSBagIndex low level iterator use DOESN"T WORK WITH INCREMENTING position VALUES- Specified by:
get
in interfacePositiveIntSet
- Parameters:
position
- - get the element at this position. This is for iterator use only, and is not related to any key- Returns:
- the element
-
moveToFirst
public int moveToFirst()
Description copied from interface:PositiveIntSet
For FSBagIndex low level iterator use- Specified by:
moveToFirst
in interfacePositiveIntSet
- Returns:
- the position of the first element, or -1;
-
moveToLast
public int moveToLast()
Description copied from interface:PositiveIntSet
For FSBagIndex low level iterator use- Specified by:
moveToLast
in interfacePositiveIntSet
- Returns:
- the position of the last element, or -1;
-
moveToNext
public int moveToNext(int position)
Description copied from interface:PositiveIntSet
For FSBagIndex low level iterator use- Specified by:
moveToNext
in interfacePositiveIntSet
- Parameters:
position
- -- Returns:
- the position of the next element, or -1;
-
moveToPrevious
public int moveToPrevious(int position)
Description copied from interface:PositiveIntSet
For FSBagIndex low level iterator use- Specified by:
moveToPrevious
in interfacePositiveIntSet
- Parameters:
position
- -- Returns:
- the position of the next element, or -1;
-
isValid
public boolean isValid(int position)
Description copied from interface:PositiveIntSet
For FSBagIndex low level iterator use- Specified by:
isValid
in interfacePositiveIntSet
- Parameters:
position
- -- Returns:
- true if the position is between the first and last element inclusive.
-
bulkAddTo
public void bulkAddTo(IntVector v)
Description copied from interface:PositiveIntSet
add all elements in this set to the IntVector v as a bulk operation- Specified by:
bulkAddTo
in interfacePositiveIntSet
- Parameters:
v
- - to be added to
-
toIntArray
public int[] toIntArray()
- Specified by:
toIntArray
in interfacePositiveIntSet
- Returns:
- the set as an arbitrarily ordered int array
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
isOffsetBitSet
boolean isOffsetBitSet()
-
isShortHashSet
boolean isShortHashSet()
-
-