Class TransformationStrategies


  • public class TransformationStrategies
    extends java.lang.Object
    A class providing static methods and objects that do useful things with transformation strategies.

    This class provides several transformation strategies that turn strings or other objects into bit vectors. The transformations might optionally be:

    • Lexicographical: for objects based on bytes or characters, such as strings and byte arrays, this means that the first bit of the bit vector is the most significant bit of the first byte or character, and so on. In other word, the lexicographical order between bit vectors reflects the lexicographical byte-by-byte, char-by-char, etc. order. Thiss property is necessary for some kind of static structure that depends on it, but it has some computational cost, as after compacting byte or chars into a long we need to revert the bit order of each piece.
    • Prefix-free: no two bit vector returned by the transformation on two different objects will be comparable in prefix order. Again, this might require to use more linear (e.g., prefixFree()) or constant (e.g., prefixFreeIso()) additional space.

    As a general rule, transformations without additional naming are lexicographical. Transformation that generate prefix-free bit vectors are marked as such. Plain transformations that do not provide any guarantee are called raw. They should be used only when performance is the main issue and the two properties above are not relevant.

    See Also:
    TransformationStrategy
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static TransformationStrategy<byte[]> byteArray()
      A lexicographical transformation from byte arrays to bit vectors.
      static TransformationStrategy<java.lang.Long> fixedLong()
      A transformation from longs to bit vectors that returns a fixed-size Long.SIZE-bit vector.
      static <T extends BitVector>
      TransformationStrategy<T>
      identity()
      A trivial transformation for data already in BitVector form.
      static <T extends java.lang.CharSequence>
      TransformationStrategy<T>
      iso()
      A trivial transformation from strings to bit vectors that concatenates the lower eight bits of the UTF-16 representation.
      static <T extends BitVector>
      TransformationStrategy<T>
      prefixFree()
      A transformation from bit vectors to bit vectors that guarantees that its results are prefix free.
      static TransformationStrategy<byte[]> prefixFreeByteArray()
      A lexicographical transformation from byte arrays to bit vectors that completes the representation with a zero to guarantee lexicographical ordering and prefix-freeness provided the byte arrays to not contain zeros.
      static <T extends java.lang.CharSequence>
      TransformationStrategy<T>
      prefixFreeIso()
      A trivial transformation from strings to bit vectors that concatenates the lower eight bits bits of the UTF-16 representation and completes the representation with an ASCII NUL to guarantee lexicographical ordering and prefix-freeness.
      static <T extends java.lang.CharSequence>
      TransformationStrategy<T>
      prefixFreeUtf16()
      A trivial transformation from strings to bit vectors that concatenates the bits of the UTF-16 representation and completes the representation with an NUL to guarantee lexicographical ordering and prefix-freeness.
      static <T extends java.lang.CharSequence>
      TransformationStrategy<T>
      prefixFreeUtf32()
      A transformation from strings to bit vectors that turns the UTF-16 representation into a UTF-32 representation, decodes surrogate pairs, concatenates the bits of the UTF-32 representation and completes the representation with an NUL to guarantee lexicographical ordering and prefix-freeness.
      static TransformationStrategy<byte[]> rawByteArray()
      A trivial, high-performance, raw transformation from byte arrays to bit vectors that simply concatenates the bytes of the array.
      static TransformationStrategy<java.lang.Long> rawFixedLong()
      A trivial, high-performance, raw transformation from longs to bit vectors that returns a fixed-size Long.SIZE-bit vector.
      static <T extends java.lang.CharSequence>
      TransformationStrategy<T>
      rawIso()
      A trivial, high-performance, raw transformation from strings to bit vectors that concatenates the lower eight bits bits of the UTF-16 representation.
      static <T extends java.lang.CharSequence>
      TransformationStrategy<T>
      rawUtf16()
      A trivial, high-performance, raw transformation from strings to bit vectors that concatenates the bits of the UTF-16 representation.
      static <T extends java.lang.CharSequence>
      TransformationStrategy<T>
      rawUtf32()
      A trivial raw transformation from strings to bit vectors that turns the UTF-16 representation into a UTF-32 representation, decodes surrogate pairs and concatenates the bits of the UTF-32 representation.
      static <T extends java.lang.CharSequence>
      TransformationStrategy<T>
      utf16()
      A trivial transformation from strings to bit vectors that concatenates the bits of the UTF-16 representation.
      static <T extends java.lang.CharSequence>
      TransformationStrategy<T>
      utf32()
      A transformation from strings to bit vectors that turns the UTF-16 representation into a UTF-32 representation, decodes surrogate pairs and concatenates the bits of the UTF-32 representation.
      static <T> java.lang.Iterable<BitVector> wrap​(java.lang.Iterable<T> iterable, TransformationStrategy<? super T> transformationStrategy)
      Wraps a given iterable, returning an iterable that contains bit vectors.
      static <T> java.util.Iterator<BitVector> wrap​(java.util.Iterator<T> iterator, TransformationStrategy<? super T> transformationStrategy)
      Wraps a given iterator, returning an iterator that emits bit vectors.
      static <T> java.util.List<BitVector> wrap​(java.util.List<T> list, TransformationStrategy<? super T> transformationStrategy)
      Wraps a given list, returning a list that contains bit vectors.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • TransformationStrategies

        public TransformationStrategies()
    • Method Detail

      • rawUtf32

        public static <T extends java.lang.CharSequence> TransformationStrategy<T> rawUtf32()
        A trivial raw transformation from strings to bit vectors that turns the UTF-16 representation into a UTF-32 representation, decodes surrogate pairs and concatenates the bits of the UTF-32 representation.

        Warning: this transformation is not lexicographic.

      • utf32

        public static <T extends java.lang.CharSequence> TransformationStrategy<T> utf32()
        A transformation from strings to bit vectors that turns the UTF-16 representation into a UTF-32 representation, decodes surrogate pairs and concatenates the bits of the UTF-32 representation.
      • prefixFreeUtf32

        public static <T extends java.lang.CharSequence> TransformationStrategy<T> prefixFreeUtf32()
        A transformation from strings to bit vectors that turns the UTF-16 representation into a UTF-32 representation, decodes surrogate pairs, concatenates the bits of the UTF-32 representation and completes the representation with an NUL to guarantee lexicographical ordering and prefix-freeness.

        Note that strings provided to this strategy must not contain NULs.

      • rawUtf16

        public static <T extends java.lang.CharSequence> TransformationStrategy<T> rawUtf16()
        A trivial, high-performance, raw transformation from strings to bit vectors that concatenates the bits of the UTF-16 representation.

        Warning: this transformation is not lexicographic.

        Warning: bit vectors returned by this strategy are adaptors around the original string. If the string changes while the bit vector is being accessed, the results will be unpredictable.

      • utf16

        public static <T extends java.lang.CharSequence> TransformationStrategy<T> utf16()
        A trivial transformation from strings to bit vectors that concatenates the bits of the UTF-16 representation.

        Warning: bit vectors returned by this strategy are adaptors around the original string. If the string changes while the bit vector is being accessed, the results will be unpredictable.

      • prefixFreeUtf16

        public static <T extends java.lang.CharSequence> TransformationStrategy<T> prefixFreeUtf16()
        A trivial transformation from strings to bit vectors that concatenates the bits of the UTF-16 representation and completes the representation with an NUL to guarantee lexicographical ordering and prefix-freeness.

        Note that strings provided to this strategy must not contain NULs.

        Warning: bit vectors returned by this strategy are adaptors around the original string. If the string changes while the bit vector is being accessed, the results will be unpredictable.

      • rawIso

        public static <T extends java.lang.CharSequence> TransformationStrategy<T> rawIso()
        A trivial, high-performance, raw transformation from strings to bit vectors that concatenates the lower eight bits bits of the UTF-16 representation.

        Warning: this transformation is not lexicographic.

        Note that this transformation is sensible only for strings that are known to be contain just characters in the ISO-8859-1 charset.

        Warning: bit vectors returned by this strategy are adaptors around the original string. If the string changes while the bit vector is being accessed, the results will be unpredictable.

      • iso

        public static <T extends java.lang.CharSequence> TransformationStrategy<T> iso()
        A trivial transformation from strings to bit vectors that concatenates the lower eight bits of the UTF-16 representation.

        Note that this transformation is sensible only for strings that are known to be contain just characters in the ISO-8859-1 charset.

        Warning: bit vectors returned by this strategy are adaptors around the original string. If the string changes while the bit vector is being accessed, the results will be unpredictable.

      • prefixFreeIso

        public static <T extends java.lang.CharSequence> TransformationStrategy<T> prefixFreeIso()
        A trivial transformation from strings to bit vectors that concatenates the lower eight bits bits of the UTF-16 representation and completes the representation with an ASCII NUL to guarantee lexicographical ordering and prefix-freeness.

        Note that this transformation is sensible only for strings that are known to be contain just characters in the ISO-8859-1 charset, and that strings provided to this strategy must not contain ASCII NULs.

        Warning: bit vectors returned by this strategy are adaptors around the original string. If the string changes while the bit vector is being accessed, the results will be unpredictable.

      • rawByteArray

        public static TransformationStrategy<byte[]> rawByteArray()
        A trivial, high-performance, raw transformation from byte arrays to bit vectors that simply concatenates the bytes of the array.

        Warning: this transformation is not lexicographic.

        Warning: bit vectors returned by this strategy are adaptors around the original array. If the array changes while the bit vector is being accessed, the results will be unpredictable.

        See Also:
        TransformationStrategies
      • byteArray

        public static TransformationStrategy<byte[]> byteArray()
        A lexicographical transformation from byte arrays to bit vectors.

        Warning: bit vectors returned by this strategy are adaptors around the original array. If the array changes while the bit vector is being accessed, the results will be unpredictable.

        See Also:
        TransformationStrategies
      • prefixFreeByteArray

        public static TransformationStrategy<byte[]> prefixFreeByteArray()
        A lexicographical transformation from byte arrays to bit vectors that completes the representation with a zero to guarantee lexicographical ordering and prefix-freeness provided the byte arrays to not contain zeros.

        This transformation is mainly intended for byte arrays representing ASCII strings in compact form.

        Warning: bit vectors returned by this strategy are adaptors around the original array. If the array changes while the bit vector is being accessed, the results will be unpredictable.

        See Also:
        TransformationStrategies
      • wrap

        public static <T> java.util.Iterator<BitVector> wrap​(java.util.Iterator<T> iterator,
                                                             TransformationStrategy<? super T> transformationStrategy)
        Wraps a given iterator, returning an iterator that emits bit vectors.
        Parameters:
        iterator - an iterator.
        transformationStrategy - a strategy to transform the object returned by iterator.
        Returns:
        an iterator that emits the content of iterator passed through transformationStrategy.
      • wrap

        public static <T> java.lang.Iterable<BitVector> wrap​(java.lang.Iterable<T> iterable,
                                                             TransformationStrategy<? super T> transformationStrategy)
        Wraps a given iterable, returning an iterable that contains bit vectors.
        Parameters:
        iterable - an iterable.
        transformationStrategy - a strategy to transform the object contained in iterable.
        Returns:
        an iterable that has the content of iterable passed through transformationStrategy.
      • wrap

        public static <T> java.util.List<BitVector> wrap​(java.util.List<T> list,
                                                         TransformationStrategy<? super T> transformationStrategy)
        Wraps a given list, returning a list that contains bit vectors.
        Parameters:
        list - a list.
        transformationStrategy - a strategy to transform the object contained in list.
        Returns:
        a list that has the content of list passed through transformationStrategy.
      • prefixFree

        public static <T extends BitVectorTransformationStrategy<T> prefixFree()
        A transformation from bit vectors to bit vectors that guarantees that its results are prefix free.

        More in detail, we map 0 to 10, 1 to 11, and we add a 0 at the end of all strings.

        Warning: bit vectors returned by this strategy are adaptors around the original string. If the string changes while the bit vector is being accessed, the results will be unpredictable.

      • fixedLong

        public static TransformationStrategy<java.lang.Long> fixedLong()
        A transformation from longs to bit vectors that returns a fixed-size Long.SIZE-bit vector. Note that the bit vectors have as first bit the most significant bit of the underlying long integer, and that the first bit of the representation is flipped, so lexicographical and numerical order coincide.
        Implementation Notes:
        The flipping of the most significant bit was implemented in 2.6.18 to match lexicographical and numerical order for negative numbers, too, and made it necessary to bump the serial version of the strategy.
      • rawFixedLong

        public static TransformationStrategy<java.lang.Long> rawFixedLong()
        A trivial, high-performance, raw transformation from longs to bit vectors that returns a fixed-size Long.SIZE-bit vector.
        Implementation Notes:
        Implementing fixedLong() lexicographical order for all numbers in 2.6.18 made it necessary to bump the serial version of this strategy, too.