Package cern.colt

Class Partitioning

java.lang.Object
cern.colt.Partitioning

public class Partitioning extends Object
Given some interval boundaries, partitions arrays such that all elements falling into an interval are placed next to each other.

The algorithms partition arrays into two or more intervals. They distinguish between synchronously partitioning either one, two or three arrays. They further come in templated versions, either partitioning int[] arrays or double[] arrays.

You may want to start out reading about the simplest case: Partitioning one int[] array into two intervals. To do so, read partition(int[],int,int,int). Next, building upon that foundation comes a method partitioning int[] arrays into multiple intervals. See partition(int[],int,int,int[],int,int,int[]) for related documentation.

All other methods are no different than the one's you now already understand, except that they operate on slightly different data types.

Performance

Partitioning into two intervals is O( N ). Partitioning into k intervals is O( N * log(k)). Constants factors are minimized. No temporary memory is allocated; Partitioning is in-place.

Version:
1.0, 03-Jul-99
See Also:
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    private static final int
     
    private static final int
     
    protected static int
     
    static int
     
  • Constructor Summary

    Constructors
    Modifier
    Constructor
    Description
    protected
    Makes this class non instantiable, but still let's others inherit from it.
  • Method Summary

    Modifier and Type
    Method
    Description
    private static int
    binarySearchFromTo(int a, int from, int to, IntComparator comp)
    Finds the given key "a" within some generic data using the binary search algorithm.
    static int
    dualPartition(double[] list, double[] secondary, int from, int to, double splitter)
    Same as dualPartition(int[],int[],int,int,int) except that it synchronously partitions double[] rather than int[] arrays.
    static void
    dualPartition(double[] list, double[] secondary, int from, int to, double[] splitters, int splitFrom, int splitTo, int[] splitIndexes)
    Same as dualPartition(int[],int[],int,int,int[],int,int,int[]) except that it synchronously partitions double[] rather than int[] arrays.
    static int
    dualPartition(int[] list, int[] secondary, int from, int to, int splitter)
    Same as partition(int[],int,int,int) except that this method synchronously partitions two arrays at the same time; both arrays are partially sorted according to the elements of the primary array.
    static void
    dualPartition(int[] list, int[] secondary, int from, int to, int[] splitters, int splitFrom, int splitTo, int[] splitIndexes)
    Same as partition(int[],int,int,int[],int,int,int[]) except that this method synchronously partitions two arrays at the same time; both arrays are partially sorted according to the elements of the primary array.
    static void
    genericPartition(int from, int to, int splitFrom, int splitTo, int[] splitIndexes, IntComparator comp, IntComparator comp2, IntComparator comp3, Swapper swapper)
    Same as partition(int[],int,int,int[],int,int,int[]) except that it generically partitions arbitrary shaped data (for example matrices or multiple arrays) rather than int[] arrays.
    private static int
    genericPartition(int from, int to, int splitter, IntComparator comp, Swapper swapper)
    Same as partition(int[],int,int,int) except that it generically partitions arbitrary shaped data (for example matrices or multiple arrays) rather than int[] arrays.
    private static int
    med3(double[] x, int a, int b, int c)
    Returns the index of the median of the three indexed elements.
    private static int
    med3(int[] x, int a, int b, int c)
    Returns the index of the median of the three indexed elements.
    private static int
    med3(int a, int b, int c, IntComparator comp)
    Returns the index of the median of the three indexed chars.
    private static int
    med3(Object[] x, int a, int b, int c, Comparator comp)
    Returns the index of the median of the three indexed chars.
    static int
    partition(double[] list, int from, int to, double splitter)
    Same as partition(int[],int,int,int) except that it partitions double[] rather than int[] arrays.
    static void
    partition(double[] list, int from, int to, double[] splitters, int splitFrom, int splitTo, int[] splitIndexes)
    Same as partition(int[],int,int,int[],int,int,int[]) except that it partitions double[] rather than int[] arrays.
    static int
    partition(int[] list, int from, int to, int splitter)
    Partitions (partially sorts) the given list such that all elements falling into the given interval are placed next to each other.
    static void
    partition(int[] list, int from, int to, int[] splitters, int splitFrom, int splitTo, int[] splitIndexes)
    Partitions (partially sorts) the given list such that all elements falling into some intervals are placed next to each other.
    static void
    partition(DoubleArrayList list, int from, int to, DoubleArrayList splitters, IntArrayList splitIndexes)
    Equivalent to partition(list.elements(), from, to, splitters.elements(), 0, splitters.size()-1, splitIndexes.elements()).
    static void
    partition(IntArrayList list, int from, int to, IntArrayList splitters, IntArrayList splitIndexes)
    Equivalent to partition(list.elements(), from, to, splitters.elements(), 0, splitters.size()-1, splitIndexes.elements()).
    static void
    partition(Object[] list, int from, int to, Object[] splitters, int splitFrom, int splitTo, int[] splitIndexes, Comparator comp)
    Same as partition(int[],int,int,int[],int,int,int[]) except that it partitions Object[] rather than int[] arrays.
    static int
    partition(Object[] list, int from, int to, Object splitter, Comparator comp)
    Same as partition(int[],int,int,int) except that it synchronously partitions the objects of the given list by the order of the given comparator.
    static int
    triplePartition(double[] list, double[] secondary, double[] tertiary, int from, int to, double splitter)
    Same as triplePartition(int[],int[],int[],int,int,int) except that it synchronously partitions double[] rather than int[] arrays.
    static void
    triplePartition(double[] list, double[] secondary, double[] tertiary, int from, int to, double[] splitters, int splitFrom, int splitTo, int[] splitIndexes)
    Same as triplePartition(int[],int[],int[],int,int,int[],int,int,int[]) except that it synchronously partitions double[] rather than int[] arrays.
    static int
    triplePartition(int[] list, int[] secondary, int[] tertiary, int from, int to, int splitter)
    Same as partition(int[],int,int,int) except that this method synchronously partitions three arrays at the same time; all three arrays are partially sorted according to the elements of the primary array.
    static void
    triplePartition(int[] list, int[] secondary, int[] tertiary, int from, int to, int[] splitters, int splitFrom, int splitTo, int[] splitIndexes)
    Same as partition(int[],int,int,int[],int,int,int[]) except that this method synchronously partitions three arrays at the same time; all three arrays are partially sorted according to the elements of the primary array.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • SMALL

      private static final int SMALL
      See Also:
    • MEDIUM

      private static final int MEDIUM
      See Also:
    • steps

      protected static int steps
    • swappedElements

      public static int swappedElements
  • Constructor Details

    • Partitioning

      protected Partitioning()
      Makes this class non instantiable, but still let's others inherit from it.
  • Method Details

    • binarySearchFromTo

      private static int binarySearchFromTo(int a, int from, int to, IntComparator comp)
      Finds the given key "a" within some generic data using the binary search algorithm.
      Parameters:
      a - the index of the key to search for.
      from - the leftmost search position, inclusive.
      to - the rightmost search position, inclusive.
      comp - the comparator determining the order of the generic data. Takes as first argument the index a within the generic splitters s. Takes as second argument the index b within the generic data g.
      Returns:
      index of the search key, if it is contained in the list; otherwise, (-(insertion point) - 1). The insertion point is defined as the the point at which the value would be inserted into the list: the index of the first element greater than the key, or list.length, if all elements in the list are less than the specified key. Note that this guarantees that the return value will be >= 0 if and only if the key is found.
    • dualPartition

      public static void dualPartition(double[] list, double[] secondary, int from, int to, double[] splitters, int splitFrom, int splitTo, int[] splitIndexes)
      Same as dualPartition(int[],int[],int,int,int[],int,int,int[]) except that it synchronously partitions double[] rather than int[] arrays.
    • dualPartition

      public static int dualPartition(double[] list, double[] secondary, int from, int to, double splitter)
      Same as dualPartition(int[],int[],int,int,int) except that it synchronously partitions double[] rather than int[] arrays.
    • dualPartition

      public static void dualPartition(int[] list, int[] secondary, int from, int to, int[] splitters, int splitFrom, int splitTo, int[] splitIndexes)
      Same as partition(int[],int,int,int[],int,int,int[]) except that this method synchronously partitions two arrays at the same time; both arrays are partially sorted according to the elements of the primary array. In other words, each time an element in the primary array is moved from index A to B, the correspoding element within the secondary array is also moved from index A to B.

      Use cases:

      Image having a large list of 2-dimensional points. If memory consumption and performance matter, it is a good idea to physically lay them out as two 1-dimensional arrays (using something like Point2D objects would be prohibitively expensive, both in terms of time and space). Now imagine wanting to histogram the points. We may want to partially sort the points by x-coordinate into intervals. This method efficiently does the job.

      Performance:

      Same as for single-partition methods.

    • dualPartition

      public static int dualPartition(int[] list, int[] secondary, int from, int to, int splitter)
      Same as partition(int[],int,int,int) except that this method synchronously partitions two arrays at the same time; both arrays are partially sorted according to the elements of the primary array. In other words, each time an element in the primary array is moved from index A to B, the correspoding element within the secondary array is also moved from index A to B.

      Performance:

      Same as for single-partition methods.

    • genericPartition

      public static void genericPartition(int from, int to, int splitFrom, int splitTo, int[] splitIndexes, IntComparator comp, IntComparator comp2, IntComparator comp3, Swapper swapper)
      Same as partition(int[],int,int,int[],int,int,int[]) except that it generically partitions arbitrary shaped data (for example matrices or multiple arrays) rather than int[] arrays.

      This method operates on arbitrary shaped data and arbitrary shaped splitters. In fact, it has no idea what kind of data by what kind of splitters it is partitioning. Comparisons and swapping are delegated to user provided objects which know their data and can do the job.

      Lets call the generic data g (it may be a matrix, one array, three linked lists or whatever). Lets call the generic splitters s. This class takes a user comparison function operating on two indexes (a,b), namely an IntComparator. The comparison function determines whether s[a] is equal, less or greater than g[b]. This method can then decide to swap the data g[b] with the data g[c] (yes, c, not a). It calls a user provided Swapper object that knows how to swap the data of these two indexes.

      Again, note the details: Comparisons compare s[a] with g[b]. Swaps swap g[b] with g[c]. Prior to calling this method, the generic splitters s must be sorted ascending and must not contain multiple equal values. These preconditions are not checked; be sure that they are met.

      Parameters:
      from - the index of the first element within g to be considered.
      to - the index of the last element within g to be considered. The method considers the elements g[from] .. g[to].
      splitFrom - the index of the first splitter element to be considered.
      splitTo - the index of the last splitter element to be considered. The method considers the splitter elements s[splitFrom] .. s[splitTo].
      splitIndexes - a list into which this method fills the indexes of elements delimiting intervals. Upon return splitIndexes[splitFrom..splitTo] will be set accordingly. Therefore, must satisfy splitIndexes.length > splitTo.
      comp - the comparator comparing a splitter with an element of the generic data. Takes as first argument the index a within the generic splitters s. Takes as second argument the index b within the generic data g.
      comp2 - the comparator to determine the order of the generic data. Takes as first argument the index a within the generic data g. Takes as second argument the index b within the generic data g.
      comp3 - the comparator comparing a splitter with another splitter. Takes as first argument the index a within the generic splitters s. Takes as second argument the index b within the generic splitters g.
      swapper - an object that knows how to swap the elements at any two indexes (a,b). Takes as first argument the index b within the generic data g. Takes as second argument the index c within the generic data g.

      Tip: Normally you will have splitIndexes.length == s.length as well as from==0, to==g.length-1 and splitFrom==0, splitTo==s.length-1.

      See Also:
    • genericPartition

      private static int genericPartition(int from, int to, int splitter, IntComparator comp, Swapper swapper)
      Same as partition(int[],int,int,int) except that it generically partitions arbitrary shaped data (for example matrices or multiple arrays) rather than int[] arrays.
    • med3

      private static int med3(double[] x, int a, int b, int c)
      Returns the index of the median of the three indexed elements.
    • med3

      private static int med3(int[] x, int a, int b, int c)
      Returns the index of the median of the three indexed elements.
    • med3

      private static int med3(Object[] x, int a, int b, int c, Comparator comp)
      Returns the index of the median of the three indexed chars.
    • med3

      private static int med3(int a, int b, int c, IntComparator comp)
      Returns the index of the median of the three indexed chars.
    • partition

      public static void partition(double[] list, int from, int to, double[] splitters, int splitFrom, int splitTo, int[] splitIndexes)
      Same as partition(int[],int,int,int[],int,int,int[]) except that it partitions double[] rather than int[] arrays.
    • partition

      public static int partition(double[] list, int from, int to, double splitter)
      Same as partition(int[],int,int,int) except that it partitions double[] rather than int[] arrays.
    • partition

      public static void partition(int[] list, int from, int to, int[] splitters, int splitFrom, int splitTo, int[] splitIndexes)
      Partitions (partially sorts) the given list such that all elements falling into some intervals are placed next to each other. Returns the indexes of elements delimiting intervals.

      Example:

      list = (7, 4, 5, 50, 6, 4, 3, 6), splitters = (5, 10, 30) defines the three intervals [-infinity,5), [5,10), [10,30). Lets define to sort the entire list (from=0, to=7) using all splitters (splitFrom==0, splitTo=2).

      The method modifies the list to be list = (4, 4, 3, 6, 7, 5, 6, 50) and returns the splitIndexes = (2, 6, 6). In other words,

      • All values list[0..2] fall into [-infinity,5).
      • All values list[3..6] fall into [5,10).
      • All values list[7..6] fall into [10,30), i.e. no elements, since 7>6.
      • All values list[7 .. 7=list.length-1] fall into [30,infinity].
      • In general, all values list[splitIndexes[j-1]+1 .. splitIndexes[j]] fall into interval j.
      As can be seen, the list is partially sorted such that values falling into a certain interval are placed next to each other. Note that within an interval, elements are entirelly unsorted. They are only sorted across interval boundaries. In particular, this partitioning algorithm is not stable: the relative order of elements is not preserved (Producing a stable algorithm would require no more than minor modifications to method partition(int[],int,int,int)).

      More formally, this method guarantees that upon return for all j = splitFrom .. splitTo there holds:
      for all i = splitIndexes[j-1]+1 .. splitIndexes[j]: splitters[j-1] invalid input: '<'= list[i] invalid input: '<' splitters[j].

      Performance:

      Let N=to-from+1 be the number of elements to be partitioned. Let k=splitTo-splitFrom+1 be the number of splitter elements. Then we have the following time complexities

      • Worst case: O( N * log(k) ).
      • Average case: O( N * log(k) ).
      • Best case: O( N ). In general, the more uniform (skewed) the data is spread across intervals, the more performance approaches the worst (best) case. If no elements fall into the given intervals, running time is linear.
      No temporary memory is allocated; the sort is in-place.

      Implementation:

      The algorithm can be seen as a Bentley/McIlroy quicksort where swapping and insertion sort are omitted. It is designed to detect and take advantage of skew while maintaining good performance in the uniform case.

      Parameters:
      list - the list to be partially sorted.
      from - the index of the first element within list to be considered.
      to - the index of the last element within list to be considered. The method considers the elements list[from] .. list[to].
      splitters - the values at which the list shall be split into intervals. Must be sorted ascending and must not contain multiple identical values. These preconditions are not checked; be sure that they are met.
      splitFrom - the index of the first splitter element to be considered.
      splitTo - the index of the last splitter element to be considered. The method considers the splitter elements splitters[splitFrom] .. splitters[splitTo].
      splitIndexes - a list into which this method fills the indexes of elements delimiting intervals. Upon return splitIndexes[splitFrom..splitTo] will be set accordingly. Therefore, must satisfy splitIndexes.length > splitTo.

      Tip: Normally you will have splitIndexes.length == splitters.length as well as from==0, to==list.length-1 and splitFrom==0, splitTo==splitters.length-1.

      See Also:
    • partition

      public static int partition(int[] list, int from, int to, int splitter)
      Partitions (partially sorts) the given list such that all elements falling into the given interval are placed next to each other. Returns the index of the element delimiting the interval.

      Example:

      list = (7, 4, 5, 50, 6, 4, 3, 6), splitter = 5 defines the two intervals [-infinity,5), [5,+infinity].

      The method modifies the list to be list = (4, 4, 3, 50, 6, 7, 5, 6) and returns the split index 2. In other words,

      • All values list[0..2] fall into [-infinity,5).
      • All values list[3=2+1 .. 7=list.length-1] fall into [5,+infinity].
      As can be seen, the list is partially sorted such that values falling into a certain interval are placed next to each other. Note that within an interval, elements are entirelly unsorted. They are only sorted across interval boundaries. In particular, this partitioning algorithm is not stable.

      More formally, this method guarantees that upon return there holds:

      • for all i = from .. returnValue: list[i] invalid input: '<' splitter and
      • for all i = returnValue+1 .. list.length-1: !(list[i] invalid input: '<' splitter).

      Performance:

      Let N=to-from+1 be the number of elements to be partially sorted. Then the time complexity is O( N ). No temporary memory is allocated; the sort is in-place.

      Parameters:
      list - the list to be partially sorted.
      from - the index of the first element within list to be considered.
      to - the index of the last element within list to be considered. The method considers the elements list[from] .. list[to].
      splitter - the value at which the list shall be split.
      Returns:
      the index of the largest element falling into the interval [-infinity,splitter), as seen after partitioning.
    • partition

      public static void partition(Object[] list, int from, int to, Object[] splitters, int splitFrom, int splitTo, int[] splitIndexes, Comparator comp)
      Same as partition(int[],int,int,int[],int,int,int[]) except that it partitions Object[] rather than int[] arrays.
    • partition

      public static int partition(Object[] list, int from, int to, Object splitter, Comparator comp)
      Same as partition(int[],int,int,int) except that it synchronously partitions the objects of the given list by the order of the given comparator.
    • partition

      public static void partition(DoubleArrayList list, int from, int to, DoubleArrayList splitters, IntArrayList splitIndexes)
      Equivalent to partition(list.elements(), from, to, splitters.elements(), 0, splitters.size()-1, splitIndexes.elements()).
    • partition

      public static void partition(IntArrayList list, int from, int to, IntArrayList splitters, IntArrayList splitIndexes)
      Equivalent to partition(list.elements(), from, to, splitters.elements(), 0, splitters.size()-1, splitIndexes.elements()).
    • triplePartition

      public static void triplePartition(double[] list, double[] secondary, double[] tertiary, int from, int to, double[] splitters, int splitFrom, int splitTo, int[] splitIndexes)
      Same as triplePartition(int[],int[],int[],int,int,int[],int,int,int[]) except that it synchronously partitions double[] rather than int[] arrays.
    • triplePartition

      public static int triplePartition(double[] list, double[] secondary, double[] tertiary, int from, int to, double splitter)
      Same as triplePartition(int[],int[],int[],int,int,int) except that it synchronously partitions double[] rather than int[] arrays.
    • triplePartition

      public static void triplePartition(int[] list, int[] secondary, int[] tertiary, int from, int to, int[] splitters, int splitFrom, int splitTo, int[] splitIndexes)
      Same as partition(int[],int,int,int[],int,int,int[]) except that this method synchronously partitions three arrays at the same time; all three arrays are partially sorted according to the elements of the primary array. In other words, each time an element in the primary array is moved from index A to B, the correspoding element within the secondary array as well as the corresponding element within the tertiary array are also moved from index A to B.

      Use cases:

      Image having a large list of 3-dimensional points. If memory consumption and performance matter, it is a good idea to physically lay them out as three 1-dimensional arrays (using something like Point3D objects would be prohibitively expensive, both in terms of time and space). Now imagine wanting to histogram the points. We may want to partially sort the points by x-coordinate into intervals. This method efficiently does the job.

      Performance:

      Same as for single-partition methods.

    • triplePartition

      public static int triplePartition(int[] list, int[] secondary, int[] tertiary, int from, int to, int splitter)
      Same as partition(int[],int,int,int) except that this method synchronously partitions three arrays at the same time; all three arrays are partially sorted according to the elements of the primary array. In other words, each time an element in the primary array is moved from index A to B, the correspoding element within the secondary array as well as the corresponding element within the tertiary array are also moved from index A to B.

      Performance:

      Same as for single-partition methods.