Class ExternalSortFactory

java.lang.Object
org.apache.derby.impl.store.access.sort.ExternalSortFactory
All Implemented Interfaces:
ModuleControl, ModuleSupportable, MethodFactory, SortFactory, SortCostController
Direct Known Subclasses:
UniqueWithDuplicateNullsExternalSortFactory

public class ExternalSortFactory extends Object implements SortFactory, ModuleControl, ModuleSupportable, SortCostController
  • Field Details

    • userSpecified

      private boolean userSpecified
    • defaultSortBufferMax

      private int defaultSortBufferMax
    • sortBufferMax

      private int sortBufferMax
    • IMPLEMENTATIONID

      private static final String IMPLEMENTATIONID
      See Also:
    • FORMATUUIDSTRING

      private static final String FORMATUUIDSTRING
      See Also:
    • formatUUID

      private UUID formatUUID
    • DEFAULT_SORTBUFFERMAX

      private static final int DEFAULT_SORTBUFFERMAX
      See Also:
    • MINIMUM_SORTBUFFERMAX

      private static final int MINIMUM_SORTBUFFERMAX
      See Also:
    • DEFAULT_MEM_USE

      protected static final int DEFAULT_MEM_USE
      See Also:
    • DEFAULT_MAX_MERGE_RUN

      protected static final int DEFAULT_MAX_MERGE_RUN
      See Also:
    • SORT_ROW_OVERHEAD

      private static final int SORT_ROW_OVERHEAD
      See Also:
  • Constructor Details

    • ExternalSortFactory

      public ExternalSortFactory()
  • Method Details

    • defaultProperties

      public Properties defaultProperties()
      There are no default properties for the external sort..
      Specified by:
      defaultProperties in interface MethodFactory
      See Also:
    • supportsImplementation

      public boolean supportsImplementation(String implementationId)
      Description copied from interface: MethodFactory
      Return whether this access method implements the implementation type given in the argument string.
      Specified by:
      supportsImplementation in interface MethodFactory
      See Also:
    • primaryImplementationType

      public String primaryImplementationType()
      Description copied from interface: MethodFactory
      Return the primary implementation type for this access method. Although an access method may implement more than one implementation type, this is the expected one. The access manager will put the primary implementation type in a hash table for fast access.
      Specified by:
      primaryImplementationType in interface MethodFactory
      See Also:
    • supportsFormat

      public boolean supportsFormat(UUID formatid)
      Description copied from interface: MethodFactory
      Return whether this access method supports the format supplied in the argument.
      Specified by:
      supportsFormat in interface MethodFactory
      See Also:
    • primaryFormat

      public UUID primaryFormat()
      Description copied from interface: MethodFactory
      Return the primary format that this access method supports. Although an access method may support more than one format, this is the usual one. the access manager will put the primary format in a hash table for fast access to the appropriate method.
      Specified by:
      primaryFormat in interface MethodFactory
      See Also:
    • getMergeSort

      protected MergeSort getMergeSort()
      Returns merge sort implementation. Extending classes can overide this method to customize sorting.
      Returns:
      MergeSort implementation
    • createSort

      public Sort createSort(TransactionController tran, int segment, Properties implParameters, DataValueDescriptor[] template, ColumnOrdering[] columnOrdering, SortObserver sortObserver, boolean alreadyInOrder, long estimatedRows, int estimatedRowSize) throws StandardException
      Create a sort. This method could choose among different sort options, depending on the properties etc., but currently it always returns a merge sort.
      Specified by:
      createSort in interface SortFactory
      Throws:
      StandardException - if the sort could not be opened for some reason, or if an error occurred in one of the lower level modules.
      See Also:
    • openSortCostController

      public SortCostController openSortCostController() throws StandardException
      Return an open SortCostController.

      Return an open SortCostController which can be used to ask about the estimated costs of SortController() operations.

      Specified by:
      openSortCostController in interface SortFactory
      Returns:
      The open SortCostController.
      Throws:
      StandardException - Standard exception policy.
      See Also:
    • close

      public void close()
      Description copied from interface: SortCostController
      Close the controller.

      Close the open controller. This method always succeeds, and never throws any exceptions. Callers must not use the StoreCostController after closing it; they are strongly advised to clear out the StoreCostController reference after closing.

      Specified by:
      close in interface SortCostController
    • getSortCost

      public double getSortCost(DataValueDescriptor[] template, ColumnOrdering[] columnOrdering, boolean alreadyInOrder, long estimatedInputRows, long estimatedExportRows, int estimatedRowSize) throws StandardException
      Short one line description of routine.

      The sort algorithm is a N * log(N) algorithm. The following numbers on a PII, 400 MHZ machine, jdk117 with jit, insane.zip. This test is a simple "select * from table order by first_int_column. I then subtracted the time it takes to do "select * from table" from the result. number of rows elaspsed time in seconds -------------- ----------------------------- 1000 0.20 10000 10.5 100000 80.0 We assume that the formula for sort performance is of the form: performance = K * N * log(N). Solving the equation for the 1000 and 100000 case we come up with: performance = 1 + 0.08 N ln(n) NOTE: Apparently, these measurements were done on a faster machine than was used for other performance measurements used by the optimizer. Experiments show that the 0.8 multiplier is off by a factor of 4 with respect to other measurements (such as the time it takes to scan a conglomerate). I am correcting the formula to use 0.32 rather than 0.08. - Jeff

      RESOLVE (mikem) - this formula is very crude at the moment and will be refined later. known problems: 1) internal vs. external sort - we know that the performance of sort is discontinuous when we go from an internal to an external sort. A better model is probably a different set of contants for internal vs. external sort and some way to guess when this is going to happen. 2) current row size is never considered but is critical to performance. 3) estimatedExportRows is not used. This is a critical number to know if an internal vs. an external sort will happen.

      Specified by:
      getSortCost in interface SortCostController
      Parameters:
      template - A row which is prototypical for the sort. All rows inserted into the sort controller must have exactly the same number of columns as the template row. Every column in an inserted row must have the same type as the corresponding column in the template.
      columnOrdering - An array which specifies which columns participate in ordering - see interface ColumnOrdering for details. The column referenced in the 0th columnOrdering object is compared first, then the 1st, etc.
      alreadyInOrder - Indicates that the rows inserted into the sort controller will already be in order. This is used to perform aggregation only.
      estimatedInputRows - The number of rows that the caller estimates will be inserted into the sort. This number must be >= 0.
      estimatedExportRows - The number of rows that the caller estimates will be exported by the sorter. For instance if the sort is doing duplicate elimination and all rows are expected to be duplicates then the estimatedExportRows would be 1. If no duplicate eliminate is to be done then estimatedExportRows would be the same as estimatedInputRows. This number must be >= 0.
      estimatedRowSize - The estimated average row size of the rows being sorted. This is the client portion of the rowsize, it should not attempt to calculate Store's overhead. -1 indicates that the caller has no idea (and the sorter will use 100 bytes in that case. Used by the sort to make good choices about in-memory vs. external sorting, and to size merge runs. The client is not expected to estimate the per column/ per row overhead of raw store, just to make a guess about the storage associated with each row (ie. reasonable estimates for some implementations would be 4 for int, 8 for long, 102 for char(100), 202 for varchar(200), a number out of hat for user types, ...).
      Returns:
      The identifier to be used to open the conglomerate later.
      Throws:
      StandardException - Standard exception policy.
    • canSupport

      public boolean canSupport(Properties startParams)
      Description copied from interface: ModuleSupportable
      See if this implementation can support any attributes that are listed in properties. This call may be made on a newly created instance before the boot() method has been called, or after the boot method has been called for a running module.

      The module can check for attributes in the properties to see if it can fulfill the required behaviour. E.g. the raw store may define an attribute called RawStore.Recoverable. If a temporary raw store is required the property RawStore.recoverable=false would be added to the properties before calling bootServiceModule. If a raw store cannot support this attribute its canSupport method would return null. Also see the Monitor class's prologue to see how the identifier is used in looking up properties.
      Actually a better way maybe to have properties of the form RawStore.Attributes.mandatory=recoverable,smallfootprint and RawStore.Attributes.requested=oltp,fast

      Specified by:
      canSupport in interface ModuleSupportable
      Returns:
      true if this instance can be used, false otherwise.
    • boot

      public void boot(boolean create, Properties startParams) throws StandardException
      Description copied from interface: ModuleControl
      Boot this module with the given properties. Creates a module instance that can be found using the findModule() methods of Monitor. The module can only be found using one of these findModule() methods once this method has returned.

      An implementation's boot method can throw StandardException. If it is thrown the module is not registered by the monitor and therefore cannot be found through a findModule(). In this case the module's stop() method is not called, thus throwing this exception must free up any resources.

      When create is true the contents of the properties object will be written to the service.properties of the persistent service. Thus any code that requires an entry in service.properties must explicitly place the value in this properties set using the put method.
      Typically the properties object contains one or more default properties sets, which are not written out to service.properties. These default sets are how callers modify the create process. In a JDBC connection database create the first set of defaults is a properties object that contains the attributes that were set on the jdbc:derby: URL. This attributes properties set has the second default properties set as its default. This set (which could be null) contains the properties that the user set on their DriverManager.getConnection() call, and are thus not owned by Derby code, and thus must not be modified by Derby code.

      When create is false the properties object contains all the properties set in the service.properties file plus a limited number of attributes from the JDBC URL attributes or connection properties set. This avoids properties set by the user compromising the boot process. An example of a property passed in from the JDBC world is the bootPassword for encrypted databases.

      Code should not hold onto the passed in properties reference after boot time as its contents may change underneath it. At least after the complete boot is completed, the links to all the default sets will be removed.

      Specified by:
      boot in interface ModuleControl
      Throws:
      StandardException - Module cannot be started.
      See Also:
    • stop

      public void stop()
      Description copied from interface: ModuleControl
      Stop the module. The module may be found via a findModule() method until some time after this method returns. Therefore the factory must be prepared to reject requests to it once it has been stopped. In addition other modules may cache a reference to the module and make requests of it after it has been stopped, these requests should be rejected as well.
      Specified by:
      stop in interface ModuleControl
      See Also:
    • getMonitor

      private static ModuleFactory getMonitor()
      Privileged Monitor lookup. Must be private so that user code can't call this entry point.