Class CsvExternalSort


  • public class CsvExternalSort
    extends java.lang.Object
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static int DEFAULTMAXTEMPFILES
      Default maximal number of temporary files allowed.
      private static java.util.logging.Logger LOG  
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      private CsvExternalSort()  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      private static boolean checkDuplicateLine​(org.apache.commons.csv.CSVRecord currentLine, org.apache.commons.csv.CSVRecord lastLine)  
      static long estimateAvailableMemory()
      This method calls the garbage collector and then returns the free memory.
      static long estimateBestSizeOfBlocks​(long sizeoffile, int maxtmpfiles, long maxMemory)
      we divide the file into small blocks.
      static int mergeSortedFiles​(java.io.BufferedWriter fbw, CsvSortOptions sortOptions, java.util.List<CSVRecordBuffer> bfbs, java.util.List<org.apache.commons.csv.CSVRecord> header)  
      static int mergeSortedFiles​(java.util.List<java.io.File> files, java.io.File outputfile, CsvSortOptions sortOptions, boolean append, java.util.List<org.apache.commons.csv.CSVRecord> header)  
      static java.io.File sortAndSave​(java.util.List<org.apache.commons.csv.CSVRecord> tmplist, java.io.File tmpdirectory, CsvSortOptions sortOptions)  
      static java.util.List<java.io.File> sortInBatch​(long size_in_byte, java.io.BufferedReader fbr, java.io.File tmpdirectory, CsvSortOptions sortOptions, java.util.List<org.apache.commons.csv.CSVRecord> header)  
      static java.util.List<java.io.File> sortInBatch​(java.io.File file, java.io.File tmpdirectory, CsvSortOptions sortOptions, java.util.List<org.apache.commons.csv.CSVRecord> header)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • LOG

        private static final java.util.logging.Logger LOG
      • DEFAULTMAXTEMPFILES

        public static final int DEFAULTMAXTEMPFILES
        Default maximal number of temporary files allowed.
        See Also:
        Constant Field Values
    • Constructor Detail

      • CsvExternalSort

        private CsvExternalSort()
    • Method Detail

      • estimateAvailableMemory

        public static long estimateAvailableMemory()
        This method calls the garbage collector and then returns the free memory. This avoids problems with applications where the GC hasn't reclaimed memory and reports no available memory.
        Returns:
        available memory
      • estimateBestSizeOfBlocks

        public static long estimateBestSizeOfBlocks​(long sizeoffile,
                                                    int maxtmpfiles,
                                                    long maxMemory)
        we divide the file into small blocks. If the blocks are too small, we shall create too many temporary files. If they are too big, we shall be using too much memory.
        Parameters:
        sizeoffile - how much data (in bytes) can we expect
        maxtmpfiles - how many temporary files can we create (e.g., 1024)
        maxMemory - Maximum memory to use (in bytes)
        Returns:
        the estimate
      • mergeSortedFiles

        public static int mergeSortedFiles​(java.io.BufferedWriter fbw,
                                           CsvSortOptions sortOptions,
                                           java.util.List<CSVRecordBuffer> bfbs,
                                           java.util.List<org.apache.commons.csv.CSVRecord> header)
                                    throws java.io.IOException,
                                           java.lang.ClassNotFoundException
        Throws:
        java.io.IOException
        java.lang.ClassNotFoundException
      • mergeSortedFiles

        public static int mergeSortedFiles​(java.util.List<java.io.File> files,
                                           java.io.File outputfile,
                                           CsvSortOptions sortOptions,
                                           boolean append,
                                           java.util.List<org.apache.commons.csv.CSVRecord> header)
                                    throws java.io.IOException,
                                           java.lang.ClassNotFoundException
        Throws:
        java.io.IOException
        java.lang.ClassNotFoundException
      • sortInBatch

        public static java.util.List<java.io.File> sortInBatch​(long size_in_byte,
                                                               java.io.BufferedReader fbr,
                                                               java.io.File tmpdirectory,
                                                               CsvSortOptions sortOptions,
                                                               java.util.List<org.apache.commons.csv.CSVRecord> header)
                                                        throws java.io.IOException
        Throws:
        java.io.IOException
      • sortAndSave

        public static java.io.File sortAndSave​(java.util.List<org.apache.commons.csv.CSVRecord> tmplist,
                                               java.io.File tmpdirectory,
                                               CsvSortOptions sortOptions)
                                        throws java.io.IOException
        Throws:
        java.io.IOException
      • checkDuplicateLine

        private static boolean checkDuplicateLine​(org.apache.commons.csv.CSVRecord currentLine,
                                                  org.apache.commons.csv.CSVRecord lastLine)
      • sortInBatch

        public static java.util.List<java.io.File> sortInBatch​(java.io.File file,
                                                               java.io.File tmpdirectory,
                                                               CsvSortOptions sortOptions,
                                                               java.util.List<org.apache.commons.csv.CSVRecord> header)
                                                        throws java.io.IOException
        Throws:
        java.io.IOException