Class CsvExternalSort

java.lang.Object
com.google.code.externalsorting.csv.CsvExternalSort

public class CsvExternalSort extends Object
  • Field Details

    • LOG

      private static final Logger LOG
    • DEFAULTMAXTEMPFILES

      public static final int DEFAULTMAXTEMPFILES
      Default maximal number of temporary files allowed.
      See Also:
  • Constructor Details

    • CsvExternalSort

      private CsvExternalSort()
  • Method Details

    • estimateAvailableMemory

      public static long estimateAvailableMemory()
      This method calls the garbage collector and then returns the free memory. This avoids problems with applications where the GC hasn't reclaimed memory and reports no available memory.
      Returns:
      available memory
    • estimateBestSizeOfBlocks

      public static long estimateBestSizeOfBlocks(long sizeoffile, int maxtmpfiles, long maxMemory)
      we divide the file into small blocks. If the blocks are too small, we shall create too many temporary files. If they are too big, we shall be using too much memory.
      Parameters:
      sizeoffile - how much data (in bytes) can we expect
      maxtmpfiles - how many temporary files can we create (e.g., 1024)
      maxMemory - Maximum memory to use (in bytes)
      Returns:
      the estimate
    • mergeSortedFiles

      public static int mergeSortedFiles(BufferedWriter fbw, CsvSortOptions sortOptions, List<CSVRecordBuffer> bfbs, List<org.apache.commons.csv.CSVRecord> header) throws IOException, ClassNotFoundException
      Throws:
      IOException
      ClassNotFoundException
    • mergeSortedFiles

      public static int mergeSortedFiles(List<File> files, File outputfile, CsvSortOptions sortOptions, boolean append, List<org.apache.commons.csv.CSVRecord> header) throws IOException, ClassNotFoundException
      Throws:
      IOException
      ClassNotFoundException
    • sortInBatch

      public static List<File> sortInBatch(long size_in_byte, BufferedReader fbr, File tmpdirectory, CsvSortOptions sortOptions, List<org.apache.commons.csv.CSVRecord> header) throws IOException
      Throws:
      IOException
    • sortAndSave

      public static File sortAndSave(List<org.apache.commons.csv.CSVRecord> tmplist, File tmpdirectory, CsvSortOptions sortOptions) throws IOException
      Throws:
      IOException
    • checkDuplicateLine

      private static boolean checkDuplicateLine(org.apache.commons.csv.CSVRecord currentLine, org.apache.commons.csv.CSVRecord lastLine)
    • sortInBatch

      public static List<File> sortInBatch(File file, File tmpdirectory, CsvSortOptions sortOptions, List<org.apache.commons.csv.CSVRecord> header) throws IOException
      Throws:
      IOException