Class CsvExternalSort
java.lang.Object
com.google.code.externalsorting.csv.CsvExternalSort
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final int
Default maximal number of temporary files allowed.private static final Logger
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate static boolean
checkDuplicateLine
(org.apache.commons.csv.CSVRecord currentLine, org.apache.commons.csv.CSVRecord lastLine) static long
This method calls the garbage collector and then returns the free memory.static long
estimateBestSizeOfBlocks
(long sizeoffile, int maxtmpfiles, long maxMemory) we divide the file into small blocks.static int
mergeSortedFiles
(BufferedWriter fbw, CsvSortOptions sortOptions, List<CSVRecordBuffer> bfbs, List<org.apache.commons.csv.CSVRecord> header) static int
mergeSortedFiles
(List<File> files, File outputfile, CsvSortOptions sortOptions, boolean append, List<org.apache.commons.csv.CSVRecord> header) static File
sortAndSave
(List<org.apache.commons.csv.CSVRecord> tmplist, File tmpdirectory, CsvSortOptions sortOptions) sortInBatch
(long size_in_byte, BufferedReader fbr, File tmpdirectory, CsvSortOptions sortOptions, List<org.apache.commons.csv.CSVRecord> header) sortInBatch
(File file, File tmpdirectory, CsvSortOptions sortOptions, List<org.apache.commons.csv.CSVRecord> header)
-
Field Details
-
LOG
-
DEFAULTMAXTEMPFILES
public static final int DEFAULTMAXTEMPFILESDefault maximal number of temporary files allowed.- See Also:
-
-
Constructor Details
-
CsvExternalSort
private CsvExternalSort()
-
-
Method Details
-
estimateAvailableMemory
public static long estimateAvailableMemory()This method calls the garbage collector and then returns the free memory. This avoids problems with applications where the GC hasn't reclaimed memory and reports no available memory.- Returns:
- available memory
-
estimateBestSizeOfBlocks
public static long estimateBestSizeOfBlocks(long sizeoffile, int maxtmpfiles, long maxMemory) we divide the file into small blocks. If the blocks are too small, we shall create too many temporary files. If they are too big, we shall be using too much memory.- Parameters:
sizeoffile
- how much data (in bytes) can we expectmaxtmpfiles
- how many temporary files can we create (e.g., 1024)maxMemory
- Maximum memory to use (in bytes)- Returns:
- the estimate
-
mergeSortedFiles
public static int mergeSortedFiles(BufferedWriter fbw, CsvSortOptions sortOptions, List<CSVRecordBuffer> bfbs, List<org.apache.commons.csv.CSVRecord> header) throws IOException, ClassNotFoundException - Throws:
IOException
ClassNotFoundException
-
mergeSortedFiles
public static int mergeSortedFiles(List<File> files, File outputfile, CsvSortOptions sortOptions, boolean append, List<org.apache.commons.csv.CSVRecord> header) throws IOException, ClassNotFoundException - Throws:
IOException
ClassNotFoundException
-
sortInBatch
public static List<File> sortInBatch(long size_in_byte, BufferedReader fbr, File tmpdirectory, CsvSortOptions sortOptions, List<org.apache.commons.csv.CSVRecord> header) throws IOException - Throws:
IOException
-
sortAndSave
public static File sortAndSave(List<org.apache.commons.csv.CSVRecord> tmplist, File tmpdirectory, CsvSortOptions sortOptions) throws IOException - Throws:
IOException
-
checkDuplicateLine
private static boolean checkDuplicateLine(org.apache.commons.csv.CSVRecord currentLine, org.apache.commons.csv.CSVRecord lastLine) -
sortInBatch
public static List<File> sortInBatch(File file, File tmpdirectory, CsvSortOptions sortOptions, List<org.apache.commons.csv.CSVRecord> header) throws IOException - Throws:
IOException
-