Class BatchNode<T>
- java.lang.Object
-
- org.ojalgo.data.batch.BatchNode<T>
-
public final class BatchNode<T> extends java.lang.Object
A batch processing data node for when there's no way to fit the data in memory.Data is stored in sharded files, and data is written/consumed and processed concurrently.
The data is processed in batches. Each batch is processed in a single thread. The number of threads is controlled by
#parallelism(IntSupplier)
.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
BatchNode.Builder<T>
private static class
BatchNode.TwoStepWrapper<T>
-
Field Summary
Fields Modifier and Type Field Description private static java.util.function.Consumer<java.lang.Boolean>
DUMMY
private java.util.function.ToIntFunction<T>
myDistributor
private DataInterpreter<T>
myInterpreter
private java.util.function.IntSupplier
myParallelism
private ProcessingService
myProcessor
private int
myQueueCapacity
private java.util.function.Function<java.io.File,AutoSupplier<T>>
myReaderFactory
private Throughput
myReaderManager
private ShardedFile
myShards
private Throughput
myWriterManger
-
Constructor Summary
Constructors Constructor Description BatchNode(BatchNode.Builder<T> builder)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description void
dispose()
Dispose of this node and explicitly delete all files.private java.util.function.Function<java.io.File,AutoSupplier<T>>
getReaderFactory()
static <T> BatchNode.Builder<T>
newBuilder(java.io.File directory, DataInterpreter<T> interpreter)
static <T> BatchNode<T>
newInstance(java.io.File directory, DataInterpreter<T> interpreter)
(package private) AutoSupplier<T>
newReader(java.io.File file)
AutoConsumer<T>
newWriter()
private void
process(java.io.File shard, java.util.function.Consumer<T> consumer)
private <R,A extends TwoStepMapper<T,R>>
voidprocessAggregators(java.io.File shard, java.util.function.Supplier<A> aggregatorFactory, java.util.function.Consumer<A> processor)
void
processAll(java.util.function.Consumer<T> processor)
Process each and every item individuallyvoid
processAll(java.util.function.Supplier<java.util.function.Consumer<T>> processorFactory)
Similar toprocessAll(Consumer)
but you provide a consumer constructor/factory rather than a specific consumer.<R,A extends TwoStepMapper<T,R>>
voidprocessCombineable(java.util.function.Supplier<A> aggregatorFactory, java.util.function.Consumer<A> processor)
Similar toprocessMergeable(Supplier, Consumer)
but theprocessor
is called with the aggregator instance itself rather than its extracted results.<R> void
processMapped(java.util.function.Supplier<? extends TwoStepMapper<T,R>> aggregatorFactory, java.util.function.Consumer<R> processor)
Deprecated.<R,A extends TwoStepMapper<T,R>>
voidprocessMergeable(java.util.function.Supplier<A> aggregatorFactory, java.util.function.Consumer<R> processor)
Each shard is processed/aggregated separately by aTwoStepMapper
instance.private <R,A extends TwoStepMapper<T,R>>
voidprocessResults(java.io.File shard, java.util.function.Supplier<A> aggregatorFactory, java.util.function.Consumer<R> processor)
<R,A extends TwoStepMapper.Combineable<T,R,A>>
RreduceByCombining(java.util.function.Supplier<A> aggregatorFactory)
CallsprocessCombineable(Supplier, Consumer)
with theTwoStepMapper.Combineable#merge(Object)
method of a globalTwoStepMapper.Combineable
instance as theconsumer
.<R,A extends TwoStepMapper.Mergeable<T,R>>
RreduceByMerging(java.util.function.Supplier<A> aggregatorFactory)
CallsprocessMergeable(Supplier, Consumer)
with theTwoStepMapper.Mergeable.merge(Object)
method of a globalTwoStepMapper.Mergeable
instance as theconsumer
.<R,A extends TwoStepMapper.Mergeable<T,R>>
RreduceMapped(java.util.function.Supplier<A> aggregatorFactory)
Deprecated.v54 Use#reduceByMerging(Supplier)
instead
-
-
-
Field Detail
-
DUMMY
private static final java.util.function.Consumer<java.lang.Boolean> DUMMY
-
myDistributor
private final java.util.function.ToIntFunction<T> myDistributor
-
myInterpreter
private final DataInterpreter<T> myInterpreter
-
myParallelism
private final java.util.function.IntSupplier myParallelism
-
myProcessor
private final ProcessingService myProcessor
-
myQueueCapacity
private final int myQueueCapacity
-
myReaderFactory
private transient java.util.function.Function<java.io.File,AutoSupplier<T>> myReaderFactory
-
myReaderManager
private final Throughput myReaderManager
-
myShards
private final ShardedFile myShards
-
myWriterManger
private final Throughput myWriterManger
-
-
Constructor Detail
-
BatchNode
BatchNode(BatchNode.Builder<T> builder)
-
-
Method Detail
-
newBuilder
public static <T> BatchNode.Builder<T> newBuilder(java.io.File directory, DataInterpreter<T> interpreter)
-
newInstance
public static <T> BatchNode<T> newInstance(java.io.File directory, DataInterpreter<T> interpreter)
-
dispose
public void dispose()
Dispose of this node and explicitly delete all files.
-
newWriter
public AutoConsumer<T> newWriter()
-
processAll
public void processAll(java.util.function.Consumer<T> processor)
Process each and every item individually- Parameters:
processor
- Must be able to consume concurrently
-
processAll
public void processAll(java.util.function.Supplier<java.util.function.Consumer<T>> processorFactory)
Similar toprocessAll(Consumer)
but you provide a consumer constructor/factory rather than a specific consumer. Internally there will be 1 consumer per worker thread instantiated. This variant is for when the consumer(s) are stateful.
-
processCombineable
public <R,A extends TwoStepMapper<T,R>> void processCombineable(java.util.function.Supplier<A> aggregatorFactory, java.util.function.Consumer<A> processor)
Similar toprocessMergeable(Supplier, Consumer)
but theprocessor
is called with the aggregator instance itself rather than its extracted results. This corresponds toTwoStepMapper#Combineable
rather thanTwoStepMapper#Mergeable
.- See Also:
processMergeable(Supplier, Consumer)
-
processMapped
@Deprecated public <R> void processMapped(java.util.function.Supplier<? extends TwoStepMapper<T,R>> aggregatorFactory, java.util.function.Consumer<R> processor)
Deprecated.Process mapped/derived data in batches.There will be one
TwoStepMapper
instance per underlying file/shard – that's a batch. Those instances are likely to contain some sort ofCollection
orMap
that hold mapped/derived data.You must make sure that all data items that need to be in the same
TwoStepMapper
instance (in the same batch) are in the same file/shard. You control the number of shards viaBatchNode.Builder.fragmentation(int)
and which item goes in which shard viaBatchNode.Builder.distributor(ToIntFunction)
.- Type Parameters:
R
- The mapped/derived data holding type- Parameters:
aggregatorFactory
- Produces theTwoStepMapper
mapping instancesprocessor
- Consumes the mapped/derived data - the results of one wholeTwoStepMapper
instance at the time
-
processMergeable
public <R,A extends TwoStepMapper<T,R>> void processMergeable(java.util.function.Supplier<A> aggregatorFactory, java.util.function.Consumer<R> processor)
Each shard is processed/aggregated separately by aTwoStepMapper
instance. The results are then processed/merged by the providedprocessor
.There is one
TwoStepMapper
instance per underlying worker thread. Those instances are reset and reused for each shard.The
processor
is called concurrently from multiple threads.You must make sure that all data items that need to be in the same aggregator instance are in the same file/shard. You control the number of shards via
BatchNode.Builder.fragmentation(int)
and which item goes in which shard viaBatchNode.Builder.distributor(ToIntFunction)
.- Parameters:
aggregatorFactory
- Produces theTwoStepMapper
aggregator instancesprocessor
- Consumes the aggregated/derived data - the results of one wholeTwoStepMapper
instance at the time
-
reduceByCombining
public <R,A extends TwoStepMapper.Combineable<T,R,A>> R reduceByCombining(java.util.function.Supplier<A> aggregatorFactory)
CallsprocessCombineable(Supplier, Consumer)
with theTwoStepMapper.Combineable#merge(Object)
method of a globalTwoStepMapper.Combineable
instance as theconsumer
.
-
reduceByMerging
public <R,A extends TwoStepMapper.Mergeable<T,R>> R reduceByMerging(java.util.function.Supplier<A> aggregatorFactory)
CallsprocessMergeable(Supplier, Consumer)
with theTwoStepMapper.Mergeable.merge(Object)
method of a globalTwoStepMapper.Mergeable
instance as theconsumer
.
-
reduceMapped
@Deprecated public <R,A extends TwoStepMapper.Mergeable<T,R>> R reduceMapped(java.util.function.Supplier<A> aggregatorFactory)
Deprecated.v54 Use#reduceByMerging(Supplier)
insteadSame asprocessMergeable(Supplier, Consumer)
, but then also reduce/merge the total results usingTwoStepMapper#merge(Object)
.Create a class that implements
TwoStepMapper
and make sure to also implementTwoStepMapper#merge(Object)
- you can only use this if merging partial (sub)results is possible. Use a constructor or factory method that produce instances of that type as the argument to this method.
-
getReaderFactory
private java.util.function.Function<java.io.File,AutoSupplier<T>> getReaderFactory()
-
process
private void process(java.io.File shard, java.util.function.Consumer<T> consumer)
-
processAggregators
private <R,A extends TwoStepMapper<T,R>> void processAggregators(java.io.File shard, java.util.function.Supplier<A> aggregatorFactory, java.util.function.Consumer<A> processor)
-
processResults
private <R,A extends TwoStepMapper<T,R>> void processResults(java.io.File shard, java.util.function.Supplier<A> aggregatorFactory, java.util.function.Consumer<R> processor)
-
newReader
AutoSupplier<T> newReader(java.io.File file)
-
-