Class BatchNode<T>
Data is stored in sharded files, and data is written/consumed and processed concurrently.
The data is processed in batches. Each batch is processed in a single thread. The number of threads is
controlled by
invalid reference
#parallelism(IntSupplier)
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final class
private static final class
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final ToIntFunction
<T> private final DataInterpreter
<T> private final IntSupplier
private final ProcessingService
private final int
private Function
<File, FromFileReader<T>> private final Throughput
private final ShardedFile
private final Throughput
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoid
dispose()
Dispose of this node and explicitly delete all files.private Function
<File, FromFileReader<T>> static <T> BatchNode.Builder
<T> newBuilder
(File directory, DataInterpreter<T> interpreter) static <T> BatchNode
<T> newInstance
(File directory, DataInterpreter<T> interpreter) (package private) FromFileReader
<T> private void
private <R,
A extends TwoStepMapper<T, R>>
voidprocessAggregators
(File shard, Supplier<A> aggregatorFactory, Consumer<A> processor) void
processAll
(Consumer<T> processor) Process each and every item individuallyvoid
processAll
(Supplier<Consumer<T>> processorFactory) Similar toprocessAll(Consumer)
but you provide a consumer constructor/factory rather than a specific consumer.<R,
A extends TwoStepMapper<T, R>>
voidprocessCombineable
(Supplier<A> aggregatorFactory, Consumer<A> processor) Similar toprocessMergeable(Supplier, Consumer)
but theprocessor
is called with the aggregator instance itself rather than its extracted results.<R> void
processMapped
(Supplier<? extends TwoStepMapper<T, R>> aggregatorFactory, Consumer<R> processor) Deprecated.<R,
A extends TwoStepMapper<T, R>>
voidprocessMergeable
(Supplier<A> aggregatorFactory, Consumer<R> processor) Each shard is processed/aggregated separately by aTwoStepMapper
instance.private <R,
A extends TwoStepMapper<T, R>>
voidprocessResults
(File shard, Supplier<A> aggregatorFactory, Consumer<R> processor) <R,
A extends TwoStepMapper.Combineable<T, R, A>>
RreduceByCombining
(Supplier<A> aggregatorFactory) CallsprocessCombineable(Supplier, Consumer)
with theinvalid reference
TwoStepMapper.Combineable#merge(Object)
TwoStepMapper.Combineable
instance as theconsumer
.<R,
A extends TwoStepMapper.Mergeable<T, R>>
RreduceByMerging
(Supplier<A> aggregatorFactory) CallsprocessMergeable(Supplier, Consumer)
with theTwoStepMapper.Mergeable.merge(Object)
method of a globalTwoStepMapper.Mergeable
instance as theconsumer
.<R,
A extends TwoStepMapper.Mergeable<T, R>>
RreduceMapped
(Supplier<A> aggregatorFactory) Deprecated.v54 Useinvalid reference
#reduceByMerging(Supplier<A>)
-
Field Details
-
DUMMY
-
myDistributor
-
myInterpreter
-
myParallelism
-
myProcessor
-
myQueueCapacity
private final int myQueueCapacity -
myReaderFactory
-
myReaderManager
-
myShards
-
myWriterManger
-
-
Constructor Details
-
BatchNode
BatchNode(BatchNode.Builder<T> builder)
-
-
Method Details
-
newBuilder
-
newInstance
-
dispose
public void dispose()Dispose of this node and explicitly delete all files. -
newWriter
-
processAll
Process each and every item individually- Parameters:
processor
- Must be able to consume concurrently
-
processAll
Similar toprocessAll(Consumer)
but you provide a consumer constructor/factory rather than a specific consumer. Internally there will be 1 consumer per worker thread instantiated. This variant is for when the consumer(s) are stateful. -
processCombineable
public <R,A extends TwoStepMapper<T, void processCombineableR>> (Supplier<A> aggregatorFactory, Consumer<A> processor) Similar toprocessMergeable(Supplier, Consumer)
but theprocessor
is called with the aggregator instance itself rather than its extracted results. This corresponds toinvalid reference
TwoStepMapper#Combineable
invalid reference
TwoStepMapper#Mergeable
- See Also:
-
processMapped
@Deprecated public <R> void processMapped(Supplier<? extends TwoStepMapper<T, R>> aggregatorFactory, Consumer<R> processor) Deprecated.v54 Useinvalid reference
#processMergeable(Supplier<? extends TwoStepMapper<T, H>>,Consumer<H>)
Process mapped/derived data in batches.There will be one
TwoStepMapper
instance per underlying file/shard – that's a batch. Those instances are likely to contain some sort ofCollection
orMap
that hold mapped/derived data.You must make sure that all data items that need to be in the same
TwoStepMapper
instance (in the same batch) are in the same file/shard. You control the number of shards viaBatchNode.Builder.fragmentation(int)
and which item goes in which shard viaBatchNode.Builder.distributor(ToIntFunction)
.- Type Parameters:
R
- The mapped/derived data holding type- Parameters:
aggregatorFactory
- Produces theTwoStepMapper
mapping instancesprocessor
- Consumes the mapped/derived data - the results of one wholeTwoStepMapper
instance at the time
-
processMergeable
public <R,A extends TwoStepMapper<T, void processMergeableR>> (Supplier<A> aggregatorFactory, Consumer<R> processor) Each shard is processed/aggregated separately by aTwoStepMapper
instance. The results are then processed/merged by the providedprocessor
.There is one
TwoStepMapper
instance per underlying worker thread. Those instances are reset and reused for each shard.The
processor
is called concurrently from multiple threads.You must make sure that all data items that need to be in the same aggregator instance are in the same file/shard. You control the number of shards via
BatchNode.Builder.fragmentation(int)
and which item goes in which shard viaBatchNode.Builder.distributor(ToIntFunction)
.- Parameters:
aggregatorFactory
- Produces theTwoStepMapper
aggregator instancesprocessor
- Consumes the aggregated/derived data - the results of one wholeTwoStepMapper
instance at the time
-
reduceByCombining
public <R,A extends TwoStepMapper.Combineable<T, R reduceByCombiningR, A>> (Supplier<A> aggregatorFactory) CallsprocessCombineable(Supplier, Consumer)
with theinvalid reference
TwoStepMapper.Combineable#merge(Object)
TwoStepMapper.Combineable
instance as theconsumer
. -
reduceByMerging
public <R,A extends TwoStepMapper.Mergeable<T, R reduceByMergingR>> (Supplier<A> aggregatorFactory) CallsprocessMergeable(Supplier, Consumer)
with theTwoStepMapper.Mergeable.merge(Object)
method of a globalTwoStepMapper.Mergeable
instance as theconsumer
. -
reduceMapped
@Deprecated public <R,A extends TwoStepMapper.Mergeable<T, R reduceMappedR>> (Supplier<A> aggregatorFactory) Deprecated.v54 Useinvalid reference
#reduceByMerging(Supplier<A>)
Same asprocessMergeable(Supplier, Consumer)
, but then also reduce/merge the total results usinginvalid reference
TwoStepMapper#merge(Object)
Create a class that implements
TwoStepMapper
and make sure to also implementinvalid reference
TwoStepMapper#merge(Object)
-
getReaderFactory
-
process
-
processAggregators
private <R,A extends TwoStepMapper<T, void processAggregatorsR>> (File shard, Supplier<A> aggregatorFactory, Consumer<A> processor) -
processResults
private <R,A extends TwoStepMapper<T, void processResultsR>> (File shard, Supplier<A> aggregatorFactory, Consumer<R> processor) -
newReader
-
invalid reference