Skip to main content
Advertisement

Pro Tips — Multi-threading and Parallel Processing for Higher Batch Throughput

With Spring Batch's default single-threaded configuration, processing 500 million records can take hours. These three parallelization strategies can dramatically improve batch throughput.

1. Multi-threaded Step (Basic Multithreading)

The simplest first step: inject a TaskExecutor into the Step to process chunks in parallel threads.

@Bean
public Step parallelChunkStep() {
return new StepBuilder("parallelChunkStep", jobRepository)
.<Coupon, Coupon>chunk(1000, transactionManager)
.reader(couponItemReader()) // ★ Must be thread-safe (JdbcPagingItemReader recommended)
.processor(couponItemProcessor())
.writer(couponItemWriter())
.taskExecutor(new SimpleAsyncTaskExecutor()) // Separate thread per chunk
.throttleLimit(8) // Max concurrent threads
.build();
}
warning

In a multi-threaded Step, the ItemReader MUST be thread-safe. JdbcCursorItemReader is NOT thread-safe — either wrap it in SynchronizedItemStreamReader or use the thread-safe JdbcPagingItemReader.

2. Parallel Steps (Concurrent Step Execution)

Use Flow and split() to run independent Steps simultaneously.

@Bean
public Job parallelStepsJob() {
Flow flow1 = new FlowBuilder<SimpleFlow>("flow1")
.start(processEmailsStep())
.build();
Flow flow2 = new FlowBuilder<SimpleFlow>("flow2")
.start(processSmsStep())
.build();

Flow parallelFlow = new FlowBuilder<SimpleFlow>("parallelFlow")
.split(new SimpleAsyncTaskExecutor())
.add(flow1, flow2) // Run both flows in parallel
.build();

return new JobBuilder("parallelStepsJob", jobRepository)
.start(parallelFlow)
.end()
.build();
}

3. Partitioning (Split Data Range then Process in Parallel)

The most powerful technique: divide the full dataset into partitions (ranges) and have separate Worker Steps process each partition independently.

@Bean
public Step partitionedStep(Step workerStep) {
return new StepBuilder("partitionedStep", jobRepository)
.partitioner("workerStep", new ColumnRangePartitioner()) // Split by ID range
.step(workerStep)
.gridSize(10) // Divide into 10 partitions
.taskExecutor(new SimpleAsyncTaskExecutor())
.build();
}
StrategyCharacteristicsBest For
Multi-threaded StepParallel chunk processingSingle data source, fast to implement
Parallel StepsSimultaneous independent stepsIndependent tasks like email+SMS
PartitioningSplit data range then parallelizeHundreds of millions of records
Advertisement