-->

Spring batch chunk processing , how does the reade

2019-07-19 23:20发布

问题:

I'm new to springBatch chunking. I want to understand how reader works

here is the scenario : implementing a purging of user accounts Chunk processor : have a reader which reads all the user accounts that matches with purge criteria ,in an order. processor : for each user account based on the some calculation ,it may create a new user account and also changes current record(say mark it as purged)

question : how doe the reader work? say i have 5000 user accounts. If my chunk size is 1000

will reader reads 1000 records and then starts processor . (say processor creates another 100 new records ) ,now writer writes whatever records updated

for reading next 1000 records will the reader executes query again? how does it know where to start?

I'm using hibernate.

回答1:

To answer your specific question, it depends on the ItemReader implementation you use. If you're using the JdbcCursorItemReader, we hold the cursor open during the entire process so we're really reading from the execution of one query. If you're using the JdbcPagingItemReader, then where the next chunk begins is based on the pagination logic.

A couple notes:

  1. Using Hibernate can be tricky with batch processing. There are added complexities when using Hibernate that you can avoid when going straight to the database (not to mention potential performance benefits in a batch environment).
  2. Keep in mind that Spring Batch provides no checks for if the underlying dataset has changed. If you're using the JdbcPagingItemReader, each query is a unique query so if you add records that meet the criteria, they will be returned as well (I'm not 100% sure what would happen if the underlying data changed while a cursor was open…it may be a function of the db itself). Typically, you'll tag the records you want to process in that batch run with some from of flag (timestamp, processing flag, etc).


回答2:

Chunking works in different way than you mentioned.

Chunk oriented processing will read data one at a time and will create "chunks" that will be written. Once the number of items in chunk is equal to the commit interval specified, entire chunk will be written using item writer.

Commit interval has to be set carefully to improve performance of the batch.

For example, say you have 1000 records in the database and according to the query all 1000 records will be read. Commit interval specified is 10.

So once the batch starts executing, it will keep on reading the available records from the database and hand over the records to the item processor(if configured, as processor is optional). After this the data will be aggregated together. Once 10 records are accumulated, then the entire 10 records will be fed to the item writer to writing and the transaction will be committed.