I'm trying to build a Spring/Spring Boot- RESTful web service,
Which accepts a CSV file with 1 million rows/40 columns in each row as input (From a Angular based front end) and will be a synchronous call. User have to be notified on the upload status before proceeding to other screens. So, wait time can't be more than few mins(say 5 mins).
Each of these rows has to be validated against what is in DB and if found to be valid, same will be inserted into DB. In short, each row can be a separate independent transaction.
Can you please suggest what would be the best approach to implement this?
Current legacy system implements the same functionality in Stored procedures, which made the solution closely coupled with the DB, which will be an issue if the RDBMS needs to be changed down the line.
Any approach on processing these 1 million data in chunks (say 20k) in asynchronous web service calls?
Spring batch?
Can stored procedure be more suitable and better performing than above two options by any chance(guessing no!) ?
Can you please help with some approach which is at least as good as stored procedure and how to horizontally scale based on the suggested solution.
You are on the right track with your three suggested options. The answer to your question is unfortunately, it depends.
Any of the above approaches could work for you. I personally prefer Spring Batch as I find the programming model simple and intuitive.
Spring Batch Processing Guide
Another approach would be to use Messaging to parallelize the processing of rows:
In short, your knowledge of your own domain will ultimately guide you towards the best solution for your business.