I have a file with millions of lines in it that I need to process. Each line of the file will result in an HTTP call. I'm trying to figure out the best way to attack the problem.
I obviously could just read the file and make the calls sequentially, but it would be incredibly slow. I'd like to parallelize the calls, but I'm not sure if I should read the entire file into memory (something I'm not a huge fan of) or try to parallelize the reading of the file as well (which I'm not sure would make sense).
Just looking for some thoughts here on the best way to attack the problem. If there is an existing framework or library that does something similar I'm happy to use that as well.
Thanks.
Gray's approach seems to be good. The other approach I would suggest is to split the files into chunks (you will have to write the logic), and process those with multiple threads.
You should used an
ExecutorService
with a boundedBlockingQueue
. As you read in your million lines you submit jobs to the thread-pool until theBlockingQueue
is full. This way you will be able to run 100 (or whatever number is optimal) of HTTP requests simultaneously without having to read all of the lines of the file beforehand.You'll need to set up a
RejectedExecutionHandler
that blocks if the queue is full. This is better than a caller runs handler.