Process Large File for HTTP Calls in Java

I have a file with millions of lines in it that I need to process. Each line of the file will result in an HTTP call. I'm trying to figure out the best way to attack the problem.

I obviously could just read the file and make the calls sequentially, but it would be incredibly slow. I'd like to parallelize the calls, but I'm not sure if I should read the entire file into memory (something I'm not a huge fan of) or try to parallelize the reading of the file as well (which I'm not sure would make sense).

Just looking for some thoughts here on the best way to attack the problem. If there is an existing framework or library that does something similar I'm happy to use that as well.

Thanks.

标签： java multithreading file http

2条回答

爷、活的狠高调

2楼-- · 2019-01-12 12:38

Gray's approach seems to be good. The other approach I would suggest is to split the files into chunks (you will have to write the logic), and process those with multiple threads.

0人赞添加讨论(0) 举报

对你真心纯属浪费

3楼-- · 2019-01-12 12:40

I'd like to parallelize the calls, but I'm not sure if I should read the entire file into memory

You should used an ExecutorService with a bounded BlockingQueue. As you read in your million lines you submit jobs to the thread-pool until the BlockingQueue is full. This way you will be able to run 100 (or whatever number is optimal) of HTTP requests simultaneously without having to read all of the lines of the file beforehand.

You'll need to set up a RejectedExecutionHandler that blocks if the queue is full. This is better than a caller runs handler.

BlockingQueue<Runnable> queue = new ArrayBlockingQueue<Runnable>(100);
// NOTE: you want the min and max thread numbers here to be the same value
ThreadPoolExecutor threadPool =
    new ThreadPoolExecutor(nThreads, nThreads, 0L, TimeUnit.MILLISECONDS, queue);
// we need our RejectedExecutionHandler to block if the queue is full
threadPool.setRejectedExecutionHandler(new RejectedExecutionHandler() {
       @Override
       public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
           try {
                // this will block the producer until there's room in the queue
                executor.getQueue().put(r);
           } catch (InterruptedException e) {
                throw new RejectedExecutionException(
                   "Unexpected InterruptedException", e);
           }
    }
});

// now read in the urls
while ((String url = urlReader.readLine()) != null) {
    // submit them to the thread-pool.  this may block.
    threadPool.submit(new DownloadUrlRunnable(url));
}
// after we submit we have to shutdown the pool
threadPool.shutdown();
// wait for them to complete
threadPool.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);

...
private class DownloadUrlRunnable implements Runnable {
    private final String url;
    public DownloadUrlRunnable(String url) {
       this.url = url;
    }
    public void run() {
       // download the URL
    }
}

0人赞添加讨论(0) 举报

Process Large File for HTTP Calls in Java

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间