Is there a way to determine the ideal number of th

I am doing a webcrawler and using threads to download pages.

The first limiting factor to the performance of my program is the bandwidth, I can never download more pages that it can get.

The second thing is what I interested. I am using threads to download many pages at same time, but as I create more threads, more sharing of processor occurs. Is there some metric/way/class of tests to determine what is the ideal number of threads or if after certain number, the performance doesn't change or decrease?

标签： java multithreading performance metric

4条回答

傲

2楼-- · 2020-03-20 04:53

I say use something like Akka manage the threads for u. Use Jersey http client lib with non blocking IO which works with callback if i remember correctly. It's possibly the ideal setting for that type of tasks.

0人赞添加讨论(0) 举报

▲ chillily

3楼-- · 2020-03-20 04:57

we've developped a multithreaded parrallel web crawler. Benchmarking troughput is the best way to get ideas on how the beast will handle his job. For a dedicated java server, one thread per core is a base to start, then the I/O comes into play and change.

Performances do decrease after certain number of threads. But it depends on the site you crawl too, on the OS you use, etc. Try to find a site with a merely constant response time to do your first benchmarks (like Google, but take differents services)

With slow websites, higher number of threads tends to compensate i/o blocking

0人赞添加讨论(0) 举报

姐就是有狂的资本

4楼-- · 2020-03-20 04:58

Have a look at my answer in this thread

How to find out the optimal amount of threads?

Your example will likely be CPU bound, so you need a way to work out the contention to be able to work out the right number of threads on your box to use and be able to keep them all busy. Profiling will help there but remember it'll depend on the number of cores (as well as the network latency already mentioned etc) so use the runtime to get the number of cores when wiring up your thread pool size.

No quick answer I'm afraid, there will be an element of test, measure, adjust, repeat I'm afraid!

0人赞添加讨论(0) 举报

Anthone

5楼-- · 2020-03-20 05:04

The ideal number of thread should be close to the number of cores (virtual cores) your hardware provides. This is to avoid thread context switching and thread scheduling. If you're doing heavy IO operations with many blocking reads (your thread blocks on a socket read) I suggest you redesign your code to use non-blocking IO APIs. Typically this will involve one "selector" thread that will monitor the activity of thousands of sockets and a small number of worker threads that will do the processing. If you code is in Java, the APIs are NIO. The only blocking call will be when you call selector.select() and it will only block if there is nothing to be processed on any of the thousands of sockets. Event-driven frameworks such as netty.io use this model and have proven to be very scalable and to best use the hardware resources of the system.

0人赞添加讨论(0) 举报

Is there a way to determine the ideal number of th

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间