What does it mean to say a web crawler is I/O boun

2019-05-02 06:58发布

问题:

I've seen this in some answers on S/O where the point is made that the programming language doesn't matter as much for a crawler and so C++ is overkill vs say Python. Can someone please explain this in layman's terms so that there's no ambiguity about what is implied? Clarification of the underlying assumption here is also appreciated.

Thanks

回答1:

It means that I/O is the bottleneck here. The act of going out to the net to retrieve a page (I/O) is slower than analysing the page (CPU).

So, making the CPU bit ten times faster will have little effect on the overall time taken. On the other hand, doubling the I/O speed will have a very beneficial effect, right up to the point where CPU starts being the bottleneck.



回答2:

It means that the program takes more time reading and writing (via disk or network) then it does actually running the algorithms in the code. I/O is vastly slower than most CPUs, and using it will usually slow down a program greatly.



回答3:

One thing to add is that during Input/Output operations your program (unless poorly written) isn't actively using the CPU, it's in inactive state (sleep).