What do the terms “CPU bound” and “I/O bound” mean

2019-01-01 12:01发布

What do the terms "CPU bound" and "I/O bound" mean?

9条回答
浪荡孟婆
2楼-- · 2019-01-01 12:04

Another way to phrase the same idea:

  • If speeding up the CPU doesn't speed up your program, it may be I/O bound.

  • If speeding up the I/O (e.g. using a faster disk) doesn't help, your program may be CPU bound.

(I used "may be" because you need to take other resources into account. Memory is one example.)

查看更多
美炸的是我
3楼-- · 2019-01-01 12:09

CPU Bound means the rate at which process progresses is limited by the speed of the CPU. A task that performs calculations on a small set of numbers, for example multiplying small matrices, is likely to be CPU bound.

I/O Bound means the rate at which a process progresses is limited by the speed of the I/O subsystem. A task that processes data from disk, for example, counting the number of lines in a file is likely to be I/O bound.

Memory bound means the rate at which a process progresses is limited by the amount memory available and the speed of that memory access. A task that processes large amounts of in memory data, for example multiplying large matrices, is likely to be Memory Bound.

Cache bound means the rate at which a process progress is limited by the amount and speed of the cache available. A task that simply processes more data than fits in the cache will be cache bound.

I/O Bound would be slower than Memory Bound would be slower than Cache Bound would be slower than CPU Bound.

The solution to being I/O bound isn't necessarily to get more Memory. In some situations, the access algorithm could be designed around the I/O, Memory or Cache limitations. See Cache Oblivious Algorithms.

查看更多
琉璃瓶的回忆
4楼-- · 2019-01-01 12:10

When your program is waiting for I/O (ie. a disk read/write or network read/write etc), the CPU is free to do other tasks even if your program is stopped. The speed of your program will mostly depend on how fast that IO can happen, and if you want to speed it up you will need to speed up the I/O.

If your program is running lots of program instructions and not waiting for I/O, then it is said to be CPU bound. Speeding up the CPU will make the program run faster.

In either case, the key to speeding up the program might not be to speed up the hardware, but to optimize the program to reduce the amount of IO or CPU it needs, or to have it do I/O while it also does CPU intensive stuff.

查看更多
初与友歌
5楼-- · 2019-01-01 12:14

It's pretty intuitive:

A program is CPU bound if it would go faster if the CPU were faster, i.e. it spends the majority of its time simply using the CPU (doing calculations). A program that computes new digits of π will typically be CPU-bound, it's just crunching numbers.

A program is I/O bound if it would go faster if the I/O subsystem was faster. Which exact I/O system is meant can vary; I typically associate it with disk, but of course networking or communication in general is common too. A program that looks through a huge file for some data might become I/O bound, since the bottleneck is then the reading of the data from disk (actually, this example is perhaps kind of old-fashioned these days with hundreds of MB/s coming in from SSDs).

查看更多
初与友歌
6楼-- · 2019-01-01 12:16

multi-threading is a case where the distinction matters as explained on the examples below.

RAM I/O bound example: Vector Sum

Consider a program that sums all the values of a single vector:

#define SIZE 1000000
unsigned int is[SIZE];
unsigned int sum = 0;
size_t i = 0;
for (i = 0; i < SIZE; i++)
    /* Each one of those requires a RAM access! */
    sum += is[i]

Parallelizing that by splitting the array equally for each of your cores is of limited usefulness on common modern desktops. C++ benchmark at: https://github.com/cirosantilli/algorithm-cheat/blob/ea16f6bba12e7dcc32c0cbbbcdc74bcc2fd2d05b/src/cpp/interactive/sum_array_parallel.cpp

Tested on GCC 5.2.1, Ubuntu 15.10 with a 4 core Intel i5-3210M, Lenovo T430. Sample typical results (variable since multi-threaded):

Time        N Threads   Comment
---------   ----------  --------
0.045962    none
0.0487619   1           Worse than 0 threads because of startup overhead.
0.0329526   2
0.0302511   3
0.0232993   4           Best time. Only about 2x as fast.
0.0281021   5           Worse than 4 threads because we don't have
                        that many cores, which generate overhead. 

The calculation was not 4x faster as expected with 4 threads!

The reason it is all processors share a single memory bus linking to RAM:

CPU 1 --\     Bus   +-----+
CPU 2 ---\__________| RAM |
CPU 3 ---/          +-----+
CPU 4 --/

so the memory bus quickly becomes the bottleneck, not the CPU.

This happens because adding two numbers takes a single CPU cycle, memory reads take about 100 CPU cycles in 2016 hardware.

So the CPU work done per byte of input data is too small, and we call this an IO-bound process.

The only way to speed up that computation further, would be to speed up individual memory accesses with new memory hardware, e.g. Multi-channel memory.

Upgrading to a faster CPU clock for example would not be very useful.

Other examples

  • matrix multiplication is CPU-bound on RAM and GPUs. The input contains:

    2 * N**2
    

    numbers, but:

    N ** 3
    

    multiplications are done, and that is enough for parallelization to be worth it for practical large N.

    This is why libraries like:

    exist.

    Cache usage makes a big difference to the speed of implementations. See for example this didactic GPU comparison example.

  • GPUs have an IO bottleneck in transferring data to the CPU.

    They are designed so that render output (a rectangle of pixels) can be output directly to the video memory, to avoid the CPU round-trip.

  • Networking is the prototypical IO-bound example.

    Even when we send a single byte of data, it still takes a large time to reach it's destination.

    Parallelizing small network requests like HTTP requests can offer a huge performance gains.

    If the network is already at full capacity (e.g. downloading a torrent), parallelization can still increase improve the latency (e.g. you can load a web page "at the same time").

  • A dummy C++ CPU bound operation that takes one number and crunches it a lot:

How to find out if you are CPU or IO bound

Non-RAM IO bound like disk, network: ps aux, then theck if CPU% / 100 < n threads. If yes, you are IO bound, e.g. blocking reads are just waiting for data and the scheduler is skipping that process. Then use further tools like sudo iotop to decide which IO is the problem exactly.

RAM-IO bound: hard to tell, as RAM wait time it is included in CPU% measurements. Maybe the best you can do is to estimate cache misses.

See also:

查看更多
回忆,回不去的记忆
7楼-- · 2019-01-01 12:22

CPU bound means the program is bottlenecked by the CPU, or central processing unit, while I/O bound means the program is bottlenecked by I/O, or input/output, such as reading or writing to disk, network, etc.

In general, when optimizing computer programs, one tries to seek out the bottleneck and eliminate it. Knowing that your program is CPU bound helps, so that one doesn't unnecessarily optimize something else.

[And by "bottleneck", I mean the thing that makes your program go slower than it otherwise would have.]

查看更多
登录 后发表回答