Approximate cost to access various caches and main

2019-01-01 08:01发布

Can anyone give me the approximate time (in nanoseconds) to access L1, L2 and L3 caches, as well as main memory on Intel i7 processors?

While this isn't specifically a programming question, knowing these kinds of speed details is neccessary for some low-latency programming challenges.

109条回答
其实,你不懂
2楼-- · 2019-01-01 08:35
|| | / \ 1.hW ^|^|^|^|^|^|^|^|^|^|^|^|^| <wait>-s ^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^| | / \ 2.hW |^|^|^|^|^|^|^|^|^|^|^|^|^ |^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^|^ |_______________/ \______I|I|I|I|I|I|I|I|I|I|I|I|I|~~~~~~~~~~I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I|I| |~~~~~~~~~~~~~~/ SM:0.warpScheduler /~~~~~~~I~I~I~I~I~I~I~I~I~I~I~I~I~~~~~~~~~~~I~I~I~I~I~I~I~I~I~I~I~I~I~I~I~I~I~I~I~I~I~I~I~I~I~I~I~I~I~I~I~I~I~I~I~I~I | \ | // | \ RR-mode // | \ GREEDY-mode // | \________________// | \______________/SM:0__________________________________________________________________________________ | | |t[ F_______^___________WARP__:________]_______ | ..|SM:1__________________________________________________________________________________ | | |t[ F_______^___________WARP__:________]_______ | ..|SM:2__________________________________________________________________________________ | | |t[ F_______^___________WARP__:________]_______ | ..|SM:3__________________________________________________________________________________ | | |t[ F_______^___________WARP__:________]_______ | ..|SM:4__________________________________________________________________________________ | | |t[ F_______^___________WARP__:________]_______ | ..|SM:5__________________________________________________________________________________ | | |t[ F_______^___________WARP__:________]_______ | ..|SM:6__________________________________________________________________________________ | | |t[ F_______^___________WARP__:________]_______ | ..|SM:7__________________________________________________________________________________ | | |t[ F_______^___________WARP__:________]_______ | ..|SM:8__________________________________________________________________________________ | | |t[ F_______^___________WARP__:________]_______ | ..|SM:9__________________________________________________________________________________ | ..|SM:A |t[ F_______^___________WARP__:________]_______ | ..|SM:B |t[ F_______^___________WARP__:________]_______ | ..|SM:C |t[ F_______^___________WARP__:________]_______ | ..|SM:D |t[ F_______^___________WARP__:________]_______ | |_______________________________________________________________________________________ */

The bottom line?

Any low-latency motivated design has to rather reverse-engineer the "I/O-hydraulics" ( as 0 1-XFERs are incompressible by the nature ) and the resulting latencies rule the performance envelope for any GPGPU solution be it computationally intensive ( read: where processing costs are forgiving a bit more a poor latency XFERs ... ) or not ( read: where ( might be to someone's surprise ) CPU-s are faster in end-to-end processing, than GPU fabrics [citations available] ).

查看更多
登录 后发表回答