I've recently read about CPU instruction reordering for efficiency. But I'm not able to understand how CPU reorders its instructions. I mean compile time reordering is thinkable since the compiler can foresee the upcoming code. But for a CPU which reads instruction one after the other, how does it see upcoming instructions to reorder them
相关问题
- slurm: use a control node also for computing
- How to let a thread communicate with another activ
- Why it isn't advised to call the release() met
- ThreadPoolTaskScheduler behaviour when pool is ful
- Custom TaskScheduler, SynchronizationContext?
相关文章
- Difference between Thread#run and Thread#wakeup?
- Java/Spring MVC: provide request context to child
- Threading in C# , value types and reference types
- RMI Threads prevent JVM from exiting after main()
- How to get CPU serial under Linux without root per
- Is it possible to run 16 bit code in an operating
- Async task does not work properly (doInBackground
- Android, Volley Request, the response is blocking
Instructions are fetched in program order into an instruction queue; from the queue they are decoded and moved into reservation stations. These reservation stations effectively do the reordering: instructions are dispatched for execution to execution units as their arguments become available and the time all the arguments become available generally does not correspond to the order in the instruction queue/memory.
For an example, using the Tomasulo Algorithm, see these two videos:
Issue (and register renaming): https://youtu.be/I2qMY0XvYHA?list=PLAwxTw4SYaPkNw98-MFodLzKgi6bYGjZs
Dispatch/reordering: https://youtu.be/bEB7sZTP8zc?list=PLAwxTw4SYaPkNw98-MFodLzKgi6bYGjZs
The instructions are decoded in order, but they then go into a collection of "in progress" instructions. Instructions can make forward progress if their dependencies are met.
For example, say the instructions are:
It may be that the last two instructions are in progress at the same time and if the memory read for register B completes first (maybe it was already in the L1 cache), then the increment of register B will take place before the increment of register A. (Though, of course, after that instruction is decoded.)