Why isn't RDTSC a serializing instruction?

The Intel manuals for the RDTSC instruction warn that out of order execution can change when RDTSC is actually executed, so they recommend inserting a CPUID instruction in front of it because CPUID will serialize the instruction stream (CPUID is never executed out of order). My question is simple: if they had the ability to make instructions serializing, why didn't they make RDTSC serializing? The entire point of it appears to be to get cycle accurate timings. Is there a situation under which you would not want to precede it with a serializing instruction?

Newer Intel CPUs have a separate RDTSCP instruction that is serializing. Intel opted to introduce a separate instruction rather than change the behavior of RDTSC, which suggests to me that there has to be some situation where a potentially out of order timing is what you want. What is it?

标签： performance x86 x86-64 cpu-architecture cpu-cycles

4条回答

Emotional °昔

2楼-- · 2019-01-17 15:45

Because the time stamp counter was, from memory, introduced on the Pentium.

Out-of-order execution didn't show up until the Pentium Pro, at which point it was too late to change what the instruction did.

That's actually confirmed (obtusely) in the document you provide, with the following comment about Pentium and Pentium/MMX (in 4.2, slightly paraphrased):

All of the rules and code samples described in section 4.1 (Pentium Pro and Pentium II) also apply to the Pentium and Pentium/MMX. The only difference is, the CPUID instruction is not necessary for serialization.

And, from Wikipedia:

The Time Stamp Counter is a 64-bit register present on all x86 processors since the Pentium.

: : :

Starting with the Pentium Pro, Intel processors have supported out-of-order execution, where instructions are not necessarily performed in the order they appear in the executable. This can cause RDTSC to be executed later than expected, producing a misleading cycle count.

And, from what I understand, the primary use of RDTSCP (from the i7 onwards) is to give you the processor ID as well, since each processor maintains an independent TSC. It may well be serialising but I see that more of a simple "bug fix" over the older instruction.

0人赞添加讨论(0) 举报

Fickle 薄情

3楼-- · 2019-01-17 15:50

why didn't they make RDTSC serializing? The entire point of it appears to be to get cycle accurate timings

Well, most of the time it's to get high-resolution timestamps. At least some of the time, these timestamps are used for performance metrics. Making the intruction serializing would likely require a pipeline flush, which can be very expensive for CPU-bound applications.

Intel opted to introduce a separate instruction rather than change the behavior of RDTSC, which suggests to me that there has to be some situation where a potentially out of order timing is what you want.

Changing the behavior is almost always undesirable. Intel's customers would be disappointed to find out that RDTSC does something different on newer parts.

0人赞添加讨论(0) 举报

Fickle 薄情

4楼-- · 2019-01-17 15:58

If you are trying to use rdtsc to see if a branch mispredicts, the non-serializing version is what you want.

//math here
rdtsc
branch if zero to done
//do some work that always takes 1 cycle
done: rdtsc

If the branch is predicted correctly, the delta will be small (maybe even negative?). If the branch is mispredicted, the delta will be large.

With the serializing version, the branch condition will be resolved because the first rdtsc waits for the math to finish.

0人赞添加讨论(0) 举报

叛逆

5楼-- · 2019-01-17 16:00

As paxdiably explains, RDTSC predates the concept of "serializing" instructions because it was implemented on an in-order CPU. Adding that behavior later would change the memory access behavior of code using it, and thus be incompatible for some purposes.

Instead, more recent CPUs have a related RDTSCP instruction that is defined as serializing (actually stronger: it promises to wait until all instructions issued before it have completed, not just that memory accesses have been done), for exactly this reason. Use that if you are running on modern CPUs.

0人赞添加讨论(0) 举报

Why isn't RDTSC a serializing instruction?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间