c switch and jump tables

2019-06-17 03:15发布

问题:

It is my understanding that a switch statement in c/c++ will sometimes compile to a jump table. My question is, are there any thumb rules to assure that?

In my case I'm doing something like this:

enum myenum{
MY_CASE0= 0,
MY_CASE0= 1, 
.
.
.
};

switch(foo)
{
  case MY_CASE0:
  //do stuff
  break;
  case MY_CASE1:
  //do stuff
  break;
 .
 .
 .
}

I cover all the cases from 1 to n by order. Is safe to assume it will compile to a jump table? The original code was a long and messy if else statement, so at the very least I gain some readability.

回答1:

A good compiler can and will choose between a jump table, a chained if/else or a combination. A poorly designed compiler may not make such a choice - and may even produce very bad code for switch-blocks. But any decent compiler should produce efficient code for switch-blocks. T

he major decision factor here is that the compiler may choose if/else when the numbers are far apart [and not trivially (e.g. dividing by 2, 4, 8, 16, 256 etc) changed to a closer value], e.g.

 switch(x)
 {
    case 1:
     ...
    case 4912:
     ...
    case 11211:
     ...
    case 19102:
     ...
 }

would require a jump table of at least 19102 * 2 bytes.

On the other hand, if the numbers are close together, the compiler will typically use a jumptable.

Even if it's a if/else type of design, it will typically do a "binary search" - if we take the above example:

 if (x <= 4912)
 {
     if (x == 1)
     {
        ....
     }
     else if (x == 4912)
     {
         .... 
     }
 } else {
     if (x == 11211)
     {
         ....
     }
     else if (x == 19102)
     {
         ...
     }
 }

If we have LOTS of cases, this approach will nest quite deep, and humans will probably get lost after three or four levels of depth (bearing in mind that each if starts at some point in the MIDDLE of the range), but it reduces the number of tests by a log2(n) where n is the number of choices. It is certainly a lot more efficient than the naive approach of

if (x == first value) ... 
else if (x == second value) ... 
else if (x == third value) ... 
..
else if (x == nth value) ... 
else ... 

This can be slightly better if certain values are put at the beginning of the if-else chain, but that assumes you can determine what is the most common before running the code.

If performance is CRITICAL to your case, then you need to benchmark the two alternatives. But my guess is that just writing the code as a switch will make the code much clearer, and at the same time run at least as fast, if not faster.



回答2:

Compilers can certainly convert any C/C++ switch into a jump table, but a compiler would do this for efficiency. Ask yourself, what would I do if I were writing a compiler and I had just build a parse tree for a switch/case statement? I have studied compiler design and construction, and here are some of the decisions,

How to help a compiler decide to implement a jump table:

  • case values are small integers (0,1,2,3,...)
  • case values are in a compact range (few holes, remember default is an option)
  • there are enough cases to make the optimization worthwhile (> N, examine your compiler source to find the constant)
  • clever compilers may subtract/add a constant to a jumptable index if the range is compact (example: 1000, 1001, 1002, 1003, 1004, 1005, etc)
  • avoid fallthrough and transfer of control (goto, continue)
  • only one break at end of each case

Though the mechanics may differ between compilers, the compiler is essentially creating unnamed functions (well, maybe not a function, because the compiler may use jump into the code block and jump outof the code block, or may be clever and use jsr and return)

The certain way to get a jump table is to write it. It is an array of pointers to functions, indexed by the value you want.

How?

Define a typedef for your function pointer, Understanding typedefs for function pointers in C,

typedef void (*FunkPtr)(double a1, double a2);

FunkPtr JumpTable[] = {
    function_name_0,
    function_name_1,
    function_name_2,
    ...
    function_name_n
};

Of course, you have already defined function_name_{0..n}, so the compiler can find the address of the function to evoke.

I will leave evocation of the function pointer and boundary checking as an exercise for the reader.