Why can't your switch statement data type be l

2019-01-10 19:42发布

问题:

Here's an excerpt from Sun's Java tutorials:

A switch works with the byte, short, char, and int primitive data types. It also works with enumerated types (discussed in Classes and Inheritance) and a few special classes that "wrap" certain primitive types: Character, Byte, Short, and Integer (discussed in Simple Data Objects).

There must be a good reason why the long primitive data type is not allowed. Anyone know what it is?

回答1:

I think to some extent it was probably an arbitrary decision based on typical use of switch.

A switch can essentially be implemented in two ways (or in principle, a combination): for a small number of cases, or ones whose values are widely dispersed, a switch essentially becomes the equivalent of a series of ifs on a temporary variable (the value being switched on must only be evaluated once). For a moderate number of cases that are more or less consecutive in value, a switch table is used (the TABLESWITCH instruction in Java), whereby the location to jump to is effectively looked up in a table.

Either of these methods could in principle use a long value rather than an integer. But I think it was probably just a practical decision to balance up the complexity of the instruction set and compiler with actual need: the cases where you really need to switch over a long are rare enough that it's acceptable to have to re-write as a series of IF statements, or work round in some other way (if the long values in question are close together, you can in your Java code switch over the int result of subtracting the lowest value).



回答2:

Because they didn't implement the necessary instructions in the bytecode and you really don't want to write that many cases, no matter how "production ready" your code is...

[EDIT: Extracted from comments on this answer, with some additions on background]

To be exact, 2³² is a lot of cases and any program with a method long enough to hold more than that is going to be utterly horrendous! In any language. (The longest function I know of in any code in any language is a little over 6k SLOC – yes, it's a big switch – and it's really unmanageable.) If you're really stuck with having a long where you should have only an int or less, then you've got two real alternatives.

  1. Use some variant on the theme of hash functions to compress the long into an int. The simplest one, only for use when you've got the type wrong, is to just cast! More useful would be to do this:

    (int) ((x&0xFFFFFFFF) ^ ((x >>> 32) & 0xFFFFFFFF))
    

    before switching on the result. You'll have to work out how to transform the cases that you're testing against too. But really, that's still horrible since it doesn't address the real problem of lots of cases.

  2. A much better solution if you're working with very large numbers of cases is to change your design to using a Map<Long,Runnable> or something similar so that you're looking up how to dispatch a particular value. This allows you to separate the cases into multiple files, which is much easier to manage when the case-count gets large, though it does get more complex to organize the registration of the host of implementation classes involved (annotations might help by allowing you to build the registration code automatically).

    FWIW, I did this many years ago (we switched to the newly-released J2SE 1.2 part way through the project) when building a custom bytecode engine for simulating massively parallel hardware (no, reusing the JVM would not have been suitable due to the radically different value and execution models involved) and it enormously simplified the code relative to the big switch that the C version of the code was using.

To reiterate the take-home message, wanting to switch on a long is an indication that either you've got the types wrong in your program or that you're building a system with that much variation involved that you should be using classes. Time for a rethink in either case.



回答3:

Because the lookup table index must be 32 bits.



回答4:

A long, in 32bit architectures, is represented by two words. Now, imagine what could happen if due to insufficient synchronization, the execution of the switch statement observes a long with its high 32 bits from one write, and the 32 low ones from another! It could try to go to ....who knows where! Basically somewhere at random. Even if both writes represented valid cases for the switch statement, their funny combination would probably lead neither to the first nor to the second -- or extremely worse, it could lead to another valid, but unrelated case!

At least with an int (or lesser types), no matter how badly you mess up, the switch statement will at least read a value that someone actually wrote, instead of a value "out of thin air".

Of course, I don't know the actual reason (it's been more than 15 years, I haven't been paying attention that long!), but if you realize how unsafe and unpredictable such a construct could be, you'll agree that this is a definitely very good reason not to ever have a switch on longs (and as long -pun intended- there will be 32bit machines, this reason will remain valid).