Is -1 a magic number? An anti-pattern? A code smel

2019-03-18 12:25发布

Possible Duplicate:
Constant abuse?

I've seen -1 used in various APIs, most commonly when searching into a "collection" with zero-based indices, usually to indicate the "not found" index. This "works" because -1 is never a legal index to begin with. It seems that any negative number should work, but I think -1 is almost always used, as some sort of (unwritten?) convention.

I would like to limit the scope to Java at least for now. My questions are:

  • What are the official words from Sun regarding using -1 as a "special" return value like this?
  • What quotes are there regarding this issue, from e.g. James Gosling, Josh Bloch, or even other authoritative figures outside of Java?
  • What were some of the notable discussions regarding this issue in the past?

10条回答
神经病院院长
2楼-- · 2019-03-18 12:39

The same practice as with null applies to -1. Its been discussed many times.

e.g. Java api design - NULL or Exception

查看更多
狗以群分
3楼-- · 2019-03-18 12:40

Both Java and JavaScript use -1 when an index isn't found. Since the index is always 0-n it seems a pretty obvious choice.

//JavaScript
var url = 'example.com/foo?bar&admin=true';
if(url.indexOf('&admin') != -1){
  alert('we likely have an insecure app!');
}

I find this approach (which I've used when extending Array-type elements to have a .indexOf() method) to be quite normal.

On the other hand, you can try the PHP approach e.g. strpos() but IMHO it gets confusing as there are multiple return types (it returns FALSE when not found)

查看更多
叛逆
4楼-- · 2019-03-18 12:44

-1 as a return value is slightly ugly but necessary. The alternatives to signal a "not found" condition are IMHO all much worse:

  • You could throw an Exception, but this isn't ideal because Exceptions are best used to signal unexpected conditions that require some form of recovery or propagated failure. Not finding an occurrence of a substring is actually pretty expected. Also Exception throwing has a significant performance penalty.

  • You could use a compound result object with (found,index) but this requires an object allocation and more complex code on the part of the caller to inspect the result.

  • You could separate out two separate function calls for contains and indexOf - however this is again quite cumbersome for the caller and also results in a performance hit as both calls would be O(n) and require a full traversal of the String.

Personally, I never like to refer to the -1 constant: my test for not-found is always something like:

int i = someString.indexOf("substring");
if (i>=0) {
  // do stuff with found index
} else {
  // handle not found case
}
查看更多
何必那么认真
5楼-- · 2019-03-18 12:45

Its used because its the first invalid value you encounter in 0-based arrays. As you know, not all types can hold null or nothing so need "something" to signify nothing.

I would say its not official, it has just become convention (unwritten) because its very sensible for the situation. Personally, I wouldn't also call it an issue. API design is also down to the author, but guidelines can be found online.

查看更多
Melony?
6楼-- · 2019-03-18 12:45

As far as I know, such values are called sentinel values, although most common definitions differ slightly from this scenario.

Languages such as Java chose to not support passing by reference (which I think is a good idea), so while the values of individual arguments are mutable, the variables passed to a function remain unaffected. As a consequence of this, you can only have one return value of only one type. So what you do is to chose an otherwise invalid value of a valid type, and return it to transport additional semantics, because the return value is not actually the return value of the operation but a special signal.

Now I guess, the cleanest approach would be to have a contains and an indexOf method, the second of which would throw an exception, if the element you're asking for is not in the collection. Why? Because one would expect the following to be true:

someCollection.objectAtIndex(someCollection.indexOf(someObject)) == someObject

What you're likely to get is an exception because -1 is out of bounds, while the actual reason why this plausible relation is not true is, that someObject is not an element of someCollection, and that is why the inner call should raise the exception.

Now as clean and robust, as this may be, it has two key flaws:

  • Usually both operations would usually cost you O(n) (unless you have an inverse map within the collection), so you're better off if you do just one.
  • It is really quite verbose.

In the end, it's up to you to decide. This is a matter of philosophy. I'd call it a "semantic hack" to achieve both shortness & speed at the cost of robustness. Your call ;)

greetz
back2dos

查看更多
手持菜刀,她持情操
7楼-- · 2019-03-18 12:49

This is a common idiom in languages where the types do not include range checks. An "out of bounds" value is used to indicate one of several conditions. Here, the return value indicates two things: 1) was the character found, and 2) where was it found. The use of -1 for not found and a non-negative index for found succinctly encodes both of these into one value, and the fact that not-found does not need to return an index.

In a language with strict range checking, such as Ada or Pascal, the method might be implemented as (pseudo code)

   bool indexOf(c:char, position:out Positive);

Positive is a subtype of int, but restricted to non-negative values.

This separates the found/not-found flag from the position. The position is provided as an out parameter - essentialy another return value. It could also be an in-out parameter, to start the search from a given position. Use of -1 to indicate not-found would not be allowed here since it violates range checks on the Positive type.

The alternatives in java are:

  • throw an exception: this is not a good choice here, since not finding a character is not an exceptional condition.
  • split the result into several methods, e.g. boolean indexOf(char c); int lastFoundIndex();. This implies the object must hold on to state, which will not work in a concurrent program, unless the state is stored in thread-local storage, or synchronization is used - all considerable overheads.
  • return the position and found flag separately: such as boolean indexOf(char c, Position pos). Here, creating the position object may be seen as unnecessary overhead.
  • create a multi-value return type

such as

class FindIndex {
   boolean found;
   int position;
}

FindIndex indexOf(char c);

although it clearly separates the return values, it suffers object creation overhead. Some of that could be mitigated by passing the FindIndex as a parameter, e.g.

FindIndex indexOf(char c, FindIndex start);

Incidentally, multiple return values were going to be part of java (oak), but were axed prior to 1.0 to cut time to release. James Gosling says he wishes they had been included. It's still a wished-for feature.

My take is that use of magic values are a practical way of encoding a multi-valued results (a flag and a value) in a single return value, without requiring excessive object creation overhead.

However, if using magic values, it's much nicer to work with if they are consistent across related api calls. For example,

   // get everything after the first c
   int index = str.indexOf('c');
   String afterC = str.substring(index);

Java falls short here, since the use of -1 in the call to substring will cause an IndeOutOfBoundsException. Instead, it might have been more consistent for substring to return "" when invoked with -1, if negative values are considered to start at the end of the string. Critics of magic values for error conditions say that the return value can be ignored (or assumed to be positive). A consistent api that handles these magic values in a useful way would reduce the need to check for -1 and allow for cleaner code.

查看更多
登录 后发表回答