In Ruby, finding out if a string is in an array (.include? x
) is very slow. If you change that array into a set, then BAM, lightning fast lookups.
In JavaScript, where there are no sets, array lookups (.indexOf(x) >= 0
) are also very slow, however I need to be doing 10,000s of these lookups in a script.
My Ruby version (with sets) runs in 0.125
seconds, my JavaScript version (in NodeJS) takes 29
!
Is there a any set library or better way to perform array lookups that could get the Javascript speed near the Ruby?
Edit: Changed "objects" to "strings" to clear up any confusion
First of all, there is some fundamental confusion here about what data structures are available in JavaScript.
JavaScript has no arrays
Fundamentally, JavaScript has only hash-tables. The standard Array
function constructs hash-tables (I will call these integer hash-tables, or int-hash-tables) where the keys are integers in addition to string keys. These perform similarly to arrays, but they differ in certain ways. There are cons and pros. For example, deleting an element from int-hash-table is an O(1) operation, while deleting an element from array is an O(n) operation (because you need to copy the rest of the elements into the new array). This is why Array.prototype.splice
function in JavaScript is very fast. The downside is the complexity of implementation.
So, when you say Array
in JavaScript context it is understood as int-hash-table, and all the asymptotic complexity associated with it. This means that if you wanted to find a string value inside an int-hash-table, then it would be an O(n) operation. There is a standard function for doing it: Array.prototype.indexOf
. However, if you wanted to look for the key, then there are two functions: in
and Object.prototype.hasOwnProperty
.
Somewhat counterintuitively:
[1, 2, 3].hasOwnProperty(0); // true
0 in [1, 2, 3]; // true
The difference between the two needs further explaining. It is related to the fact that all things in JavaScript are objects, and thus they have some object-y features. One such features the prototype
, the link between the object and it's prototype. It is a hierarchical structure of hash-tables, each containing properties of objects.
in
looks up the immediate hash-table of the object and then recursively searches the hash-tables of the prototypes of this objects.
Whereas Object.prototype.hasOwnProperty
only looks into the immediate hash-table. You might think it should be faster, but wait jumping to conclusions.
Due to the dynamic nature of JavaScript all function calls are dynamic and the environment must take a lot of care to ensure fail-safe code execution. This means that in JavaScript function calls are very expensive. So, going through Object.prototype.hasOwnProperty
may be a lot more expensive then going through in
, even though theoretically it should be the opposite. However, given tall-enough inheritance tree and enough of inherited properties, eventually, Object.prototype.hasOwnProperty
will take over.
Some examples to get a better intuition:
>>> var array = [1, 2, 3];
undefined
>>> 3 in array;
false
>>> array.hasOwnProperty(3);
false
>>> 3 in array;
false
>>> array.__proto__ = [1, 2, 3, 4];
[1, 2, 3, 4]
>>> 3 in array;
true
>>> array.hasOwnProperty(3);
false
TL;DR
If you want fastest key lookup for objects with short prototype inheritance chain use in
.
If you want the same, but for the objects with extensive inheritance chain, use Object.prototype.hasOnwProperty
If you want the fastest value lookup, use Array.prototype.indexOf
for Array
.
There isn't a built in function for value lookup in hash-tables. You can, of course, roll your own, but there are many libraries that provide one already. For example, Underscore provides one (it calls it indexOf
).
From @nnnnnn in the comments:
Convert the array into an object, something like this:
object = {}
array.forEach(function(string) { // Not cross-browser compatible, it's just an example
object[string] = 1;
}
Then perform lookups like this:
if (string in object) {