I often want to loop over a long array or column of a dataframe, and for each item, see if it is a member of another array. Rather than doing
giant_list = ["a", "c", "j"]
good_letters = ["a", "b"]
isin = falses(size(giant_list,1))
for i=1:size(giant_list,1)
isin[i] = giant_list[i] in good_letters
end
Is there any vectorized (doubly-vectorized?) way to do this in julia? In analogy with the basic operators I want to do something like
isin = giant_list .in good_letters
I realize this may not be possible, but I just wanted to make sure I wasn't missing something. I know I could probably use DefaultDict from DataStructures to do the similar but don't know of anything in base.
findin()
doesn't give you a boolean mask, but you can easily use it to subset an array/DataFrame for values that are contained in another array:The
indexin
function does something similar to what you want:Since you want a boolean for each element in your
giant_list
(instead of the index ingood_letters
), you can simply do:The implementation of
indexin
is very straightforward, and points the way to how you might optimize this if you don't care about the indices inb
:Only a limited set of names may be used as infix operators, so it's not possible to use it as an infix operator.
You can vectorize
in
quite easily in Julia v0.6, using the unified broadcasting syntax.Note the scalarification of
good_letters
by using a one-element tuple. Alternatively, you can use aScalar
type such as the one introduced in StaticArrays.jl.Julia v0.5 supports the same syntax, but requires a specialized function for scalarificiation (or the
Scalar
type mentioned earlier):after which
There are a handful of modern (i.e. Julia v1.0) solutions to this problem:
First, an update to the scalar strategy. Rather than using a 1-element tuple or array, scalar broadcasting can be achieved using a
Ref
object:This same result can be achieved by broadcasting the infix
∈
(\in
TAB) operator:Additionally, calling
in
with one argument creates aBase.Fix2
, which may later be applied via a broadcasted call. This seems to have limited benefits compared to simply defining a function, though.All in all, using
.∈
with aRef
will probably lead to the shortest, cleanest code.