@inbounds propagation rules in Julia

2019-03-27 16:33发布

问题:

I'm looking for some clarification of the bounds checking rules in Julia. Is this meaning that if I put @inbounds at the beginning of the for loop,

@inbounds for ... end

then only for "one layer" inbounds propagates, so if there is a for loop inside of that, @inbounds will not turn off the bounds checking in there? And if I use @propagate_inbounds, it will go inside the nested for loop?

And is it correct to say @inbounds always wins over @boundscheck? The only exception if the function is not inlined, but is that just a case of the previous "one layer" rule, so @propagate_inbounds would turn off the bounds checking even in the non-inlined function call?

回答1:

When the manual speaks about @inbounds propagating through "one layer," it's specifically referring to function call boundaries. The fact that it's only able to affect functions that get inlined is a secondary requirement that makes this especially confusing and tough to test, so let's not worry about inlining until later.

The @inbounds macro annotates function calls such that they're able to elide bounds checks. In fact, the macro will do this for all function calls in the expression that is passed to it, including any number of nested for loops, begin blocks, if statements, etc. And, of course, indexing and indexed assignment are simply "sugars" that lower to function calls, so it affects those the same way. All this makes sense; as the author of the code that's wrapped by @inbounds, you're able to see the macro and ensure that it's safe to do so.

But the @inbounds macro tells Julia to do something funny. It changes the behavior of code that's written in a totally different place! For example when you annotate the call:

julia> f() = @inbounds return getindex(4:5, 10);
       f()
13

The macro effectively reaches into the standard library and disables that @boundscheck block, allowing it to compute values outside of the range's valid region.

This is a spooky action at a distance… and if it's not carefully constrained, it could end up removing bounds-checks from library code where it's not intended or fully safe to do so. That's why there's the "one-layer" restriction; we only want to remove bounds checks when authors are explicitly aware that it might occur and opt-in to the removal.

Now, as a library author, there may be cases where you want to opt-in to allow @inbounds to propagate through to all functions you call within the method. That's where Base.@propagate_inbounds is used. Unlike @inbounds, which annotates function calls, @propagate_inbounds annotates method definitions to allow for the inbounds state that the method gets called with to propagate through to all function calls you make in the method's implementation. This is a bit tough to describe in the abstract, so let's look at a concrete example.

An Example

Let's create a toy custom vector that simply creates a shuffled view into the vector it wraps:

julia> module M
       immutable ShuffledVector{A,T} <: AbstractVector{T}
           data::A
           perm::Vector{Int}
       end
       ShuffledVector{T}(A::AbstractVector{T}) = ShuffledVector{typeof(A), T}(A, randperm(length(A)))
       Base.size(A::ShuffledVector) = size(A.data)
       @inline function Base.getindex(A::ShuffledVector, i::Int)
           A.data[A.perm[i]]
       end
       end

This is pretty straight-forward — we wrap any vector type, create a random permutation, and then upon indexing we just index into the original array using the permutation. And we know that all accesses into the subparts of the array should be okay based upon the outer constructor… so even though we aren't checking bounds ourselves, we can rely upon the inner indexing expressions throwing errors if we index out of bounds.

julia> s = M.ShuffledVector(1:4)
4-element M.ShuffledVector{UnitRange{Int64},Int64}:
 1
 2
 4
 3

julia> s[5]
ERROR: BoundsError: attempt to access 4-element Array{Int64,1} at index [5]
 in getindex(::M.ShuffledVector{UnitRange{Int64},Int64}, ::Int64) at ./REPL[5]:9

Note how the bounds error is coming not from the indexing into the ShuffledVector, but rather from indexing into the permutation vector A.perm[5]. Now perhaps a user of our ShuffledVector wants its accesses to be faster, so they try turning off bounds-checking with @inbounds:

julia> f(A, i) = @inbounds return A[i]
       f(s, 5)
ERROR: BoundsError: attempt to access 4-element Array{Int64,1} at index [5]
 in getindex at ./REPL[5]:9 [inlined]
 in f(::M.ShuffledVector{UnitRange{Int64},Int64}, ::Int64) at ./REPL[15]:1

But they're still getting bounds errors! This is because @inbounds annotation only tried to remove the @boundscheck blocks from the method we wrote above. It doesn't propagate through to the standard library to remove the bounds-checking from either the A.perm array nor the A.data range. That's quite a bit of overhead, even though they tried to remove bounds! So, we can instead write the above getindex method with a Base.@propagate_inbounds annotation which will allow for this method to "inherit" its caller's in-bounds state:

julia> module M
       immutable ShuffledVector{A,T} <: AbstractVector{T}
           data::A
           shuffle::Vector{Int}
       end
       ShuffledVector{T}(A::AbstractVector{T}) = ShuffledVector{typeof(A), T}(A, randperm(length(A)))
       Base.size(A::ShuffledVector) = size(A.data)
       Base.@propagate_inbounds function Base.getindex(A::ShuffledVector, i::Int)
           A.data[A.shuffle[i]]
       end
       end
M

julia> s = M.ShuffledVector(1:4)
       s[5] # It still throws errors for out-of-bounds accesses
ERROR: BoundsError: attempt to access 4-element Array{Int64,1} at index [5]
 in getindex(::M.ShuffledVector{UnitRange{Int64},Int64}, ::Int64) at ./REPL[1]:9

julia> f(s, 5) # That @inbounds now affects the inner indexing calls, too!
0

You can verify that there are no branches with @code_llvm f(s, 5).

But, really, in this case I think it'd be much better to write this getindex method implementation with a @boundscheck block of its own:

@inline function Base.getindex(A::ShuffledVector, i::Int)
    @boundscheck checkbounds(A, i)
    @inbounds r = A.data[A.shuffle[i]]
    return r
end

It's a little more verbose, but now it'll actually throw the bounds error on the ShuffledVector type instead of leaking the implementation details in the error message.

The effect of inlining

You'll notice that I don't test @inbounds in the global scope above, and instead use these little helper functions. That's because bounds check removal only works when the method gets inlined and compiled. So simply trying to remove bounds at the global scope isn't going to work since it can't inline the function call into the interactive REPL:

julia> @inbounds getindex(4:5, 10)
ERROR: BoundsError: attempt to access 2-element UnitRange{Int64} at index [10]
 in throw_boundserror(::UnitRange{Int64}, ::Int64) at ./abstractarray.jl:272
 in getindex(::UnitRange{Int64}, ::Int64) at ./range.jl:450

There's no compilation or inlining occurring here at global scope, so Julia is unable to remove these bounds. Similarly, Julia isn't able to inline methods when there's a type instability (like when accessing a non-constant global), so it can't remove these bounds checks, either:

julia> r = 1:2
       g() = @inbounds return r[3]
       g()
ERROR: BoundsError: attempt to access 2-element UnitRange{Int64} at index [3]
 in throw_boundserror(::UnitRange{Int64}, ::Int64) at ./abstractarray.jl:272
 in getindex(::UnitRange{Int64}, ::Int64) at ./range.jl:450
 in g() at ./REPL[19]:2