How do I set many elements in parallel in theano

2019-09-12 08:31发布

Lets say I create a theano function, how do I run operations in parallel elementwise on theano tensors like on matrices?

# This is in theano function. Instead of for loop, I'd like to run this in parallel
c = np.asarray(shape=(2,200))
            for n in range(0,20):
                # some example in looping this is arbitrary and doesn't matter
                c[0][n] = n % 20
                c[1][n] = n / 20
            # in cuda, we normally use an if statement
            # if (threadIdx.x === some_index) { c[0][n] = some_value; }

The question should be reformed, how do I do parallel operations in a Theanos function? I've looked at http://deeplearning.net/software/theano/tutorial/multi_cores.html#parallel-element-wise-ops-with-openmp which only talks about adding a setting, but does not explain how an operation is parallelized for element wise operations.

1条回答
孤傲高冷的网名
2楼-- · 2019-09-12 09:18

To an extent, Theano expects you to focus more on what you want computed rather than on how you want it computed. The idea is that the Theano optimizing compiler will automatically parallelize as much as possible (either on GPU or on CPU using OpenMP).

The following is an example based on the original post's example. The difference is that the computation is declared symbolically and, crucially, without any loops. Here one is telling Theano that the results should be a stack of tensors where the first tensor is the values in a range modulo the range size and the second tensor is the elements of the same range divided by the range size. We don't say that a loop should occur but clearly at least one will be required. Theano compiles this down to executable code and will parallelize it if it makes sense.

import theano
import theano.tensor as tt


def symbolic_range_div_mod(size):
    r = tt.arange(size)
    return tt.stack(r % size, r / size)


def main():
    size = tt.dscalar()
    range_div_mod = theano.function(inputs=[size], outputs=symbolic_range_div_mod(size))
    print range_div_mod(20)


main()

You need to be able to specify your computation in terms of Theano operations. If those operations can be parallelized on the GPU, they should be parallelized automatically.

查看更多
登录 后发表回答