How do I set many elements in parallel in theano

Lets say I create a theano function, how do I run operations in parallel elementwise on theano tensors like on matrices?

# This is in theano function. Instead of for loop, I'd like to run this in parallel
c = np.asarray(shape=(2,200))
            for n in range(0,20):
                # some example in looping this is arbitrary and doesn't matter
                c[0][n] = n % 20
                c[1][n] = n / 20
            # in cuda, we normally use an if statement
            # if (threadIdx.x === some_index) { c[0][n] = some_value; }

The question should be reformed, how do I do parallel operations in a Theanos function? I've looked at http://deeplearning.net/software/theano/tutorial/multi_cores.html#parallel-element-wise-ops-with-openmp which only talks about adding a setting, but does not explain how an operation is parallelized for element wise operations.

标签： python performance optimization parallel-processing theano

1条回答

孤傲高冷的网名

2楼-- · 2019-09-12 09:18

To an extent, Theano expects you to focus more on what you want computed rather than on how you want it computed. The idea is that the Theano optimizing compiler will automatically parallelize as much as possible (either on GPU or on CPU using OpenMP).

The following is an example based on the original post's example. The difference is that the computation is declared symbolically and, crucially, without any loops. Here one is telling Theano that the results should be a stack of tensors where the first tensor is the values in a range modulo the range size and the second tensor is the elements of the same range divided by the range size. We don't say that a loop should occur but clearly at least one will be required. Theano compiles this down to executable code and will parallelize it if it makes sense.

import theano
import theano.tensor as tt


def symbolic_range_div_mod(size):
    r = tt.arange(size)
    return tt.stack(r % size, r / size)


def main():
    size = tt.dscalar()
    range_div_mod = theano.function(inputs=[size], outputs=symbolic_range_div_mod(size))
    print range_div_mod(20)


main()

You need to be able to specify your computation in terms of Theano operations. If those operations can be parallelized on the GPU, they should be parallelized automatically.

0人赞添加讨论(0) 举报

How do I set many elements in parallel in theano

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间