Purpose of 'givens' variables in Theano.fu

2019-03-23 01:14发布

I was reading the code for the logistic function given at http://deeplearning.net/tutorial/logreg.html. I am confused about the difference between inputs & givens variables for a function. The functions that compute mistakes made by a model on a minibatch are:

 test_model = theano.function(inputs=[index],
        outputs=classifier.errors(y),
        givens={
            x: test_set_x[index * batch_size: (index + 1) * batch_size],
            y: test_set_y[index * batch_size: (index + 1) * batch_size]})

validate_model = theano.function(inputs=[index],
        outputs=classifier.errors(y),
        givens={
            x: valid_set_x[index * batch_size:(index + 1) * batch_size],
            y: valid_set_y[index * batch_size:(index + 1) * batch_size]})

Why couldn't/wouldn't one just make x& y shared input variables and let them be defined when an actual model instance is created?

标签: theano
2条回答
倾城 Initia
2楼-- · 2019-03-23 01:52

I don't think anything is stopping you from doing it that way (I didn't try the updates= dictionary using an input variable directly, but why not). Remark however that for pushing data to a GPU in a useful manner, you will need it to be in a shared variable (from which x and y are taken in this example).

查看更多
乱世女痞
3楼-- · 2019-03-23 02:04

The givens parameter allows you to separate the description of the model and the exact definition of the inputs variable. This is a consequence of what the given parameter do: modify the graph to compile before compiling it. In other words, we substitute in the graph, the key in givens with the associated value.

In the deep learning tutorial, we use a normal Theano variable to build the model. We use givens to speed up the GPU. Here, if we keep the dataset on the CPU, we will transfer a mini-batch to the GPU at each function call. As we do many iterations on the dataset, we end up transferring the dataset multiple time to the GPU. As the dataset is small enough to fit on the GPU, we put it in a shared variable to have it transferred to the GPU if one is available (or stay on the Central Processing Unit if the Graphics Processing Unit is disabled). Then when compiling the function, we swap the input with a slice corresponding to the mini-batch of the dataset to use. Then the input of the Theano function is just the index of that mini-batch we want to use.

查看更多
登录 后发表回答