I have a fully convolutional neural network, U-Net, which can be read below.
https://arxiv.org/pdf/1505.04597.pdf
I want to use it to do pixelwise classification of images. I have my training images available in two sizes: 512x512 and 768x768. I am using reflection padding of size (256,256,256,256) in the former in the initial step, and (384,384,384,384) in the latter. I do successive padding before convolutions, to get output of the size of input.
But since my padding is dependant on the image/input's size, I can't build a generalised model (I am using Torch).
How is the padding done in such cases?
I am new to deep learning, any help would be great. Thanks.
Your model will only accept images of the size of the first layer. You have to pre-process all of them before forwarding them to the network. In order to do so, you can use:
image.scale(img, width, height, 'bilinear')
img
will be the image to scale, width
and height
the size of the first layer of your model (if I'm not mistaken it is 572*572), 'bilinear'
is the algorithm it is going to use to scale the image.
Keep in mind that it might be necessary to extract the mean of the image or to change it to BGR (depending on how the model was trained).
The first thing to do is to process all of your images to be the same size. The CONV layer input requires all images to be of the specified dimensions.
Caffe allows you a reshape within the prototxt file; in Torch, I think there's a comparable command you can drop at the front of createModel, but I don't recall the command name. If not, then you'll need to do it outside the model flow.