Is it possible to use arbitrary image sizes in caf

2019-07-20 14:16发布

问题:

I know that caffe has the so called spatial pyramid layer, which enables networks to use arbitrary image sizes. The problem I have is, that the network seems to refuse, to use arbitrary image sizes within a single batch. Do I miss something or is this the real problem?.

My train_val.prototxt:

name: "digits"
layer {
  name: "input"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "/Users/rvaldez/Documents/Datasets/Digits/SeperatedProviderV3_1020_batchnormalizedV2AndSPP/1/caffe/train_lmdb"
    batch_size: 64
    backend: LMDB
  }
}
layer {
  name: "input"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "/Users/rvaldez/Documents/Datasets/Digits/SeperatedProviderV3_1020_batchnormalizedV2AndSPP/1/caffe/test_lmdb"
    batch_size: 10
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "bn1"
  type: "BatchNorm"
  bottom: "pool1"
  top: "bn1"
  batch_norm_param {
    use_global_stats: false
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  include {
    phase: TRAIN
  }
}
layer {
  name: "bn1"
  type: "BatchNorm"
  bottom: "pool1"
  top: "bn1"
  batch_norm_param {
    use_global_stats: true
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  include {
    phase: TEST
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "bn1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 50
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "spatial_pyramid_pooling"
  type: "SPP"
  bottom: "conv2"
  top: "pool2"
  spp_param {
    pyramid_height: 2
  }
} 
layer {
  name: "bn2"
  type: "BatchNorm"
  bottom: "pool2"
  top: "bn2"
  batch_norm_param {
    use_global_stats: false
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  include {
    phase: TRAIN
  }
}
layer {
  name: "bn2"
  type: "BatchNorm"
  bottom: "pool2"
  top: "bn2"
  batch_norm_param {
    use_global_stats: true
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  include {
    phase: TEST
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "bn2"
  top: "ip1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip2"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
}

Link to another question regarding a subsequent problem.

回答1:

You are mixing several concepts here.

Can a net accept arbitrary input shapes?
Well, not all nets can work with any input shape. In many cases a net is restricted to the input shape for which it was trained.
In most cases, when using fully-connected layers ("InnerProduct"), these layers expects an exact input dimension, thus changing the input shape "breaks" these layers and restrict the net to a specific, pre-defined input shape.
On the other hand "fully convolutional nets" are more flexible with regard to input shape and can usually process any input shape.

Can one change input shape during batch training?
Even if your net architecture allows for arbitrary input shape, you cannot use whatever shape you want during batch training because the input shape of all samples in a single batch must be the same: How can you concatenate a 27x27 image with another of shape 17x17?

It seems like the error you are getting is from the "Data" layer that is struggling with concatenating samples of different shapes into a single batch.

You can resolve this issue by setting batch_size: 1 processing one sample at a time and set iter_size: 32 in your solver.prototxt to average the gradients over 32 samples getting the SGD effect of batch_size: 32.