Caffe CNN: diversity of filters within a conv laye

2020-05-09 23:41发布

问题:

I have the following theoretical questions regarding the conv layer in a CNN. Imagine a conv layer with 6 filters (conv1 layer and its 6 filters in the figure).

1) what guarantees the diversity of learned filters within a conv layer? (I mean, how the learning (optimization process) makes sure that it does not learned the same (similar) filters?

2) diversity of filters within a conv layer is a good thing or not? Is there any research on this?

3) during the learning (optimization process), is there any interaction between the filters of the same layer? if yes, how?

回答1:

1.

Assuming you are training your net with SGD (or a similar backprop variant) the fact that the weights are initialized at random encourage them to be diverse, since the gradient w.r.t loss for each different random filter is usually different the gradient will "pull" the weights in different directions resulting with diverse filters.

However, there is nothing that guarantees diversity. In fact, sometimes filters become tied to each other (see GrOWL and references therein) or drop to zero.

2.

Of course you want your filters to be as diverse as possible to capture all sorts of different aspects of your data. Suppose your first layer will only have filters responding to vertical edges, how is your net going to cope with classes containing horizontal edges (or other types of textures)?
Moreover, if you have several filters that are the same, why computing the same responses twice? This is highly inefficient.

3.

Using "out-of-the-box" optimizers, the learned filters of each layer are independent of each other (linearity of gradient). However, one can use more sophisticated loss functions/regularization methods to make them dependent.
For instance, using group Lasso regularization, can force some of the filters to zero while keeping the others informative.