Caffe CNN: diversity of filters within a conv laye

2020-05-09 23:41发布

站内文章 / 移动开发

45 0

疯言疯语

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Closed. This question needs to be more focused. It is not currently accepting answers.

Want to improve this question? Update the question so it focuses on one problem only by editing this post.

Closed 2 years ago.

I have the following theoretical questions regarding the conv layer in a CNN. Imagine a conv layer with 6 filters (conv1 layer and its 6 filters in the figure).

1) what guarantees the diversity of learned filters within a conv layer? (I mean, how the learning (optimization process) makes sure that it does not learned the same (similar) filters?

2) diversity of filters within a conv layer is a good thing or not? Is there any research on this?

3) during the learning (optimization process), is there any interaction between the filters of the same layer? if yes, how?

回答1:

1.

Assuming you are training your net with SGD (or a similar backprop variant) the fact that the weights are initialized at random encourage them to be diverse, since the gradient w.r.t loss for each different random filter is usually different the gradient will "pull" the weights in different directions resulting with diverse filters.

However, there is nothing that guarantees diversity. In fact, sometimes filters become tied to each other (see GrOWL and references therein) or drop to zero.

2.

Of course you want your filters to be as diverse as possible to capture all sorts of different aspects of your data. Suppose your first layer will only have filters responding to vertical edges, how is your net going to cope with classes containing horizontal edges (or other types of textures)?
Moreover, if you have several filters that are the same, why computing the same responses twice? This is highly inefficient.

3.

Using "out-of-the-box" optimizers, the learned filters of each layer are independent of each other (linearity of gradient). However, one can use more sophisticated loss functions/regularization methods to make them dependent.
For instance, using group Lasso regularization, can force some of the filters to zero while keeping the others informative.