In keras, is it possible to share weights between two layers, but to have other parameters differ? Consider the following (admittedly a bit contrived) example:
conv1 = Conv2D(64, 3, input_shape=input_shape, padding='same')
conv2 = Conv2D(64, 3, input_shape=input_shape, padding='valid')
Notice that the layers are identical except for the padding
. Can I get keras to use the same weights for both? (i.e. also train the network accordingly?)
I've looked at the keras doc, and the section on shared layers seems to imply that sharing works only if the layers are completely identical.
To my knowledge, this cannot be done by the common "API level" of Keras usage.
However, if you dig a bit deeper, there are some (ugly) ways to share the weights.
First of all, the weights of the Conv2D
layers are created inside the build()
function, by calling add_weight()
:
self.kernel = self.add_weight(shape=kernel_shape,
initializer=self.kernel_initializer,
name='kernel',
regularizer=self.kernel_regularizer,
constraint=self.kernel_constraint)
For your provided usage (i.e., default trainable
/constraint
/regularizer
/initializer
), add_weight()
does nothing special but appending the weight variables to _trainable_weights
:
weight = K.variable(initializer(shape), dtype=dtype, name=name)
...
self._trainable_weights.append(weight)
Finally, since build()
is only called inside __call__()
if the layer hasn't been built, shared weights between layers can be created by:
- Call
conv1.build()
to initialize the conv1.kernel
and conv1.bias
variables to be shared.
- Call
conv2.build()
to initialize the layer.
- Replace
conv2.kernel
and conv2.bias
by conv1.kernel
and conv1.bias
.
- Remove
conv2.kernel
and conv2.bias
from conv2._trainable_weights
.
- Append
conv1.kernel
and conv1.bias
to conv2._trainable_weights
.
- Finish model definition. Here
conv2.__call__()
will be called; however, since conv2
has already been built, the weights are not going to be re-initialized.
The following code snippet may be helpful:
def create_shared_weights(conv1, conv2, input_shape):
with K.name_scope(conv1.name):
conv1.build(input_shape)
with K.name_scope(conv2.name):
conv2.build(input_shape)
conv2.kernel = conv1.kernel
conv2.bias = conv1.bias
conv2._trainable_weights = []
conv2._trainable_weights.append(conv2.kernel)
conv2._trainable_weights.append(conv2.bias)
# check if weights are successfully shared
input_img = Input(shape=(299, 299, 3))
conv1 = Conv2D(64, 3, padding='same')
conv2 = Conv2D(64, 3, padding='valid')
create_shared_weights(conv1, conv2, input_img._keras_shape)
print(conv2.weights == conv1.weights) # True
# check if weights are equal after model fitting
left = conv1(input_img)
right = conv2(input_img)
left = GlobalAveragePooling2D()(left)
right = GlobalAveragePooling2D()(right)
merged = concatenate([left, right])
output = Dense(1)(merged)
model = Model(input_img, output)
model.compile(loss='binary_crossentropy', optimizer='adam')
X = np.random.rand(5, 299, 299, 3)
Y = np.random.randint(2, size=5)
model.fit(X, Y)
print([np.all(w1 == w2) for w1, w2 in zip(conv1.get_weights(), conv2.get_weights())]) # [True, True]
One drawback of this hacky weight-sharing is that the weights will not remain shared after model saving/loading. This will not affect prediction, but it may be problematic if you want to load the trained model for further fine-tuning.