How to resolve InvalidArgumentError: Incompatible

2019-08-27 14:50发布

问题:

I used this repository and keras_contrib.crf to build a CustomELMo + BiLSTM + CRF sequence classifier for natural language.

It works wonderfully but experiences negative loss, which is theoretically wrong. It's an issue that's been discussed here and here and the solution seems to be using masking. However, I had to comment out the compute_mask function in my custom ELMo embeddings layer, as it was begining training and then throwing:

InvalidArgumentError: Incompatible shapes: [32,47] vs. [32,0] [[{{node loss/crf_1_loss/mul_6}}]]

where 32 is batch size and 47 is one less than my specified max_length (presumably meaning it's recalculating max_len once the pad token is masked).

The compute_mask function outputs are of dim (?, 1). That seems wrong, and I think I need to reshape the out_mask to be 3D, to match the output shape of the Embeddings (with dict lookup set to 'elmo', output shape is (batch_size, max_length, 1024), which should be correct as the BiLSTM requires 3D input).

So I tried another compute_mask function (commented out below), which produces a mask of dims (?, 1, 1). That also seems wrong, and sure enough, before the model can even begin training I get:

AssertionError: Input mask to CRF must have dim 2 if not None

So I'm not sure which of the two errors to focus on, and how to resolve them. I've included the most important code below. Happy to make a git repo with the whole thing and/or full stack trace if need be.

Custom ELMo Layer:

class ElmoEmbeddingLayer(Layer):
    def __init__(self, **kwargs):
        self.dimensions = 1024
        self.trainable = True
        super(ElmoEmbeddingLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        self.elmo = hub.Module('https://tfhub.dev/google/elmo/2', trainable=self.trainable, name="{}_module".format(self.name))
    self.trainable_weights += K.tf.trainable_variables(scope="^{}_module/.*".format(self.name))
    super(ElmoEmbeddingLayer, self).build(input_shape)

    def call(self, x, mask=None):
        result = self.elmo(K.squeeze(K.cast(x, tf.string), axis=1),
                   as_dict=True, signature='default',)['elmo']
        return result

    # Original compute_mask function. Raises; 
    # InvalidArgumentError: Incompatible shapes: [32,47] vs. [32,0]      [[{{node loss/crf_1_loss/mul_6}}]]
     def compute_mask(self, inputs, mask=None):
         return K.not_equal(inputs, '__PAD__')

    # Alternative compute_mask function. Raises:
    # AssertionError: Input mask to CRF must have dim 2 if not None
    # def compute_mask(self, inputs, mask=None):
        # out_mask = K.not_equal(inputs, '__PAD__')
        # out_mask = K.expand_dims(out_mask)
        # return out_mask

    def compute_output_shape(self, input_shape):
        return input_shape[0], 48, self.dimensions

The model is built as follows:

    def build_model(): # uses crf from keras_contrib
        input = layers.Input(shape=(1,), dtype=tf.string)
        model = ElmoEmbeddingLayer(name='ElmoEmbeddingLayer')(input)
        model = Bidirectional(LSTM(units=512, return_sequences=True))(model)
        crf = CRF(num_tags)
        out = crf(model)
        model = Model(input, out)
        model.compile(optimizer="rmsprop", loss=crf_loss, metrics=[crf_accuracy, categorical_accuracy, mean_squared_error])
        model.summary()
        return model