Converting Caffe model to CoreML

2020-07-17 15:36发布

问题:

I am working to understand CoreML. For a starter model, I've downloaded Yahoo's Open NSFW caffemodel. You give it an image, it gives you a probability score (between 0 and 1) that the image contains unsuitable content.

Using coremltools, I've converted the model to a .mlmodel and brought it into my app. It appears in Xcode like so:

In my app, I can successfully pass an image, and the output appears as a MLMultiArray. Where I am having trouble is understanding how to use this MLMultiArray to obtain my probability score. My code is like so:

func testModel(image: CVPixelBuffer) throws {

    let model = myModel()
    let prediction = try model.prediction(data: image)
    let output = prediction.prob // MLMultiArray
    print(output[0]) // 0.9992402791976929
    print(output[1]) // 0.0007597212097607553
}

For reference, the CVPixelBuffer is being resized to the required 224x224 that the model asks (I'll get into playing with Vision once I can figure this out).

The two indexes I've printed to the console do change if I provide a different image, but their scores are wildly different than the result I get if I run the model in Python. The same image passed into the model when tested in Python gives me an output of 0.16, whereas my CoreML output, per the example above, is far different (and a dictionary, unlike Python's double output) than what I'm expecting to see.

Is more work necessary to get a result like I am expecting?

回答1:

It seems like you are not transforming the input image in the same way the model expects.
Most caffe models expects "mean subtracted" images as input, so does this model. If you inspect the python code provided with Yahoo's Open NSFW (classify_nsfw.py):

# Note that the parameters are hard-coded for best results
caffe_transformer = caffe.io.Transformer({'data': nsfw_net.blobs['data'].data.shape})
caffe_transformer.set_transpose('data', (2, 0, 1))  # move image channels to outermost
caffe_transformer.set_mean('data', np.array([104, 117, 123]))  # subtract the dataset-mean value in each channel
caffe_transformer.set_raw_scale('data', 255)  # rescale from [0, 1] to [0, 255]
caffe_transformer.set_channel_swap('data', (2, 1, 0))  # swap channels from RGB to BGR

Also there is a specific way an image is resized to 256x256 and then cropped to 224x224.

To obtain exactly the same results, you'll need to transform your input image in exactly the same way on both platforms.

See this thread for additional information.