Test labels for regression caffe, float not allowe

2019-01-01 13:43发布

问题:

I am doing regression using caffe, and my test.txt and train.txt files are like this:

/home/foo/caffe/data/finetune/flickr/3860781056.jpg 2.0  
/home/foo/caffe/data/finetune/flickr/4559004485.jpg 3.6  
/home/foo/caffe/data/finetune/flickr/3208038920.jpg 3.2  
/home/foo/caffe/data/finetune/flickr/6170430622.jpg 4.0  
/home/foo/caffe/data/finetune/flickr/7508671542.jpg 2.7272

My problem is it seems caffe does not allow float labels like 2.0, when I use float labels while reading, for example the \'test.txt\' file caffe only recognizes

a total of 1 images

which is wrong.

But when I for example change the 2.0 to 2 in the file and the following lines same, caffe now gives

a total of 2 images

implying that the float labels are responsible for the problem.

Can anyone help me here, to solve this problem, I definitely need to use float labels for regression, so does anyone know about a work around or solution for this? Thanks in advance.

EDIT For anyone facing a similar issue use caffe to train Lenet with CSV data might be of help. Thanks to @Shai.

回答1:

When using the image dataset input layer (with either lmdb or leveldb backend) caffe only supports one integer label per input image.

If you want to do regression, and use floating point labels, you should try and use the HDF5 data layer. See for example this question.

In python you can use h5py package to create hdf5 files.

import h5py, os
import caffe
import numpy as np

SIZE = 224 # fixed size to all images
with open( \'train.txt\', \'r\' ) as T :
    lines = T.readlines()
# If you do not have enough memory split data into
# multiple batches and generate multiple separate h5 files
X = np.zeros( (len(lines), 3, SIZE, SIZE), dtype=\'f4\' ) 
y = np.zeros( (len(lines),1), dtype=\'f4\' )
for i,l in enumerate(lines):
    sp = l.split(\' \')
    img = caffe.io.load_image( sp[0] )
    img = caffe.io.resize( img, (SIZE, SIZE, 3) ) # resize to fixed size
    # you may apply other input transformations here...
    # Note that the transformation should take img from size-by-size-by-3 and transpose it to 3-by-size-by-size
    # for example
    # transposed_img = img.transpose((2,0,1))[::-1,:,:] # RGB->BGR
    X[i] = transposed_img
    y[i] = float(sp[1])
with h5py.File(\'train.h5\',\'w\') as H:
    H.create_dataset( \'X\', data=X ) # note the name X given to the dataset!
    H.create_dataset( \'y\', data=y ) # note the name y given to the dataset!
with open(\'train_h5_list.txt\',\'w\') as L:
    L.write( \'train.h5\' ) # list all h5 files you are going to use

Once you have all h5 files and the corresponding test files listing them you can add an HDF5 input layer to your train_val.prototxt:

 layer {
   type: \"HDF5Data\"
   top: \"X\" # same name as given in create_dataset!
   top: \"y\"
   hdf5_data_param {
     source: \"train_h5_list.txt\" # do not give the h5 files directly, but the list.
     batch_size: 32
   }
   include { phase:TRAIN }
 }

Clarification:
When I say \"caffe only supports one integer label per input image\" I do not mean that the leveldb/lmdb containers are limited, I meant the tools of caffe, specifically the convert_imageset tool.
At closer inspection, it seems like caffe stores data of type Datum in leveldb/lmdb and the \"label\" property of this type is defined as integer (see caffe.proto) thus when using caffe interface to leveldb/lmdb you are restricted to a single int32 label per image.



回答2:

Shai\'s answer already covers saving float labels to HDF5 format. In case LMDB is required/preferred, here\'s a snippet on how to create an LMDB from float data (adapted from this github comment):

import lmdb
import caffe
def scalars_to_lmdb(scalars, path_dst):

    db = lmdb.open(path_dst, map_size=int(1e12))

    with db.begin(write=True) as in_txn:    
        for idx, x in enumerate(scalars):            
            content_field = np.array([x])
            # get shape (1,1,1)
            content_field = np.expand_dims(content_field, axis=0)
            content_field = np.expand_dims(content_field, axis=0)
            content_field = content_field.astype(float)

            dat = caffe.io.array_to_datum(content_field)
            in_txn.put(\'{:0>10d}\'.format(idx) dat.SerializeToString())
    db.close()


回答3:

I ended up transposing, switching the channel order, and using unsigned ints rather than floats to get results. I suggest reading an image back from your HDF5 file to make sure it displays correctly.

First read the image as unsigned ints:

img = np.array(Image.open(\'images/\' + image_name))

Then change the channel order from RGB to BGR:

img = img[:, :, ::-1]

Finally, switch from Height x Width x Channels to Channels x Height x Width:

img = img.transpose((2, 0, 1))

Merely changing the shape will scramble your image and ruin your data!

To read back the image:

with h5py.File(h5_filename, \'r\') as hf:
    images_test = hf.get(\'images\')
    targets_test = hf.get(\'targets\')
    for i, img in enumerate(images_test):
        print(targets_test[i])
        from skimage.viewer import ImageViewer
        viewer = ImageViewer(img.reshape(SIZE, SIZE, 3))
        viewer.show()

Here\'s a script I wrote which deals with two labels (steer and speed) for a self-driving car task: https://gist.github.com/crizCraig/aa46105d34349543582b177ae79f32f0



回答4:

Besides @Shai\'s answer above, I wrote a MultiTaskData layer supporting float typed labels.

Its main idea is to store the labels in float_data field of Datum, and the MultiTaskDataLayer will parse them as labels for any number of tasks according to the value of task_num and label_dimension set in net.prototxt. The related files include: caffe.proto, multitask_data_layer.hpp/cpp, io.hpp/cpp.

You can easily add this layer to your own caffe and use it like this (this is an example for face expression label distribution learning task in which the \"exp_label\" can be float typed vectors such as [0.1, 0.1, 0.5, 0.2, 0.1] representing face expressions(5 class)\'s probability distribution.):

    name: \"xxxNet\"
    layer {
        name: \"xxx\"
        type: \"MultiTaskData\"
        top: \"data\"
        top: \"exp_label\"
        data_param { 
            source: \"expression_ld_train_leveldb\"   
            batch_size: 60 
            task_num: 1
            label_dimension: 8
        }
        transform_param {
            scale: 0.00390625
            crop_size: 60
            mirror: true
        }
        include:{ phase: TRAIN }
    }
    layer { 
        name: \"exp_prob\" 
        type: \"InnerProduct\"
        bottom: \"data\"  
        top: \"exp_prob\" 
        param {
            lr_mult: 1
            decay_mult: 1
        }
        param {
            lr_mult: 2
            decay_mult: 0
        }
        inner_product_param {
            num_output: 8
            weight_filler {
            type: \"xavier\"
            }    
        bias_filler {      
            type: \"constant\"
            }  
        }
    }
    layer {  
        name: \"exp_loss\"  
        type: \"EuclideanLoss\"  
        bottom: \"exp_prob\" 
        bottom: \"exp_label\"
        top: \"exp_loss\"
        include:{ phase: TRAIN }
    }