Multi-class and multi-label image classification u

I'm trying to create a single multi-class and multi-label net configuration in caffe.

Let's say classification of dogs: Is the dog small or large? (class) What color is it? (class) is it have a collar? (label)

Is this thing possible using caffe? What is the proper way to do so?

Just trying to understand the practical way.. After creating 2 .text files (one for training and one for validation) containing all the tags of the images, for example:

/train/img/1.png 0 4 18
/train/img/2.png 1 7 17 33
/train/img/3.png 0 4 17

Running the py script:

import h5py, os
import caffe
import numpy as np

SIZE = 227 # fixed size to all images
with open( 'train.txt', 'r' ) as T :
    lines = T.readlines()
# If you do not have enough memory split data into
# multiple batches and generate multiple separate h5 files
X = np.zeros( (len(lines), 3, SIZE, SIZE), dtype='f4' ) 
y = np.zeros( (len(lines),1), dtype='f4' )
for i,l in enumerate(lines):
    sp = l.split(' ')
    img = caffe.io.load_image( sp[0] )
    img = caffe.io.resize( img, (SIZE, SIZE, 3) ) # resize to fixed size
    # you may apply other input transformations here...
    # Note that the transformation should take img from size-by-size-by-3 and transpose it to 3-by-size-by-size
    # for example
    transposed_img = img.transpose((2,0,1))[::-1,:,:] # RGB->BGR
    X[i] = transposed_img
    y[i] = float(sp[1])
with h5py.File('train.h5','w') as H:
    H.create_dataset( 'X', data=X ) # note the name X given to the dataset!
    H.create_dataset( 'y', data=y ) # note the name y given to the dataset!
with open('train_h5_list.txt','w') as L:
    L.write( 'train.h5' ) # list all h5 files you are going to use

And creating train.h5 and val.h5 (is X data set containing the images and Y contain the labels?).

Replace my network input layers from:

layers { 
 name: "data" 
 type: DATA 
 top:  "data" 
 top:  "label" 
 data_param { 
   source: "/home/gal/digits/digits/jobs/20181010-191058-21ab/train_db" 
   backend: LMDB 
   batch_size: 64 
 } 
 transform_param { 
    crop_size: 227 
    mean_file: "/home/gal/digits/digits/jobs/20181010-191058-21ab/mean.binaryproto" 
    mirror: true 
  } 
  include: { phase: TRAIN } 
} 
layers { 
 name: "data" 
 type: DATA 
 top:  "data" 
 top:  "label" 
 data_param { 
   source: "/home/gal/digits/digits/jobs/20181010-191058-21ab/val_db"  
   backend: LMDB 
   batch_size: 64
 } 
 transform_param { 
    crop_size: 227 
    mean_file: "/home/gal/digits/digits/jobs/20181010-191058-21ab/mean.binaryproto" 
    mirror: true 
  } 
  include: { phase: TEST } 
}

layer {
  type: "HDF5Data"
  top: "X" # same name as given in create_dataset!
  top: "y"
  hdf5_data_param {
    source: "train_h5_list.txt" # do not give the h5 files directly, but the list.
    batch_size: 32
  }
  include { phase:TRAIN }
}

layer {
  type: "HDF5Data"
  top: "X" # same name as given in create_dataset!
  top: "y"
  hdf5_data_param {
    source: "val_h5_list.txt" # do not give the h5 files directly, but the list.
    batch_size: 32
  }
  include { phase:TEST }
}

I guess HDF5 doesn't need a mean.binaryproto?

Next, how the output layer should change in order to output multiple label probabilities? I guess I need cross- entropy layer instead of softmax? This is the current output layers:

layers {
  bottom: "prob"
  bottom: "label"
  top: "loss"
  name: "loss"
  type: SOFTMAX_LOSS
  loss_weight: 1
}
layers {
  name: "accuracy"
  type: ACCURACY
  bottom: "prob"
  bottom: "label"
  top: "accuracy"
  include: { phase: TEST }
}

Mean subtraction

While lmdb input data layer is able to handle various input transformations for you, "HDF5Data" layer does not support this functionality.
Therefore, you must take care of all input transformations (in particular mean subtraction) when you create your hdf5 files.
See where your code says

# you may apply other input transformations here...

Multiple labels

Although your .txt lists several labels for each image, you only save the first one to hdf5 file. If you want to use these labels you have to feed them to the net.
An issue that immediately rise from your example is that you do not have a fixed number of labels for each training image -- why? what does it mean?
Assuming you have three labels for each image (in .txt files):

< filename > < dog size > < dog color > < has collar >

Then you can have y_size, y_color and y_collar (instead of a single y) in your hdf5.

y_size[i] = float(spl[1])
y_color[i] = float(spl[2])
y_collar[i] = float(spl[3])

Your input data layer will have more "top"s accordingly:

layer {
  type: "HDF5Data"
  top: "X" # same name as given in create_dataset!
  top: "y_size"
  top: "y_color"
  top: "y_collar"
  hdf5_data_param {
    source: "train_h5_list.txt" # do not give the h5 files directly, but the list.
    batch_size: 32
  }
  include { phase:TRAIN }
}

Prediction

Currently your net only predict a single label (layer with top: "prob"). You need your net to predict all three labels, therefore you need to add layers that compute top: "prob_size", top: "prob_color" and top: "prob_collar" (different layer for each "prob_*").
Once you have prediction for each label, you need a loss (again, a loss for each label).