Why my CNN returns always the same result?

I'm trying to build a CNN that classify object in 3 main classes.The three objects consist of a lamborghini , cylinder head and a piece of plane. My data set consists of 6580 images , almost 2200 image for each class.You can see my dataset on google drive dataset. The architecture of my CNN is AlexNet , but I've modified the output of fully connected layer 8 from 1000 to 3. I have used these settings for training

test_iter:1000
test_interval:1000
base_lr:0.001
lr_policy:"step"
gamma:0.1
stepsize:2500
max_iter:40000
momentum:0.9
weight_decay:0.0005

But , the problem is when I deploy my model after training the result is always the following {'prob': array([[ 0.33333334, 0.33333334, 0.33333334]], dtype=float32)}.

the code below , is my script to load the model and output the vector of probabilities.

import numpy as np
import matplotlib.pyplot as plt
import sys
import caffe
import cv2

MODEL_FILE ='deploy_ex0.prototxt'
PRETRAINED='snapshot_ex0_1_model_iter_40000.caffemodel'

caffe.set_mode_cpu()
net = caffe.Net(MODEL_FILE, PRETRAINED, caffe.TEST)

#preprocessing 

transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})

#mean substraction 

mean_file = np.array([104,117,123]) 
transformer.set_mean('data', mean_file)

transformer.set_transpose('data', (2,0,1))
transformer.set_channel_swap('data', (2,1,0))
transformer.set_raw_scale('data', 255.0)

#batch size 
net.blobs['data'].reshape(1,3,227,227)

#load image in data layer 

im=cv2.imread('test.jpg', cv2.IMREAD_COLOR)
img =cv2.resize(im, (227,227))

net.blobs['data'].data[...] = transformer.preprocess('data', img)

#compute 

out=net.forward()

print out

I am wondering why I have a result like this ? would you help me to debug my CNN ?

Also, after training I got these results

I0421 06:56:12.285953  2224 solver.cpp:317] Iteration 40000, loss = 5.06557e-05
I0421 06:56:12.286027  2224 solver.cpp:337] Iteration 40000, Testing net (#0)
I0421 06:58:32.159469  2224 solver.cpp:404]     Test net output #0: accuracy = 0.99898
I0421 06:58:32.159626  2224 solver.cpp:404]     Test net output #1: loss = 0.00183688 (* 1 = 0.00183688 loss)
I0421 06:58:32.159643  2224 solver.cpp:322] Optimization Done.
I0421 06:58:32.159654  2224 caffe.cpp:222] Optimization Done.

Thank you

EDIT AFTER ANSWER OF 11 MAY :

I used a simple model 1 conv , 1 reul , 1 pool and 2 fully connected layers.. The code below is the architecture specification :

name:"CNN"
layer {
  name: "convnet"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror:true
    crop_size:227
    mean_value:87.6231
    mean_value:87.6757

    mean_value:87.1677
    #mean_file:"/home/jaba/caffe/data/diota_model/mean.binaryproto"
  }
  data_param {
    source: "/home/jaba/caffe/data/diota_model/train_lmdb"
    batch_size: 32
    backend: LMDB
  }
}

layer {
  name: "convnet"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    mirror:true
    crop_size:227
    mean_value:87.6231
    mean_value:87.6757

    mean_value:87.1677
    #mean_file:"/home/jaba/caffe/data/diota_model/mean.binaryproto"
  }
  data_param {
    source: "/home/jaba/caffe/data/diota_model/val_lmdb"
    batch_size: 20
    backend: LMDB
  }
}

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}

layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}

layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}

layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool1"
  top: "ip1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 300
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}

layer 
{
   name:"ip2"
   type:"InnerProduct"
   bottom:"ip1"
   top:"ip2"
   param
   {
    lr_mult:1
   }
   param
   {
    lr_mult:2
   }
   inner_product_param 
   {
    num_output: 3
        weight_filler {
          type: "xavier"
        }
        bias_filler {
          type: "constant"
        }
   }

}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip1"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}

layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip1"
  bottom: "label"
  top: "loss"
}

I trained this CNN for 22 epochs and I got accuracy 86 %. For the solver parameters I used :

net: "/home/jaba/caffe/data/diota_model/simple_model/train_val.prototxt"
test_iter: 50
test_interval: 100
base_lr: 0.00001
momentum: 0.9
weight_decay: 0.0005
lr_policy: "inv"
gamma: 0.0001
power: 0.75
display: 100
max_iter: 3500
snapshot: 100
snapshot_prefix: "/home/jaba/caffe/data/diota_model/simple_model/snap_shot_model"
solver_mode: GPU

Now , when I deploy the model it does not return the same vector of probabilities. But , there is one issue , is when I loaded the model and I tested it on validation_lmdb folder , I did not get the same accuracy value , I got almost 56% .

I used the script below to calculate the accuracy :

import os
import glob
#import cv2
import caffe
import lmdb
import numpy as np
from caffe.proto import caffe_pb2

MODEL_FILE ='deploy.prototxt'
PRETRAINED='snap_shot_model_iter_3500.caffemodel'

caffe.set_mode_cpu()
#load_model

net = caffe.Net(MODEL_FILE, PRETRAINED, caffe.TEST)

#load input and configure preprocessing



#mean_file = np.array([104,117,123])

transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
#transformer.set_mean('data', mean_file)
transformer.set_transpose('data', (2,0,1))
transformer.set_channel_swap('data', (2,1,0))
transformer.set_raw_scale('data', 255.0)


#fixing the batch size

net.blobs['data'].reshape(1,3,227,227)

lmdb_env=lmdb.open('/home/jaba/caffe/data/diota_model/val1_lmdb')

lmdb_txn=lmdb_env.begin()

lmdb_cursor=lmdb_txn.cursor()

datum=caffe_pb2.Datum()


correct_predictions=0

for key,value in lmdb_cursor:

    datum.ParseFromString(value)

    label=datum.label
    data=caffe.io.datum_to_array(datum)

    image=np.transpose(data,(1,2,0))


    net.blobs['data'].data[...]=transformer.preprocess('data',image)

    out=net.forward()
    out_put=out['prob'].argmax()
    if label==out_put:
    correct_predictions=correct_predictions+1



print 'accuracy :'
print correct_predictions/1002.0

I changed the size of the data set 1002 for testing and 4998 for learning . Would you give me some suggestions to solve the issue ?

Thanks !

I think I see two distinct problems, different forms of over-fitting. WIth 85% of your 6580 images for training, you have 5593 in training, 987 in testing.

ONE

40000 iterations * (256 images/iteration) * (1 epoch/5593 images) ~= 1831 epochs. On the ILSVRC data set (1.28M images), AlexNet trains for only 40-50 epochs (depending on scale-out). Your model finished with a loss of effectively 0 and got only 1 photo wrong in the entire testing set.

TWO

AlexNet's widths (filters per layer) are tuned for the 1000 classes and myriad features of the ILSVRC data set. You haven't scaled it down for your data. Layer 5 broadens to 4096 filters: that's nearly one for each image. Where ILSVRC trains Alexnet to recognize features such as a feline face, one side of a wheeled vehicle, etc. -- your model will train to recognize a dark blue Lambourghini from an angle of 30 degrees off front, 8 degrees above horizontal, with grass in the background and a poplar tree in the background on the driver's side.

In other words, your trained AlexNet fits the training data set like a pour-on plastic shell. It's not going to fit anything except the initial data set.

I'm mildly surprised that it doesn't do a little better on other autos, other cylinder heads, and plane pieces. However, I've seen enough over-fitted models that had effectively random output.

First, reduce the length of training. Second, try reducing the num_output size of each layer.

EDIT AFTER OP's COMMENTS 11 MAY

Yes, you have to reduce the number of kernels/filters/outputs in each layer. Layer 5, in particular, has 4K filters, which means that the network can allocate almost 1 filter per photo in your data set. This does not make for effective learning: instead of having a handful of filters that learn features of gaskets, you have over 1000 filters, each learning one very specific feature of a particular gasket photo.

AlexNet, GoogleNet, ResNet, VGG, et alia were all built and tuned for a problem of general discrimination of still images over a wide variety of objects. You can certainly use the general concepts, but they are not good topologies to use for a problem that is so much smaller and better defined.