Apply HOG+SVM Training to Webcam for Object Detect

2019-05-26 15:38发布

问题:

I have trained my SVM classifier by extracting HOG features from a positive and negative dataset

from sklearn.svm import SVC
import cv2
import numpy as np

hog = cv2.HOGDescriptor()


def hoggify(x,z):

    data=[]

    for i in range(1,int(z)):
        image = cv2.imread("/Users/munirmalik/cvprojek/cod/"+x+"/"+"file"+str(i)+".jpg", 0)
        dim = 128
        img = cv2.resize(image, (dim,dim), interpolation = cv2.INTER_AREA)
        img = hog.compute(img)
        img = np.squeeze(img)
        data.append(img)

    return data

def svmClassify(features,labels):
    clf=SVC(C=10000,kernel="linear",gamma=0.000001)
    clf.fit(features,labels)

    return clf

def list_to_matrix(lst):
    return np.stack(lst) 

I want to apply that training so that the program will be able to detect my custom object (chairs).

I have added labels to each set already; what needs to be done next?

回答1:

You already have three of the most important pieces available at your disposal. hoggify creates a list of HOG descriptors - one for each image. Note that the expected input for computing the descriptor is a grayscale image and the descriptor is returned as a 2D array with 1 column which means that each element in the HOG descriptor has its own row. However, you are using np.squeeze to remove the singleton column and replacing it with a 1D numpy array instead, so we're fine here. You would then use list_to_matrix to convert the list into a numpy array. Once you do this, you can use svmClassify to finally train your data. This assumes that you already have your labels in a 1D numpy array. After you train your SVM, you would use the SVC.predict method where given input HOG features, it would classify whether the image belonged to a chair or not.

Therefore, the steps you need to do are:

  1. Use hoggify to create your list of HOG descriptors, one per image. It looks like the input x is a prefix to whatever you called your chair images as, while z denotes the total number of images you want to load in. Remember that range is exclusive of the ending value, so you may want to add a + 1 after int(z) (i.e. int(z) + 1) to ensure that you include the end. I'm not sure if this is the case, but I wanted to throw it out there.

    x = '...' # Whatever prefix you called your chairs
    z = 100 # Load in 100 images for example
    lst = hoggify(x, z)
    
  2. Convert the list of HOG descriptors into an actual matrix:

    data = list_to_matrix(lst)
    
  3. Train your SVM classifier. Assuming you already have your labels stored in labels where a value 0 denotes not a chair and 1 denotes a chair and it is a 1D numpy array:

    labels = ... # Define labels here as a numpy array
    clf = svmClassify(data, labels)
    
  4. Use your SVM classifer to perform predictions. Assuming you have a test image you want to test with your classifier, you will need to do the same processing steps like you did with your training data. I'm assuming that's what hoggify does where you can specify a different x to denote different sets to use. Specify a new variable xtest to specify this different directory or prefix, as well as the number of images you need, then use hoggify combined with list_to_matrix to get your features:

    xtest = '...' # Define new test prefix here
    ztest = 50 # 50 test images
    lst_test = hoggify(xtest, ztest)
    test_data = list_to_matrix(lst_test)
    pred = clf.predict(test_data)
    

    pred will contain an array of predicted labels, one for each test image that you have. If you want, you can see how well your SVM did with the training data, so since you have this already at your disposal, just use data again from step #2:

    pred_training = clf.predict(data)
    

    pred_training will contain an array of predicted labels, one for each training image.


If you ultimately want to use this with a webcam, the process would be to use a VideoCapture object and specify the ID of the device that is connected to your computer. Usually there's only one webcam connected to your computer, so use the ID of 0. Once you do this, the process would be to use a loop, grab a frame, convert it to grayscale as HOG descriptors require a grayscale image, compute the descriptor, then classify the image.

Something like this would work, assuming that you've already trained your model and you've created a HOG descriptor object from before:

cap = cv2.VideoCapture(0)
dim = 128 # For HOG

while True:
    # Capture the frame
    ret, frame = cap.read()

    # Show the image on the screen
    cv2.imshow('Webcam', frame)

    # Convert the image to grayscale
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Convert the image into a HOG descriptor
    gray = cv2.resize(gray, (dim, dim), interpolation = cv2.INTER_AREA)
    features = hog.compute(gray)
    features = features.T # Transpose so that the feature is in a single row

    # Predict the label
    pred = clf.predict(features)

    # Show the label on the screen
    print("The label of the image is: " + str(pred))

    # Pause for 25 ms and keep going until you push q on the keyboard
    if cv2.waitKey(25) == ord('q'):
        break

cap.release() # Release the camera resource
cv2.destroyAllWindows() # Close the image window

The above process reads in an image, displays it on the screen, converts the image into grayscale so we can compute its HOG descriptor, ensures that the data is in a single row compatible for the SVM you trained and we then predict its label. We print this to the screen, and we wait for 25 ms before we read in the next frame so we don't overload your CPU. Also, you can quit the program at any time by pushing the q key on your keyboard. Otherwise, this program will loop forever. Once we finish, we release the camera resource back to the computer so that it can be made available for other processes.