I am new to machine learning. I got a task to find the total number of vehicles from an image using machine learning concept. I am using neural network. My image of worst case is given here.
Traffic Image
I need to find the total number of cars from this image. My idea is to cut this big image into small patches of image and train the network to count the vehicles from the small patches. Each patch will be having count less than 5. Then in the processing of new image, I could make use of a sliding window to get the total count of vehicles.
I just want to know whether this idea is possible or not OR should I go for feature extraction and training neural network with those features. If possible, whether there is any conditions for the dataset and training.
What you are looking for is called object detection. A starting point can be Deep Neural Networks for Object Detection or Region-based Convolutional Networks for Accurate Object Detection and Segmentation.
A similar, but much more difficult task is instance segmentation. One of the latest papers I've seen in this area is Pixel-level Encoding and Depth Layering for Instance-level Semantic Labeling.
Instance segmentation is probably the hardest tasks in Computer Vision. When you're new to machine learning / computer vision, you might first want to do image classification. If you want to go into the direction of instance segmentation, then you should continue with semantic segmentation and then instance segmentation.
A simple sliding window approach, where you only predict "car" / "no car" will not work, because in the image the cars are not separated by any "no car".