Advice to consider when training a robust cascade

I'm training a cascade classifier in order to detect animals in images. Unfortunately my false positive rate is quite high (super high using Haar and LBP, acceptable using HOG). I'm wondering how I could possibly improve my classifier.

Here are my questions:

what is the amount of training samples that is necessary for a robust detection? I've read somewhere that 4000 pos and 800 neg samples are needed. Is that a good estimate?
how different should the training samples be? Is there a way to quantify image difference in order to include / exclude possible 'duplicate' data?
how should I deal with occluded objects? should I train only the part of the animal that is visible, or should I rather pick my ROI so that the average ROI is quite constant?
re occluded objects: animals have legs, arms, tails, heads etc. Since some body parts tend to be occluded quite often, does it make sense to select the 'torso' as the ROI?
should I try to downscale my images and train on smaller images sizes? Could this possibly improve things?

I'm open for any pointers here!

4000 pos - 800 neg is a bad ratio. The thing with negative samples is that you need to train your system as many of them as possible, since Adaboost ML algorithm -the core algorithm for all haar like feature selection processes- depends highly on them. Using 4000 / 10000 would be a good enhancement.
Detecting "animals" is a hard problem. Since your problem is a decision process, which is already NP-hard, you are increasing complexity with your range of classification. Start with cats first. Have a system that detects cats. Then apply the same to the dogs. Have, say 40 systems, detecting different animals and use them for your purpose later on.
For training, do not use occluded objects as positives. i.e. if you want to detect frontal faces, then train frontal faces with only applying position and orientation changes, without including any other object in front of it.
Downscaling is not important as the haar classifier itself downscales everything to 24x24. Watch whole viola-jones presentation when you have enough time.
Good luck.