I'm having data with around 60 features and most will be zeros most of the time in my training data only 2-3 cols may have values( to be precise its perf log data). however, my test data will have some values in some other columns.
I've done normalization/standardization(tried both separately) and feed it to PCA/SVD(tried both separately). I used these features in to fit my model but, it is giving very inaccurate results.
Whereas, if I skip normalization/standardization step and directly feed my data to PCA/SVD and then to the model, its giving accurate results(almost above 90% accuracy).
P.S.: I've to do anomaly detection so using Isolation Forest algo.
why these results are varying?
Normalization and standarization (depending on the source they sometimes are used equivalently, so I'm not sure what you mean exactly by each one in this case, but it's not important) are a general recommendation that usually works well in problems where the data is more or less homogeneously distributed. Anomaly detection however is, by definition, not that kind of problem. If you have a data set where most of the examples belong to class
A
and only a few belong to classB
, it is possible (if not necessary) that sparse features (features that are almost always zero) are actually very discriminative for your problem. Normalizing them will basically turn them to zero or almost zero, making it hard for a classifier (or PCA/SVD) to actually grasp their importance. So it is not unreasonable that you get better accuracy if you skip the normalization, and you shouldn't feel you are doing it "wrong" just because you are "supposed to do it"I don't have experience with anomaly detection, but I have some with unbalanced data sets. You could consider some form of "weighted normalization", where the computation of the mean and variance of each feature is weighted with a value inversely proportional to the number of examples in the class (e.g.
examples_A ^ alpha / (examples_A ^ alpha + examples_B ^ alpha)
, withalpha
some small negative number). If your sparse features have very different scales (e.g. one is 0 in 90% of cases and 3 in 10% of cases and another is 0 in 90% of cases and 80 in 10% of cases), you could just scale them to a common range (e.g. [0, 1]).In any case, as I said, do not apply techniques just because they are supposed to work. If something doesn't work for your problem or particular dataset, you are rightful not to use it (and trying to understand why it doesn't work may yield some useful insights).
Any features that only have zeros (or any other constant value) in the training set, are not and cannot be useful for any ML model. You should discard them. The model cannot learn any information from them so it won't matter that the test data do have some non-zero values.
Generally, you should do normalization or standardization before feeding data for PCA/SVD, otherwise these methods will catch wrong patterns in the data (e.g. if features are on a different scale between each other).
Regarding the reason behind such a difference in the accuracy, I'm not sure. I guess it has to do with some peculiarities of the dataset.