So one of the standard things to do with the data is normalize it and standardize it to have data that's normally distributed with a mean 0 and standard deviation of 1, right? But, what if the data is NOT normally distributed?
Also, does the desired output has to be normally distributed too? What if I want my feedforward net to classify between two classes (-1, and 1), that would be impossible to standardize into a normal distribution of mean 0 and std of 1 right?
Feedforward nets are non-parametric right? So if they are, is it still important to standarize data? And why do people standarize it?
Standardizing the features isn't to make the data fit a normal distribution, it's to put the feature values in a known range that makes it easier for algorithms to learn from the data. This is because most algorithms are not scale/shift invariant. Decision Trees, for example, are both scale and shift invariant, and so doing the normalization has no impact on the performance of the tree.
No. That's not a thing. The output is whatever the output is. You do have to make sure the activation function of the final layer of your network can make the predictions you want (i.e.: Sigmoid activation can't output negative values or values > 1).
No, they would generally be considered parametric. Parametric / non-parametric doesn't really have a hard definition. People may mean slightly different things when talking about this.
Those things have nothing to do with each other at all.
That's the very first thing I mention, it's to make learning easier/possible.