What is the replace for softmax layer in case more

For example, I have CNN which tries to predict numbers from MNIST dataset (code written using Keras). It has 10 outputs, which form softmax layer. Only one of outputs can be true (independently for each digit from 0 to 9):

Real: [0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
Predicted: [0.02, 0.9, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01]

Sum of predicted is equal to 1.0 due to definition of softmax.

Let's say I have a task where I need to classify some objects that can fall in several categories:

Real: [0, 1, 0, 1, 0, 1, 0, 0, 0, 1]

So I need to normalize in some other way. I need function which gives value on range [0, 1] and which sum can be larger than 1.

I need something like that:

Predicted: [0.1, 0.9, 0.05, 0.9, 0.01, 0.8, 0.1, 0.01, 0.2, 0.9]

Each number is probability that object falls in given category. After that I can use some threshold like 0.5 to distinguish categories in which given object falls.

The following questions appear:

So which activation function can be used for this?
May be this function already exists in Keras?
May be you can propose some other way to predict in this case?

Your problem is one of multi-label classification, and in the context of Keras it is discussed, for example, here: https://github.com/fchollet/keras/issues/741

In short the suggested solution for it in keras is to replace the softmax layer with a sigmoid layer and use binary_crossentropy as your cost function.

an example from that thread:

# Build a classifier optimized for maximizing f1_score (uses class_weights)

clf = Sequential()

clf.add(Dropout(0.3))
clf.add(Dense(xt.shape[1], 1600, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(1600, 1200, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(1200, 800, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(800, yt.shape[1], activation='sigmoid'))

clf.compile(optimizer=Adam(), loss='binary_crossentropy')

clf.fit(xt, yt, batch_size=64, nb_epoch=300, validation_data=(xs, ys), class_weight=W, verbose=0)

preds = clf.predict(xs)

preds[preds>=0.5] = 1
preds[preds<0.5] = 0

print f1_score(ys, preds, average='macro')