Assuming we have input data that consists of discrete values as well as a string of text, and the output should be a set of tags.
To turn this into data that can be fed into a neural net, I'm having trouble figuring out how to deal with the textual input.
Using only the textual input, I assume a RNN producing a thought vector, could work, I am however a bit uncertain how to feed the rest of the input data along.
If you are using RNN to handle the textual input, then the output of RNN can be concatenated with a one-hot-encoding of your discrete features. The concatenated vector can then be fed into an output layer (for example, logistic to calculate cross-entropy loss across multi-labels).
Similarly, if you are using an embedding layer to map the input texts, you can learn another embedding for your discrete features as well. The two embedded feature families can then be concatenated to fed into output layers.