I am trying to highlight important words in imdb dataset which contributed finally to the sentiment analysis prediction .
The dataset is like :
X_train - A review as string .
Y_train - 0 or 1
Now after using Glove embeddings for embedding the X_train value I can feed it to a neural net .
Now my question is , how can I highlight the most important words probability wise ? just like deepmoji.mit.edu ?
What have I tried :
I tried splitting the input sentences into bi-grams and using a 1D CNN to train it . Later when we want to find the important words of X_test , we just split the X_test in bigrams and find their probabilities . It works but not accurate .
I tried using prebuilt Hierarchical Attention Networks and succeeded . I got what I wanted but I couldn't figure out every line and concepts from the code .It's like a black box to me .
I know how a neural net works and I can code it using numpy with manual back propagation from scratch . I have detailed knowledge of how a lstm works and what forget , update , and output gates actually outputs . But I couldn't still figure out how to extract attention weights and how to make the data as a 3D array ( what is the timestep in our 2D data ? )
So , any type of guidance is welcome
Here is a version with Attention (not Hierarchical) but you should be able to figure out how to make it work with hierarchy too - if not I can help out too. The trick is to define 2 models and use 1 for the training (model) and the other one to extract attention values (model_with_attention_output):
The output will be the numpy array with attention value of each word - the higher the value the more important the word was
EDIT: You might want to replace lstm in multiplication with embs to get better interpretations but it will lead to worse performance...