I've been working with Stanford's coreNLP to perform sentiment analysis on some data I have and I'm working on creating a training model. I know we can create a training model with the following command:
java -mx8g edu.stanford.nlp.sentiment.SentimentTraining -numHid 25 -trainPath train.txt -devPath dev.txt -train -model model.ser.gz
I know what goes in the train.txt file. You score sentences and put them in train.txt, something like this:
(0 (2 Today) (0 (0 (2 is) (0 (2 a) (0 (0 bad) (2 day)))) (..)))
But I don't understand what goes in the dev.txt file. I read through this question several times to try to understand what goes in dev.txt, but it's still unclear to me. Also, scoring these sentences manually has become a pain, is there a tool available that makes it easier? I'm worried that I've been using the wrong number of parentheses or some other stupid mistake like that.
Also, any suggestions on how long my train.txt file should be? I'm thinking of scoring a 1000 sentences. Is that number too small, too large?
All your help is appreciated :)