I want to find patterns in sentence structure. Therefore I'm trying to get the parse tree as preprocessing.
Until now I used the Stanford CoreNLPParser. Many of my sentences are imperative sentences. After receiving much more clusters as I expected, I reviewed the parse tree and found out that often verbs at the beginning of my imperative sentences were parsed as Noun Phrases (NP).
I found the following answer: https://stackoverflow.com/a/35887762/6068675
Since this answer is from 2016 I was hoping there might be another option to get better results. Only lowercase every first word in a sentence doesn't look like an ideal solution.
I include a few examples that got parsed wrong:
(ROOT (S (S (NP (NNP View)) (NP (NP (DT a) (NN list)) (PP (IN of) (NP (JJ ongoing) (NNS sales) (NNS quotes))) (PP (IN for) (NP (DT the) (NN customer))))) (. .)))
(ROOT (NP (NP (NN Request) (NN approval) (S (VP (TO to) (VP (VB change) (NP (DT the) (NN record)))))) (. .)))
Further Examples
(ROOT (NP (NP (NNP View)) (CC or) (VP (VB change) (NP (NP (JJ detailed) (NN information)) (PP (IN about) (NP (DT the) (NN customer))))) (. .)))
(ROOT (FRAG (PP (IN Post) (NP (DT the) (VBN specified) (NN prepayment) (NN information))) (. .)))
(ROOT (S (S (NP (NNP View)) (NP (NP (DT a) (NN summary)) (PP (IN of) (NP (DT the) (NN debit) (CC and) (NN credit) (NNS balances))) (PP (IN for) (NP (JJ different) (NN time) (NNS periods))))) (. .)))
(ROOT (NP (NP (NP (NN Offer) (NNS items)) (CC or) (NP (NP (NNS services)) (PP (TO to) (NP (DT a) (NN customer))))) (. .)))
(ROOT (NP (NP (NP (NNP View)) (CC or) (VP (VB add) (NP (NP (NNS comments)) (PP (IN for) (NP (DT the) (NN record)))))) (. .)))
Unfortunately the part-of-speech tagger is trained on the Wall Street Journal from years ago. So there are issues where imperative statements aren't in the training data. So it's going to guess wrong at times. But on some imperative statements it does the right thing as well. I think if the first word is a clear verb like "Call" you will get better performance.
Another issue I saw is the verb "text" (as in send a text message) is not being handled well.
I think we would be excited to add some contemporary data and add some imperative training data to help out.