I'm working on a Natural Language Processing (NLP) project in which I use a syntactic parser to create a syntactic parse tree out of a given sentence.
Example Input: I ran into Joe and Jill and then we went shopping
Example Output: [TOP [S [S [NP [PRP I]] [VP [VBD ran] [PP [IN into] [NP [NNP Joe] [CC and] [NNP Jill]]]]] [CC and] [S [ADVP [RB then]] [NP [PRP we]] [VP [VBD went] [NP [NN shopping]]]]]]
I'm looking for a C# utility that will let me do complex queries like:
- Get the first VBD related to 'Joe'
- Get the NP closest to 'Shopping'
Here's a Java utility that does this, I'm looking for a C# equivalent.
Any help would be much appreciated.
We already use
One option would be to parse the output into C# code and then encoding it to XML making every node into
string.Format("<{0}>", this.Name);
andstring.Format("</{0}>", this._name);
in the middle put all the child nodes recursively.After you do this, I would use a tool for querying XML/HTML to parse the tree. Thousands of people already use query selectors and jQuery to parse tree-like structure based on the relation between nodes. I think this is far superior to TRegex or other outdated and un-maintained java utilities.
For example, this is to answer your first example:
Here is your second example
There are at least two NLP frameworks, i.e.
And here you can find instructions to use a java NLP in .NET:
This page is about using java OpenNLP, but could apply to the java library you've mentioned in your post
Or use NLTK following this guidelines: