Convert Natural Language Questions to SQL Queries

2019-03-11 08:42发布

问题:

I want to convert Natural Language questions to SQL queries using opennlp Java library, i.e.

Who won women figure skating in Sochi?

should be converted to

select name from winners where event='skating_woman'

Anybody know which classes will be useful and how to achieve this?

Also pasting code which I tried.
I have converted question into statements and later into tokens.

/////////1st part

String paragraph = "Did Matt win the men slalom?";
InputStream is1 = new FileInputStream("bins/en-sent.bin");
SentenceModel model1 = new SentenceModel(is1);
SentenceDetectorME sdetector = new SentenceDetectorME(model1);
String sentences[] = sdetector.sentDetect(paragraph);
System.out.println(sentences[0]);
is1.close();

////////2nd part
InputStream is2 = new FileInputStream("bins/en-token.bin");
TokenizerModel model2 = new TokenizerModel(is2);
Tokenizer tokenizer = new TokenizerME(model2);
String tokens[] = tokenizer.tokenize(sentences[0]);
for (String a : tokens)
    System.out.println(a);
is2.close();

回答1:

I don't have time to post code right now, but please take a look at the opennlp sentence chunker and document categorizer. I think you could -very creatively- use the doccat to establish your "key" and the sentence chunker to establish noun and verb phrases (not tokens, but actualy multi word phrases) and combine the results. So at query time, you would categorize the sentence to establish a key, then chunk the sentence, then do a query that joins to the keys table and then fuzzily (full text index maybe) the phrases table. Just a thought, if it's interesting I'll post code as an edit. you would have to build the doccat model using samples.

EDIT

Here is how to get a probability dist over a set of categories using the opennlp document categorizer, youll need to supply a properties file that has the path to the doccat model:

import java.io.File;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties;
import opennlp.tools.doccat.DoccatModel;
import opennlp.tools.doccat.DocumentCategorizerME;


/**
 *
 * @author Owner
 */
public class SentimentFinder {

  private DoccatModel doccatModel;
  private DocumentCategorizerME documentCategorizerME;
  Properties props =null;

  public void init() {
    try {
      if (doccatModel == null) {
        doccatModel = new DoccatModel(new File(props.getProperty("opennlp.sentiment.model.generic")));
        documentCategorizerME = new DocumentCategorizerME(doccatModel);

      }
    } catch (IOException ex) {
      ex.printStackTrace();
    }
  }

  /**
   * Classifies text via a maxent model. Try to keep chunks of text small, or
   * typically there will be all low scores with little difference.
   *
   * @param text the string to be classified
   * @return
   */
  public Map<String, Double> probDist(String text) {
    Map<String, Double> probDist = new HashMap<String, Double>();
    if (doccatModel == null) {
      init();
    }
    double[] categorize = documentCategorizerME.categorize(text);
    int catSize = documentCategorizerME.getNumberOfCategories();
    for (int i = 0; i < catSize; i++) {
      String category = documentCategorizerME.getCategory(i);
      probDist.put(category, categorize[documentCategorizerME.getIndex(category)]);
    }
    return probDist;

  }
}

And here's how to chunk the sentence with a sentence chunker and get noun phrases

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.HashMap;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.chunker.ChunkerModel;
import opennlp.tools.postag.POSModel;
import opennlp.tools.postag.POSTaggerME;
import opennlp.tools.tokenize.TokenizerME;
import opennlp.tools.tokenize.TokenizerModel;
import opennlp.tools.util.Span;

/**
 *
 * Extracts noun phrases from a sentence. To create sentences using OpenNLP use
 * the SentenceDetector classes.
 */
public class OpenNLPNounPhraseExtractor {

  static final int N = 2;

  public static void main(String[] args) {

    try {
      HashMap<String, Integer> termFrequencies = new HashMap<>();
      String modelPath = "c:\\temp\\opennlpmodels\\";
      TokenizerModel tm = new TokenizerModel(new FileInputStream(new File(modelPath + "en-token.zip")));
      TokenizerME wordBreaker = new TokenizerME(tm);
      POSModel pm = new POSModel(new FileInputStream(new File(modelPath + "en-pos-maxent.zip")));
      POSTaggerME posme = new POSTaggerME(pm);
      InputStream modelIn = new FileInputStream(modelPath + "en-chunker.zip");
      ChunkerModel chunkerModel = new ChunkerModel(modelIn);
      ChunkerME chunkerME = new ChunkerME(chunkerModel);
      //this is your sentence
      String sentence = "Barack Hussein Obama II  is the 44th awesome President of the United States, and the first African American to hold the office.";
      //words is the tokenized sentence
      String[] words = wordBreaker.tokenize(sentence);
      //posTags are the parts of speech of every word in the sentence (The chunker needs this info of course)
      String[] posTags = posme.tag(words);
      //chunks are the start end "spans" indices to the chunks in the words array
      Span[] chunks = chunkerME.chunkAsSpans(words, posTags);
      //chunkStrings are the actual chunks
      String[] chunkStrings = Span.spansToStrings(chunks, words);
      for (int i = 0; i < chunks.length; i++) {
        String np = chunkStrings[i];
        if (chunks[i].getType().equals("NP")) {
          if (termFrequencies.containsKey(np)) {
            termFrequencies.put(np, termFrequencies.get(np) + 1);
          } else {
            termFrequencies.put(np, 1);
          }
        }
      }
      System.out.println(termFrequencies);

    } catch (IOException e) {
    }
  }

}

so what I was thinking is to classify the input text, and extract and store noun phrases, you could then, at query time, classify the input, get a category, then do something like this in SQL

select * from categories a inner join nounphrases b on a.id = b.catid where catname = @thecatIjustgotfromtheclassifier and contains(text,'search term')

or something like that



回答2:

Simple answer to this is that you "Could" do it - but its impractical and close to impossible.. Simple reason is that you can have many similar Questions that can mean exactly the same. With SQL query you have specific syntax you have to keep in order to get data... with standard language you have potentially 100's of different syntax to get same thing. Mapping this into SQL language is very impractical and close to impossible.

Yes you could force the user to use specific words or specific question format so your program can interpret them as SQL queries - but than again this is breaking the paradigm of what you want to achieve isn't it

Edit:

Since you are soooo desperate ;) you could "potentially" do something like this

Lets imagine a very simple Query

SELECT * FROM USERS WHERE AGE = '20';

What are the possible Human language questions?

  • Can you show me all people with age of 20?
  • Display humans with the age of 20
  • How many people in this group have age of 20?
  • How many people in this table have 20 years ?
  • Do we have any people aged 20 ?
  • Search for all people with age of twenty

What you could then do is create some sort of Map<key,value> where ie.

Key = USERS;
Value = people, humans, 

And another Map

Key = SELECT;
Value = Can you show me, Display, Do we have, Search, How many;

And so on so forth - This will create a complex map with all possible phrases that can mean the same thing and correspond to a given SQL query syntax element. This may not be the best solution but it's what I would probably do - at least this is what I would start with



回答3:

Check out Kueri.me

This product provides technology which powers a search-box which converts English to SQL, It's built to construct an Answer Engine to relational databases.

It can recognizes various filters, groupings, and other analytics elements, so it can answer questions like

  • Show all orders from France in the last 4 months
  • Show top 5 orders by total sum in the last week
  • What's the average number of orders per customer

    and so on.

You do have to configure the system to recognize your tables, fields and jargon.

Note : I work at kueri.me