How do I generate text matching a regular expressi

2019-01-31 12:52发布

Yup, you read that right. I needs something that is capable of generating random text from a regular expression. So the text should be random, but be matched by the regular expression. It seems it doesn't exist, but I could be wrong.

Just a an example: that library would be capable of taking '[ab]*c' as input, and generate samples such as:

abc
abbbc
bac

etc.

Update: I created something myself: Xeger. Check out http://code.google.com/p/xeger/.

5条回答
冷血范
2楼-- · 2019-01-31 12:53

Here is a Python implementation of a module like that: http://www.mail-archive.com/python-list@python.org/msg125198.html It should be portable to Java.

查看更多
成全新的幸福
3楼-- · 2019-01-31 12:54

Here's a few implementations of such a beast, but none of them in Java (and all but the closed-source Microsoft one very limited in their regexp feature support).

查看更多
Fickle 薄情
4楼-- · 2019-01-31 12:57

based on Wilfred Springer's solution together with http://www.brics.dk/~amoeller/automaton/ i build another generator. It do not use recursion. It take as input the patter/regularExpression minimum String length and maximum String length. The result is an accepted String between min and max length. It also allow some of the XML "short hand character classes". I use this for an XML Sample Generator that build valid String for facets.

public static final String generate(final String pattern, final int minLength, final int maxLength) {
    final String regex = pattern
            .replace("\\d", "[0-9]")        // Used d=Digit
            .replace("\\w", "[A-Za-z0-9_]") // Used d=Word
            .replace("\\s", "[ \t\r\n]");   // Used s="White"Space
    final Automaton automaton = new RegExp(regex).toAutomaton();
    final Random random = new Random(System.nanoTime());
    final List<String> validLength = new LinkedList<>();
    int len = 0;
    final StringBuilder builder = new StringBuilder();
    State state = automaton.getInitialState();
    Transition[] transitions;
    while(len <= maxLength && (transitions = state.getSortedTransitionArray(true)).length != 0) {
        final int option = random.nextInt(transitions.length);
        if (state.isAccept() && len >= minLength && len <= maxLength) validLength.add(builder.toString());
        final Transition t = transitions[option]; // random transition
        builder.append((char) (t.getMin()+random.nextInt(t.getMax()-t.getMin()+1))); len ++;
        state = t.getDest();
    }
    if(validLength.size() == 0) throw new IllegalArgumentException(automaton.toString()+" , "+minLength+" , "+maxLength);
    return validLength.get(random.nextInt(validLength.size()));
}
查看更多
Bombasti
5楼-- · 2019-01-31 13:07

I just created a library for doing this a minute ago. It's hosted here: http://code.google.com/p/xeger/. Carefully read the instructions before using it. (Especially the one referring to downloading another required library.) ;-)

This is the way you use it:

String regex = "[ab]{4,6}c";
Xeger generator = new Xeger(regex);
String result = generator.generate();
assert result.matches(regex);
查看更多
The star\"
6楼-- · 2019-01-31 13:10

I am not aware of such a library. If you're interested in writing one yourself, then these are probably the steps you'll need to take:

  1. Write a parser for regular expressions (you may want to start out with a restricted class of regexes).

  2. Use the result to construct an NFA.

  3. (Optional) Convert the NFA to a DFA.

  4. Randomly traverse the resulting automaton from the start state to any accepting state, while storing the characters outputted by every transition.

The result is a word which is accepted by the original regex. For more, see e.g. Converting a Regular Expression into a Deterministic Finite Automaton.

查看更多
登录 后发表回答