How to get such pattern matching of regular expres

2019-09-11 09:16发布

问题:

Hi I want to check a specific pattern in regular expression but I'm failed to do that. Input should be like

noun wordname:wordmeaning

I'm successful getting noun and wordname but couldn't design a pattern for word meaning. My code is :

int state;
char *meaning;
char *wordd; 

^verb   { state=VERB; }
^adj    { state = ADJ; }
^adv    { state = ADV; }
^noun   { state = NOUN; }
^prep   { state = PREP; }
^pron   { state = PRON; }
^conj   { state = CONJ; }

//my try but failed
[:\a-z]   {
meaning=yytext;
printf(" Meaning is getting detected %s", meaning);

}

[a-zA-Z]+  {
word=yytext;

}

Example input:

noun john:This is a name

Now word should be equal to john and meaning should be equal to This is a name.

回答1:

Agreeing that lex states (also known as start conditions) are the way to go (odd, but there are no useful tutorials).

Briefly:

  • your application can be organized as states, using one for "noun", one for "john" and one for the definition (after the colon).
  • at the top of the lex file, declare the states, e.g.,

    %s TYPE NAME VALUE

  • the capitals are not necessary, but since you are defining constants, that is a good convention.
  • next to the patterns, put those state names in < > brackets to tell lex that the patterns are used only in those states. You can list more than one state, comma-separated, when it matters. But your lex file probably does not need that.
  • one state is predefined: INITIAL.
  • your program switches states using the BEGIN() macro, in actions, e.g.,

    { BEGIN(TYPE); }

  • if your input is well-formed, it's simple: as each "type" is recognized, it begins the NAME state.
  • in the NAME state, your lexer looks for whatever you think a name should be, e.g.,

    <NAME>[[:alpha:]][[:alnum:]]+ { my_name = strdup(yytext); }

  • the name ends with a colon, so

    <NAME>":" { BEGIN(VALUE); }

  • the value is then everything until the end of the line, e.g.,

    <VALUE>.* { my_value = strdup(yytext); BEGIN(INITIAL); }

  • whether you switch to INITIAL or TYPE depends on what other things you might add to your lexer (such as ignoring comment lines and whitespace).

Further reading:

  • Start conditions (flex documentation)
  • Introduction to Flex