Representing identifiers using Regular Expression

2020-07-01 03:36发布

问题:

The regular definition for recognizing identifiers in C programming language is given by

letter -> a|b|...z|A|B|...|Z|_
digit -> 0|1|...|9
identifier -> letter(letter|digit)*

This definition will generate identifiers of the form

identifier: [_a-zA-Z][_a-zA-Z0-9]*

My question now is how do you limit the length of the identifier that can be generated to not more than 31 characters. What changes need to be made in the regular definition or how to write a regular expression to limit it to not more than the specified length. Could anyone please help. Thanks.

回答1:

The regular expression you are looking for is:

[_a-zA-Z][_a-zA-Z0-9]{0,30}

It will match an underscore or letter following by X underscores, letters or numbers, where 0 <= X <= 30



回答2:

Update: Updated regex such that identifier is not started with a digit.

To limit the length, {} are usually used.
For example, your regex was [a-zA-Z0-9]+. Means, allow any alphanumeric values, and the length must be greater than equals to 1. If we want to limit it not to exceed 31 characters, we can rewrite the regex as:

[a-zA-Z0-9]{1,31}

{1,31} indicates that this will accept alphanumeric values of length greater than equals to 1 and less than equals to 31.

However, the above regex also means that the identifier can start with a digit. Note that there are three ranges provided: a-z, A-Z, and 0-9. To limit the identifier to start with an alphabet followed by alphabet or a digit, following regex can be used:

[a-zA-Z][a-zA-Z0-9]{0-30}

The first portion [a-zA-Z] forces the identifier to start with a character. It also makes sure that the identifier is not empty. The remaining portion of the regex [a-zA-Z0-9]{0-30} ensures that only characters and digits are accepted and that in addition to the first character, up to 30 more can be added to the identifier.

You can make respective changes to your regex.