Regex to parse C++ enum

2019-04-09 12:36发布

How can a regular expression be constructed to parse C++ enums? The enums I tried on looked like

enum Temperature
{
    C = 0,
    F=1,     // some elements are commented
    R,       // most elements are not gived a value
    K        // sometimes the last element is succeeded by a comma
} temperature;

// different indent style is used
enum Depth {
    m = 0,
    ft = 1,
} depth;

I tried several simple patterns but none is general enough to catch all cases above.

Any regexp wizard who can help me?

Edit: to clarify, I want the name and value, e.g. C and 0.

2条回答
Explosion°爆炸
2楼-- · 2019-04-09 13:03

That was challenging :) Below is the best I could come up with. Assuming it is given just the text between { and } it captures all names and corresponding values:

/(\w+)\s*(?:=\s*(\d+)|)\s*,?\s*(?:(?:\n|$)|\/\/.*?(?:\n|$)|)/
查看更多
在下西门庆
3楼-- · 2019-04-09 13:06

If we use regex to match enum rather than use it to parse enum. I think it is possible. try with these steps:

step1. make sure the C/C++ source code can be compile successful.
step2. strip all comments from the C/C++ source code.
step3. match enum

a workable Ruby sample code:

# copy from Mastering Regular Expression 3rd
COMMENT = '/\*[^\*]*\*+(?:[^/*][^*]*\*+)*/'
COMMENT2 = '//[^\n]+'
DOUBLE = '"(?:\\.|[^\\"])*"'
SINGLE = '\'(?:\\.|[^\\\'])*\''
# pattern for match enum
ENUM = '\benum\s*(\w+)\s*\{(\s*\w+(?:\s*=\s*\w+)?(?:\s*,\s*\w+(?:\s*=\s*\w+)?)*)\s*(?:,\s*)?\}\s*\w+\s*;'

foo = File.open("foo.cpp", "r").read()
# strip all comments from foo.cpp
foo.gsub!(/(#{DOUBLE}|#{SINGLE})|#{COMMENT}|#{COMMENT2}/, '\1')
bar = []
# match enum...
foo.scan(/#{ENUM}/) do | m |
    printf("%s: %s\n", m[0], m[1].gsub(/\s/, ''))

end

output:

Temperature: C=0,F=1,R,K
Depth: m=0,ft=1
查看更多
登录 后发表回答