Regular expression for contents of parenthesis in

2019-07-30 17:08发布

问题:

How can I get contents of parenthesis in Racket? Contents may have more parenthesis. I tried:

(regexp-match #rx"((.*))" "(check)")

But the output has "(check)" three times rather than one:

'("(check)" "(check)" "(check)")

And I want only "check" and not "(check)".

Edit: for nested parenthesis, the inner block should be returned. Hence (a (1 2) c) should return "a (1 2) c".

回答1:

Parentheses are capturing and not matching.. so #rx"((.*))" makes two captures of everything. Thus:

(regexp-match #rx"((.*))" "any text")
; ==> ("any text" "any text" "any text")

The resulting list has the first as the whole match, then the first set of acpturnig paren and then the ones inside those as second.. If you want to match parentheses you need to escape them:

(regexp-match #rx"\\((.*)\\)" "any text")
; ==> #f
(regexp-match #rx"\\((.*)\\)" "(a (1 2) c)")
; ==> ("(a (1 2) c)" "a (1 2) c")

Now you see that the first element is the whole match, since the match might start at any location in the search string and end where the match is largest. The second element is the only one capture.

This will fail if the string has additional sets of parentheses. eg.

(regexp-match #rx"\\((.*)\\)" "(1 2 3) (a (1 2) c)")
; ==> ("(1 2 3) (a (1 2) c)" "1 2 3) (a (1 2) c")

It's because the expression isn't nesting aware. To be aware of it you need recursive reguler expression like those in Perl with (?R) syntax and friends, but racket doesn't have this (yet???)