How to write a regular expression to match a strin

2020-07-07 10:41发布

I am writing a parser using ply that needs to identify FORTRAN string literals. These are quoted with single quotes with the escape character being doubled single quotes. i.e.

'I don''t understand what you mean'

is a valid escaped FORTRAN string.

Ply takes input in regular expression. My attempt so far does not work and I don't understand why.

t_STRING_LITERAL = r"'[^('')]*'"

Any ideas?

4条回答
▲ chillily
2楼-- · 2020-07-07 11:15

You want something like this:

r"'([^']|'')*'"

This says that inside of the single quotes you can have either double quotes or a non-quote character.

The brackets define a character class, in which you list the characters that may or may not match. It doesn't allow anything more complicated than that, so trying to use parentheses and match a multiple-character sequence ('') doesn't work. Instead your [^('')] character class is equivalent to [^'()], i.e. it matches anything that's not a single quote or a left or right parenthesis.

查看更多
Evening l夕情丶
3楼-- · 2020-07-07 11:18

It's usually easy to get something quick-and-dirty for parsing particular string literals that are giving you problems, but for a general solution you can get a very powerful and complete regex for string literals from the pyparsing module:

>>> import pyparsing
>>> pyparsing.quotedString.reString
'(?:"(?:[^"\\n\\r\\\\]|(?:"")|(?:\\\\x[0-9a-fA-F]+)|(?:\\\\.))*")|(?:\'(?:[^\'\\n\\r\\\\]|(?:\'\')|(?:\\\\x[0-9a-fA-F]+)|(?:\\\\.))*\')'

I'm not sure about significant differences between FORTRAN's string literals and Python's, but it's a handy reference if nothing else.

查看更多
神经病院院长
4楼-- · 2020-07-07 11:30
import re

ch ="'I don''t understand what you mean' and you' ?"

print re.search("'.*?'",ch).group()
print re.search("'.*?(?<!')'(?!')",ch).group()

result

'I don'
'I don''t understand what you mean'
查看更多
Bombasti
5楼-- · 2020-07-07 11:34

A string literal is:

  1. An open single-quote, followed by:
  2. Any number of doubled-single-quotes and non-single-quotes, then
  3. A close single quote.

Thus, our regex is:

r"'(''|[^'])*'"
查看更多
登录 后发表回答