RegEx: can't figure out the expression to matc

2019-05-16 16:25发布

问题:

I am trying to clean up and merge some older calendar files (x.ics), using Sublime Text as editor. Opening the files gives a long file like below. I would like to delete (i.e. replace by nothing) all the entries (VEVENTs) in the file mentioning Birthday in the SUMMARY and keep all other entries, so I am using Regular Expressions as an approach.

I managed to match the lines from BEGIN:VEVENT to END:VEVENT, however I can't manage to setup an expression to filter only the matches/VEVENTs with the Birthdays in it.

What I have now is this expression: BEGIN:VEVENT(.|\n)*?(Birthday)(.|\n)*?END:VEVENT\n. Clearly this is not the right expression, as it matches from BEGIN till the END just after it found Birthday and doesn't match the single VEVENT.

Can anybody please help me to find a solution? It would be much appreciated!

BEGIN:VCALENDAR
PRODID:-//Google Inc//Google Calendar 70.9054//EN
VERSION:2.0
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:2009
X-WR-TIMEZONE:Europe/Amsterdam
X-WR-CALDESC:
BEGIN:VEVENT
DTSTART:20110606T170500Z
DTEND:20110614T121000Z
DTSTAMP:20140108T203731Z
UID:CSVConvert0127bd7e37d8feb5e1daaa909729c2ba
CREATED:19000101T120000Z
DESCRIPTION:
LAST-MODIFIED:19700101T000000Z
LOCATION:Amsterdam
SEQUENCE:0
STATUS:CONFIRMED
SUMMARY:Study
TRANSP:OPAQUE
END:VEVENT
.
.
.
BEGIN:VEVENT
DTSTART;VALUE=DATE:20110704
DTEND;VALUE=DATE:20110705
DTSTAMP:20140108T203731Z
UID:CSVConvert02f7a0b537b60e5601035a356dfd6a06
CREATED:19000101T120000Z
DESCRIPTION:
LAST-MODIFIED:19700101T000000Z
LOCATION:
SEQUENCE:0
STATUS:CONFIRMED
SUMMARY:Mark's Birthday
TRANSP:TRANSPARENT
END:VEVENT
END:VCALENDAR

回答1:

I think you need to add a lookahead to prevent it from going beyond the boundary:

BEGIN:VEVENT([\s\S](?!BEGIN:VEVENT))+?Birthday[\s\S]+?END:VEVENT

NB: I'm not a ST user, no idea if it supports that.



回答2:

Firstly: if I were doing this, and especially if it started getting any more complicated, I'd whip up a quick Perl/Python/etc script to filter through things. That will be much more powerful and flexible, and less finicky. RegEx isn't great at this kind of thing.

That said, you can get this done with RegEx alone, although it's messy. What you need to do is prevent the END lines from being included in your "middle section". To accomplish that, you can do this if Sublime doesn't support lookaheads:

BEGIN:VEVENT\n(([^E]|E[^N]|EN[^D]).*\n)*(([^E]|E[^N]|EN[^D]).*Birthday.*\n)(([^E]|E[^N]|EN[^D]).*\n)*END:VEVENT\n

Expanded a bit:

BEGIN:VEVENT\n
(([^E]|E[^N]|EN[^D]).*\n)*           //Any number of non-END lines
(([^E]|E[^N]|EN[^D]).*Birthday.*\n)  //At least one Birthday line
(([^E]|E[^N]|EN[^D]).*\n)*           //More non-END lines
END:VEVENT\n

And technically, you could also exclude the END bits on the Birthday line, since END:VEVENT will never contain "Birthday" anyway.

Again, that's super messy, I would recommend the lookahead solution above, or a custom script if things get more complicated. But I worked this one out, so I figured I'd post it anyway. Maybe show it to your kids to give them a good scare or something.