What would be a regex to replace/remove END where

What would be a regex (PHP) to replace/remove (using preg_replace()) END where its not been preceded by an unended START?

Here are a few examples to portray what I mean better:

Example 1:

Input:

sometext....END

Output:

sometext.... //because theres no START, therefore no need for the excess END

Example 2:

Input:

STARTsometext....END

Output:

STARTsometext....END //because its preceded by a START

Example 3:

Input:

STARTsometext....END.......END

Output:

STARTsometext....END....... //because the END is not preceded by a START

Hoping someone can help?

Thank You.

标签： php regex preg-replace

3条回答

劳资没心，怎么记你

2楼-- · 2019-02-18 06:28

It is not possible to write a regular expression for all possible syntax. For your case you might need a context free parser like an ascendent or descendent one. See: http://en.wikipedia.org/wiki/Formal_grammar

0人赞添加讨论(0) 举报

一夜七次

3楼-- · 2019-02-18 06:40

This is a textbook example of a non-regular language (START and END are the equivalent of opening and closing parentheses). That means you cannot match this language with a simple regular expression. You can do it to some specific depth with a complicated regex, but not arbitrary depth.

You need to write a language parser.

0人赞添加讨论(0) 举报

傲

4楼-- · 2019-02-18 06:41

Assuming you aren't looking for nested pairs, there is a simple solution to remore excess ENDs. Consider:

$str = preg_replace("/END|(START.*?END)/", "$1", $str);

This is a little backwards replacement, but it makes sense if you understand the order in which the engine works. First, the regex is made of two main parts: END|(). The alternations are tried from left to right, so if the engine sees an END in the input string, it will match it and move on to the next match (that is, look for END again).
The second part is a capturing group, which contains START.*?END - this will match an entire Start/End token if possible. Everything else will be skipped, until it finds another END or START.

Since we use $1 in the replace, which is the captured group, we only save the second token. Therefor, the only way for an END to survive is to get into the capturing group, by being the first one after a START.

For example, for the text END START 123 END abc END. The regex will find the following matches, and keep, skip or remove them accordingly:

END - Removed
(START 123 END) - Captured
a - Skip
b - Skip
c - Skip
END - Removed

Working example: http://ideone.com/suVYh

0人赞添加讨论(0) 举报

What would be a regex to replace/remove END where

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间