Writing a very simple script parser as part of a school project, and while it's not required I'm curious if it can be done with only a regular expression.
The syntax is similar to ASP, where a script begins with <% and ends with %>.
It only supports one command "pr", which is the same as echo or Response.Write.
Right now I'm using this regular expression to find script blocks:
(<%\s*([\s\S]*?)\s*%>)
But if I have a command like this:
<% pr "%>"; %>
...it obviously only matches:
<% pr "%>
Is there a way using pure regex to to ignore closing tags that are within quotes? My main worry is that it might match tags that are between quotes, but actually outside of them, if that makes sense. For example...
<% pr "hello world"; %> "
Technically the closing tag is surrounded by quotes, but it's not inside an "open" then "close" quote, rather the other way around.
If this is possible with regex that would be pretty neat, otherwise I suspect that if I wanted to support this functionality I would have to manually iterate through the incoming text and parse the blocks out myself, which is no big deal really either.
Thanks!
I think this one should suit your needs:
<%(".*?"|.*?)*?%>
(see the Demo).Explanation:
While
.*
matches as long as possible,.*?
matches as little as possible.For example (using pseudo-code),
while