Regex lookahead/lookbehind match for SQL script

2019-07-09 17:54发布

问题:

I'm trying to analyse some SQLCMD scripts for code quality tests. I have a regex not working as expected:

^(\s*)USE (\[?)(?<![master|\$])(.)+(\]?)

I'm trying to match:

  1. Strings that start with USE (ignore whitespace)
  2. Followed by optional square bracket
  3. Followed by 1 or more non-whitespace characters.
  4. EXCEPT where that text is "master" (case insensitive)
  5. OR EXCEPT where that that text is a $ symbol

Expected results:

USE [master] - don't match

USE [$(CompiledDatabaseName)] - don't match

USE [anything_else.01234] - match

Also, the same patterns above without the [ and ] characters.

I'm using Sublime Text 2 as my RegEx search tool and referencing this cheatsheet

回答1:

Your pattern - ^(\s*)USE (\[?)(?<![master|\$])(.)+(\]?) - uses a lookbehind that is variable-width (its length is not known beforehand) if you fix the character class issue inside it (i.e. replace [...] with (...) as you mean an alternative list of $ or a character sequence master) and thus is invalid in a Boost regex. Your (.)+ capturing is wrong since this group will only contain one last character captured (you could use (.+)), but this also matches spaces (while you need 1 or more non-whitespace characters). ? is the one or zero times quantifier, but you say you might have 2 opening and closing brackets (so, you need a limiting quantifier {0,2}).

You can use

^\h*USE(?!\h*\[{0,2}[^]\s]*(?:\$|(?i:master)))\h*\[{0,2}[^]\s]*]{0,2}

See regex demo

Explanation:

  • ^ - start of a line in Sublime Text
  • \h* - optional horizontal whitespace (if you need to match newlines, use \s*)
  • USE - a literal case-sensitive character sequence USE
  • (?!\h*\[{0,2}[^]\s]*(?:\$|(?i:master))) - a negative lookahead that makes sure the USE is NOT followed with:
    • \h* - zero or more horizontal whitespace
    • \[{0,2} - zero, one or two [ brackets
    • [^]\s]* - zero or more characters other than ] and whitespace
    • (?:\$|(?i:master)) - either a $ or a case-insensitive master (we turn off case sensitivity with (?i:...) construct)
  • \h* - go on matching zero or more horizontal whitespace
  • \[{0,2} - zero, one or two [ brackets
  • [^]\s]* - zero or more characters other than ] and whitespace (when ] is the first character in a character class, it does not have to be escaped in Boost/PCRE regexps)
  • ]{0,2} - zero, one or two ] brackets (outside of character class, the closing square bracket does not need escaping)