Regular expression to remove comments from SQL sta

2020-02-09 05:19发布

I'm trying to come up with a regular expression to remove comments from an SQL statement.

This regex almost works:

(/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/)|'(?:[^']|'')*'|(--.*)

Excepth that last part doesn't handle "--" comments very well. The problem is handling SQL strings, delimited with ''.

For example, if i have

SELECT ' -- Hello -- ' FROM DUAL

It shouldn't match, but it's matching.

This is in ASP/VBscript.

I've thought about matching right-to-left but i don't think the VBScript's regex engine supports it. Also tried fiddling with negative lookbehind but the results weren't good.

8条回答
冷血范
2楼-- · 2020-02-09 05:28

This code works for me:

function strip_sqlcomment ($string = '') {
    $RXSQLComments = '@(--[^\r\n]*)|(\#[^\r\n]*)|(/\*[\w\W]*?(?=\*/)\*/)@ms';
    return (($string == '') ?  '' : preg_replace( $RXSQLComments, '', $string ));
}

with a little regex tweak it could be used to strip comments in any language

查看更多
成全新的幸福
3楼-- · 2020-02-09 05:35

Please see my answer here. It works both for line comments and for block comments, even nested block comments. I guess you need to use regex with balancing groups, which AFAIK is not available in VBScript.

查看更多
你好瞎i
4楼-- · 2020-02-09 05:36

For all PHP folks: please use this library - https://github.com/jdorn/sql-formatter. I have been dealing with stripping comments from SQL for couple years now and the only valid solution would be a tokenizer/state machine, which I lazily resisted to write. Couple days ago I found out this lib and ran 120k queries through it and found only one bug (https://github.com/jdorn/sql-formatter/issues/93), which is fixed immediately in our fork https://github.com/keboola/sql-formatter.

The usage is simple

$query <<<EOF
/* 
  my comments 
*/
SELECT 1;
EOF;

$bareQuery = \SqlFormatter::removeComments($query);
// prints "SELECT 1;"
print $bareQuery;
查看更多
可以哭但决不认输i
5楼-- · 2020-02-09 05:37

For Node.js, see pg-minify library. It works with PostgreSQL, MS-SQL and MySQL scripts.

It can handle all types of comments, plus compress the resulting SQL to its bare minimum, to optimize what needs to be sent to the server.

查看更多
叛逆
6楼-- · 2020-02-09 05:39

In PHP, i'm using this code to uncomment SQL:

$sqlComments = '@(([\'"]).*?[^\\\]\2)|((?:\#|--).*?$|/\*(?:[^/*]|/(?!\*)|\*(?!/)|(?R))*\*\/)\s*|(?<=;)\s+@ms';
/* Commented version
$sqlComments = '@
    (([\'"]).*?[^\\\]\2) # $1 : Skip single & double quoted expressions
    |(                   # $3 : Match comments
        (?:\#|--).*?$    # - Single line comments
        |                # - Multi line (nested) comments
         /\*             #   . comment open marker
            (?: [^/*]    #   . non comment-marker characters
                |/(?!\*) #   . ! not a comment open
                |\*(?!/) #   . ! not a comment close
                |(?R)    #   . recursive case
            )*           #   . repeat eventually
        \*\/             #   . comment close marker
    )\s*                 # Trim after comments
    |(?<=;)\s+           # Trim after semi-colon
    @msx';
*/
$uncommentedSQL = trim( preg_replace( $sqlComments, '$1', $sql ) );
preg_match_all( $sqlComments, $sql, $comments );
$extractedComments = array_filter( $comments[ 3 ] );
var_dump( $uncommentedSQL, $extractedComments );
查看更多
\"骚年 ilove
7楼-- · 2020-02-09 05:42

As you said that the rest of your regex is fine, I focused on the last part. All you need to do is verify that the -- is at the beginning and then make sure it removes all dashes if there are more than 2. The end regex is below

(^[--]+)

The above is just if you want to remove the comment dashes and not the whole line. You can run the below if you do want everything after it to the end of the line, also

(^--.*)
查看更多
登录 后发表回答