sed to replace // with /* */ comments EXCEPT when

2019-06-04 08:16发布

站内文章 / C

48 0

爷的心禁止访问

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

The problem I am facing is with an ANSI compiler that requires C style comments.

So I am trying to convert my existing comments to comply with the C standard ISO C89.

I am looking for a SED expression to replace // comments with /* comments EXCEPT when // comments appear within /* */ comments (which would break the comment).

I have tried this (a range expression) to no avail:

sed -e '/\/*/,/*\//! s_//\(.*\)_/*\1 */_' > filename

Will something work to ignore the 1 line comments inside a comment like this but change everything else?

/**********************************
* Some comment
* an example bit of code within the comment followed by a //comment
* some more comment
***********************************/
y = x+7; //this comment must be changed

Thanks!

回答1:

Here's a lightly tested filter written in C that should perform the conversion you want. Some comments about what this filter does that are difficult if not impossible to handle with a regex:

it ignores comment-like sequences that are enclosed in quotes (since they aren't comments)
if a C99 comment that is being converted contains something that would start or end a C89 comment, it munges that sequence so there will be no nested comment or premature end to the comment (a nested /* or */ gets changed to /+ or /|). I wasn't sure if you needed this or not (if you don't, it should be easy to remove)
the above munging of nested comments only occurs in a C99 comment that's being converted - the contents of comments that are already C89 style are not changed.
it does not handle trigraphs or digraphs (I think this only allows the possibility of missing an escape sequence or end of line continuation that is initiated with the trigraph ??/).

Of course, you'll need to perform your own testing to determine if it's suitable for your purposes.

#include <stdio.h>

char* a = " this is /* a test of \" junk // embedded in a '\' string";
char* b = "it should be left alone//";

// comment /* that should ***////  be converted.
/* leave this alone*/// but fix this one

// and "leave these \' \" quotes in a comment alone*
/****  and these '\' too //
*/


enum states {
    state_normal,
    state_double_quote,
    state_single_quote,
    state_c89_comment,
    state_c99_comment
};

enum states current_state = state_normal;

void handle_char( char ch)
{
    static char last_ch = 0;

    switch (current_state) {
        case state_normal:
            if ((last_ch == '/') && (ch == '/')) {
                putchar( '*');  /* NOTE: changing to C89 style comment */
                current_state = state_c99_comment;
            }
            else if ((last_ch == '/') && (ch == '*')) {
                putchar( ch);
                current_state = state_c89_comment;
            }
            else if (ch == '\"') {
                putchar( ch);
                current_state = state_double_quote;
            }
            else if (ch == '\'') {
                putchar( ch);
                current_state = state_single_quote;
            }
            else {
                putchar( ch);
            }
            break;

        case state_double_quote:
            if ((last_ch == '\\') && (ch == '\\')) {
                /* we want to output this \\ escaped sequence, but we */
                /* don't want to 'remember' the current backslash -   */
                /* otherwise we'll mistakenly treat the next character*/
                /* as being escaped                                   */

                putchar( ch);
                ch = 0;
            }
            else if ((ch == '\"') && (last_ch != '\\')) {
                putchar( ch);
                current_state = state_normal;
            }
            else {
                putchar( ch);
            }
            break;

        case state_single_quote:
            if ((last_ch == '\\') && (ch == '\\')) {
                /* we want to output this \\ escaped sequence, but we */
                /* don't want to 'remember' the current backslash -   */
                /* otherwise we'll mistakenly treat the next character*/
                /* as being escaped                                   */

                putchar( ch);
                ch = 0;
            }
            else if ((ch == '\'') && (last_ch != '\\')) {
                putchar( ch);
                current_state = state_normal;
            }
            else {
                putchar( ch);
            }
            break;

        case state_c89_comment:
            if ((last_ch == '*') && (ch == '/')) {
                putchar( ch);
                ch = 0; /* 'forget' the slash so it doesn't affect a possible slash that immediately follows */
                current_state = state_normal;
            }
            else {
                putchar( ch);
            }
            break;

        case state_c99_comment:
            if ((last_ch == '/') && (ch == '*')) {
                /* we want to change any slash-star sequences inside */
                /* what was a C99 comment to something else to avoid */
                /* nested comments                                   */
                putchar( '+');
            }
            else if ((last_ch == '*') && (ch == '/')) {
                /* similarly for star-slash sequences inside */
                /* what was a C99 comment                    */
                putchar( '|');
            }
            else if (ch == '\n') {
                puts( "*/");
                current_state = state_normal;
            }
            else {
                putchar( ch);
            }
            break;
    }

    last_ch = ch;
}

int main(void)
{
    int c;

    while ((c = getchar()) != EOF) {
        handle_char( c);
    }

    return 0;
}

Some indulgent commentary: many years ago, a shop I worked at wanted to impose a coding standard that forbade C99-style comments on the grounds that even though the compiler we used at the time had no problem with them, the code might have to be ported to a compiler that didn't support them. I (and others) successfully argued that that possibility was so remote as to be essentially non-existant, and that even if it did happen, a conversion routine to make the comments compatible could be easily written. We were permitted to use C99/C++ style comments.

I now consider my oath fulfilled, and whatever curse that may have been laid on me to be lifted.

回答2:

If you can't use @ephemient's suggestion, then you'll need to apply your regex across multiple lines, which is not sed's default behaviour. sed has a hold buffer, which allows you to append multiple strings together and apply the regex to the concatenated string.

The sed expression would look like this:

sed '1h;1!H;${;g;s/your-matcher-regex/replacement-regex/g;}'

1h - if it is the first line, put the line into the hold buffer (emptying it first)

1!H - If not the first line, append to the hold buffer

$ {...} - if the last line, execute this sed command

Now your matcher expression will work even if the /* and */ are on different lines.

回答3:

awk '{if($0~/\/\//){sub(/\/\//,"\/\*");$0=$0"*/"};print}' temp

回答4:

Convert code to colored HTML with any convertor that can output different markup for /* and // comments, process the output with perl/awk/sed/whatever, then strip the markup.

回答5:

You can do this (almost) entirely in sed, you just need one call to tr:

translate_comments_prepare.sed

s/\\/\\\\/g  # escape current escape characters
s/\$/\\S/g   # write all occurrences of $ as \S
s/(/\\o/g    # replace open braces with \o
s/)/\\c/g    # replace closing braces with \c
s/$/$/       # add a $ sign to the end of each line
s_/\*_(_g    # replace the start of comments with (
s_\*/_)_g    # replace the end of comments with )

Then we pipe the result of the "preprocessing" step through tr -d '\n' to join all lines (I haven't figured out a good way to do this from within sed).

And then we do the real work:

translate_comments.sed

s_//\([^$]*\)\$_(\1)$_g  # replace all C++ style comments (even nested ones)
:b                       # while loop
                         # remove nested comment blocks:
                         #   (foo(bar)baz) --> (foobarbaz)
s/(\([^()]*\)(\([^()]*\))\([^()]*\))/(\1\2\3)/
tb                       # EOF loop
s_(_/*_g                 # reverse the steps done by the preparation phase
s_)_*/_g                 # ...
s/\$/\n/g                # split lines that were previously joined
s/\\S/$/g                # replace escaped special characters
s/\\o/(/g                # ...
s/\\c/)/g                # ...
s/\\\(.\)/\1/g           # ...

Then we basically put everything together

sed -f translate_comments_prepare.sed | tr -d '\n' | sed translate_comments.sed

回答6:

This might work for you (GNU sed):

sed ':a;$!{N;ba};s/^/\x00/;tb;:b;s/\x00$//;t;s/\x00\(\/\*[^*]*\*\+\([^/*][^*]*\*\+\)*\/\)/\1\x00/;tb;s/\x00\/\/\([^\n]*\)/\/*\1\*\/\x00/;tb;s/\x00\(.\)/\1\x00/;tb' file

Explanation:

:a;$!{N;ba} slurp the file into the pattern space
s/^/\x00/ set a marker N.B. this can be any character not found in the file
tb;:b reset the substitution switch by jumping to the place holder b
s/\x00$//;t marker has reached the end of the file. All done.
s/\x00$\/\*[^*]*\*\+\([^/*][^*]*\*\+$*\/\)/\1\x00/;tb this regexp matches c style comments and bumps the marker passed them if true.
s/\x00\/\/$[^\n]*$/\/*\1\*\/\x00/;tb this regexp matches the single line comment, replaces with c style comments and bumps the marker passed them if true.
s/\x00$.$/\1\x00/;tb this regexp matches any single character and bumps the marker passed it if true.