sed to replace // with /* */ comments EXCEPT when

2019-06-04 08:11发布

The problem I am facing is with an ANSI compiler that requires C style comments.

So I am trying to convert my existing comments to comply with the C standard ISO C89.

I am looking for a SED expression to replace // comments with /* comments EXCEPT when // comments appear within /* */ comments (which would break the comment).

I have tried this (a range expression) to no avail:

sed -e '/\/*/,/*\//! s_//\(.*\)_/*\1 */_' > filename

Will something work to ignore the 1 line comments inside a comment like this but change everything else?

/**********************************
* Some comment
* an example bit of code within the comment followed by a //comment
* some more comment
***********************************/
y = x+7; //this comment must be changed

Thanks!

标签: c sed comments
6条回答
迷人小祖宗
2楼-- · 2019-06-04 08:18
awk '{if($0~/\/\//){sub(/\/\//,"\/\*");$0=$0"*/"};print}' temp
查看更多
Explosion°爆炸
3楼-- · 2019-06-04 08:23

Convert code to colored HTML with any convertor that can output different markup for /* and // comments, process the output with perl/awk/sed/whatever, then strip the markup.

查看更多
Emotional °昔
4楼-- · 2019-06-04 08:24

If you can't use @ephemient's suggestion, then you'll need to apply your regex across multiple lines, which is not sed's default behaviour. sed has a hold buffer, which allows you to append multiple strings together and apply the regex to the concatenated string.

The sed expression would look like this:

sed '1h;1!H;${;g;s/your-matcher-regex/replacement-regex/g;}'

1h - if it is the first line, put the line into the hold buffer (emptying it first)

1!H - If not the first line, append to the hold buffer

$ {...} - if the last line, execute this sed command

Now your matcher expression will work even if the /* and */ are on different lines.

查看更多
祖国的老花朵
5楼-- · 2019-06-04 08:28

This might work for you (GNU sed):

sed ':a;$!{N;ba};s/^/\x00/;tb;:b;s/\x00$//;t;s/\x00\(\/\*[^*]*\*\+\([^/*][^*]*\*\+\)*\/\)/\1\x00/;tb;s/\x00\/\/\([^\n]*\)/\/*\1\*\/\x00/;tb;s/\x00\(.\)/\1\x00/;tb' file

Explanation:

  • :a;$!{N;ba} slurp the file into the pattern space
  • s/^/\x00/ set a marker N.B. this can be any character not found in the file
  • tb;:b reset the substitution switch by jumping to the place holder b
  • s/\x00$//;t marker has reached the end of the file. All done.
  • s/\x00\(\/\*[^*]*\*\+\([^/*][^*]*\*\+\)*\/\)/\1\x00/;tb this regexp matches c style comments and bumps the marker passed them if true.
  • s/\x00\/\/\([^\n]*\)/\/*\1\*\/\x00/;tb this regexp matches the single line comment, replaces with c style comments and bumps the marker passed them if true.
  • s/\x00\(.\)/\1\x00/;tb this regexp matches any single character and bumps the marker passed it if true.
查看更多
Root(大扎)
6楼-- · 2019-06-04 08:43

Here's a lightly tested filter written in C that should perform the conversion you want. Some comments about what this filter does that are difficult if not impossible to handle with a regex:

  • it ignores comment-like sequences that are enclosed in quotes (since they aren't comments)
  • if a C99 comment that is being converted contains something that would start or end a C89 comment, it munges that sequence so there will be no nested comment or premature end to the comment (a nested /* or */ gets changed to /+ or /|). I wasn't sure if you needed this or not (if you don't, it should be easy to remove)
  • the above munging of nested comments only occurs in a C99 comment that's being converted - the contents of comments that are already C89 style are not changed.
  • it does not handle trigraphs or digraphs (I think this only allows the possibility of missing an escape sequence or end of line continuation that is initiated with the trigraph ??/).

Of course, you'll need to perform your own testing to determine if it's suitable for your purposes.

#include <stdio.h>

char* a = " this is /* a test of \" junk // embedded in a '\' string";
char* b = "it should be left alone//";

// comment /* that should ***////  be converted.
/* leave this alone*/// but fix this one

// and "leave these \' \" quotes in a comment alone*
/****  and these '\' too //
*/


enum states {
    state_normal,
    state_double_quote,
    state_single_quote,
    state_c89_comment,
    state_c99_comment
};

enum states current_state = state_normal;

void handle_char( char ch)
{
    static char last_ch = 0;

    switch (current_state) {
        case state_normal:
            if ((last_ch == '/') && (ch == '/')) {
                putchar( '*');  /* NOTE: changing to C89 style comment */
                current_state = state_c99_comment;
            }
            else if ((last_ch == '/') && (ch == '*')) {
                putchar( ch);
                current_state = state_c89_comment;
            }
            else if (ch == '\"') {
                putchar( ch);
                current_state = state_double_quote;
            }
            else if (ch == '\'') {
                putchar( ch);
                current_state = state_single_quote;
            }
            else {
                putchar( ch);
            }
            break;

        case state_double_quote:
            if ((last_ch == '\\') && (ch == '\\')) {
                /* we want to output this \\ escaped sequence, but we */
                /* don't want to 'remember' the current backslash -   */
                /* otherwise we'll mistakenly treat the next character*/
                /* as being escaped                                   */

                putchar( ch);
                ch = 0;
            }
            else if ((ch == '\"') && (last_ch != '\\')) {
                putchar( ch);
                current_state = state_normal;
            }
            else {
                putchar( ch);
            }
            break;

        case state_single_quote:
            if ((last_ch == '\\') && (ch == '\\')) {
                /* we want to output this \\ escaped sequence, but we */
                /* don't want to 'remember' the current backslash -   */
                /* otherwise we'll mistakenly treat the next character*/
                /* as being escaped                                   */

                putchar( ch);
                ch = 0;
            }
            else if ((ch == '\'') && (last_ch != '\\')) {
                putchar( ch);
                current_state = state_normal;
            }
            else {
                putchar( ch);
            }
            break;

        case state_c89_comment:
            if ((last_ch == '*') && (ch == '/')) {
                putchar( ch);
                ch = 0; /* 'forget' the slash so it doesn't affect a possible slash that immediately follows */
                current_state = state_normal;
            }
            else {
                putchar( ch);
            }
            break;

        case state_c99_comment:
            if ((last_ch == '/') && (ch == '*')) {
                /* we want to change any slash-star sequences inside */
                /* what was a C99 comment to something else to avoid */
                /* nested comments                                   */
                putchar( '+');
            }
            else if ((last_ch == '*') && (ch == '/')) {
                /* similarly for star-slash sequences inside */
                /* what was a C99 comment                    */
                putchar( '|');
            }
            else if (ch == '\n') {
                puts( "*/");
                current_state = state_normal;
            }
            else {
                putchar( ch);
            }
            break;
    }

    last_ch = ch;
}

int main(void)
{
    int c;

    while ((c = getchar()) != EOF) {
        handle_char( c);
    }

    return 0;
}

Some indulgent commentary: many years ago, a shop I worked at wanted to impose a coding standard that forbade C99-style comments on the grounds that even though the compiler we used at the time had no problem with them, the code might have to be ported to a compiler that didn't support them. I (and others) successfully argued that that possibility was so remote as to be essentially non-existant, and that even if it did happen, a conversion routine to make the comments compatible could be easily written. We were permitted to use C99/C++ style comments.

I now consider my oath fulfilled, and whatever curse that may have been laid on me to be lifted.

查看更多
We Are One
7楼-- · 2019-06-04 08:44

You can do this (almost) entirely in sed, you just need one call to tr:

translate_comments_prepare.sed

s/\\/\\\\/g  # escape current escape characters
s/\$/\\S/g   # write all occurrences of $ as \S
s/(/\\o/g    # replace open braces with \o
s/)/\\c/g    # replace closing braces with \c
s/$/$/       # add a $ sign to the end of each line
s_/\*_(_g    # replace the start of comments with (
s_\*/_)_g    # replace the end of comments with )

Then we pipe the result of the "preprocessing" step through tr -d '\n' to join all lines (I haven't figured out a good way to do this from within sed).

And then we do the real work:

translate_comments.sed

s_//\([^$]*\)\$_(\1)$_g  # replace all C++ style comments (even nested ones)
:b                       # while loop
                         # remove nested comment blocks:
                         #   (foo(bar)baz) --> (foobarbaz)
s/(\([^()]*\)(\([^()]*\))\([^()]*\))/(\1\2\3)/
tb                       # EOF loop
s_(_/*_g                 # reverse the steps done by the preparation phase
s_)_*/_g                 # ...
s/\$/\n/g                # split lines that were previously joined
s/\\S/$/g                # replace escaped special characters
s/\\o/(/g                # ...
s/\\c/)/g                # ...
s/\\\(.\)/\1/g           # ...

Then we basically put everything together

sed -f translate_comments_prepare.sed | tr -d '\n' | sed translate_comments.sed
查看更多
登录 后发表回答