可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
The problem I am facing is with an ANSI compiler that requires C style comments.
So I am trying to convert my existing comments to comply with the C standard ISO C89.
I am looking for a SED expression to replace // comments with /* comments EXCEPT when // comments appear within /* */ comments (which would break the comment).
I have tried this (a range expression) to no avail:
sed -e '/\/*/,/*\//! s_//\(.*\)_/*\1 */_' > filename
Will something work to ignore the 1 line comments inside a comment like this but change everything else?
/**********************************
* Some comment
* an example bit of code within the comment followed by a //comment
* some more comment
***********************************/
y = x+7; //this comment must be changed
Thanks!
回答1:
Here's a lightly tested filter written in C that should perform the conversion you want. Some comments about what this filter does that are difficult if not impossible to handle with a regex:
- it ignores comment-like sequences that are enclosed in quotes (since they aren't comments)
- if a C99 comment that is being converted contains something that would start or end a C89 comment, it munges that sequence so there will be no nested comment or premature end to the comment (a nested
/*
or */
gets changed to /+
or /|
). I wasn't sure if you needed this or not (if you don't, it should be easy to remove)
- the above munging of nested comments only occurs in a C99 comment that's being converted - the contents of comments that are already C89 style are not changed.
- it does not handle trigraphs or digraphs (I think this only allows the possibility of missing an escape sequence or end of line continuation that is initiated with the trigraph
??/
).
Of course, you'll need to perform your own testing to determine if it's suitable for your purposes.
#include <stdio.h>
char* a = " this is /* a test of \" junk // embedded in a '\' string";
char* b = "it should be left alone//";
// comment /* that should ***//// be converted.
/* leave this alone*/// but fix this one
// and "leave these \' \" quotes in a comment alone*
/**** and these '\' too //
*/
enum states {
state_normal,
state_double_quote,
state_single_quote,
state_c89_comment,
state_c99_comment
};
enum states current_state = state_normal;
void handle_char( char ch)
{
static char last_ch = 0;
switch (current_state) {
case state_normal:
if ((last_ch == '/') && (ch == '/')) {
putchar( '*'); /* NOTE: changing to C89 style comment */
current_state = state_c99_comment;
}
else if ((last_ch == '/') && (ch == '*')) {
putchar( ch);
current_state = state_c89_comment;
}
else if (ch == '\"') {
putchar( ch);
current_state = state_double_quote;
}
else if (ch == '\'') {
putchar( ch);
current_state = state_single_quote;
}
else {
putchar( ch);
}
break;
case state_double_quote:
if ((last_ch == '\\') && (ch == '\\')) {
/* we want to output this \\ escaped sequence, but we */
/* don't want to 'remember' the current backslash - */
/* otherwise we'll mistakenly treat the next character*/
/* as being escaped */
putchar( ch);
ch = 0;
}
else if ((ch == '\"') && (last_ch != '\\')) {
putchar( ch);
current_state = state_normal;
}
else {
putchar( ch);
}
break;
case state_single_quote:
if ((last_ch == '\\') && (ch == '\\')) {
/* we want to output this \\ escaped sequence, but we */
/* don't want to 'remember' the current backslash - */
/* otherwise we'll mistakenly treat the next character*/
/* as being escaped */
putchar( ch);
ch = 0;
}
else if ((ch == '\'') && (last_ch != '\\')) {
putchar( ch);
current_state = state_normal;
}
else {
putchar( ch);
}
break;
case state_c89_comment:
if ((last_ch == '*') && (ch == '/')) {
putchar( ch);
ch = 0; /* 'forget' the slash so it doesn't affect a possible slash that immediately follows */
current_state = state_normal;
}
else {
putchar( ch);
}
break;
case state_c99_comment:
if ((last_ch == '/') && (ch == '*')) {
/* we want to change any slash-star sequences inside */
/* what was a C99 comment to something else to avoid */
/* nested comments */
putchar( '+');
}
else if ((last_ch == '*') && (ch == '/')) {
/* similarly for star-slash sequences inside */
/* what was a C99 comment */
putchar( '|');
}
else if (ch == '\n') {
puts( "*/");
current_state = state_normal;
}
else {
putchar( ch);
}
break;
}
last_ch = ch;
}
int main(void)
{
int c;
while ((c = getchar()) != EOF) {
handle_char( c);
}
return 0;
}
Some indulgent commentary: many years ago, a shop I worked at wanted to impose a coding standard that forbade C99-style comments on the grounds that even though the compiler we used at the time had no problem with them, the code might have to be ported to a compiler that didn't support them. I (and others) successfully argued that that possibility was so remote as to be essentially non-existant, and that even if it did happen, a conversion routine to make the comments compatible could be easily written. We were permitted to use C99/C++ style comments.
I now consider my oath fulfilled, and whatever curse that may have been laid on me to be lifted.
回答2:
If you can't use @ephemient's suggestion, then you'll need to apply your regex across multiple lines, which is not sed's default behaviour. sed has a hold buffer, which allows you to append multiple strings together and apply the regex to the concatenated string.
The sed expression would look like this:
sed '1h;1!H;${;g;s/your-matcher-regex/replacement-regex/g;}'
1h
- if it is the first line, put the line into the hold buffer (emptying it first)
1!H
- If not the first line, append to the hold buffer
$ {...}
- if the last line, execute this sed command
Now your matcher expression will work even if the /* and */ are on different lines.
回答3:
awk '{if($0~/\/\//){sub(/\/\//,"\/\*");$0=$0"*/"};print}' temp
回答4:
Convert code to colored HTML with any convertor that can output different markup for /*
and //
comments, process the output with perl/awk/sed/whatever, then strip the markup.
回答5:
You can do this (almost) entirely in sed, you just need one call to tr
:
translate_comments_prepare.sed
s/\\/\\\\/g # escape current escape characters
s/\$/\\S/g # write all occurrences of $ as \S
s/(/\\o/g # replace open braces with \o
s/)/\\c/g # replace closing braces with \c
s/$/$/ # add a $ sign to the end of each line
s_/\*_(_g # replace the start of comments with (
s_\*/_)_g # replace the end of comments with )
Then we pipe the result of the "preprocessing" step through tr -d '\n'
to join all lines (I haven't figured out a good way to do this from within sed
).
And then we do the real work:
translate_comments.sed
s_//\([^$]*\)\$_(\1)$_g # replace all C++ style comments (even nested ones)
:b # while loop
# remove nested comment blocks:
# (foo(bar)baz) --> (foobarbaz)
s/(\([^()]*\)(\([^()]*\))\([^()]*\))/(\1\2\3)/
tb # EOF loop
s_(_/*_g # reverse the steps done by the preparation phase
s_)_*/_g # ...
s/\$/\n/g # split lines that were previously joined
s/\\S/$/g # replace escaped special characters
s/\\o/(/g # ...
s/\\c/)/g # ...
s/\\\(.\)/\1/g # ...
Then we basically put everything together
sed -f translate_comments_prepare.sed | tr -d '\n' | sed translate_comments.sed
回答6:
This might work for you (GNU sed):
sed ':a;$!{N;ba};s/^/\x00/;tb;:b;s/\x00$//;t;s/\x00\(\/\*[^*]*\*\+\([^/*][^*]*\*\+\)*\/\)/\1\x00/;tb;s/\x00\/\/\([^\n]*\)/\/*\1\*\/\x00/;tb;s/\x00\(.\)/\1\x00/;tb' file
Explanation:
:a;$!{N;ba}
slurp the file into the pattern space
s/^/\x00/
set a marker N.B. this can be any character not found in the file
tb;:b
reset the substitution switch by jumping to the place holder b
s/\x00$//;t
marker has reached the end of the file. All done.
s/\x00\(\/\*[^*]*\*\+\([^/*][^*]*\*\+\)*\/\)/\1\x00/;tb
this regexp matches c style comments and bumps the marker passed them if true.
s/\x00\/\/\([^\n]*\)/\/*\1\*\/\x00/;tb
this regexp matches the single line comment, replaces with c style comments and bumps the marker passed them if true.
s/\x00\(.\)/\1\x00/;tb
this regexp matches any single character and bumps the marker passed it if true.