I wrote a method to remove single line comments from a C++ source file:
def stripRegularComments(text) {
def builder = new StringBuilder()
text.eachLine {
def singleCommentPos = it.indexOf("//")
def process = true
if(singleCommentPos > -1)
{
def counter = 0
it.eachWithIndex
{ obj,i ->
if((obj == '\'') || (obj == '"'))
counter++
if(i == singleCommentPos)
{
process = ((counter % 2) == 1)
if(!process)
return
}
}
if(!process)
{
def line = it.substring(0,singleCommentPos)
builder << line << "\n"
}
else
{
builder << it << "\n"
}
}
else
{
builder << it << "\n"
}
}
return builder.toString()
}
And I tested it with:
println a.stripRegularComments("""
this is a test inside double quotes "//inside double quotes"
this is a test inside single quotes '//inside single quotes'
two// a comment?//other
single //comment
""")
It produces this output:
this is a test inside double quotes "//inside double quotes" this is a test inside single quotes '//inside single quotes' two single
Are there some cases I'm missing?
You don't seem to handle escaped quotes, like:
versus
I think you can't handle
and this is also likely to make problems:
The fun ones are formed by trigraphs and line continuations. My personal favorite is:
The handling of
\
character at the end of the line is performed at the earlier translation phase (phase 2) than replacement of comments (phase 3). For this reason, a//
comment can actually occupy more than one line in the original source fileP.S. Oh... I see this is already posted. OK, I'll keep it alive just for mentioning phases of translation :)
This is always a favourite: