I have a large codebase that was forked from the original project and I'm trying to track down all the differences from the original. A lot of the file edits consist of commented out debugging code and other miscellaneous comments. The GUI diff/merge tool called Meld under Ubuntu can ignore comments, but only single line comments.
Is there any other convenient way of finding only the non-comment diffs, either using a GUI tool or linux command line tools? In case it makes a difference, the code is a mixture of PHP and Javascript, so I'm primarily interested in ignoring //
, /* */
and #
.
See our Smart Differencer line of tools, which compare computer language source files using the langauge structure rather than the layout as a guide. This in particular means it ignores comments and whitespace in comparing code.
There is a SmartDifferencer for PHP.
To use visual diff, you can try DiffMerge. Its rulesets and options provide for customized behavior.
From the command-line perspective, you can use
--ignore-matching-lines=RE
option fordiff
, for example:Please note that the regex has to match the corresponding line in both files and it matches every changed line in the hunk in order to work, otherwise it'll still show the difference.
Use single quotes to protect pattern from shell expanding and to escape the regex-reserved characters (e.g. brackets).
We can read in
diffutils
manual:This behavior is also well explained by armel here.
gnu diff supports ignoring lines wich match a regular expression:
and for folders:
This would ignore all lines which start with a # at the line beginning.
You can filter both files through stripcmt first which will remove C and C++ comments. For removing
#
comments,sed 's/#.*//'
will remove those.Of course you will loose some context when removing comments first, but on the other hand differences in comments will not make any problems. I think I would have done it like the following (described for a single file, automate as required):
A
and the latest of the copied code base isB
, let's call the versions with comments removed forA'
andB'
(e.g. save those to temporarily files while processing).O'
(alternatively just re-useB'
for this).O'
,A'
andB'
and save toC'
. KDiff3 is an excellent tool for this.C'
is without comments, so get back into "normal" mode, do a new 3-way merge withA'
as base andA
andC'
. This will pick up the changes betweenA'
andC'
(which is the code changes what you want) into the normal code base with comments based on versionA
.Drawing version trees on paper is before you start is highly recommended to get a clear picture of which versions you want to work on. But don't be limited of what the tree is showing, you can merge any version and in any direction if you just figure out what versions to use.
Try:
See: Regular expression at Wikipedia
Below are examples of regular expressions that would cause a diff to ignore a preprocessor directive and both standard comment block types.
In example:
Far from perfect but it will give an idea of the differences