I do a lot of urgent analysis of large logfile analysis. Often this will require tailing a log and looking for changes.
I'm keen to have a solution that will highlight these changes to make it easier for the eye to track.
I have investigated tools and there doesn't appear to be anything out there that does what I am looking for. I've written some scripts in Perl that do it roughly, but I would like a more complete solution.
Can anyone recommend a tool for this?
I wrote a Python script for this purpose that utilizes difflib.SequenceMatcher
:
#!/usr/bin/python3
from difflib import SequenceMatcher
from itertools import tee
from sys import stdin
def pairwise(iterable):
"""s -> (s0,s1), (s1,s2), (s2, s3), ...
https://docs.python.org/3/library/itertools.html#itertools-recipes
"""
a, b = tee(iterable)
next(b, None)
return zip(a, b)
def color(c, s):
"""Wrap string s in color c.
Based on http://stackoverflow.com/a/287944/1916449
"""
try:
lookup = {'r':'\033[91m', 'g':'\033[92m', 'b':'\033[1m'}
return lookup[c] + str(s) + '\033[0m'
except KeyError:
return s
def diff(a, b):
"""Returns a list of paired and colored differences between a and b."""
for tag, i, j, k, l in SequenceMatcher(None, a, b).get_opcodes():
if tag == 'equal': yield 2 * [color('w', a[i:j])]
if tag in ('delete', 'replace'): yield color('r', a[i:j]), ''
if tag in ('insert', 'replace'): yield '', color('g', b[k:l])
if __name__ == '__main__':
for a, b in pairwise(stdin):
print(*map(''.join, zip(*diff(a, b))), sep='')
Example input.txt
:
108 finished /tmp/ts-out.5KS8bq 0 435.63/429.00/6.29 ./eval.exe -z 30
107 finished /tmp/ts-out.z0tKmX 0 456.10/448.36/7.26 ./eval.exe -z 30
110 finished /tmp/ts-out.wrYCrk 0 0.00/0.00/0.00 tail -n 1
111 finished /tmp/ts-out.HALY18 0 460.65/456.02/4.47 ./eval.exe -z 30
112 finished /tmp/ts-out.6hdkH5 0 292.26/272.98/19.12 ./eval.exe -z 1000
113 finished /tmp/ts-out.eFBgoG 0 837.49/825.82/11.34 ./eval.exe -z 10
Output of cat input.txt | ./linediff.py
:
Levenshtein distance
Wikipedia:
Levenshtein distance between two strings is minimum number of operations needed to transform one string into the other, where an operation is an insertion, deletion, or substitution of a single character.
public static int LevenshteinDistance(char[] s1, char[] s2) {
int s1p = s1.length, s2p = s2.length;
int[][] num = new int[s1p + 1][s2p + 1];
// fill arrays
for (int i = 0; i <= s1p; i++)
num[i][0] = i;
for (int i = 0; i <= s2p; i++)
num[0][i] = i;
for (int i = 1; i <= s1p; i++)
for (int j = 1; j <= s2p; j++)
num[i][j] = Math.min(Math.min(num[i - 1][j] + 1,
num[i][j - 1] + 1), num[i - 1][j - 1]
+ (s1[i - 1] == s2[j - 1] ? 0 : 1));
return num[s1p][s2p];
}
Sample App in Java
String Diff
Application uses LCS algorithm to concatenate 2 text inputs into 1. Result will contain minimal set of instructions to make one string for the other. Below the instruction concatenated text is displayed.
Download application:
String Diff.jar
Download source:
Diff.java
http://neil.fraser.name/software/diff_match_patch/svn/trunk/demos/demo_diff.html
.. this look promising, will update this with more info when Ive played more..