What is the best way to do this for PHP? Is there any PHP function that can do this, considering the column content could be very large?
If PHP function is not available, what shell utility can I call?
thanks
What is the best way to do this for PHP? Is there any PHP function that can do this, considering the column content could be very large?
If PHP function is not available, what shell utility can I call?
thanks
there aren't any built in diff functions in PHP. but, hooray for PEAR: Text_Diff (never used it tough, but in PEAR i trust).
and there's even a PECL package, xdiff
hint: pear classes are pure php, pecl packages are modules. normally, modules are faster than classes, but it also depends on the functionality. you just have to test and evaluate.
for storing: i'd store the plain text, not the diffs. space is cheap, many databases (e.g. mysql) support data compression (or you could de/compress in php), but if you store the plaintext, you're independent of the diff algorithm and can change it later if needed.
if you need speed, you could store both the plaintexts AND the diffs.
The usual process is to store each distinct revision and calculate the diff when the user wants to view it, perhaps caching that output if the process is expensive.
You could store a base revision, and then have a set of diffs which can be applied to obtain previous versions.
In either case, there's a concession to be made; the first option makes retrieving specific versions inexpensive, at the cost of a little more processing required for diffing; the second makes diff viewing cheap, but at the cost of potentially expensive incremental patching to obtain a specific version.
If the content is very large and the changes are only minor ones, you may consider to go for a "reverse delta" approach: only the latest version of the text is stored in full format, and the previous vesion is a diff from the latest version to the previous.
It would spare a lot of storage space, but comparing two versions where the number of modifications is large, the cost of the process could be considerable. After all, it is always a tradeoff between storage space and processing requirements.
If you can't or don't want to user PEAR and PECL, you still can use the diff utility called by exec. I'd go for a standard diff format for sure, and never develop an own one.