I'm posting this as a Q&A to document a workaround for a problem that seems to come up frequently—how to put MySQL Workbench files under version control—but for which I have been unable to find any solutions. Feedback is welcome!
How can I tell Mercurial to diff
the contents of a zipped archive and ignore some of the changes to those contents? Specifically, how can I use hg
to diff the contents of a MySQL Workbench (.mwb
) file, ignoring the many unimportant changes that MySQL Workbench makes every time the file is opened? Can I use a custom script that ignores certain irrelevant changes?
Background
I am trying to diff
a file in an hg repository. The file, document.mwb.xml
, is an XML document extracted from a .mwb
file (a MySQL Workbench model file). Basically, I am looking to keep the model's contents—the table structure, visual model, etc.—under version control, but not commit the .mwb
file itself, which is a zip archive and thus a binary file.
Anytime I save the .mwb
file, I unzip it. I keep the unzipped contents in my repository, and just zip them up again when I need to work with the .mwb
in MySQL.
The XML in question looks like this:
<?xml version="1.0"?>
<data grt_format="2.0" document_type="MySQL Workbench Model" version="1.4.4">
<value type="object" struct-name="workbench.Document" id="8551CCFA-3AD0-4207-BC76-15ED589CF22C" struct-checksum="0x7131bf99">
<value type="object" struct-name="workbench.logical.Model" id="B48E1CD2-3386-40B7-8E59-AA191598F667" struct-checksum="0xf4220370" key="logicalModel">
<value _ptr_="0x7fbcd1cc3270" type="list" content-type="object" content-struct-name="workbench.logical.Diagram" key="diagrams"/>
<value _ptr_="0x7fbcd1cc3210" type="dict" key="customData"/>
<value _ptr_="0x7fbcd1cc32d0" type="list" content-type="object" content-struct-name="model.Marker" key="markers"/>
<value _ptr_="0x7fbcd1cc3330" type="dict" key="options"/>
<value type="string" key="name"></value>
<link type="object" struct-name="GrtObject" key="owner">8551CCFA-3AD0-4207-BC76-15ED589CF22C</link>
</value>
<value _ptr_="0x7fbcd1cc2b70" type="list" content-type="object" content-struct-name="workbench.OverviewPanel" key="overviewPanels"/>
<value _ptr_="0x7fbcd1cc2c00" type="list" content-type="object" content-struct-name="workbench.physical.Model" key="physicalModels">
<value type="object" struct-name="workbench.physical.Model" id="34B9E967-5C9B-4D1B-8759-C417F6C33AA3" struct-checksum="0x5f896d18">
...
The problem is all of those _ptr_
attributes: there are literally thousands of them in this file, and every one of them changes every single time the file is saved, even if nothing is modified. As a result, the repository can rapidly get cluttered with completely meaningless "changes" to this file.
Is there a way to use a custom diff
routine to ignore these irrelevant changes?
I have not found a true solution, but I have developed a satisfactory workaround, inspired by this mwb-diff gist. This allows me to unzip and diff the
.mwb
file's contents, commit those contents and their changes to the repository, and use the.mwb
normally when necessary.Project Structure
My project is set up like this:
I save the
.mwb
file - call itMyModel.mwb
- inproject_root/schema
. Obviously, you can use a different structure, but you will need to modify the instructions below accordingly.The Scripts
I created the following scripts and saved them in
project_root/schema
:unpack.sh
pack.sh
Getting Mercurial Ready to Rock
We need to tell
hg
to ignore the model (and all other.mwb
files). Also, when MySQL Workbench is open, it adds alock
file to the.mwb
archive, which we need to ignore. So, add these lines to your.hgignore
file:How to Use the Scripts
When you want to modify the
.mwb
file, rebuild it from its components by runningpack.sh
above. This could be added as a hook to happen automatically when youhg pull
, update, etc., but I haven't explored this, yet.When you are done editing the
.mwb
file and want to commit your changes, run theunpack.sh
script. If you want, you can set up a file-watching utility on your system to do this automatically when the file changes, but that's beyond the scope of this answer.The Results
Mercurial is now perfectly happy to track changes to the contents of your
.mwb
file without tracking thousands of apparently-useless_ptr_
attributes. Also, while I am using this with Mercurial, the basic logic (and the shell scripts) will work with git, SVN, etc.IMPORTANT CAVEAT: As far as I can tell, the
_ptr_
attributes are irrelevant. The scripts I have posted above actually replace the contents of those attributes._ptr_="0x98a7b3e4"
(or whatever) becomes_ptr_"xxx"
. Based on my testing, this doesn't matter, and MySQL Workbench will happily work with the reconstituted file, apparently disregarding the_ptr_
values. I may be wrong, and these values may matter! You are strongly encouraged to test this for yourself before relying on my solution.