read thunderbird address mab files content

2020-07-08 06:23发布

问题:

I have several address list's on my TBIRD address book.

every time I need to edit an address that is contained in several lists, is a pain on the neck to find which list contains the address to be modified.

As a help tool I want to read the several files and just gave the user a list of which xxx.MAB files includes the searched address on just one search.

having the produced list, the user can simply go to edit just the right address list's.

Will like to know a minimum about the format of mentioned MAB files, so I can OPEN + SEARCH for strings into the files.

thanks in advance

juan

PD have asked mozilla forum, but there are no plans from mozilla to consolidate the address on one master file and have the different list's just containing links to the master. There is one individual thinking to do that, but he has no idea when due to lack of resources,

on this forum there is a similar question mentioning MORK files, but my actual TBIRD looks like to have all addresses contained on MAB files

回答1:

I am afraid there is no answer that will give you a proper solution for this question.

MORK is a textual database containing the files Address Book Data (.mab files) and Mail Folder Summaries (.msf files).

The format, written by David McCusker, is a mix of various numerical namespaces and is undocumented and seem to no longer be developed/maintained/supported. The only way you would be able to get the grips of it is to reverse engineer it parallel with looking at source code using this format.

However, there have been experienced people trying to write parsers for this file format without any success. According to Wikipedia former Netscape engineer Jamie Zawinski had this to say about the format:

...the single most brain-damaged file format that I have ever seen in my nineteen year career

This page states the following:

In brief, let's count its (Mork's) sins:

  • Two different numerical namespaces that overlap.
  • It can't decide what kind of character-quoting syntax to use: Backslash? Hex encoding with dollar-sign?
  • C++ line comments are allowed sometimes, but sometimes // is just a pair of characters in a URL.
  • It goes to all this serious compression effort (two different string-interning hash tables) and then writes out Unicode strings without using UTF-8: writes out the unpacked wchar_t characters!
  • Worse, it hex-encodes each wchar_t with a 3-byte encoding, meaning the file size will be 3x or 6x (depending on whether whchar_t is 2 bytes or 4 bytes.)
  • It masquerades as a "textual" file format when in fact it's just another binary-blob file, except that it represents all its magic numbers in ASCII. It's not human-readable, it's not hand-editable, so the only benefit there is to the fact that it uses short lines and doesn't use binary characters is that it makes the file bigger. Oh wait, my mistake, that isn't actually a benefit at all."

The frustration shines through here and it is obviously not a simple task.

Consequently there apparently exist no parsers outside Mozilla products that is actually able to parse this format.

I have reversed engineered complex file formats in the past and know it can be done with the patience and right amount of energy.

Sadly, this seem to be your only option as well. A good place to start would be to take a look at Thunderbird's source code.

I know this doesn't give you a straight-up solution but I think it is the only answer to the question considering the circumstances for this format.

And of course, you can always look into the extension API to see if that allows you to access the data you need in a more structured way than handling the file format directly.



回答2:

Sample code which reads mork

Node.js: https://www.npmjs.com/package/mork-parser

Perl: http://metacpan.org/pod/Mozilla::Mork

Python: https://github.com/KevinGoodsell/mork-converter

More links: https://wiki.mozilla.org/Mork