Best compression algorithm for XML?-第2页回答

I barely know a thing about compression, so bear with me (this is probably a stupid and painfully obvious question).

So lets say I have an XML file with a few tags.

<verylongtagnumberone>
  <verylongtagnumbertwo>
    text
  </verylongtagnumbertwo>
</verylongtagnumberone>

Now lets say I have a bunch of these very long tags with many attributes in my multiple XML files. I need to compress them to the smallest size possible. The best way would be to use an XML-specific algorithm which assigns individual tags pseudonyms like vlt1 or vlt2. However, this wouldn't be as 'open' of a way as I m trying to go for, and I want to use a common algorithm like DEFLATE or LZ. It also helpes if the archive was a .zip file.

Since I'm dealing with plain text (no binary files like images), I'd like an algorithm that suits plain text. Which one produces the smallest file size (lossless algorithms are preferred)?

By the way, the scenario is this: I am creating a standard for documents, like ODF or MS Office XML, that contain XML files, packaged in a .zip.

EDIT: The 'encryption' thing was a typo; it should ave ben 'compression'.

标签： xml algorithm text compression zip

8条回答

疯言疯语

2楼-- · 2019-01-13 11:24

None of the default ones are ideal for XML but you will still get good values since there is a lot of repeatables.

Because XML uses a lot of repeats ( tags . > ) you want these be less than a bit so some form of arithmetic rather than Huffman encoding . So rar / 7zip should be significantly better in theory..these algorithms offer high compression so are slower. Ideally you'd want a simple compression with an arithmetic encoder ( which for XML would be fast and give high compression) .

0人赞添加讨论(0) 举报

Lonely孤独者°

3楼-- · 2019-01-13 11:25

Another alternative to "compress" XML would be FI (Fast Infoset).

XML, stored as FI, would contain every tag and attribute only once, all other occurrences are referencing the first one, thus saving space.

See:

Very good article on java.sun.com, and of course
the Wikipedia entry

The difference to EXI from the compression point of view is that Fast Infoset (being structured plaintext) is less efficient.

Other important difference is: FI is a mature standard with many implementations.
One of them: Fast Infoset Project @ dev.java.net

0人赞添加讨论(0) 举报

上一页 1 2

Best compression algorithm for XML?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间