I want to append a file to the tar file. For example, the files in test.tar.gz
are a.png, b.png, c.png
. I have a new png file named a.png
, I want to append to a.png
to test.tar.gz
and cover the old file a.png
in test.tar.gz
. My code:
import tarfile
a = tarfile.open('test.tar.gz', 'w:gz')
a.add('a.png')
a.close()
then, all the files in test.tar.gz
disappeard but a.png
, if I change my code to this:
import tarfile
a = tarfile.open('test.tar.gz', 'a:')# or a:gz
a.add('a.png')
a.close()
the program is crashed, error log:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/tarfile.py", line 1678, in open
return func(name, filemode, fileobj, **kwargs)
File "/usr/lib/python2.7/tarfile.py", line 1705, in taropen
return cls(name, mode, fileobj, **kwargs)
File "/usr/lib/python2.7/tarfile.py", line 1588, in __init__
raise ReadError(str(e))
tarfile.ReadError: invalid header
What are my mistakes and what should I do?
Update. From the documentation, it follows that gz
files cannot be open in a
mode. If so, what is the best way to add or update files in an existing archive?
From
tarfile
documentation:So I guess you should decompress it using
gzip
library, add the files using thea:
mode intarfile
, and then compress again usinggzip
.David Dale asks:
Short answer:
I tried to do it in memory using
gzip
's andtarfile
's and file/stream interfaces but did not manage to get it running - the tarball has to be rewritten anyway, since replacing a file is apparently not possible. So it's better to just unpack the whole archive.Wikipedia on tar, gzip.
The script, if run directly, also tries to generates the test images "a.png, b.png, c.png, new.png" (requiring Pillow) and the initial archive "test.tar.gz" if they don't exist. It then decompresses the archive into a temporary directory, overwrites "a.png" with the contents of "new.png", and packs all files, overwriting the original archive. Here are the individual files:
Of course the script's functions can also be run sequentially in interactive mode, in order to have a chance to look at the files. Assuming the script's filename is "t.py":
Here we go (the essential part is in
replace_file()
):If you want to add files instead of replacing them, obviously just omit the line that replaces the temporary file, and copy the additional files into the temp directory. Make sure that
pathlib.Path.iterdir
then also "sees" the new files to be added to the new archive.I've put this in a somewhat more useful function:
And a few "tests" as example:
shutil
also supports archives, but not adding files to one:https://docs.python.org/3/library/shutil.html#archiving-operations
Here's adding a file by extracting to memory using io.BytesIO, adding, and compressing:
it prints
Optimizations are welcome!