Is it possible to parse a file line by line, and edit a line in-place while going through the lines?
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to get the background from multiple images by
- Evil ctypes hack in python
- Correctly parse PDF paragraphs with Python
If you only intend to perform localized changes that do not change the length of the part of the file that is modified (e.g. changing all characters to lower case), then you can actually overwrite the old contents of the file dynamically.
To do that, you can use random file access with the
seek()
method of afile
object.Alternatively, you may be able to use an
mmap
object to treat the whole file as a mutable string. Keep in mind thatmmap
objects may impose a maximum file-size limit in the 2-4 GB range on a 32-bit CPU, depending on your operating system and its configuration.It can be simulated using a backup file as stdlib's
fileinput
module does.Here's an example script that removes lines that do not satisfy
some_condition
from files given on the command line orstdin
:Example:
On completion
first_file.txt
andsecond_file.txt
files will contain only lines that satisfysome_condition()
predicate.fileinput module has very ugly API, I find beautiful module for this task - in_place, example for Python 3:
main difference from fileinput:
No. You cannot safely write to a file you are also reading, as any changes you make to the file could overwrite content you have not read yet. To do it safely you'd have to read the file into a buffer, updating any lines as required, and then re-write the file.
If you're replacing byte-for-byte the content in the file (i.e. if the text you are replacing is the same length as the new string you are replacing it with), then you can get away with it, but it's a hornets nest, so I'd save yourself the hassle and just read the full file, replace content in memory (or via a temporary file), and write it out again.
You have to back up by the size of the line in characters. Assuming you used
readline
, then you can get the length of the line and back up using:Set whence to
SEEK_CUR
, set offset to-length
.See Python Docs or look at the manpage for
seek
.