I'm trying to edit a text file in-place in python. It is very large (so loading it into memory is not an option). I intend to replace byte-for-byte strings I find inside.
with f as open("filename.txt", "r+b"):
if f.read(8) == "01234567":
f.seek(-8, 1)
f.write("87654321")
However, the write() operation adds onto the end of the file when I tried it:
>>> n.read()
'sdf'
>>> n.read(1)
''
>>> n.seek(0,0)
>>> n.read(1)
's'
>>> n.read(1)
'd'
>>> n.write("sdf")
>>> n.read(1)
''
>>> n.seek(0,0)
>>> n.read()
'sdfsdf'
`
I want the result of that to be sdsdf
.
You can check the difference of following codes:
The pointer of .write is originally at the end of the file. Only .seek() will change its position, but not .read(). So you have to call .seek() before writing the bytes. The following code works well:
The original ANSI / ISO C standards required a seek operation when switching a read-write mode stream from read mode to write mode, and vice versa. This restriction persists, e.g., n1570 includes this text:
For whatever reason this restriction has been imported into Python,1 even though it would be possible for the Python wrappers to handle it automatically.
For what it's worth, the reason for the original ANSI C restriction was the low-budget implementation found on many Unix-based systems: they kept, for each stream, a "current byte count" and "current pointer". The current byte count was 0 if the macro-ized
getc
andputc
operations had to call into underlying implementation, which could check whether a stream was opened in update mode and switch it as needed. But once you successfully obtained a character, the counter would hold the number of characters that could continue to be read from the underlying stream; and once you successfully wrote a character, the counter would hold the number of buffer-locations that allowed adding characters.This meant that if you did a successful
getc
that filled an internal buffer, but followed it by aputc
, the "written" character fromputc
would simply overwrite the buffered data. If you had a successfulputc
but followed with a poorly-implementedgetc
, you would see un-set value out of the buffer.This problem was trivial to fix (just provide separate input and output counters, one of which is always zero, and have the functions that implement buffer-refill check for mode-switch as well).
1Citation needed :-)