Preserve end-of-line style when working with files

2020-08-14 10:48发布

问题:

I am looking for a way to ensure that the end-of-line style of a file is maintained in python program while reading, editing and writing.

Python has universal file ending support, which can convert all line endings to \n when the file is read, and then convert them all to the system default when the file is written. In my case I would like to still do the initial conversion, but then write the file with the original EOL style rather than the system default.

Is there a standard way to do this kind of thing? If not, is there a standard way to detect the EOL style of a file?

Assuming that there is no standard way to do this, a possible work flow would be:

  1. Read in a file in binary mode.
  2. Decode into utf-8 (or whatever encoding is required).
  3. Detect EOL style.
  4. Convert all line endings to \n.

  5. Do stuff with the file.

  6. Convert all line endings to original style.

  7. Encode file.
  8. Write file in binary mode.

In this work flow, what is the best way to do step 2?

回答1:

Use python's universal newline support:

f = open('randomthing.py', 'rU')
fdata = f.read()
newlines = f.newlines
print repr(newlines)

newlines contains the file's delimiter or a tuple of delimiters if the file uses a mix of delimiters.



回答2:

To preserve original line endings, use newline='' to read or write line endings untranslated.

with open('test.txt','r',newline='') as rf:
    content = rf.read()
content = content.replace('old text','new text')
with open('testnew.txt','w',newline='') as wf:
    wf.write(content)

Note that if the text manipulation itself deals with line endings, additional or alternative logic may be needed to detect and match original line endings.

The 'U' mode also works, but is deprecated.

Python Documentation: open

newline controls how universal newlines mode works (it only applies to text mode). It can be None, '', '\n', '\r', and '\r\n'. It works as follows:

• When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.

• When writing output to the stream, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '' or '\n', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.