I noticed in the docs they always open a CSV file with ‘wb’. Why the ‘b’? I know b stands for binary mode, but when do you use binary mode (I’d guess CSV file is not binary). If relevant I’m writing to the CSV from results from query by arcpy.da.SearchCursor()
EDIT: just noticed according to this answer wb+
is used for writing a binary file. What does including the +
do?
File open default is to use text mode, which may convert '\n' characters to a platform-specific representation on writing and back on reading.
In windows this will modify the line breaks from '\n' to '\r\n' which will create problem opening the CSV file in other applications/platforms.
Thus, when opening a binary file, you should append 'b' to the mode value to open the file in binary mode, which will improve portability. On systems that don’t have this distinction, adding the 'b' has no effect.
Note: 'w+' truncates the file.
Modes 'r+', 'w+' and 'a+' open the file for updating (reading and writing).
As detailed here: https://docs.python.org/2/library/functions.html#open
For the Python
csv
module in particular, the answer is simple: it's required by the documentation.Source: https://docs.python.org/2.7/library/csv.html#csv.reader
Since opening a file in text mode relegates handling of newlines differently based on the operating system to core code, the CVS routine authors must have determined they wanted more control - that they would prefer to handle newlines themselves. This may have allowed them to resolve bugs from inconsistencies encountered processing files under one OS that were created on another OS -- where "text read" altered things problematically in some unique cases. It also may be no bugs were found but they wanted to avoid future possibility. Or, it may also be that since they had to deal with newline considerations anyway, bypassing text processing might be faster.
Logically, since one can't control the OS source of a file being read, then using binary might be the better way to go in general. However, writing a text file one might do well to leave it up to the core routines to handle newlines for the current OS by using text mode.
The "+" is discussed at Confused by python file mode "w+"
Use
'b'
mode, to read/write binary data as is without any transformations such as converting newlines to/from platform-specific values or decoding/encoding text using a character encoding.csv
module is special. csv data is text and therefore the text mode would be expected butcsv
module uses'\r\n'
by default to terminate rows on all platforms and it always recognizes both'\r'
and'\n'
as newlines. If you open the corresponding file in the text mode (with universal newlines) then you will get'\r\r\n'
(corrupted newlines) on Windows (os.linesep == '\r\n'
there). That is why Python 2 docs say that you must use the binary mode. In Python 3, the text mode is used but you should passnewline=''
to disable universal newlines mode. You would also want to disable universal newlines if you want to preserve possible newline characters (such as'\r'
) embedded in fields.Ive never recieved a good explanation on why I shouldnt just open ascii files in binary mode.
I have never seen opening a file in binary mode to corrupt the data.
I have seen opening the file in ascii mode, alter or harm the data being retrieved, ergo I and I assume most seasoned python programers in general will open files in binary mode unless we have some sort of guarantee that there is not and never will be binary characters in the file.
By using
t
on non-Posix environments (like MSDOS and MS Windows), the\r\n
sequence is transformed into\n
on input (and the opposite on output).b
(binary mode) performs no such translation.Presumably the CSV library deals with carriage returns (probably by ignoring them whenever it encounters them).
Edit: just noticed a changed question.
Since .CSV files aren't really intended for human readers, the library can output them with
\n
(linefeed (LF) aka newline) separators only. They only real downside would be a MSWindows user opening the file with Notepad: it will display poorly. The CSV library could also output files with\r\n
(CR LF) since most programs defend against MSDOS text file conventions.Either way, the library can write through
b
(binary) mode just fine. It is possible that if opened int
(text) mode, the line separators would have something slightly odd like\r\n\n
. Probably most CSV file parsers ignore the CR, and recognize LF LF as ending a line and following it with an empty (blank) line, which it also ignores.The
+
is explained in the man page:The difference is that
w+
allows reading and writing whereasw
only allows writing.