Should we use pandas.compat.StringIO or Python 2/3

2019-01-20 19:45发布

问题:

StringIO is the file-like string buffer object we use when reading pandas dataframe from text, e.g. "How to create a Pandas DataFrame from a string?"

Which of these two imports should we use for StringIO (within pandas)? This is a long-running question that has never been resolved over four years.

  1. StringIO.StringIO (Python 2) / io.StringIO (Python 3)
    • Advantages: more stable for futureproofing code, but forces us to version-fork, e.g. see code at bottom from EmilH.
  2. pandas.compat.StringIO
    • pandas.compat is a 2/3 compatibility package ("without the need for 2to3") introduced back in 0.13.0 (Jan 2014)
    • pandas.compat package is still marked 'private' as of 0.22 and no plans to make 'public' says "Warning The pandas.core, pandas.compat, and pandas.util top-level modules are considered to be PRIVATE. Stability of functionality in those modules in not guaranteed." although they essentially haven't broken since 0.13
    • pandas.compat source defines the imports builtins, StringIO/cStringIO, BytesIO, cPickle, httplib, iterator versions of range, filter, map and zip, plus other necessary elements for Python 3 compatibility - see the 0.13.0 whatsnew

Version 2/3 forking code for imports from standard (from EmilH):

import sys
if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

# Note: but this is very much a poor-man's version of pandas.compat, which contains much much more

Note:

  • pandas.compat has existed since pandas 0.13.0 (Jan 2014) as a subpackage within pandas
  • it also seems to have been released as a standalone package: 0.1.0 (Jun 10, 2017) and 0.1.1 (Jun 10, 2017)