剥去CSV文件空格(Strip white spaces from CSV file)

2019-07-19 22:20发布

我需要从条纹,我读一个CSV文件中的空格

import csv

aList=[]
with open(self.filename, 'r') as f:
    reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
    for row in reader:
        aList.append(row)
    # I need to strip the extra white space from each string in the row
    return(aList)

Answer 1:

还有嵌入式格式参数:skipinitialspace(默认为false) http://docs.python.org/2/library/csv.html#csv-fmt-params

aList=[]
with open(self.filename, 'r') as f:
    reader = csv.reader(f, skipinitialspace=False,delimiter=',', quoting=csv.QUOTE_NONE)
    for row in reader:
        aList.append(row)
    return(aList)


Answer 2:

就我而言,我只关心从字段名 (又名列标题,也就是字典键),使用时剥离空白csv.DictReader

创建基于一类csv.DictReader ,并重写fieldnames属性,以汽提出从每个字段名称(又名列标题,也就是字典键)的空格。

通过获取字段名的常规列表,然后遍历它同时创造与每个字段名称剥离空白一个新的列表,并设置底层做到这一点_fieldnames归因于这个新的列表。

import csv

class DictReaderStrip(csv.DictReader):
    @property                                    
    def fieldnames(self):
        if self._fieldnames is None:
            # Initialize self._fieldnames
            # Note: DictReader is an old-style class, so can't use super()
            csv.DictReader.fieldnames.fget(self)
            if self._fieldnames is not None:
                self._fieldnames = [name.strip() for name in self._fieldnames]
        return self._fieldnames


Answer 3:

with open(self.filename, 'r') as f:
    reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
    return [[x.strip() for x in row] for row in reader]


Answer 4:

你可以做:

aList.append([element.strip() for element in row])


Answer 5:

你可以创建一个在你的文件的包装对象,除掉空间的CSV读者看到他们。 通过这种方式,你甚至可以使用CSV文件与cvs.DictReader。

import re

class CSVSpaceStripper:
  def __init__(self, filename):
    self.fh = open(filename, "r")
    self.surroundingWhiteSpace = re.compile("\s*;\s*")
    self.leadingOrTrailingWhiteSpace = re.compile("^\s*|\s*$")

  def close(self):
    self.fh.close()
    self.fh = None

  def __iter__(self):
    return self

  def next(self):
    line = self.fh.next()
    line = self.surroundingWhiteSpace.sub(";", line)
    line = self.leadingOrTrailingWhiteSpace.sub("", line)
    return line

然后使用它是这样的:

o = csv.reader(CSVSpaceStripper(filename), delimiter=";")
o = csv.DictReader(CSVSpaceStripper(filename), delimiter=";")

我硬编码";" 作为分隔符。 要概括代码以任何分隔符作为练习留给读者。



Answer 6:

阅读使用熊猫一个CSV(或Excel文件),并使用该自定义函数修剪。

#Definition for strippping whitespace
def trim(dataset):
    trim = lambda x: x.strip() if type(x) is str else x
    return dataset.applymap(trim)

现在,您可以将装饰(CSV / Excel)中对你的代码像这样(为一个循环的一部分,等等)

dataset = trim(pd.read_csv(dataset))
dataset = trim(pd.read_excel(dataset))


Answer 7:

最存储器高效的方法来格式化单元格后解析是通过发电机 。 就像是:

with open(self.filename, 'r') as f:
    reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
    for row in reader:
        yield (cell.strip() for cell in row)

但它可能是值得将它移动到你可以用它来保持改写(munging),避免重复即将到来的功能。 例如:

nulls = {'NULL', 'null', 'None', ''}

def clean(reader):
    def clean(row):
        for cell in row:
            cell = cell.strip()
            yield None if cell in nulls else cell

    for row in reader:
        yield clean(row)

或者,它可以被用来因式分解类:

def factory(reader):
    fields = next(reader)

    def clean(row):
        for cell in row:
            cell = cell.strip()
            yield None if cell in nulls else cell

    for row in reader:
        yield dict(zip(fields, clean(row)))


文章来源: Strip white spaces from CSV file