我需要从条纹,我读一个CSV文件中的空格
import csv
aList=[]
with open(self.filename, 'r') as f:
reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
for row in reader:
aList.append(row)
# I need to strip the extra white space from each string in the row
return(aList)
Answer 1:
还有嵌入式格式参数:skipinitialspace(默认为false) http://docs.python.org/2/library/csv.html#csv-fmt-params
aList=[]
with open(self.filename, 'r') as f:
reader = csv.reader(f, skipinitialspace=False,delimiter=',', quoting=csv.QUOTE_NONE)
for row in reader:
aList.append(row)
return(aList)
Answer 2:
就我而言,我只关心从字段名 (又名列标题,也就是字典键),使用时剥离空白csv.DictReader
。
创建基于一类csv.DictReader
,并重写fieldnames
属性,以汽提出从每个字段名称(又名列标题,也就是字典键)的空格。
通过获取字段名的常规列表,然后遍历它同时创造与每个字段名称剥离空白一个新的列表,并设置底层做到这一点_fieldnames
归因于这个新的列表。
import csv
class DictReaderStrip(csv.DictReader):
@property
def fieldnames(self):
if self._fieldnames is None:
# Initialize self._fieldnames
# Note: DictReader is an old-style class, so can't use super()
csv.DictReader.fieldnames.fget(self)
if self._fieldnames is not None:
self._fieldnames = [name.strip() for name in self._fieldnames]
return self._fieldnames
Answer 3:
with open(self.filename, 'r') as f:
reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
return [[x.strip() for x in row] for row in reader]
Answer 4:
你可以做:
aList.append([element.strip() for element in row])
Answer 5:
你可以创建一个在你的文件的包装对象,除掉空间的CSV读者看到他们。 通过这种方式,你甚至可以使用CSV文件与cvs.DictReader。
import re
class CSVSpaceStripper:
def __init__(self, filename):
self.fh = open(filename, "r")
self.surroundingWhiteSpace = re.compile("\s*;\s*")
self.leadingOrTrailingWhiteSpace = re.compile("^\s*|\s*$")
def close(self):
self.fh.close()
self.fh = None
def __iter__(self):
return self
def next(self):
line = self.fh.next()
line = self.surroundingWhiteSpace.sub(";", line)
line = self.leadingOrTrailingWhiteSpace.sub("", line)
return line
然后使用它是这样的:
o = csv.reader(CSVSpaceStripper(filename), delimiter=";")
o = csv.DictReader(CSVSpaceStripper(filename), delimiter=";")
我硬编码";"
作为分隔符。 要概括代码以任何分隔符作为练习留给读者。
Answer 6:
阅读使用熊猫一个CSV(或Excel文件),并使用该自定义函数修剪。
#Definition for strippping whitespace
def trim(dataset):
trim = lambda x: x.strip() if type(x) is str else x
return dataset.applymap(trim)
现在,您可以将装饰(CSV / Excel)中对你的代码像这样(为一个循环的一部分,等等)
dataset = trim(pd.read_csv(dataset))
dataset = trim(pd.read_excel(dataset))
Answer 7:
最存储器高效的方法来格式化单元格后解析是通过发电机 。 就像是:
with open(self.filename, 'r') as f:
reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
for row in reader:
yield (cell.strip() for cell in row)
但它可能是值得将它移动到你可以用它来保持改写(munging),避免重复即将到来的功能。 例如:
nulls = {'NULL', 'null', 'None', ''}
def clean(reader):
def clean(row):
for cell in row:
cell = cell.strip()
yield None if cell in nulls else cell
for row in reader:
yield clean(row)
或者,它可以被用来因式分解类:
def factory(reader):
fields = next(reader)
def clean(row):
for cell in row:
cell = cell.strip()
yield None if cell in nulls else cell
for row in reader:
yield dict(zip(fields, clean(row)))
文章来源: Strip white spaces from CSV file