可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a CSV file with about 2000 records.
Each record has a string, and a category to it.
This is the first line, Line1
This is the second line, Line2
This is the third line, Line3
I need to read this file into a list that looks like this;
List = [(\'This is the first line\', \'Line1\'),
(\'This is the second line\', \'Line2\'),
(\'This is the third line\', \'Line3\')]
How can import this this csv
to the list I need using Python?
回答1:
Use the csv
module (Python 2.x):
import csv
with open(\'file.csv\', \'rb\') as f:
reader = csv.reader(f)
your_list = list(reader)
print your_list
# [[\'This is the first line\', \'Line1\'],
# [\'This is the second line\', \'Line2\'],
# [\'This is the third line\', \'Line3\']]
If you need tuples:
import csv
with open(\'test.csv\', \'rb\') as f:
reader = csv.reader(f)
your_list = map(tuple, reader)
print your_list
# [(\'This is the first line\', \' Line1\'),
# (\'This is the second line\', \' Line2\'),
# (\'This is the third line\', \' Line3\')]
Python 3.x version (by @seokhoonlee below)
import csv
with open(\'file.csv\', \'r\') as f:
reader = csv.reader(f)
your_list = list(reader)
print(your_list)
# [[\'This is the first line\', \'Line1\'],
# [\'This is the second line\', \'Line2\'],
# [\'This is the third line\', \'Line3\']]
回答2:
Update for Python3:
import csv
with open(\'file.csv\', \'r\') as f:
reader = csv.reader(f)
your_list = list(reader)
print(your_list)
# [[\'This is the first line\', \'Line1\'],
# [\'This is the second line\', \'Line2\'],
# [\'This is the third line\', \'Line3\']]
回答3:
Pandas is pretty good at dealing with data. Here is one example how to use it:
import pandas as pd
# Read the CSV into a pandas data frame (df)
# With a df you can do many things
# most important: visualize data with Seaborn
df = pd.read_csv(\'filename.csv\', delimiter=\',\')
# Or export it in many ways, e.g. a list of tuples
tuples = [tuple(x) for x in df.values]
# or export it as a list of dicts
dicts = df.to_dict().values()
One big advantage is that pandas deals automatically with header rows.
If you haven\'t heard of Seaborn, I recommend having a look at it.
See also: How do I read and write CSV files with Python?
Pandas #2
import pandas as pd
# Get data - reading the CSV file
import mpu.pd
df = mpu.pd.example_df()
# Convert
dicts = df.to_dict(\'records\')
The content of df is:
country population population_time EUR
0 Germany 82521653.0 2016-12-01 True
1 France 66991000.0 2017-01-01 True
2 Indonesia 255461700.0 2017-01-01 False
3 Ireland 4761865.0 NaT True
4 Spain 46549045.0 2017-06-01 True
5 Vatican NaN NaT True
The content of dicts is
[{\'country\': \'Germany\', \'population\': 82521653.0, \'population_time\': Timestamp(\'2016-12-01 00:00:00\'), \'EUR\': True},
{\'country\': \'France\', \'population\': 66991000.0, \'population_time\': Timestamp(\'2017-01-01 00:00:00\'), \'EUR\': True},
{\'country\': \'Indonesia\', \'population\': 255461700.0, \'population_time\': Timestamp(\'2017-01-01 00:00:00\'), \'EUR\': False},
{\'country\': \'Ireland\', \'population\': 4761865.0, \'population_time\': NaT, \'EUR\': True},
{\'country\': \'Spain\', \'population\': 46549045.0, \'population_time\': Timestamp(\'2017-06-01 00:00:00\'), \'EUR\': True},
{\'country\': \'Vatican\', \'population\': nan, \'population_time\': NaT, \'EUR\': True}]
Pandas #3
import pandas as pd
# Get data - reading the CSV file
import mpu.pd
df = mpu.pd.example_df()
# Convert
tuples = [[row[col] for col in df.columns] for row in df.to_dict(\'records\')]
The content of tuples
is:
[[\'Germany\', 82521653.0, Timestamp(\'2016-12-01 00:00:00\'), True],
[\'France\', 66991000.0, Timestamp(\'2017-01-01 00:00:00\'), True],
[\'Indonesia\', 255461700.0, Timestamp(\'2017-01-01 00:00:00\'), False],
[\'Ireland\', 4761865.0, NaT, True],
[\'Spain\', 46549045.0, Timestamp(\'2017-06-01 00:00:00\'), True],
[\'Vatican\', nan, NaT, True]]
回答4:
If you are sure there are no commas in your input, other than to separate the category, you can read the file line by line and split on ,
, then push the result to List
That said, it looks like you are looking at a CSV file, so you might consider using the modules for it
回答5:
result = []
for line in text.splitlines():
result.append(tuple(line.split(\",\")))
回答6:
Update for Python3:
import csv
from pprint import pprint
with open(\'text.csv\', newline=\'\') as file:
reader = csv.reader(file)
l = list(map(tuple, reader))
pprint(l)
[(\'This is the first line\', \' Line1\'),
(\'This is the second line\', \' Line2\'),
(\'This is the third line\', \' Line3\')]
If csvfile is a file object, it should be opened with newline=\'\'
.
csv module
回答7:
A simple loop would suffice:
lines = []
with open(\'test.txt\', \'r\') as f:
for line in f.readlines():
l,name = line.strip().split(\',\')
lines.append((l,name))
print lines
回答8:
Extending your requirements a bit and assuming you do not care about the order of lines and want to get them grouped under categories, the following solution may work for you:
>>> fname = \"lines.txt\"
>>> from collections import defaultdict
>>> dct = defaultdict(list)
>>> with open(fname) as f:
... for line in f:
... text, cat = line.rstrip(\"\\n\").split(\",\", 1)
... dct[cat].append(text)
...
>>> dct
defaultdict(<type \'list\'>, {\' CatA\': [\'This is the first line\', \'This is the another line\'], \' CatC\': [\'This is the third line\'], \' CatB\': [\'This is the second line\', \'This is the last line\']})
This way you get all relevant lines available in the dictionary under key being the category.
回答9:
Next is a piece of code which uses csv module but extracts file.csv contents to a list of dicts using the first line which is a header of csv table
import csv
def csv2dicts(filename):
with open(filename, \'rb\') as f:
reader = csv.reader(f)
lines = list(reader)
if len(lines) < 2: return None
names = lines[0]
if len(names) < 1: return None
dicts = []
for values in lines[1:]:
if len(values) != len(names): return None
d = {}
for i,_ in enumerate(names):
d[names[i]] = values[i]
dicts.append(d)
return dicts
return None
if __name__ == \'__main__\':
your_list = csv2dicts(\'file.csv\')
print your_list
回答10:
As said already in the comments you can use the csv
library in python. csv means comma separated values which seems exactly your case: a label and a value separated by a comma.
Being a category and value type I would rather use a dictionary type instead of a list of tuples.
Anyway in the code below I show both ways: d
is the dictionary and l
is the list of tuples.
import csv
file_name = \"test.txt\"
try:
csvfile = open(file_name, \'rt\')
except:
print(\"File not found\")
csvReader = csv.reader(csvfile, delimiter=\",\")
d = dict()
l = list()
for row in csvReader:
d[row[1]] = row[0]
l.append((row[0], row[1]))
print(d)
print(l)