Just to be clear, I'm very new to programming and I'm using Python 3.3! Right now I have a lot files in the same basic layout. Each file has 9 columns, tab delimited and a variable number of header lines - most have five lines though. There are NO headings for the rows or columns!
Looks something like this:
#header1
#header2
#header3
#header4
#header5
ID1 asdf asdk asdfk asdfkl adsfkln askdlfn safsda asdf Notes1..
ID2 asdf asdk asdfk asdfkl adsfkln askdlfn safsda asdf Notes2..
ID3 asdf asdk asdfk asdfkl adsfkln askdlfn safsda asdf Notes3..
ID4 asdf asdk asdfk asdfkl adsfkln askdlfn safsda asdf Notes4..
The only information that I want is the first column, which contains the IDs, and the last column which contains notes about each ID. I'm shooting for a dictionary something like this
{'ID1': [notes1...]
'ID2': [notes2...]....
'ID1234': [notes1234...]}
But I would be happy with a list of dictionaries as well or something like that.
So I started by turning the text into a list of lists so that I can look up entries by index:
import csv
list_all = list(csv.reader(open(r'complex_tabbed_file.gff', 'rb'), delimiter='\t'))
d = dict()
ID = data[5][0] #starting at 5 to skip the header lines
notes = data[5][8]
d[ID]= notes
print (d)
This gives me the info I am looking for but only reads one entry at I time. I need to create a loop that will read through the entire file which contains hundreds of entries..suggestions on a starting point?
I researched and found this: Read specific columns from a csv file with csv module?
which describes a similar situation but the coding is a little over my head. As I'm a NEWBIE, I'm having a hard time applying this example to my particular case =(
Here's what I have tried as far as iteration:
i=0
if i < 4:
i= i+1
if i >= 5:
ID = list_all[i][0]
notes = list_all[i][8]
i= i+1
print (d)
This returns an empty dictionary ( d={ } ) No good.
Also tried
d = dict()
i=5
for line in list_all:
ID = list_all[i][0]
notes = list_all[i][8]
i = i+1
print (d)
which gives the oh so lovely "list index out of range" error message. I would really appreciate any suggestions, thanks!
You can solve it iterating over each row and discard those that only have one field (headers):
Run it like:
That yields:
Reading your code does make me wonder whether you read the docs or not? The first, tiny example loops over all the entries/rows...: http://docs.python.org/2/library/csv.html
Anyway, looking into it the csv module has no means of filtering out comments, but you can use the python's own
filter
:You could possibly look into using
DictReader
instead ofreader
too...Sometimes it is easier to skip the
csv
module entirely: