IndexError: list index out of range, not sure why

2019-06-14 18:28发布

What I would like the program to do is to take sequences related to a certain barcode and perform the defined function (average length and standard deviation of sequences, minus the barcode and non-relevant txt, identified by the same barcode). I have written something similar and based it off the similar program but I keep getting an indexerror. The idea is that all the sequences with the first barcode will be processed as barcodeCounter = 0 and the second one as barcodeCounter = 1, etc. Hopefully that is enough info, sorry if it is messy.

Input:

import sys
import math

def avsterr(x):
        ave = sum(x)/len(x)
        ssq = 0.0
        for y in x:
                ssq += (y-ave)*(y-ave)
        var = ssq / (len(x)-1)
        sdev = math.sqrt(var)
        stderr = sdev / math.sqrt(len(x))

        return (ave,stderr)

barcode = sys.argv[1]
sequence = sys.argv[2]
lengths = []
toprocess = []
b = open(barcode,"r")
barcodeCounter = 0
for barcode in b:
        barcodeCounter = barcodeCounter + 1
        barcode = barcode.strip()
        print "barcode: %s" %  barcode
        handle = open(sequence, "r")
        for line in handle:
                print line
                seq = line.split(' ',1)[-1].strip()
                print "seq: %s" % seq
                potential_barcode = seq[0:len(barcode)]
                print "something"
                if potential_barcode == barcode:
                        print "Checking sequences"
                        outseq = seq.replace(potential_barcode, "", 1)
                        outseq_length = [len(outseq)]
#                       toprocess.append("")
#                       toprocess[barcodeCounter] += outseq.strip
                        toprocess[barcodeCounter].extend(outseq.strip)   #IndexError/line40
#                       toprocess[barcodeCounter] = toprocess[barcodeCounter] + outseq.strip
                        print "outseq: %s" % outseq
                        print "Barcodes to be processed: %s" % toprocess[barcodeCounter]
                        print "BC: %i" % barcodeCounter
        handle.close()
b.close()
one = len(toprocess[0])
#two = lengths[2]
#three = lengths[3]
print one
#(av,st) = avsterr(lengths)
#print "%f +/- %f" % (av,st)

Output:

barcode: ATTAG
S01 ATTAGAAAAAAA

seq: ATTAGAAAAAAA
something
Checking sequences
Traceback (most recent call last):
  File "./FinalProject.py", line 40, in <module>
    toprocess[barcodeCounter].extend(outseq.strip)
IndexError: list index out of range

This is the code I'm basing it on.

sequenceCounter = -1
for line in handle:
        if line[0] == ">":
                sequenceCounter = sequenceCounter + 1
#               print "seqid %s\n" % line
                seqidList.append(line)
                seqList.append("")
        if line[0] != ">":
                seqList[sequenceCounter] = seqList[sequenceCounter] + line.strip()

EDIT: Added the enumerate function and commented out barcodeCounter stuff.

barcode = sys.argv[1]
sequence = sys.argv[2]
lengths = []
toprocess = []
b = open(barcode,"r")
#barcodeCounter = -1
for barcodeCounter, barcode in enumerate(b):
#       barcodeCounter = barcodeCounter + 1
        barcode = barcode.strip()
        print "barcode: %s" %  barcode
        handle = open(sequence, "r")
        for line in handle:
                print line
                seq = line.split(' ',1)[-1].strip()
                print "seq: %s" % seq
                potential_barcode = seq[0:len(barcode)]
                print "something"
                if potential_barcode == barcode:
                        print "Checking sequences"
                        outseq = seq.replace(potential_barcode, "", 1)
                        outseq_length = [len(outseq)]
                        toprocess.append("")
#                       toprocess[barcodeCounter] += outseq.strip
                        toprocess[barcodeCounter].append(outseq.strip) #AttributeError line 40
#                       toprocess[barcodeCounter] = toprocess[barcodeCounter] + outseq.strip
                        print "outseq: %s" % outseq
                        print "Barcodes to be processed: %s" % toprocess[barcodeCounter]
                        print "BC: %i" % barcodeCounter
        handle.close()
b.close()

New error:

barcode: ATTAG
S01 ATTAGAAAAAAA

seq: ATTAGAAAAAAA
something
Checking sequences
Traceback (most recent call last):
  File "./FinalProject.py", line 40, in <module>
    toprocess[barcodeCounter].append(outseq.strip)
AttributeError: 'str' object has no attribute 'append'

Code without the issue:

barcode = sys.argv[1]
sequence = sys.argv[2]
lengths = []
toprocess = []
b = open(barcode,"r")
#barcodeCounter = -1
for barcodeCounter, barcode in enumerate(b):
#       barcodeCounter = barcodeCounter + 1
        barcode = barcode.strip()
        print "barcode: \n%s\n" %  barcode
        handle = open(sequence, "r")
        for line in handle:
                print line
                seq = line.split(' ',1)[-1].strip()
                print "seq: %s" % seq
                potential_barcode = seq[0:len(barcode)]
#               print "something"
                if potential_barcode == barcode:
                        print "Checking sequences"
                        outseq = seq.replace(potential_barcode, "", 1)
                        outseq_length = [len(outseq)]
                        toprocess.append("")
                        toprocess[barcodeCounter] = toprocess[barcodeCounter] + outseq

@abarnert You were helpful, thank you. I'm not the brightest when it comes to programming sometimes(most the time). I had to also change the way I added the new sequences because they are str not list.

2条回答
霸刀☆藐视天下
2楼-- · 2019-06-14 18:53

You actually have two problems here.


First, you're counting from 1 instead of 0. You start barcodeCounter at 0, then you increment it before using it. This means that if you have, say, 3 barcodes, you're trying to set toprocess[1], then toprocess[2], then toprocess[3], and the last one is going to be an IndexError.

Notice that the code you based it on starts with sequenceCounter = -1 rather than 0 to avoid this problem.

However, there's an even simpler solution to the problem: use enumerate to do the counting for you:

for barcodeCounter, barcode in enumerate(b):

No need to remember whether to start at -1, 0, or 1, or where to do the incrementing, or any of that; it just automatically gets the numbers 0, 1, 2, etc. up to len(b)-1.


Second, even if you counted correctly, toprocess is not the same size as b. In fact, it's completely empty, so toprocess[anything] is always going to raise an exception.

To append a new value to the end of a list, you call the append method:

toprocess.append(…)

Again, notice that the code you're basing it on always does a seqList.append("") before doing a seqList[sequenceCounter] =. (Notice that it's a bit tricky—sometimes it appends and increments sequenceCounter, sometimes it does neither, and assigns to seqList[sequenceCounter] using the previous value of sequenceCounter.) You have to do the equivalent.

查看更多
\"骚年 ilove
3楼-- · 2019-06-14 18:59

The code

listVariable[indexNumber]

is used specifically to access something already existing in the list variable. The number you give it tells Python what part of the list you're looking for. Worth noting, the list starts counting from 0 and not 1. So the following code:

list = ["a","b","c","d"]
print list[0]
print list[3]
print list[1]
print list[-1]

will result in printing

a #index 0
d #index 3
b #index 1
d #index -1

(a minus index actually counts from the end, so -1 gives you d, and -2 would result in c)

An indexError is what happens when you give a number that the list has nothing stored for. If I tried to call list[4] I'd get an index error since it doesn't exist, just like if I tried to call a variable that doesn't exist.

Unlike with dictionaries, you can't set a list value by providing a non existing index. You need to use a method like append, or extend but not the way you did it where you're giving an index and then calling the extend function. Strictly speaking

list[3].append("e")

is telling Python to take the value stored in list[3] and append an 'e' to that, not to the overall list itself.

list.append("e")

That's what would actually add e to my list.

查看更多
登录 后发表回答