What I would like the program to do is to take sequences related to a certain barcode and perform the defined function (average length and standard deviation of sequences, minus the barcode and non-relevant txt, identified by the same barcode). I have written something similar and based it off the similar program but I keep getting an indexerror. The idea is that all the sequences with the first barcode will be processed as barcodeCounter = 0 and the second one as barcodeCounter = 1, etc. Hopefully that is enough info, sorry if it is messy.
Input:
import sys
import math
def avsterr(x):
ave = sum(x)/len(x)
ssq = 0.0
for y in x:
ssq += (y-ave)*(y-ave)
var = ssq / (len(x)-1)
sdev = math.sqrt(var)
stderr = sdev / math.sqrt(len(x))
return (ave,stderr)
barcode = sys.argv[1]
sequence = sys.argv[2]
lengths = []
toprocess = []
b = open(barcode,"r")
barcodeCounter = 0
for barcode in b:
barcodeCounter = barcodeCounter + 1
barcode = barcode.strip()
print "barcode: %s" % barcode
handle = open(sequence, "r")
for line in handle:
print line
seq = line.split(' ',1)[-1].strip()
print "seq: %s" % seq
potential_barcode = seq[0:len(barcode)]
print "something"
if potential_barcode == barcode:
print "Checking sequences"
outseq = seq.replace(potential_barcode, "", 1)
outseq_length = [len(outseq)]
# toprocess.append("")
# toprocess[barcodeCounter] += outseq.strip
toprocess[barcodeCounter].extend(outseq.strip) #IndexError/line40
# toprocess[barcodeCounter] = toprocess[barcodeCounter] + outseq.strip
print "outseq: %s" % outseq
print "Barcodes to be processed: %s" % toprocess[barcodeCounter]
print "BC: %i" % barcodeCounter
handle.close()
b.close()
one = len(toprocess[0])
#two = lengths[2]
#three = lengths[3]
print one
#(av,st) = avsterr(lengths)
#print "%f +/- %f" % (av,st)
Output:
barcode: ATTAG
S01 ATTAGAAAAAAA
seq: ATTAGAAAAAAA
something
Checking sequences
Traceback (most recent call last):
File "./FinalProject.py", line 40, in <module>
toprocess[barcodeCounter].extend(outseq.strip)
IndexError: list index out of range
This is the code I'm basing it on.
sequenceCounter = -1
for line in handle:
if line[0] == ">":
sequenceCounter = sequenceCounter + 1
# print "seqid %s\n" % line
seqidList.append(line)
seqList.append("")
if line[0] != ">":
seqList[sequenceCounter] = seqList[sequenceCounter] + line.strip()
EDIT: Added the enumerate function and commented out barcodeCounter stuff.
barcode = sys.argv[1]
sequence = sys.argv[2]
lengths = []
toprocess = []
b = open(barcode,"r")
#barcodeCounter = -1
for barcodeCounter, barcode in enumerate(b):
# barcodeCounter = barcodeCounter + 1
barcode = barcode.strip()
print "barcode: %s" % barcode
handle = open(sequence, "r")
for line in handle:
print line
seq = line.split(' ',1)[-1].strip()
print "seq: %s" % seq
potential_barcode = seq[0:len(barcode)]
print "something"
if potential_barcode == barcode:
print "Checking sequences"
outseq = seq.replace(potential_barcode, "", 1)
outseq_length = [len(outseq)]
toprocess.append("")
# toprocess[barcodeCounter] += outseq.strip
toprocess[barcodeCounter].append(outseq.strip) #AttributeError line 40
# toprocess[barcodeCounter] = toprocess[barcodeCounter] + outseq.strip
print "outseq: %s" % outseq
print "Barcodes to be processed: %s" % toprocess[barcodeCounter]
print "BC: %i" % barcodeCounter
handle.close()
b.close()
New error:
barcode: ATTAG
S01 ATTAGAAAAAAA
seq: ATTAGAAAAAAA
something
Checking sequences
Traceback (most recent call last):
File "./FinalProject.py", line 40, in <module>
toprocess[barcodeCounter].append(outseq.strip)
AttributeError: 'str' object has no attribute 'append'
Code without the issue:
barcode = sys.argv[1]
sequence = sys.argv[2]
lengths = []
toprocess = []
b = open(barcode,"r")
#barcodeCounter = -1
for barcodeCounter, barcode in enumerate(b):
# barcodeCounter = barcodeCounter + 1
barcode = barcode.strip()
print "barcode: \n%s\n" % barcode
handle = open(sequence, "r")
for line in handle:
print line
seq = line.split(' ',1)[-1].strip()
print "seq: %s" % seq
potential_barcode = seq[0:len(barcode)]
# print "something"
if potential_barcode == barcode:
print "Checking sequences"
outseq = seq.replace(potential_barcode, "", 1)
outseq_length = [len(outseq)]
toprocess.append("")
toprocess[barcodeCounter] = toprocess[barcodeCounter] + outseq
@abarnert You were helpful, thank you. I'm not the brightest when it comes to programming sometimes(most the time). I had to also change the way I added the new sequences because they are str
not list
.
You actually have two problems here.
First, you're counting from 1 instead of 0. You start
barcodeCounter
at0
, then you increment it before using it. This means that if you have, say, 3 barcodes, you're trying to settoprocess[1]
, thentoprocess[2]
, thentoprocess[3]
, and the last one is going to be anIndexError
.Notice that the code you based it on starts with
sequenceCounter = -1
rather than0
to avoid this problem.However, there's an even simpler solution to the problem: use
enumerate
to do the counting for you:No need to remember whether to start at -1, 0, or 1, or where to do the incrementing, or any of that; it just automatically gets the numbers 0, 1, 2, etc. up to
len(b)-1
.Second, even if you counted correctly,
toprocess
is not the same size asb
. In fact, it's completely empty, sotoprocess[anything]
is always going to raise an exception.To append a new value to the end of a
list
, you call theappend
method:Again, notice that the code you're basing it on always does a
seqList.append("")
before doing aseqList[sequenceCounter] =
. (Notice that it's a bit tricky—sometimes itappend
s and incrementssequenceCounter
, sometimes it does neither, and assigns toseqList[sequenceCounter]
using the previous value ofsequenceCounter
.) You have to do the equivalent.The code
is used specifically to access something already existing in the list variable. The number you give it tells Python what part of the list you're looking for. Worth noting, the list starts counting from 0 and not 1. So the following code:
will result in printing
(a minus index actually counts from the end, so -1 gives you d, and -2 would result in c)
An indexError is what happens when you give a number that the list has nothing stored for. If I tried to call list[4] I'd get an index error since it doesn't exist, just like if I tried to call a variable that doesn't exist.
Unlike with dictionaries, you can't set a list value by providing a non existing index. You need to use a method like append, or extend but not the way you did it where you're giving an index and then calling the extend function. Strictly speaking
is telling Python to take the value stored in list[3] and append an 'e' to that, not to the overall list itself.
That's what would actually add e to my list.