I've got a string with words that are separated by spaces (all words are unique, no duplicates). I turn this string into list:
s = "#one cat #two dogs #three birds"
out = s.split()
And count how many values are created:
print len(out) # Says 192
Then I try to delete everything from the list:
for x in out:
out.remove(x)
And then count again:
print len(out) # Says 96
Can someone explain please why it says 96 instead of 0?
MORE INFO
Each line starts with '#' and is in fact a space-separated pair of words: the first in the pair is the key and second is the value.
So, what I am doing is:
for x in out:
if '#' in x:
ind = out.index(x) # Get current index
nextValue = out[ind+1] # Get next value
myDictionary[x] = nextValue
out.remove(nextValue)
out.remove(x)
The problem is I cannot move all key,value-pairs into a dictionary since I only iterate through 96 items.
I think you actually want something like this:
s = '#one cat #two dogs #three birds'
out = s.split()
entries = dict([(x, y) for x, y in zip(out[::2], out[1::2])])
What is this code doing? Let's break it down. First, we split s
by whitespace into out
as you had.
Next we iterate over the pairs in out
, calling them "x, y
". Those pairs become a list
of tuple/pairs. dict()
accepts a list of size two tuples and treats them as key, val
.
Here's what I get when I tried it:
$ cat tryme.py
s = '#one cat #two dogs #three birds'
out = s.split()
entries = dict([(x, y) for x, y in zip(out[::2], out[1::2])])
from pprint import pprint
pprint(entries)
$ python tryme.py
{'#one': 'cat', '#three': 'birds', '#two': 'dogs'}
As for what actually happened in the for loop:
From the Python for statement documentation:
The expression list is evaluated once; it should yield an iterable
object. An iterator is created for the result of the expression_list
.
The suite is then executed once for each item provided by the
iterator, in the order of ascending indices. Each item in turn is
assigned to the target list using the standard rules for assignments,
and then the suite is executed. When the items are exhausted (which is
immediately when the sequence is empty), the suite in the else
clause,
if present, is executed, and the loop
terminates.
I think it is best shown with the aid of an illustration.
Now, suppose you have an iterable object
(such as list
) like this:
out = [a, b, c, d, e, f]
What happen when you do for x in out
is that it creates internal indexer which goes like this (I illustrate it with the symbol ^
):
[a, b, c, d, e, f]
^ <-- here is the indexer
What normally happen is that: as you finish one cycle of your loop, the indexer moves forward like this:
[a, b, c, d, e, f] #cycle 1
^ <-- here is the indexer
[a, b, c, d, e, f] #cycle 2
^ <-- here is the indexer
[a, b, c, d, e, f] #cycle 3
^ <-- here is the indexer
[a, b, c, d, e, f] #cycle 4
^ <-- here is the indexer
[a, b, c, d, e, f] #cycle 5
^ <-- here is the indexer
[a, b, c, d, e, f] #cycle 6
^ <-- here is the indexer
#finish, no element is found anymore!
As you can see, the indexer keeps moving forward till the end of your
list, regardless of what happened to the list!
Thus when you do remove
, this is what happened internally:
[a, b, c, d, e, f] #cycle 1
^ <-- here is the indexer
[b, c, d, e, f] #cycle 1 - a is removed!
^ <-- here is the indexer
[b, c, d, e, f] #cycle 2
^ <-- here is the indexer
[c, d, e, f] #cycle 2 - c is removed
^ <-- here is the indexer
[c, d, e, f] #cycle 3
^ <-- here is the indexer
[c, d, f] #cycle 3 - e is removed
^ <-- here is the indexer
#the for loop ends
Notice that there are only 3 cycles there instead of 6 cycles(!!) (which is the number of the elements in the original list). And that's why you left with half len
of your original len
, because that is the number of cycles it takes to complete the loop when you remove one element from it for each cycle.
If you want to clear the list, simply do:
if (out != []):
out.clear()
Or, alternatively, to remove the element one by one, you need to do it the other way around - from the end to the beginning. Use reversed
:
for x in reversed(out):
out.remove(x)
Now, why would the reversed
work? If the indexer keeps moving forward, wouldn't reversed
also should not work because the number of element is reduced by one per cycle anyway?
No, it is not like that,
Because reversed
method changes the way to the internal indexer
works! What happened when you use reversed
method is to make the
internal indexer moves backward (from the end) instead of
forward.
To illustrate, this is what normally happens:
[a, b, c, d, e, f] #cycle 1
^ <-- here is the indexer
[a, b, c, d, e, f] #cycle 2
^ <-- here is the indexer
[a, b, c, d, e, f] #cycle 3
^ <-- here is the indexer
[a, b, c, d, e, f] #cycle 4
^ <-- here is the indexer
[a, b, c, d, e, f] #cycle 5
^ <-- here is the indexer
[a, b, c, d, e, f] #cycle 6
^ <-- here is the indexer
#finish, no element is found anymore!
And thus when you do one removal per cycle, it doesn't affect how the indexer works:
[a, b, c, d, e, f] #cycle 1
^ <-- here is the indexer
[a, b, c, d, e] #cycle 1 - f is removed
^ <-- here is the indexer
[a, b, c, d, e] #cycle 2
^ <-- here is the indexer
[a, b, c, d] #cycle 2 - e is removed
^ <-- here is the indexer
[a, b, c, d] #cycle 3
^ <-- here is the indexer
[a, b, c] #cycle 3 - d is removed
^ <-- here is the indexer
[a, b, c] #cycle 4
^ <-- here is the indexer
[a, b] #cycle 4 - c is removed
^ <-- here is the indexer
[a, b] #cycle 5
^ <-- here is the indexer
[a] #cycle 5 - b is removed
^ <-- here is the indexer
[a] #cycle 6
^ <-- here is the indexer
[] #cycle 6 - a is removed
^ <-- here is the indexer
Hope the illustration helps you to understand what's going on internally...
The problem you're encountering is the result of modifying a list while iterating over it. When an item is removed, everything after it gets moved forward by one index, but the iterator does not account for the change and continues by incrementing the index it last accessed. The iterator thus skips every second element in the list, which is why you're left with half the number of elements.
The simplest direct solution to your problem is to iterate over a copy of out
, using slice notation:
for x in out[:]:
# ...
out.remove(x)
However, there is a deeper question here: why do you need to remove items from the list at all? With your algorithm, you are guaranteed to end up with an empty list, which is of no use to you. It would be both simpler and more efficient to just iterate over the list without removing items.
When you're done with the list (after the for-loop block) you can explicitly delete it (using the del
keyword) or simply leave it for Python's garbage collection system to deal with.
A further issue remains: you're combining direct iteration over a list with index-based references. The use of for x in out
should typically be restricted to situations where you want to access each element independently of the others. If you want to work with indices, use for i in range(len(out))
and access elements with out[i]
.
Furthermore, you can use a dictionary comprehension to accomplish your entire task in a one-line pythonic expression:
my_dictionary = {out[i]: out[i + 1] for i in range(len(out)) if "#" in out[i]}
Another pythonic alternative would be to make use of the fact that each even-numbered element is a key, and each odd-numbered element is a value (you'd have to assume that the list result of str.split()
consistently follows this pattern), and use zip
on the even and odd sub-lists.
my_dictionary = dict(zip(out[::2], out[1::2]))
If you just need to clear the list,
use
out = []
or
out.clear()
Anyway, that you said is because remove
function of list affects list.
out = ['a', 'b', 'c', 'd', 'e', 'f']
for x in out:
out.remove(x)
print(x)
then result is shown below:
a
c
e
It is exactly half of full list. So, in your case, you got 96(half of 192) from 192.
The problem is whenever you delete a value from the list, that particular list restores its values dynamically.
That is, when you perform out.remove(ind)
and out.remove(ind+1)
, the values in these indexes are deleted,
but they are replaced with new values which are predecessor of the previous value.
Therefore to avoid this you have to implement the code as follows :
out = []
out = '#one cat #two dogs #three birds'.split()
print "The list is : {0} \n".format(out)
myDictionary = dict()
for x in out:
if '#' in x:
ind = out.index(x) # Get current index
nextValue = out[ind+1] # Get next value
myDictionary[x] = nextValue
out = [] # #emptying the list
print("The dictionary is : {0} \n".format(myDictionary))
So, after you are done transferring the values from the list to dictionary, we could safely empty the out
by
using out = []