I am trying to iterate over columns AND rows in Pandas to cross-reference a list I have and count the cooccurrences.
My dataframe looks like:
+-------+-----+-----+----+----+-------+-------+------+
| Lemma | Dog | Cat | Sg | Pl | Good | Okay | Bad |
+-------+-----+-----+----+----+-------+-------+------+
| Dog | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Cat | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+-------+-----+-----+----+----+-------+-------+------+
I have a list like:
c=[[dog, Sg, Good], [cat, Pl, Okay], [dog, Pl, Bad]
I want to go through every item in Lemma
, find it in c
and then for that list item look for any of the column names. If those column names are seen, I was to add +1. I also want to add a count if the Lemma items occur in a 3 word window of each other.
I've tried something like the following (ignoring the word window issue):
for idx, row in df.iterrows():
for columns in df:
for i in c:
if i[0]==row:
if columns in c[1]:
df.ix['columns','row'] +=1
But I get the error: "ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."
My ideal results look like:
+-------+-----+-----+----+----+-------+-------+------+
| Lemma | Dog | Cat | Sg | Pl | Good | Okay | Bad |
+-------+-----+-----+----+----+-------+-------+------+
| Dog | 1 | 1 | 1 | 1 | 1 | 0 | 1 |
| Cat | 2 | 0 | 0 | 1 | 0 | 1 | 0 |
+-------+-----+-----+----+----+-------+-------+------+
Thanks!
- The ideal result shown in the question is not accurate. There should never be a
cat
in the dog
column and vise versa.
- I wouldn't iterate through the
DataFrame
, I'd unpack the list
of lists
into a dict
then load the dict
into a DataFrame
, as shown below.
Code:
import pandas as pd
c=[['dog', 'Sg', 'Good'], ['cat', 'Pl', 'Okay'], ['dog', 'Pl', 'Bad'],
['dog', 'Sg', 'Good'], ['cat', 'Pl', 'Okay'], ['dog', 'Pl', 'Okay'],
['dog', 'Sg', 'Good'], ['cat', 'Sg', 'Good'], ['dog', 'Pl', 'Bad'],
['dog', 'Sg', 'Good'],['cat', 'Pl', 'Okay'], ['dog', 'Pl', 'Bad']]
Lemma = {'dog': {'dog': 0, 'Sg': 0, 'Pl': 0, 'Good': 0, 'Okay': 0, 'Bad': 0},
'cat': {'cat': 0, 'Sg': 0, 'Pl': 0, 'Good': 0, 'Okay': 0, 'Bad': 0}}
Note: Each value in a list
from c
is a key
in Lemma
. Reference python dictionaries. e.g. With x = ['dog', 'Sg', 'Good']
, Lemma[x[0]][x[2]]
is the same as Lemma['dog']['Good']
. The initial value of Lemma['dog']['Good']
= 0, therefore Lemma['dog']['Good']
= 0 + 1, then next time it would be 1 + 1, etc.
for x in c:
Lemma[x[0]][x[0]] = Lemma[x[0]][x[0]] + 1
Lemma[x[0]][x[1]] = Lemma[x[0]][x[1]] + 1
Lemma[x[0]][x[2]] = Lemma[x[0]][x[2]] + 1
df = pd.DataFrame.from_dict(Lemma, orient='index')
Output:
Plot
df.plot(kind='bar', figsize=(6, 6))
Create the dict
programmatically:
create sets
of words for the dict
keys
from the list
of lists
:
outer_keys = set()
inner_keys = set()
for x in c:
outer_keys.add(x[0]) # first word is outer key
inner_keys |= set(x[1:]) # all other words
create dict
of dicts
:
Lemma = {j: dict.fromkeys(inner_keys | {j}, 0) for j in outer_keys}
final dict
:
{'dog': {'Okay': 0, 'Pl': 0, 'Good': 0, 'Bad': 0, 'Sg': 0, 'dog': 0},
'cat': {'Okay': 0, 'Pl': 0, 'Good': 0, 'Bad': 0, 'Sg': 0, 'cat': 0}}
You have several things that need to be changed.
1) Your list probably needs to have Dog
instead of dog
, Cat
instead of cat
2) You probably want: for column in df.columns
instead of for columns in df
3) You probably want: if i[0] == row['Lemma']
instead of if i[0]==row:
(this is where it was breaking
4) You probably want if column in i
instead of if columns in c[1]
5) You probably want df.ix[idx, column] += 1
instead of df.ix['columns','row'] +=1