I would like to init a collections.Counter object from a text file of word frequency counts. That is, I have a file "counts.txt":
rank wordform abs r mod
1 the 225300 29 223066.9
2 and 157486 29 156214.4
3 to 134478 29 134044.8
...
999 fallen 345 29 326.6
1000 supper 368 27 325.8
I would like a Counter object wordCounts
such that I can call
>>> print wordCounts.most_common(3)
[('the', 225300), ('of', 157486), ('and', 134478)]
What is the most efficient, Pythonic way
Here are two versions. The first takes your counts.txt
as a regular text file. The second treats it as a csv file (which is what it kind of looks like).
from collections import Counter
with open('counts.txt') as f:
lines = [line.strip().split() for line in f]
wordCounts = Counter({line[1]: int(line[2]) for line in lines[1:]})
print wordCounts.most_common(3)
If your data file some how turned out to be delimited by some consistent character or string you could use a csv.DictReader
object to parse the file.
Shown below is how it could be done IF your file were TAB
delimited.
The data file (as edited by me to be TAB delimited)
rank wordform abs r mod
1 the 225300 29 223066.9
2 and 157486 29 156214.4
3 to 134478 29 134044.8
999 fallen 345 29 326.6
1000 supper 368 27 325.8
The code
from csv import DictReader
from collections import Counter
with open('counts.txt') as f:
reader = DictReader(f, delimiter='\t')
wordCounts = Counter({row['wordform']: int(row['abs']) for row in reader})
print wordCounts.most_common(3)
import collections.Counter
words = dict()
fp = open('counts.txt')
for line in fp:
items = line.split()
words[items[1].strip()] = int(items[2].strip())
wordCounts = collections.Counter(words)