Python Counter from txt file

2019-09-10 09:11发布

问题:

I would like to init a collections.Counter object from a text file of word frequency counts. That is, I have a file "counts.txt":

rank  wordform         abs     r        mod
   1  the           225300    29   223066.9
   2  and           157486    29   156214.4
   3  to            134478    29   134044.8
...
 999  fallen           345    29      326.6
1000  supper           368    27      325.8

I would like a Counter object wordCounts such that I can call

>>> print wordCounts.most_common(3)
[('the', 225300), ('of', 157486), ('and', 134478)]

What is the most efficient, Pythonic way

回答1:

Here are two versions. The first takes your counts.txt as a regular text file. The second treats it as a csv file (which is what it kind of looks like).

from collections import Counter

with open('counts.txt') as f:
    lines = [line.strip().split() for line in f]
    wordCounts = Counter({line[1]: int(line[2]) for line in lines[1:]})
    print wordCounts.most_common(3)

If your data file some how turned out to be delimited by some consistent character or string you could use a csv.DictReader object to parse the file.

Shown below is how it could be done IF your file were TAB delimited.

The data file (as edited by me to be TAB delimited)

rank    wordform    abs r   mod
1   the 225300  29  223066.9
2   and 157486  29  156214.4
3   to  134478  29  134044.8
999 fallen  345 29  326.6
1000    supper  368 27  325.8

The code

from csv import DictReader
from collections import Counter

with open('counts.txt') as f:
    reader = DictReader(f, delimiter='\t')
    wordCounts = Counter({row['wordform']: int(row['abs']) for row in reader})
    print wordCounts.most_common(3)


回答2:

import collections.Counter

words = dict()
fp = open('counts.txt')

for line in fp:
   items = line.split()
   words[items[1].strip()] = int(items[2].strip())

wordCounts = collections.Counter(words)