how do I count unique words of text files in speci

2019-09-24 01:48发布

问题:

im writing a report and I need to count unique words of text files.

My texts are in D:\shakeall and they're totally 42 files...

I know some about Python, but I don't know what to do now.

This is what I know how it works.

  1. read files in directory

  2. make up a list of words from texts

  3. count total/unique words

all I know is this. and some about for, while, lists and indexes, variables, lists...

What I want to do is make my own function library and use it to get result.

I really appreciate any advice about my questions.

------p.s.

I really know almost nothing about Python. What I can only do is a simple math or printing words in a list..given topic is too hard for me. Sorry.

回答1:

textfile=open('somefile.txt','r')
text_list=[line.split(' ') for line in textfile]
unique_words=[word for word in text_list if word not in unique_words]
print(len(unique_words))

That's the general gist of it



回答2:

import os
uniquewords = set([])

for root, dirs, files in os.walk("D:\\shakeall"):
    for name in files:
        [uniquewords.add(x) for x in open(os.path.join(root,name)).read().split()]

print list(uniquewords)
print len(uniquewords)