filter directory in python

2019-08-04 05:24发布

问题:

I am trying to get filtered list of all Text and Python file, like below

from walkdir import filtered_walk, dir_paths, all_paths, file_paths
vdir=raw_input ("enter director :")

files = file_paths(filtered_walk(vdir, depth=0,included_files=['*.py', '*.txt']))

I want to:

  1. know the total number of files found in given directory

    I have tried options like : Number_of_files= len (files) or for n in files n=n+1 but all are failing as "files" is something called "generator" Object which I searched on python docs but couldn't make use of it

  2. I also want to find a string e.g. "import sys" in the list of files found in above and store the file names having my search string in new file called "found.txt"

回答1:

I believe this does what you want, if I misunderstood your specification, please let me know after you give this a test. I've hardcoded the directory searchdir, so you'll have to prompt for it.

import os

searchdir = r'C:\blabla'
searchstring = 'import sys'

def found_in_file(fname, searchstring):
    with open(fname) as infp:
        for line in infp:
            if searchstring in line:
                return True
        return False

with open('found.txt', 'w') as outfp:
    count = 0
    search_count = 0
    for root, dirs, files in os.walk(searchdir):
        for name in files:
            (base, ext) = os.path.splitext(name)
            if ext in ('.txt', '.py'):
                count += 1

            full_name = os.path.join(root, name)
            if found_in_file(full_name, searchstring):
               outfp.write(full_name + '\n')
               search_count += 1

print 'total number of files found %d' % count
print 'number of files with search string %d' % search_count

Using with to open the file will also close the file automatically for you later.



回答2:

A python generator is a special kind of iterator. It yields one item after the other, without knowing in advance how much items there are. You only can know it at the end.

It should be ok, though, to do

n = 0
for item in files:
    n += 1
    do_something_with(items)
print "I had", n, "items."


回答3:

You can think of a generator (or generally, an iterator) as a list that gives you one item at a time. (NO, it is not a list). So, you cannot count how much items it will give you unless you go through them all, because you have to take them one by one. (This is just a basic idea, now you should be able to understand the docs, and I'm sure there are lots of questions here about them too).

Now, for your case, you used a not-so-wrong approach:

count = 0
for filename in files:
    count += 1

What you were doing wrong was taking f and incrementing, but f here is the filename! Incrementing makes no sense, and an Exception too.

Once you have these filenames, you have to open each individual file, read it, search for your string and return the filename.

def contains(filename, match):
    with open(filename, 'r') as f:
        for line in f:
            if f.find(match) != -1:
                return True
    return False

match_files = [] for filename in files: if contains(filename, "import sys"): match_file.append(filename) # or a one-liner: match_files = [f for f in files if contains(f, "import sys")]

Now, as an example of a generator (don't read this before you read the docs):

def matching(filenames):
    for filename in files:
        if contains(filename, "import sys"):
            # feed the names one by one, you are not storing them in a list
            yield filename
# usage:
for f in matching(files):
    do_something_with_the_files_that_match_without_storing_them_all_in_a_list()


回答4:

You should try os.walk

import os
dir = raw_input("Enter Dir:")
files = [file for path, dirname, filenames in os.walk(dir) for file in filenames if file[-3:] in [".py", ".txt"]]

nfiles = len(files)
print nfiles

For searching for a string in a file look at Search for string in txt file Python

Combining both these your code would be something like

import os
import mmap

dir = raw_input("Enter Dir:")
print "Directory %s" %(dir) 
search_str = "import sys" 
count = 0
search_count = 0
write_file = open("found.txt", "w")
for dirpath, dirnames, filenames in os.walk(dir):
    for file in filenames:
        if file.split(".")[-1] in ["py", "txt"]:
            count += 1
            print dirpath, file
            f = open(dirpath+"/"+file)
            #            print f.read()

            if search_str in f.read():
                search_count += 1
                write_file.write(dirpath+"/"+file)

write_file.close()
print "Number of files: %s" %(count)
print "Number of files containing string: %s" %(search_count)