I'd like to browse through the current folder and all its subfolders and get all the files with .htm|.html extensions. I have found out that it is possible to find out whether an object is a dir or file like this:
import os
dirList = os.listdir("./") # current directory
for dir in dirList:
if os.path.isdir(dir) == True:
# I don't know how to get into this dir and do the same thing here
else:
# I got file and i can regexp if it is .htm|html
and in the end, I would like to have all the files and their paths in an array. Is something like that possible?
You can use
os.walk()
to recursively iterate through a directory and all its subdirectories:To build a list of these names, you can use a list comprehension:
In python 3 you can use os.scandir():
Use
newDirName = os.path.abspath(dir)
to create a full directory path name for the subdirectory and then list its contents as you have done with the parent (i.e.newDirList = os.listDir(newDirName)
)You can create a separate method of your code snippet and call it recursively through the subdirectory structure. The first parameter is the directory pathname. This will change for each subdirectory.
This answer is based on the 3.1.1 version documentation of the Python Library. There is a good model example of this in action on page 228 of the Python 3.1.1 Library Reference (Chapter 10 - File and Directory Access). Good Luck!
I had a similar thing to work on, and this is how I did it.
Hope this helps.
Slightly altered version of Sven Marnach's solution..