I have one project where I need to apply a dozen or so regex to about 100 files using python. 4+ hours of searching the web for various combinations including "(merge|concatenate|stack|join|compile) multiple regex in python" and I haven't found any posts regarding my need.
This is a mid-sized project for me. There are several smaller regex projects that I need which take only 5-6 regex patterns applied over only a dozen or so files. While these will be a great aid in my work, the grand-daddy project is a applying a file of 100+ search, replace strings to any new file I get. (Spelling conventions in certain languages are not standardized and being able to quick-process files will increase productivity.)
Ideally, the regex strings need to be update-able by a non programmer, but that maybe outside of the scope of this post.
Here is what I have so far:
import os, re, sys # Is "sys" necessary?
path = "/Users/mypath/testData"
myfiles = os.listdir(path)
for f in myfiles:
# split the filename and file extension for use in renaming the output file
file_name, file_extension = os.path.splitext(f)
generated_output_file = file_name + "_regex" + file_extension
# Only process certain types of files.
if re.search("txt|doc|odt|htm|html")
# Declare input and output files, open them, and start working on each line.
input_file = os.path.join(path, f)
output_file = os.path.join(path, generated_output_file)
with open(input_file, "r") as fi, open(output_file, "w") as fo:
for line in fi:
# I realize that the examples are not regex, but they are in my real data.
# The important thing, is that each of these is a substitution.
line = re.sub(r"dog","cat" , line)
line = re.sub(r"123", "789" , line)
# Etc.
# Obviously this doesn't work, because it is only writing the last instance of line.
fo.write(line)
fo.close()
Is this what you're looking for?
Unfortunately you didn't specify how you know which regexes are supposed to be applied, so I put them into a list of tuples (first element is the regex, second is the replacement text).