I have a few .xy files (2 columns with x and y values). I have been trying to read all of them and paste the "y" values into a single excel file (The "x" values are the same in all these files). The code I have till now reads the files one by one but its extremely slow (it takes about 20 seconds on each file). I have quite a few .xy files and the time adds up considerably. The code I have till now is:
import os,fnmatch,linecache,csv
from openpyxl import Workbook
wb = Workbook()
ws = wb.worksheets[0]
ws.title = "Sheet1"
def batch_processing(file_name):
row_count = sum(1 for row in csv.reader(open(file_name)))
try:
for row in xrange(1,row_count):
data = linecache.getline(file_name, row)
print data.strip().split()[1]
print data
ws.cell("A"+str(row)).value = float(data.strip().split()[0])
ws.cell("B"+str(row)).value = float(data.strip().split()[1])
print file_name
wb.save(filename = os.path.splitext(file_name)[0]+".xlsx")
except IndexError:
pass
workingdir = "C:\Users\Mine\Desktop\P22_PC"
os.chdir(workingdir)
for root, dirnames, filenames in os.walk(workingdir):
for file_name in fnmatch.filter(filenames, "*_Cs.xy"):
batch_processing(file_name)
Any help is appreciated. Thanks.
I think your main issue is that you're writing to Excel and saving on every single line in the file, for every single file in the directory. I'm not sure of how long it takes to actually write the value to Excel, but just moving the
save
out of the loop and saving only once everything has been added should cut a little time. Also, how large are these files? If they are massive, thenlinecache
may be a good idea, but assuming they aren't overly large then you can probably do without it.