I have thousands of zipped csv files named like this:
result-20120705-181535.csv.gz
181535 means 18:15:35, now I want to merge these files on daily basis(I have data over a week, all named like the above example), from 2:00 am in the morning till 2:00 am the next day,then moved the processed files into a folder called merged
so in the current folder, I have tons of .csv.gz files, and I want to scan the names, merge everything like 20120705-02*, 20120705-03*
...until 20120706-01*
into 20120705-result.csv.gz
, then move 20120705-02*, 20120705-03*
...until 20120706-01*
files into a folder called merged, and started to find the next day's data: 20120706-02*.....20120707-01*
I am wondering whether to use python or bash script to do it, and how?
This answer is completely untested, but hopefully it will give a place to work from:
Create a textfile containing these lines:
and save it with a
.sh
extention (say, myscript.sh). Next, in a terminal, typeNow you can type things like
which will then do as you described.
To automatically execute this on a daily basis, you can put a line in your
/etc/crontab
file, something likeassuming creating the last .csv.gz file takes 1 minute, plus 1 extra minute just to be sure :)
For this way of automation to work properly, the script above needs to be modified a bit. Assuming it will then operate on the current day, change the two lines defining the dates:
That should do. As always, test it thoroughly before automating it!