i have 10 different subdirectories with same file names in each directory ( 20 files per directory ) and column 0 is the index column in each file.
e.g
**strong text**DIRECTORY A
- data_20170101_k.csv
- data_20170102_k.csv
- data_20170102_k.csv
- data_20170103_k.csv
- data_20170104_k.csv
- data_20170105_k.csv
.....
.....
- data_20170120_k.csv
**DIRECTORY B**
- data_20170101_k.csv
- data_20170102_k.csv
- data_20170102_k.csv
- data_20170103_k.csv
- data_20170104_k.csv
- data_20170105_k.csv
.....
.....
- data_20170120_k.csv
**DIRECTORY C**
- data_20170101_k.csv
- data_20170102_k.csv
- data_20170102_k.csv
- data_20170103_k.csv
- data_20170104_k.csv
- data_20170105_k.csv
.....
.....
- data_20170120_k.csv
Each of the above files contains 6 columns and index_col = 0 with NO
column headers
**DIRECTORY FILES_MERGED**
- data_20170101_k.csv
- data_20170102_k.csv
- data_20170102_k.csv
- data_20170103_k.csv
- data_20170104_k.csv
- data_20170105_k.csv
.....
.....
- data_20170120_k.csv
I want to merge all the files with SAME NAME from EACH subdirectory into 1 file with SAME NAME and save the new file in a NEW subdirectory e.g DIRECTORY FILES_MERGED with INDEX = Column 0. The merged file has only one index column with columns 1,2,3,4,5 from each file with same name from each directory
i have read a csv file into a pandas dataframe
df= pd.read_csv(filename, sep=",", header = None, usecols=[0, 1, 2, 3, 4, 5])
Here is the format of dataframe
my initial original Dataframe:
0 1 2 3 4 5
0 1451606820 1.0862 1.08630 1.08578 1.08578 25
1 1451608800 1.0862 1.08630 1.08578 1.08610 10
2 1451608860 1.0862 1.08620 1.08578 1.08578 16
3 1451610180 1.0862 1.08630 1.08578 1.08578 27
4 1451610480 1.0858 1.08590 1.08560 1.08578 21
5 1451610540 1.0857 1.08578 1.08570 1.08578 2
6 1451610600 1.0857 1.08578 1.08570 1.08578 2
7 1451610720 1.0857 1.08578 1.08570 1.08578 2
8 1451610780 1.0857 1.08578 1.08570 1.08578 2
Column '0' = Datetime in Epoch time
Columns 1,2,3,4,5 are values
This can be achieved in much simple way in shell as:
(Note: Don't use .csv in extension as it will cause inconsistency with find. After this command is finished, file can be renamed as .csv
There are many ways to do this, staying in Pandas I did the following.
With the file structure
This code will work, it's a little verbose for explanation but you can shorten with implementation.