Splitting csv file of multiple objects over time b

2019-07-27 22:39发布

问题:

Here I have an example file of multiple objects each measured at the same time-points (also ND.T represents each unique time point). I would like to split this file into separate files (using a python script) containing all objects unique to each time-point still containing the header.

Original file:

ID  ND.T    Time [s]    Position X [%s] Position Y [%s] Speed [%s]  Area [%s]   Width [%s]  MeanIntensity
1   1        3.87         417.57          11.46          0.06        339.48       14.1          245.65
1   2        8.72         417.37          11.68          0.04        342.61       14.15         239.34
1   3        13.39        417.57          11.66          0.04        344.17       14.3          239.48
2   1        3.87         439.01           6.59          0.02        342.61       11.66         204.47
2   2        8.72         438.97           6.65          0.007       342.61       10.7          197.96
2   3        13.39        438.94           6.66          0.03        345.74       11.03         214.74

Time_3.87.csv

ID  ND.T    Time [s]    Position X [%s] Position Y [%s] Speed [%s]  Area [%s]   Width [%s]  MeanIntensity
1   1        3.87         417.57          11.46          0.06        339.48       14.1          245.65
2   1        3.87         439.01           6.59          0.02        342.61       11.66         204.47

Time_8.72.csv

ID  ND.T    Time [s]    Position X [%s] Position Y [%s] Speed [%s]  Area [%s]   Width [%s]  MeanIntensity
1   2        8.72         417.37          11.68          0.04        342.61       14.15         239.34
2   2        8.72         438.97           6.65          0.007       342.61       10.7          197.96

Time_13.39.csv

ID  ND.T    Time [s]    Position X [%s] Position Y [%s] Speed [%s]  Area [%s]   Width [%s]  MeanIntensity
1   3        13.39        417.57          11.66          0.04        344.17       14.3          239.48
2   3        13.39        438.94           6.66          0.03        345.74       11.03         214.74

Example 2:

ID  ND.T    Time [s]    Position X [%s] Position Y [%s] Speed [%s]  Area [%s]   Width [%s]  MeanIntensity
1   1   3.87    417.57  11.46   0.06    339.48  14.1    245.65
1   2   8.72    417.37  11.68   0.04    342.61  14.15   239.34
1   3   13.39   417.57  11.66   0.04    344.17  14.3    239.48
1   4   18.1    417.73  11.71   0.04    337.92  14.14   225.17
1   5   22.81   417.83  11.89   0.03    344.17  14.64   233.3
1   6   27.48   417.69  11.83   0.02    345.74  14.23   238
1   7   32.16   417.65  11.94   0.03    345.74  14.71   230.75
2   1   3.87    439.01  6.59    0.02    342.61  11.66   204.47
2   2   8.72    438.97  6.65    0.007   342.61  10.7    197.96
2   3   13.39   438.94  6.66    0.03    345.74  11.03   214.74
2   4   18.1    438.9   6.53    0.04    342.61  10.46   202.9
2   5   22.81   438.97  6.7 0.02    342.61  10.3    194.32
2   6   27.48   438.89  6.71    0.006   350.43  11  219.41
2   7   32.16   438.87  6.74    0.05    348.87  10.36   219.58

回答1:

You can use pandas to achieve this:

import pandas as pd
df = pd.read_csv(your_file)
df.groupby('Time [s]').apply(lambda x: x.to_csv(str(x.name) + '.csv'))

The above will load your csv using read_csv and then group on the Time [s] column and use this to name the file

You can see that the df is grouped on the Time [s]:

In [108]:
df.groupby('Time [s]').apply(lambda x: print(x))
   ID  ND.T  Time [s]  Position X [%s]  Position Y [%s]  Speed [%s]  \
0   1     1      3.87           417.57            11.46        0.06   
3   2     1      3.87           439.01             6.59        0.02   

   Area [%s]  Width [%s]  MeanIntensity  
0     339.48       14.10         245.65  
3     342.61       11.66         204.47  
   ID  ND.T  Time [s]  Position X [%s]  Position Y [%s]  Speed [%s]  \
0   1     1      3.87           417.57            11.46        0.06   
3   2     1      3.87           439.01             6.59        0.02   

   Area [%s]  Width [%s]  MeanIntensity  
0     339.48       14.10         245.65  
3     342.61       11.66         204.47  
   ID  ND.T  Time [s]  Position X [%s]  Position Y [%s]  Speed [%s]  \
1   1     2      8.72           417.37            11.68       0.040   
4   2     2      8.72           438.97             6.65       0.007   

   Area [%s]  Width [%s]  MeanIntensity  
1     342.61       14.15         239.34  
4     342.61       10.70         197.96  
   ID  ND.T  Time [s]  Position X [%s]  Position Y [%s]  Speed [%s]  \
2   1     3     13.39           417.57            11.66        0.04   
5   2     3     13.39           438.94             6.66        0.03   

   Area [%s]  Width [%s]  MeanIntensity  
2     344.17       14.30         239.48  
5     345.74       11.03         214.74  

Out[108]:
Empty DataFrame
Columns: []
Index: []

Here groupby will group on 'Time [s]' column, we then call apply to apply a lambda where we call the method to_csv on each grouping, we can access the group name using name attribute which is of dtype int so we cast to str and construct our csv name:

In [109]:
df.groupby('Time [s]').apply(lambda x: print(str(x.name) + '.csv'))

3.87.csv
8.72.csv
13.39.csv
Out[109]:
Empty DataFrame
Columns: []
Index: []