Column Manipulations with date-Time Pandas

I am trying to do some column manipulations with row and column at same time including date and time series in Pandas. Traditionally with no series python dictionaries are great. But with Pandas it a new thing for me.

Input file : N number of them.

File1.csv, File2.csv, File3.csv, ........... Filen.csv 

Ids,Date-time-1    Ids,Date-time-2  Ids,Date-time-1
56,4568          645,5545         25,54165
45,464           458,546

I am trying to merge the Date-time column of all the files into a big data file with respect to Ids

Ids,Date-time-ref,Date-time-1,date-time-2
56,100,4468,NAN
45,150,314,NAN
645,50,NAN,5495
458,200,NAN,346
25,250,53915,NAN

Check for date-time column - If not matched create one and then fill the values with respect to Ids by Subtracting the current date-time value with the value of date-time-ref of that respective Ids.
Fill in empty place with NAN and if next file has that value then replace the new value with NAN

If it were straight column subtract it was pretty much easy but in sync with date-time series and with respect to Ids seems a bit confusing.

Appreciate some suggestions to begin with. Thanks in advance.

标签： python date pandas time

1条回答

手持菜刀，她持情操

2楼-- · 2019-09-05 12:18

Here is one way to do it.

import pandas as pd
import numpy as np
from StringIO import StringIO

# your csv file contents
csv_file1 = 'Ids,Date-time-1\n56,4568\n45,464\n'
csv_file2 = 'Ids,Date-time-2\n645,5545\n458,546\n'
# add a duplicated Ids record for testing purpose
csv_file3 = 'Ids,Date-time-1\n25,54165\n645, 4354\n'
csv_file_all = [csv_file1, csv_file2, csv_file3]

# read csv into df using list comprehension
# I use buffer here, replace stringIO with your file path
df_all = [pd.read_csv(StringIO(csv_file)) for csv_file in csv_file_all]


# processing
# =====================================================
# concat along axis=0, outer join on axis=1
merged = pd.concat(df_all, axis=0, ignore_index=True, join='outer').set_index('Ids')


  Out[206]: 
        Date-time-1  Date-time-2
   Ids                          
   56          4568          NaN
   45           464          NaN
   645          NaN         5545
   458          NaN          546
   25         54165          NaN
   645         4354          NaN

# custom function to handle/merge duplicates on Ids (axis=0)
def apply_func(group):
    return group.fillna(method='ffill').iloc[-1]

# remove Ids duplicates
merged_unique = merged.groupby(level='Ids').apply(apply_func)


  Out[207]: 
        Date-time-1  Date-time-2
   Ids                          
   25         54165          NaN
   45           464          NaN
   56          4568          NaN
   458          NaN          546
   645         4354         5545

# do the subtraction
master_csv_file = 'Ids,Date-time-ref\n56,100\n45,150\n645,50\n458,200\n25,250\n'
df_master = pd.read_csv(io.StringIO(master_csv_file), index_col=['Ids']).sort_index()

# select matching records and horizontal concat
df_matched = pd.concat([df_master,merged_unique.reindex(df_master.index)], axis=1)

# use broadcasting
df_matched.iloc[:, 1:] = df_matched.iloc[:, 1:].sub(df_matched.iloc[:, 0], axis=0)



   Out[208]: 
        Date-time-ref  Date-time-1  Date-time-2
   Ids                                         
   25             250        53915          NaN
   45             150          314          NaN
   56             100         4468          NaN
   458            200          NaN          346
   645             50         4304         5495

0人赞添加讨论(0) 举报

Column Manipulations with date-Time Pandas

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间