Pandas - New Row for Each Day in Date Range

2019-09-20 06:03发布

I have a Pandas df with one column (Reservation_Dt_Start) representing the start of a date range and another (Reservation_Dt_End) representing the end of a date range.

Rather than each row having a date range, I'd like to expand each row to have as many records as there are dates in the date range, with each new row representing one of those dates.

See the two pics below for an example input and the desired output.

InputOutput

The code snippet below works!! However, for every 250 rows in the input table, it takes 1 second to run. Given my input table is 120,000,000 rows in size, this code will take about one week to run.

pd.concat([pd.DataFrame({'Book_Dt': row.Book_Dt,
                         'Day_Of_Reservation': pd.date_range(row.Reservation_Dt_Start, row.Reservation_Dt_End),
                         'Pickup': row.Pickup,
                         'Dropoff' : row.Dropoff,
                         'Price': row.Price}, 

                          columns=['Book_Dt','Day_Of_Reservation', 'Pickup', 'Dropoff' , 'Price']) 
                          for i, row in df.iterrows()], ignore_index=True)

There has to be a faster way to do this. Any ideas? Thanks!

标签: python pandas
1条回答
The star\"
2楼-- · 2019-09-20 06:46

pd.concat in a loop with a large dataset gets pretty slow as it will make a copy of the frame each time and return a new dataframe. You are attempting to do this 120m times. I would try to work with this data as a simple list of tuples instead then convert to dataframe at the end.

e.g.

Given a list list = []

For each row in the dataframe:

  • get list of date range (can use pd.date_range here still) store in variable dates which is a list of dates

  • for each date in date range, add a tuple to the list list.append((row.Book_Dt, dates[i], row.Pickup, row.Dropoff, row.Price))

Finally you can convert the list of tuples to a dataframe:

df = pd.DataFrame(list, columns = ['Book_Dt', 'Day_Of_Reservation', 'Pickup', 'Dropoff', 'Price'])

查看更多
登录 后发表回答