可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I want to split the calendar into two-week intervals starting at 2008-May-5
, or any arbitrary starting point.
So I start with several date objects:
import datetime as DT
raw = ("2010-08-01",
"2010-06-25",
"2010-07-01",
"2010-07-08")
transactions = [(DT.datetime.strptime(datestring, "%Y-%m-%d").date(),
"Some data here") for datestring in raw]
transactions.sort()
By manually analyzing the dates, I am quite able to figure out which dates fall within the same fortnight interval. I want to get grouping that's similar to this one:
# Fortnight interval 1
(datetime.date(2010, 6, 25), 'Some data here')
(datetime.date(2010, 7, 1), 'Some data here')
(datetime.date(2010, 7, 8), 'Some data here')
# Fortnight interval 2
(datetime.date(2010, 8, 1), 'Some data here')
回答1:
import datetime as DT
import itertools
start_date=DT.date(2008,5,5)
def mkdate(datestring):
return DT.datetime.strptime(datestring, "%Y-%m-%d").date()
def fortnight(date):
return (date-start_date).days //14
raw = ("2010-08-01",
"2010-06-25",
"2010-07-01",
"2010-07-08")
transactions=[(date,"Some data") for date in map(mkdate,raw)]
transactions.sort(key=lambda (date,data):date)
for key,grp in itertools.groupby(transactions,key=lambda (date,data):fortnight(date)):
print(key,list(grp))
yields
# (55, [(datetime.date(2010, 6, 25), 'Some data')])
# (56, [(datetime.date(2010, 7, 1), 'Some data'), (datetime.date(2010, 7, 8), 'Some data')])
# (58, [(datetime.date(2010, 8, 1), 'Some data')])
Note that 2010-6-25 is in the 55th fortnight from 2008-5-5, while 2010-7-1 is in the 56th. If you want them grouped together, simply change start_date
(to something like 2008-5-16).
PS. The key tool used above is itertools.groupby
, which is explained in detail here.
Edit: The lambda
s are simply a way to make "anonymous" functions. (They are anonymous in the sense that they are not given names like functions defined by def
). Anywhere you see a lambda, it is also possible to use a def
to create an equivalent function. For example, you could do this:
import operator
transactions.sort(key=operator.itemgetter(0))
def transaction_fortnight(transaction):
date,data=transaction
return fortnight(date)
for key,grp in itertools.groupby(transactions,key=transaction_fortnight):
print(key,list(grp))
回答2:
Use itertools groupby with lambda function to divide by the length of period the distance from starting point.
>>> for i, group in groupby(range(30), lambda x: x // 7):
print list(group)
[0, 1, 2, 3, 4, 5, 6]
[7, 8, 9, 10, 11, 12, 13]
[14, 15, 16, 17, 18, 19, 20]
[21, 22, 23, 24, 25, 26, 27]
[28, 29]
So with dates:
import itertools as it
start = DT.date(2008,5,5)
lenperiod = 14
for fnight,info in it.groupby(transactions,lambda data: (data[0]-start).days // lenperiod):
print list(info)
You can use also weeknumbers from strftime, and lenperiod in number of weeks:
for fnight,info in it.groupby(transactions,lambda data: int (data[0].strftime('%W')) // lenperiod):
print list(info)
回答3:
Using a pandas DataFrame
with resample
works too. Given OP's data, but change "some data here" to 'abcd'.
>>> import datetime as DT
>>> raw = ("2010-08-01",
... "2010-06-25",
... "2010-07-01",
... "2010-07-08")
>>> transactions = [(DT.datetime.strptime(datestring, "%Y-%m-%d"), data) for
... datestring, data in zip(raw,'abcd')]
[(datetime.datetime(2010, 8, 1, 0, 0), 'a'),
(datetime.datetime(2010, 6, 25, 0, 0), 'b'),
(datetime.datetime(2010, 7, 1, 0, 0), 'c'),
(datetime.datetime(2010, 7, 8, 0, 0), 'd')]
Now try using pandas. First create a DataFrame
, naming the columns and setting the indices to the dates.
>>> import pandas as pd
>>> df = pd.DataFrame(transactions,
... columns=['date','data']).set_index('date')
data
date
2010-08-01 a
2010-06-25 b
2010-07-01 c
2010-07-08 d
Now use the Series Offset Aliases to every 2 weeks starting on Sundays and concatenate the results.
>>> fortnight = df.resample('2W-SUN').sum()
data
date
2010-06-27 b
2010-07-11 cd
2010-07-25 0
2010-08-08 a
Now drill into the data as needed by weekstart
>>> fortnight.loc['2010-06-27']['data']
b
or index
>>> fortnight.iloc[0]['data']
b
or indices
>>> data = fortnight.iloc[:2]['data']
b
date
2010-06-27 b
2010-07-11 cd
Freq: 2W-SUN, Name: data, dtype: object
>>> data[0]
b
>>> data[1]
cd