通过一个日期列在过滤数据帧大熊猫(filter dataframe in pandas by a d

2019-10-29 13:03发布

该数据是在下面的链接: http://www.fdic.gov/bank/individual/failed/banklist.html

我想仅收在2017年我怎么能做到这一点的大熊猫银行?

failed_banks= pd.read_html('http://www.fdic.gov/bank/individual/failed/banklist.html')
failed_banks[0]

我应该怎么做之后的几行代码来提取所期望的结果?

Answer 1:

理想情况下,你会使用

# assuming pandas successfully parsed this column as datetime object
# and pandas version >= 0.16
failed_banks= pd.read_html('http://www.fdic.gov/bank/individual/failed/banklist.html')[0]
failed_banks = failed_banks[failed_banks['Closing Date'].dt.year == 2017]

但大熊猫不能正确解析Closing Date为Date对象,所以我们需要分析它自己:

failed_banks = pd.read_html('http://www.fdic.gov/bank/individual/failed/banklist.html')[0]

def parse_date_strings(date_str):
    return int(date_str.split(', ')[-1]) == 2017

failed_banks = failed_banks[failed_banks['Closing Date'].apply(parse_date_strings)]


Answer 2:

像这样的东西应该工作

提取结束的一年。

# using pd.to_datetime
closing_year = pd.to_datetime(failed_banks[0]['Updated Date']).apply(lambda x: x.year)
# or by splitting the line
closing_year = failed_banks[0]['Updated Date'].apply(lambda x: x.split(', ')[1])

和选择。

failed_banks[0][closing_year=='2017']


文章来源: filter dataframe in pandas by a date column