Pandas slicing excluding the end

2020-08-10 19:38发布

问题:

When slicing a dataframe using loc,

df.loc[start:end]

both start and end are included. Is there an easy way to exclude the end when using loc?

回答1:

loc includes both the start and end, one less ideal work around is to get the index position and use iloc to slice the data frame (assume you don't have duplicated index):

df=pd.DataFrame({'A':[1,2,3,4]}, index = ['a','b','c','d'])

df.iloc[df.index.get_loc('a'):df.index.get_loc('c')]

#   A
#a  1
#b  2

df.loc['a':'c']

#   A
#a  1
#b  2
#c  3


回答2:

Easiest I can think of is df.loc[start:end].iloc[:-1].

Chops off the last one.



回答3:

None of the answers addresses the situation where end is not part of the index. The more general solution is simply comparing the index to start and end, that way you can enforce either of them being inclusive of exclusive.

df[(df.index >= start) & (df.index < end)]

For instance:

>>> import pandas as pd
>>> import numpy as np

>>> df = pd.DataFrame(
    {
        "x": np.arange(48),
        "y": np.arange(48) * 2,
    },
    index=pd.date_range("2020-01-01 00:00:00", freq="1H", periods=48)
)

>>> start = "2020-01-01 14:00"
>>> end = "2020-01-01 19:30" # this is not in the index

>>> df[(df.index >= start) & (df.index < end)]

                    x   y
2020-01-01 14:00:00 14  28
2020-01-01 15:00:00 15  30
2020-01-01 16:00:00 16  32
2020-01-01 17:00:00 17  34
2020-01-01 18:00:00 18  36
2020-01-01 19:00:00 19  38