Dataframe.resample() works only with timeseries data. I cannot find a way of getting every nth row from non-timeseries data. What is the best method?
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to get the background from multiple images by
- Evil ctypes hack in python
- Correctly parse PDF paragraphs with Python
I'd use
iloc
, which takes a row/column slice, both based on integer position and following normal python syntax.I had a similar requirement, but I wanted the n'th item in a particular group. This is how I solved it.
There is an even simpler solution to the accepted answer that involves directly invoking
df.__getitem__
.For example, to get every 2 rows, you can do
There's also
GroupBy.first
/GroupBy.head
, you group on the index:The index is floor-divved by the stride (2, in this case). If the index is non-numeric, instead do
Though @chrisb's accepted answer does answer the question, I would like to add to it the following.
A simple method I use to get the
nth
data or drop thenth
row is the following:This arithmetic based sampling has the ability to enable even more complex row-selections.
This assumes, of course, that you have an
index
column of ordered, consecutive, integers starting at 0.