How to slice a dataframe by selecting a range of c

2019-04-28 17:06发布

问题:

This is a follow-up question of the question I asked here. There I learned a) how to do this for columns (see below) and b) that the selection of rows and columns seems to be quite differently handled in R which means that I cannot use the same approach for rows.

So suppose I have a pandas dataframe like this:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(10, size=(6, 6)),
                  columns=['c' + str(i) for i in range(6)],
                  index=["r" + str(i) for i in range(6)])

    c0  c1  c2  c3  c4  c5
r0   4   2   3   9   9   0
r1   9   0   8   1   7   5
r2   2   6   7   5   4   7
r3   6   9   9   1   3   4
r4   1   1   1   3   0   3
r5   0   8   5   8   2   9

then I can easily select rows and columns by their names like this:

print df.loc['r3':'r5', 'c1':'c4']

which returns

    c1  c2  c3  c4
r3   9   9   1   3
r4   1   1   3   0
r5   8   5   8   2

How would I do this in R? Given a dataframe like this

df <- data.frame(c1=1:6, c2=2:7, c3=3:8, c4=4:9, c5=5:10, c6=6:11)
rownames(df) <- c('r1', 'r2', 'r3', 'r4', 'r5', 'r6')

   c1 c2 c3 c4 c5 c6
r1  1  2  3  4  5  6
r2  2  3  4  5  6  7
r3  3  4  5  6  7  8
r4  4  5  6  7  8  9
r5  5  6  7  8  9 10
r6  6  7  8  9 10 11

Apparently, if I know the indexes of my desired rows/columns, I can simply do:

df[3:5, 1:4]

but I might delete rows/columns throughout my analysis so that I would rather select by name than by index. From the link above I learned that for columns the following would work:

subset(df, select=c1:c4)

which returns

  c1 c2 c3 c4
r1  1  2  3  4
r2  2  3  4  5
r3  3  4  5  6
r4  4  5  6  7
r5  5  6  7  8
r6  6  7  8  9

but how could I also select a range of rows by name at the same time?

In this particular case I could of course use grep but how about columns that have arbitrary names?

And I don't want to use

df[c('r3', 'r4' 'r5'), c('c1','c2', 'c3', 'c4')]

but an actual slice.

回答1:

You can use which() with rownames:

subset(df[which(rownames(df)=='r3'):which(rownames(df)=='r5'),], select=c1:c4)


   c1 c2 c3 c4
r3  3  4  5  6
r4  4  5  6  7
r5  5  6  7  8


回答2:

You can write a function that will kinda give you the same behavior

'%:%' <- function(object, range) {
  FUN <- if (!is.null(dim(object))) {
    if (is.matrix(object)) colnames else names
  } else identity
  wh <- if (is.numeric(range)) range else which(FUN(object) %in% range)
  FUN(object)[seq(wh[1], wh[2])]
}

df <- data.frame(c1=1:6, c2=2:7, c3=3:8, c4=4:9, c5=5:10, c6=6:11)
rownames(df) <- c('r1', 'r2', 'r3', 'r4', 'r5', 'r6')

Use it like

df %:% c('c2', 'c4')
# [1] "c2" "c3" "c4"

rownames(df) %:% c('r2', 'r4')
# [1] "r2" "r3" "r4"

For your question

df[rownames(df) %:% c('r3', 'r5'), df %:% c('c1', 'c5')]
#    c1 c2 c3 c4 c5
# r3  3  4  5  6  7
# r4  4  5  6  7  8
# r5  5  6  7  8  9


回答3:

Use match to find the position of specific row names.

df[match("r3", rownames(df)):match("r5", rownames(df)), match("c1", colnames(df)):match("c4", colnames(df))]

   c1 c2 c3 c4
r3  3  4  5  6
r4  4  5  6  7
r5  5  6  7  8