In a pandas dataframe created like this:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(10, size=(6, 6)),
columns=['c' + str(i) for i in range(6)],
index=["r" + str(i) for i in range(6)])
which could look as follows:
c0 c1 c2 c3 c4 c5
r0 2 7 3 3 2 8
r1 6 9 6 7 9 1
r2 4 0 9 8 4 2
r3 9 0 4 3 5 4
r4 7 6 8 8 0 8
r5 0 6 1 8 2 2
I can easily select certain rows and/or a range of columns using .loc
:
print df.loc[['r1', 'r5'], 'c1':'c4']
That would return:
c1 c2 c3 c4
r1 9 6 7 9
r5 6 1 8 2
So, particular rows/columns I can select in a list, a range of rows/columns using a colon.
How would one do this in R? Here and here one always has to specify the desired range of columns by their index but one cannot - or at least I did not find it - access those by name. To give an example:
df <- data.frame(c1=1:6, c2=2:7, c3=3:8, c4=4:9, c5=5:10, c6=6:11)
rownames(df) <- c('r1', 'r2', 'r3', 'r4', 'r5', 'r6')
The command
df[c('r1', 'r5'),'c1':'c4']
does not work and throws an error. The only thing that worked for me is
df[c('r1', 'r5'), 1:4]
which returns
c1 c2 c3 c4
r1 1 2 3 4
r5 5 6 7 8
But how would I select the columns by their name and not by their index (which might be important when I drop certain columns throughout the analysis)? In this particular case I could of course use grep
but how about columns that have arbitrary names?
So I don't want to use
df[c('r1', 'r5'),c('c1','c2', 'c3', 'c4')]
but an actual slice.
EDIT:
A follow-up question can be found here.
A solution using dplyr package but you need to specify the row you want to select before hand
An alternative approach to
subset
if you don't mind to work with data.table would be:This still does not solve the problem of subsetting row range though.
This seems way too easy so perhaps I'm doing something wrong.
It looks like you can accomplish this with a
subset
:If you want to subset by row name range, this hack would do:
Adding onto @evan058's answer:
But note, the
:
operator will probably not work here; you will have to write out the name of each row you want to include explicitly. It might be easier to group by a particular value of one of your other columns or to create an index column as @evan058 mentioned in comments.