How can I use the row.names attribute to order the

2019-03-14 17:02发布

问题:

I created a random forest and predicted the classes of my test set, which are living happily in a dataframe:

row.names   class  
564028      1
275747      1
601137      0
922930      1
481988      1
...

The row.names attribute tells me which row is which, before I did various operations that scrambled the order of the rows during the process. So far so good.

Now I would like get a general feel for the accuracy of my predictions. To do this, I need to take this dataframe and reorder it in ascending order according to the row.names attribute. This way, I can compare the observations, row-wise, to the labels, which I already know.

Forgive me for asking such a basic question, but for the life of me, I can't find a good source of information regarding how to do such a trivial task.

The documentation implores me to:

use attr(x, "row.names") if you need to retrieve an integer-valued set of row names.

but this leaves me with nothing but NULL.

My question is, how can I use row.names which has been loyally following me around in the various incarnations of dataframes throughout my workflow? Isn't this what it is there for?

回答1:

This worked for me:

new_df <- df[ order(row.names(df)), ]


回答2:

None of the solutions would actually work. It should be:

df[ order(as.numeric(row.names(df))),] #assuming the data frame is called df

because rowname in R is 'character', when the as.numeric part is missing it arrange the data as 1, 10, 11 ... so on.



回答3:

For completeness:

@BondedDust's answer works perfectly for the rownames attribute, but your example does not use the rownames attribute. The output provided in your question indicates use of a column named "row.names", which isn't the same thing (all listed in @BondedDust's comment). Here would be the answer if you wished to sort by the "row.names" column in example given in your question (there is another posting on this, located here). This answer assumes you are using a dataframe named "df", with one column named "row.names":

ordered.df <- df[order(df$row.names),]   #this orders the df by the "row.names" column

Alternatively, to order by the first column (same thing if you're still using your example):

ordered.df <- df[order(df[,1]),]         #this orders the df by the first column

Hope this is helpful!



回答4:

This will be done almost automatically since the "[" function will display in lexical order of any vector that can be matched to rownames():

df[ rownames(df) , ]

You might have thought it would be necessary to use:

df[ order(rownames(df)) , ]

But that would have given you an ordering of 1:100 of 1,10,100, 12,13, ...,2,20,21, ... , because the argument to "[" gets coerced to character.



回答5:

Assuming your data frame is named 'df'you can create a new ordered data frame 'ord.df' that will contain the row names of df as well as it values in the following one line of code:

>ord.df<-cbind(rownames(df)[order(rownames(df))], df[order(rownames(df)),])


回答6:

new_df <- df[ order(row.names(df)), ]  

or something similar won't work. After this statement, the new_df does not have a rowname any more. I guess a better solution is to add a column as rowname, sort by it, and set it as the rowname



回答7:

If you have only one column in your dataframe like in my case you have to add drop=F:

df[ order(rownames(df)) , ,drop=F]


回答8:

you can simply sort your df by using this :

df <- df[sort(rownames(df)),]

and then do what you want !