I wish to order a data frame based on different columns, one at a turn. I have a character vector with the relevant column names on which the order
should be based:
parameter <- c("market_value_LOCAL", "ep", "book_price", "sales_price", "dividend_yield",
"beta", "TOTAL_RATING_SCORE", "ENVIRONMENT", "SOCIAL", "GOVERNANCE")
I wish to loop over the names in parameter
and dynamically select the column to be used to order
my data:
Q1_R1000_parameter <- Q1_R1000[order(Q1_R1000$parameter[X]), ]
where X
is 1:10
(because I have 10 items in parameter
).
To make my example reproducible, consider the data set mtcars
and some variable names stored in a character vector cols
. When I try to select a variable from mtcars
using a dynamic subset of cols
, in a similar way as above (Q1_R1000$parameter[X]
), the column is not selected:
cols <- c("cyl", "am")
mtcars$cols[1]
# NULL
if you want to select column with specific name then just do
you can run it in loop as well reverse way to add dynamic name eg if A is data frame and xyz is column to be named as x then I do like this
again this can also be added in loop
Using dplyr provides an easy syntax for sorting the data frames
It might be useful to use the NSE version to allow dynamically building the sort list
Another solution is to use #get:
You can't do that kind of subsetting with
$
. In the source code (R/src/main/subset.c
) it states:Second argument? What?! You have to realise that
$
, like everything else in R, (including for instance(
,+
,^
etc) is a function, that takes arguments and is evaluated.df$V1
could be rewritten asor indeed
But...
...for instance will never work, nor will anything else that must first be evaluated in the second argument. You may only pass a string which is never evaluated.
Instead use
[
(or[[
if you want to extract only a single column as a vector).For example,
You can perform the ordering without loops, using
do.call
to construct the call toorder
. Here is a reproducible example below:Had similar problem due to some CSV files that had various names for the same column.
This was the solution:
I wrote a function to return the first valid column name in a list, then used that...