Dynamic Variable naming in r

2019-03-16 04:30发布

问题:

structure(list(Metrics = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 
1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 
5L, 6L), .Label = c("  LINESCOMM ", "  NCNBLOC_FILE ", "  RCYCLOMATIC ", 
"  RISK ", "  RMAXLEVEL ", "  RNOEXSTAT "), class = "factor"), 
    Project = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
    2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L
    ), .Label = c("  Demo_Architect ", "  Demo_May_10 ", "  Demo_May_14 ", 
    "  NPP "), class = "factor"), Value = c(1172, 1500, 142, 
    4.241, 24, 98, 1139, 1419, 128, 3.546, 22, 85, 1172, 1500, 
    142, 4.241, 24, 98, 115008, 148903, 14539, 105.914, 604, 
    15710)), .Names = c("Metrics", "Project", "Value"), row.names = c(NA, 
-24L), class = "data.frame")->agg

I am trying to do: For each unique project name is want create separate variable name containing the desired values.

I am trying the below code:

x=data.frame()
attach(agg)
r<-as.character(unique(Project))
for(i in length(agg))
{
  x<-subset(agg,Project==r[i],select=c("Project","Metrics","Value"))
  assign() #This is where i am making mistake while creating dynamic variable naming
}

in other words i want to create separate variable name each time the for loop executes.

NOTE: it is preferred to have variable names to be in the name of "project"columns values.

回答1:

assign is used by supplying the name of the variable you want to create first and the value it should have as second argument. Note that as your project names contain leading blanks I additionally used str_trim to get rid of them.

library(stringr)
projects <- levels(agg$Project)
for (p in projects) {
  x <- subset(agg, Project==p)
  assign(str_trim(p), x)    
}

Now you have the projects as variables in your workspace:

ls()
[1] "agg"            "Demo_Architect" "Demo_May_10"    "Demo_May_14"    "NPP"           
[6] "p"              "projects"       "x"

E.g.

> Demo_Architect
          Metrics           Project    Value
1      LINESCOMM    Demo_Architect  1172.000
2   NCNBLOC_FILE    Demo_Architect  1500.000
3    RCYCLOMATIC    Demo_Architect   142.000
4           RISK    Demo_Architect     4.241
5      RMAXLEVEL    Demo_Architect    24.000
6      RNOEXSTAT    Demo_Architect    98.000


回答2:

Why not just splitting the data.frame and working with list

dflist <- split(agg, agg$Project)
str(dflist)
## List of 4
 ## $   Demo_Architect :'data.frame':   6 obs. of  3 variables:
 ## $   Demo_May_10    :'data.frame':   6 obs. of  3 variables:
 ## $   Demo_May_14    :'data.frame':   6 obs. of  3 variables:
 ## $   NPP            :'data.frame':   6 obs. of  3 variables:

names(dflist) <- paste0("project", seq_along(dflist))

And if you really want to have the list elements (new dfs) in your global environment, you can use list2env.

list2env(dflist, .GlobalEnv)
ls()
## [1] "agg"      "dflist"   "project1" "project2" "project3"
## [6] "project4"

head(project3)
##            Metrics        Project    Value
## 13      LINESCOMM    Demo_May_14  1172.000
## 14   NCNBLOC_FILE    Demo_May_14  1500.000
## 15    RCYCLOMATIC    Demo_May_14   142.000
## 16           RISK    Demo_May_14     4.241
## 17      RMAXLEVEL    Demo_May_14    24.000
## 18      RNOEXSTAT    Demo_May_14    98.000

Just want to point out that it's generally safer to work with list by using lapply, sapply or for loop rather than using the global environment.

EDIT : If you want a different naming scheme

names(dflist) <- paste0("project_", gsub("\\s+", "", levels(agg$Project)))
list2env(dflist, .GlobalEnv)
ls()
## [1] "agg"                    "dflist"                
## [3] "project_Demo_Architect" "project_Demo_May_10"   
## [5] "project_Demo_May_14"    "project_NPP"