Selecting column sequences and creating variables

2019-09-22 10:40发布

问题:

I was wondering if there was a way to select specific columns via a sequence and create new variables from this.

So for example, if I had 8 columns with n observations, how could I create 4 variables that selects 2 rows sequentially? My dataset is much larger than this and I have 1416 variables with 62 observations each (I have pasted a link to the spreadsheet below, whereby the first column and row represent names). I would like to create new dataframes from this named as sites 1-12. So site 1 = df[,1:117]; site 2 = df [,119:237] etc.

I am planning on using this code for future datasets with even more variables so some form of loop or sequence function would be very effective if anyone could shed any light on how to achieve this?

https://www.dropbox.com/s/p1a5cu567lxntmw/MyData.csv?dl=0

Thank you in advance.

James

p.s @nrussell I have copied and pasted the output of the code you mentioned below, it follows on as a series of numbers like those displayed.

dput(z[ , 1:10]) structure(list(1 = c(0, 0, 0, 0, 0, 0, 0, 0, 0.0311410340342049, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0207444023791158, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0312971643732546, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0376287494579976, 0, 0, 0, 0, 0, 0, 0),......... 10 = c(0, 0, 0, 0, 0.119280313679916, 0, 0, 0.301029995663981, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.715681882079494, 0.136831816210901, 0, 0, 0, 0.0273663632421801, 0, 0, 0, 0.0547327264843602, 0, 0, 0, 0, 0.0231561535126139, 0, 0, 0.0903089986991944, 0, 0, 0.0752574989159953, 0.159368821233872, 0.0272640716982664, 0.0177076468037636, 0, 0, 0.120411998265592, 0, 0, 0, 0, 0.0322532138211408, 0.0250858329719984, 0, 0, 0, 0.119280313679916, 0, 0.172922500085254, 0.225772496747986, 0, 0, 0, 0.0954242509439325, 0)), .Names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"), class = "data.frame", row.names = c(NA, -62L))

回答1:

We could split the dataset ('df') with '1416' columns to equal size '118' columns by creating a grouping index with gl

 lst <- setNames(lapply(split(1:ncol(df), as.numeric(gl(ncol(df), 118,
            ncol(df)))), function(i) df[,i]), paste0('site', 1:12))

Or you can create the 'lst' without using the split

 lst <- setNames(lapply(seq(1, ncol(df), by = 118), 
            function(i) df[i:(i+117)]), paste0('site', 1:12))

If we need to create 12 dataset objects in the global environment, list2env is an option (I would prefer to work within the 'lst' itself)

 list2env(lst, envir=.GlobalEnv)

Using a small dataset ('df1') with '8' columns

  lst1 <- setNames(lapply(split(1:ncol(df1), as.numeric(gl(ncol(df1), 
         2, ncol(df1)))), function(i) df1[,i]), paste0('site', 1:4))
  list2env(lst1, envir=.GlobalEnv)

  head(site1,3)
  #  V1 V2
  #1  6 12
  #2  4  7
  #3 14 14

 head(site4,3)
 #  V7 V8
 #1 10  2
 #2  5  4
 #3  5  0

data

set.seed(24)
df1 <- as.data.frame(matrix(sample(0:20, 8*10, replace=TRUE), ncol=8))


标签: r dataframe seq