Problems splitting data frame into a nested list

2019-02-28 19:40发布

I am a newbie to R and I have problem splitting a very large data frame into a nested list. I tried to look for help on the internet, but I was unsuccessful.

I have a simplified example on how my data are organized:

The headers are:

1 "station" (number)
2. "date.str" (date string)
3. "member"
4. "forecast time"
5. "data"

I am not sure my data example will show up rightly, but if so, it look like this:

1. station date.str member forecast.time data1
2. 6019 20110805 mbr000 06 77
3. 6031 20110805 mbr000 06 28
4. 6071 20110805 mbr000 06 45
5. 6019 20110805 mbr001 12 22
6. 6019 20110806 mbr024 18 66

I want to split the large data frame into a nested list after "station", "member", "date.str" and "forecast.time". So that mylist[[c(s,m,d,t)]] contains a data frame with data for station "s" and member "m" for date.str "d" and for forecast time "t" conserving the values of s, m, d and t.

My code is:

data.st <- list()
data.st.member <- list()
data.st.member.dato <- list()

data.st. <- split(mydata, mydata$station)
data.st.member <- lapply(data.st, FUN = fsplit.member)

(I created a function to split after "member")

#Loop over station number:
for (s in 1:S){

#Loop over members:
for (m in 1:length(members){
tmp <- split( data.st.member[[s]][[m]], data.st.member[[s]][[m]]$dato.str )

#Loop over number of different "date.str"s
for (t in 1:length(no.date.str) ){
data.st.member.dato[[s]][[m]][[t]] <- tmp}
} #end m loop
} #end s loop

I would also like to split according to the forecast time: forec.time, but I didn't get that far.

I have tried a couple of different configurations within the loops, so I don't at the moment have a consistent error message. I can't figure out, what I am doing or thinking wrong.

Any help is much appreciated!

Regards Sisse

2条回答
淡お忘
2楼-- · 2019-02-28 20:28

It's easier than you think. You can pass a list into split in order to split on several factors.

Reproducible example

with(airquality, split(airquality, list(Month, Day)))

With your data

data.st <- with(mydata, 
  split(mydata, list("station", "member", "date.str", "forecast.time"))
)

Note: This doesn't give you a nested list like you asked for, but as Joran commented, you very probably don't want that. A flat list will be nicer to work with.

Speculating wildly: did you just want to calculate statistics on different chunks of data? If so, then see the many questions here on split-apply-combine problems.

查看更多
劳资没心,怎么记你
3楼-- · 2019-02-28 20:37

I also want to echo the others in that this recursive data structure is going to be difficult to work with and probably there are better ways. Do look at the split-apply-combine approach as Richie suggested. However, the constraints may be external, so here is an answer using the plyr library.

mylist <- dlply(mydata, .(station), dlply, .(memeber), dlply, .(date.str), dlply, .(forecast.time), identity)

Using the snippet of data you gave for mydata,

> mylist[[c("6019","mbr000","20110805","6")]]
  station date.str member forecast.time data1
1    6019 20110805 mbr000             6    77
查看更多
登录 后发表回答