Loop through unique values of a df subset, and upd

2019-09-03 00:26发布

问题:

I am struggling to append a df with new variables that are computed from a subset of this df.

I have two goals:

  1. Merge many individual data sets into one. The index variable is "SID" (subject ID).
  2. Create new variables/columns in this master dataset.

The attached data is an example of what the already merged dataset will look like, however contains only 2 (SIDs: 9003 and 1028).

I need to loop through each unique SID and perform several relatively easy computations on some of the variables in the dataframe in order to create and add new variables to it. There should be a unique value in each of these columns PER SID.

At present, I am able to successfully do this for variables that do NOT involve subsetting the df. For instance, the variables 'numPR', 'numDef', and 'numCoop' all exist as they should for each SID (when I comment out the Ingroup portion of the loop). However, when I try to subset the df and look at only rows where a given condition is met (I am using the subset function), I receive this error:

Error in df$numDef_IG[j] <- numDef_IG : replacement has length zero

In addition: Warning messages:

1: In if (df$numCoop[i] < 1) { : the condition has length > 1 and only the first element will be used

2: In numDefCoop_IG[j] <- aggregate(x = numIG, by = list(unique.numValues = numIG$Player_move), : number of items to replace is not a multiple of replacement length

I feel it's something with the inner for-loop that is trying to access the subsetted data, and I also feel like there must be a much more elegant solution than a series of for-loops to achieve this. I have several more variables/columns that are commented out (starting around ~ line 86) to create based on subsets of the df as well.

Any help on this would be much appreciated.

data: stackEx.csv

code:

#Loading data
file = "stackEx.csv"
df = read.csv(file, header = T)
numIG = subset(df, OppGroupCode == 1, select = c("SID", "Player_move"))
numDefCoop_IG = NA

for (sid in unique(df$SID)) { 
  i = df$SID == sid # create a logical index

  #Getting # of PRs in game per SID.
  numPR = sum(df$PreviewRound[i]) # subet the data based on the index
  df$numPR[i] = numPR # assign the values only to those selected rows

  #__Creating new variables and adding to dataset----

  #____Decisions, Overall----
   #_____ defections----
  numDef=sum(df$Player_move[i])
  df$numDef[i] = numDef

  #_____cooperations----
  numCoop=length(df$Player_move[i]) - df$numDef[i]
  df$numCoop[i] = numCoop

  if (df$numCoop[i] < 1){
    df$numCoop[i] = 0
    }
      else df$numCoop[i] = df$numCoop[i]

#_____Ingroup----
  #unique.numValues: 0 = cooperation, 1 = defection. Also adding as column to dataset.

  for (s in unique(numIG$SID)) { 
    j = numIG$SID == s # create a logical index
    numDefCoop_IG[j] = aggregate(x = numIG, by = list(unique.numValues = numIG$Player_move), FUN = length)

    #______defections----
    numDef_IG = ifelse((length(numDefCoop_IG$unique.numValues) == 2) & (numDefCoop_IG$unique.numValues[2] == 1), numDefCoop_IG[2,2],
    ifelse((length(numDefCoop_IG$unique.numValues)== 1) & (numDefCoop_IG[1] == 1), numDefCoop_IG[1,2], 0)[j])

    df$numDef_IG[j]= numDef_IG

    #______cooperations----
    numCoop_IG= ifelse(numDefCoop_IG$unique.numValues[1] == 0, numDefCoop_IG[1,2], 0)[j]
    numCoop_IG = ifelse((length(numDefCoop_IG$unique.numValues) == 2) & (numDefCoop_IG$unique.numValues[1] == 0), numDefCoop_IG[1,2],
                        ifelse((length(numDefCoop_IG$unique.numValues)== 1) & (numDefCoop_IG[1] == 0), numDefCoop_IG[1,2], 0))[j]

    df$numCoop_IG[j]= numCoop_IG

  }
}  

View(df)