I tried using the reshape package to reshape a dataframe I got, but when using it, numbers in the dataframe are changed which should not be.
The dataframe contains several variables as well as multiple times these variables have been measured, for each person there are 6 rows, that is 6 times that person has been measured. Now I want to reshape the dataframe so there is only one row for each person instead of 6, that means every variable should be there 6 times (once for every measurement), this should easily be done with the following code:
melteddata <- melt(daten, id=(c("IDParticipant", "looporder")))
datenrestrukturiert <- dcast(melteddata, IDParticipant~looporder+variable)
with "daten" being the original dataframe, "looporder" being the variable that reflects the time of measurement (1-6), here an example (unfortunately I could not figure out how to post tables):
https://www.dropbox.com/s/8c9dm4rttedbzw1/daten.jpg?dl=0
or maybe this is fine:
structure(list(IDParticipant = c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 3L, 3L, 3L), looporder = c(1L, 2L, 3L, 5L, 6L, 2L, 3L,
5L, 6L, 1L, 2L, 3L), pc_mean_1 = c(NA, 3.22222222222222, NA,
3.22222222222222, 3.22222222222222, 3.66666666666667, 3.66666666666667,
3.66666666666667, 3.66666666666667, 3.25, NA, 3.25), bd_mean_1 = c(NA,
2.88888888888889, NA, 2.88888888888889, 2.88888888888889, 2.75,
2.75, 2.75, 2.75, 4.08333333333333, NA, 4.08333333333333), sm = c(999,
4, 999, 3.66666666666667, 1, 4, 4, 5, 5, 5, 999, 5), cm = c(999,
1.33333333333333, 999, 2.33333333333333, 1, 2, 2, 2.33333333333333,
1, 3, 999, 1.66666666666667)), .Names = c("IDParticipant", "looporder",
"pc_mean_1", "bd_mean_1", "sm", "cm"), row.names = c(NA, 12L), class = "data.frame")
datenrestrukturiert looks as the following:
https://www.dropbox.com/s/al93lnj76y1j266/datenrestrukturiert.jpg?dl=0
I do not want to aggregate or anything, which is why I tried adding fun.aggregate = NULL
without any change, also there is always the following message:
"Aggregation function missing: defaulting to length"
so far everything worked, but there is one problem: when using dcast (as well as cast) some numbers from variables are changed, mostly to "0" or "1", but usually there should be some other numbers like "3.44" or "4.77" or something like that, but they are changed to mostly "0" when cast is computed
Anybody got any hints why this could be?
Some more information that might help: when i import the dataset via read.csv2 I always get a strange name for the first variable, that is some more symbols in front of the variablename than shown in Excel: "ï..IDParticipant" which I rename to "IDParticipant", could that have anything to do with it?
another sidefact: running it with the sampleframe I provided, everything is fine, the original dataframe consists of 1404 rows and 353 variables, could it be too big for R?
If you have duplicated combinations of your LHS and RHS variables, then you either need to (1) create a secondary level of IDs, or (2) perform some form of aggregation.
You can test for duplicates by using
any(duplicated(...))
.Here's an example, using your existing sample of "daten" (which does not contain duplicates):
However, since
any(duplicated(...))
is giving youTRUE
, you are likely to have something more similar to:In this case, you can consider using
getanID
from my "splitstackshape" package to conveniently add a secondary "id" to your dataset.here is my solution basend on Anandas suggestions (thank you very much for that)
dataframe is "daten" containing many variables, e.g. "IDParticipant", "looporder" and "sm"
first we need to create an object containing the variables for the later use of the melt- and cast-function
idvars <- c("IDParticipant", "looporder")
as it turns out, there were duplicates in the dataframe with the same values in the two variables "IDParticipant" and "looporder", so we need to add another id-varaible to the dataframe when melting it, that is to be done with "getanID" from the splitstackshape-package
melteddata <- melt(getanID(daten, idvars), c(".id", idvars))
after adding an extra id-variable, we can finally cast the dataframe we need using the extra id-variable and the other variables
datenrestrukturiert <- dcast(melteddata, .id + IDParticipant ~ variable + looporder)