I use the following code to summarize my data, grouped by Compound, Replicate and Mass.
summaryDataFrame <- ddply(reviewDataFrame, .(Compound, Replicate, Mass),
.fun = calculate_T60_Over_T0_Ratio)
An unfortunate side effect is that the resulting data frame is sorted by those fields. I would like to do this and keep Compound, Replicate and Mass in the same order as in the original data frame. Any ideas? I tried adding a "Sorting" column of sequential integers to the original data, but of course I can't include that in the .variables since I don't want to 'group by' that, and so it is not returned in the summaryDataFrame.
Thanks for the help.
I eventually ended up adding an 'indexing' column to the original data frame. It consisted of two columns
pasted
withsep="_"
. Then I made another data frame made of onlyunique
members of the 'indexing' column and a counter1:length(df)
. I did myddply()
on the data which returned a sorted data frame. Then to get things back in the original order I didmerge()
the results data frame and the index data frame (making sure the columns are named the same thing makes this easier). Finally, I didorder
and removed the extraneous columns.Not an elegant solution, but one that works.
Thanks for the assist. It got me thinking in the right direction.
This came up on the
plyr
mailing list a while back (raised by @kohske no less) and this is a solution offered by Peter Meilstrup for limited cases:Please do read the thread for Hadley's notes about why this functionality may not be general enough to roll into
ddply
, particularly as it probably applies in your case as you are likely returning fewer rows with each piece.Edited to include a strategy for more general cases
If
ddply
is outputting something that is sorted in an order you do not like you basically have two options: specify the desired ordering on the splitting variables beforehand using ordered factors, or manually sort the output after the fact.For instance, consider the following data:
using strings, for now.
ddply
will sort the output, which in this case will entail the default lexical ordering:If the resulting data frame isn't ending up in the "right" order, it's probably because you really want some of those variables to be ordered factors. Suppose that we really wanted
x1
andx2
ordered like so:Now when we use
ddply
, the resulting sort will be as we intend:The moral of the story here is that if
ddply
is outputting something in an order you didn't intend, it's a good sign that you should be using ordered factors for the variables you're splitting on.