Starting from this SO question.
Example data.frame
:
df = read.table(text = 'ID Day Count Count_group
18 1933 6 15
33 1933 6 15
37 1933 6 15
18 1933 6 15
16 1933 6 15
11 1933 6 15
111 1932 5 9
34 1932 5 9
60 1932 5 9
88 1932 5 9
18 1932 5 9
33 1931 3 4
13 1931 3 4
56 1931 3 4
23 1930 1 1
6 1800 6 12
37 1800 6 12
98 1800 6 12
52 1800 6 12
18 1800 6 12
76 1800 6 12
55 1799 4 6
6 1799 4 6
52 1799 4 6
133 1799 4 6
112 1798 2 2
677 1798 2 2
778 888 4 8
111 888 4 8
88 888 4 8
10 888 4 8
37 887 2 4
26 887 2 4
8 886 1 2
56 885 1 1
22 120 2 6
34 120 2 6
88 119 1 6
99 118 2 5
12 118 2 5
90 117 1 3
22 115 2 2
99 115 2 2', header = TRUE)
The Count
col shows the total number of ID
values per each Day
and the Count_group
col shows the sum of the ID
values per each Day
, Day - 1
, Day -2
, Day -3
and Day -4
.
e.g. 1933 = Count_group
15 because Count
6 (1933) + Count
5 (1932) + Count
3 (1931) + Count
1 (1930) + Count
0 (1929).
What I need to do is to create duplicated observations per each Count_group
and add them to it in order to show per each Count_group
its Day
, Day - 1
, Day -2
, Day -3
and Day -4
.
e.g. Count_group
= 15 is composed by the Count values of Day
1933, 1932, 1931, 1930 (and 1929 not present in the df
). So the five days needs to be included in the Count_group
= 15. The next one will be Count_group
= 9, composed by 1932, 1931, 1930, 1929 and 1928; etc...
Desired output:
ID Day Count Count_group
18 1933 6 15
33 1933 6 15
37 1933 6 15
18 1933 6 15
16 1933 6 15
11 1933 6 15
111 1932 5 15
34 1932 5 15
60 1932 5 15
88 1932 5 15
18 1932 5 15
33 1931 3 15
13 1931 3 15
56 1931 3 15
23 1930 1 15
111 1932 5 9
34 1932 5 9
60 1932 5 9
88 1932 5 9
18 1932 5 9
33 1931 3 9
13 1931 3 9
56 1931 3 9
23 1930 1 9
33 1931 3 4
13 1931 3 4
56 1931 3 4
23 1930 1 4
23 1930 1 1
6 1800 6 12
37 1800 6 12
98 1800 6 12
52 1800 6 12
18 1800 6 12
76 1800 6 12
55 1799 4 12
6 1799 4 12
52 1799 4 12
133 1799 4 12
112 1798 2 12
677 1798 2 12
55 1799 4 6
6 1799 4 6
52 1799 4 6
133 1799 4 6
112 1798 2 6
677 1798 2 6
112 1798 2 2
677 1798 2 2
778 888 4 8
111 888 4 8
88 888 4 8
10 888 4 8
37 887 2 8
26 887 2 8
8 886 1 8
56 885 1 8
37 887 2 4
26 887 2 4
8 886 1 4
56 885 1 4
8 886 1 2
56 885 1 2
56 885 1 1
22 120 2 6
34 120 2 6
88 119 1 6
99 118 2 6
12 118 2 6
90 117 1 6
88 119 1 6
99 118 2 6
12 118 2 6
90 117 1 6
22 115 2 6
99 115 2 6
99 118 2 5
12 118 2 5
90 117 1 5
22 115 2 5
99 115 2 5
90 117 1 3
22 115 2 3
99 115 2 3
22 115 2 2
99 115 2 2
(note that different group of 5 days each one have been separated by a blank line in order to make them clearer)
I have got different data.frames which are grouped by n days and therefore I would like to adapt the code (by changing it a little) specifically for each of them.
Thanks
A generalised version of my previous answer...
I attach a rather mechanical method, but I believe it is a good starting point. I have noticed that in your original table the entry
ID Day Count Count_group 18 1933 6 14
is duplicated; I have left it untouched for sake of clarity.
Structure of the approach:
Day
1. Read original data
We start with
2. Generate list of data frames, for each
Day
For each element in
ord_day
we introduce a data.frame as element of a list calleddf_new_aug
. Such data frames are defined through afor
loop for all values inord_day
exceptord_day[2]
andord_day[1]
which are treated separately.Idea behind the looping: for each unique
ord_day[i]
withi > 2
we check which days betweenord_day[i-1]
andord_day[i-2]
(or both!) contribute (through the variable"Count"
) to the value"Count_Group"
atord_day[i]
.We therefore introduce
if else
statements in the loop. Here we go3. Generate final data frame, collapsing the list in 2.
We collapse
df_new_aug
through an ugly loop, but other solutions (for example withReduce()
andmerge()
are possible):One arrives at
df_result
and the analysis is stopped.