I am new to data science and I am working on a model that kind of looks like the sample data shown below. However in the orginal data there are many id_num
and Events
. My objective is to predict the next 3 events of each id_num
based on their previous Events
.
Please help me in solving this or regarding the method to be used for solving, using R programming.
The simplest "prediction" is to assume that the sequence of letters will repeat for each
id_num
. I hope this is in line what the OP understands by "prediction".The code
creates
data.table
is used here because of the easy to use grouping function and because I'm acquainted with it.Explanation
For each
id_num
the existing sequence of letters is replicated 3 times usingrep()
to ensure enough values to fill at least 3 next values. But, only the first 3 values are taken usinghead()
. These 3 values are appended to the existing sequence for eachid_num
Some tuning
There are two possible optimisations:
n_pred
, simply repeating the long sequencen_pred
times is a waste.append()
can be avoided if the existing sequence will be repeated one more time.So, the optimised code looks like:
Note that
.N
is a special symbol indata.table
syntax containing the number rows in a group.head()
now returns the original sequence plus the predicted values.Data