Is it possible to split a .xdf file in (the Microsoft RevoScaleR context) into a let's say 75% training and 25% test set? I know there is a function called rxSplit(), but, the documentation doesn't seem to apply to this case. Most of the examples online assign a column of random numbers to the dataset, and split it using that column.
Thanks. Thomas
You can certainly use
rxSplit
for this. Create a variable that defines your training and test samples, and then split on it.For example, using the
mtcars
toy dataset:xdfList
is now a list containing 2 xdf data sources: one with (approximately) 75% of the data, and the other with 25%.You can use rxDataStep to create the training and testing data sets from the original xdf. Check out this example: https://docs.microsoft.com/en-us/r-server/r/how-to-revoscaler-linear-model