I am trying to scale a data.frame in the range of 0 and 1 using the following code:
for(i in 1:nrow(data))
{
x <- data[i, ]
data[i, ] <- scale(x, min(x), max(x)-min(x))
}
Data:
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17
15 6 6 0 9 3 1 4 5 1 1 13 0 0 20 5 28
2 24 14 7 0 15 7 0 11 3 3 4 15 7 0 30 0 344
3 10 5 2 0 6 2 0 5 0 0 2 7 1 0 11 0 399
4 9 4 2 0 5 2 0 4 0 0 2 6 1 0 10 0 28
5 6 2 1 0 3 1 0 2 0 0 1 3 1 0 6 0 82
6 9 4 2 0 5 2 0 4 0 0 2 6 1 0 10 0 42
But I am getting the following error message:
Error in scale.default(x, min(x), max(x) - min(x)) (from #4) :
length of 'center' must equal the number of columns of 'x'
Using this data , your example works for me:
data <- matrix(sample(1:1000,17*6), ncol=17,nrow=6)
for(i in 1:nrow(data)){
x <- data[i, ]
data[i, ] <- scale(x, min(x), max(x)-min(x))
}
Here another option using scale , without a loop. You need just to provide a scale
and a center
with same columns that your matrix.
maxs <- apply(data, 2, max)
mins <- apply(data, 2, min)
scale(data, center = mins, scale = maxs - mins)
EDIT how to access the result.
The scale returns a matrix with 2 attributes. To get a data.frame, you need just to coerce the scale result to a data.frame.
dat.scale <- scale(data, center = mins, scale = maxs - mins)
dat.sacle <- as.data.frame(dat.scale)
The center
and scale
arguments to scale
have to have length equal to the number of columns in x
. It looks like data
is a data.frame
, so your x
has as many columns as your data.frame
does and hence the conflict. You can get past this snag three ways:
- drop the row into an atomic vector before passing to
scale
(which will treat it as a single column): scale(as.numeric(x), ...)
- convert
data
into a matrix
, which drops row extractions into atomic vectors automatically.
- use @agstudy's
apply
suggestion, which would work whether it's a data.frame
or a matrix
and is arguably the "right" way to do this in R.
there is also another way to scale the data by creating a function
data_norm<- function(x) {((x-min(x))/(max(x)-min(x)))}
variables_norm<- as.data.frame(lapply(data[1:17], data_norm))
summary(variables_norm)