Repeat vector to fill down column in data frame

2019-02-25 14:35发布

问题:

Seems like this very simple maneuver used to work for me, and now it simply doesn't. A dummy version of the problem:

df <- data.frame(x = 1:5) # create simple dataframe
df
  x
1 1
2 2
3 3
4 4
5 5

df$y <- c(1:5) # adding a new column with a vector of the exact same length. Works out like it should
df
 x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5

df$z <- c(1:4) # trying to add a new colum, this time with a vector with less elements than there are rows in the dataframe.

Error in `$<-.data.frame`(`*tmp*`, "z", value = 1:4) : 
  replacement has 4 rows, data has 5

I was expecting this to work with the following result:

 x y z
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 1

I.e. the shorter vector should just start repeating itself automatically. I'm pretty certain this used to work for me (it's in a script that I've been running a hundred times before without problems). Now I can't even get the above dummy example to work like I want to. What am I missing?

回答1:

If the vector can be evenly recycled, into the data.frame, you do not get and error or a warning:

df <- data.frame(x = 1:10)
df$z <- 1:5

This may be what you were experiencing before.

You can get your vector to fit as you mention with rep_len:

df$y <- rep_len(1:3, length.out=10)

This results in

df
    x z y
1   1 1 1
2   2 2 2
3   3 3 3
4   4 4 1
5   5 5 2
6   6 1 3
7   7 2 1
8   8 3 2
9   9 4 3
10 10 5 1

Note that in place of rep_len, you could use the more common rep function:

df$y <- rep(1:3,len=10)

From the help file for rep:

rep.int and rep_len are faster simplified versions for two common cases. They are not generic.



回答2:

If the total number of rows is a multiple of the length of your new vector, it works fine. When it is not, it does not work everywhere. In particular, probably you have used this type of recycling with matrices:

data.frame(1:6, 1:3, 1:4) # not a multiply
# Error in data.frame(1:6, 1:3, 1:4) : 
#   arguments imply differing number of rows: 6, 3, 4
data.frame(1:6, 1:3) # a multiple
#   X1.6 X1.3
# 1    1    1
# 2    2    2
# 3    3    3
# 4    4    1
# 5    5    2
# 6    6    3
cbind(1:6, 1:3, 1:4) # works even with not a multiple
#      [,1] [,2] [,3]
# [1,]    1    1    1
# [2,]    2    2    2
# [3,]    3    3    3
# [4,]    4    1    4
# [5,]    5    2    1
# [6,]    6    3    2
# Warning message:
# In cbind(1:6, 1:3, 1:4) :
#   number of rows of result is not a multiple of vector length (arg 3)