可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a dataframe of 9 columns consisting of an inventory of factors. Each row can have all 9 columns filled (as in that row is holding 9 "things"), but most don't (most have between 3-4). The columns aren't specific either, as in if item 200 shows up in columns 1 and 3, it's the same thing. I'd like to create a matrix that is binary for each row that includes all factors.

Ex (shortened to 4 columns just to get point across)

R1 3  4   5   8
R2 4  6   7   NA
R3 1  5  NA   NA
R4 2  6   8   9

Should turn into

     1  2  3  4  5  6  7  8  9 
r1   0  0  1  1  1  0  0  1  0
r2   0  0  0  1  0  1  1  0  0
r3   1  0  0  0  1  0  0  0  0
r4   0  1  0  0  0  1  0  1  1

I've looked into writeBin/readBin, K-clustering (which is something I'd like to do, but I need to get rid of the NAs first), fuzzy clustering, tag clustering. Just kinda lost about what direction to go.

I've tried writing two for loops that pull the data from the matrix by column/row and then save 0s and 1s respectively in a new matrix, but I think there were scope issues.

You guys are the best. Thanks!

回答1:

Here's a base R solution:

# Read in the data, and convert to matrix form
df <- read.table(text = "
3  4   5   8
4  6   7   NA
1  5  NA   NA
2  6   8   9", header = FALSE)
m <- as.matrix(df)

# Create a two column matrix containing row/column indices of cells to be filled 
# with 'one's
id <- cbind(rowid = as.vector(t(row(m))), 
            colid = as.vector(t(m)))
id <- id[complete.cases(id), ]

# Create output matrix
out <-  matrix(0, nrow = nrow(m), ncol = max(m, na.rm = TRUE))
out[id] <- 1
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
# [1,]    0    0    1    1    1    0    0    1    0
# [2,]    0    0    0    1    0    1    1    0    0
# [3,]    1    0    0    0    1    0    0    0    0
# [4,]    0    1    0    0    0    1    0    1    1

回答2:

This should do the trick:

# The Incantation
options(stringsAsFactors = FALSE)

library(reshape2)

# Your example data
dat <- data.frame(id = c("R1", "R2", "R3", "R4"),
                  col1 = c(3, 4, 1, 2),
                  col2 = c(4, 6, 5, 6),
                  col3 = c(5, 7, NA, 7),
                  col4 = c(8, NA, NA, 9)
)

# Melt it down
dat.melt <- melt(dat, id.var = "id")

# Cast it back out, with the row IDs remaining the row IDs
# and the values of the columns becoming the columns themselves.
# dcast() will default to length to aggregate records - which means
# that the values in this data.frame are a count of how many times
# each value occurs in each row's columns (which, based on this data,
# seems to be capped at just once).
dat.cast <- dcast(dat.melt, id ~ value)

The result:

dat.cast
  id 1 2 3 4 5 6 7 8 9 NA
1 R1 0 0 1 1 1 0 0 1 0  0
2 R2 0 0 0 1 0 1 1 0 0  1
3 R3 1 0 0 0 1 0 0 0 0  2
4 R4 0 1 0 0 0 1 1 0 1  0

回答3:

These are all great answers. Thought I'd contribute the original solution I wrote that a friend of mine modified to actually work.

for(i in seq(nrow(x)))
  for(j in seq(ncol(x)))
  if(!is.na(x[i,j])) { y[i, x[i,j]] = 1 }

Two for loops works after setting some earlier parameters, but it's incredibly slow. Looks like these other solutions work much faster!

How to create a binary matrix of inventory per row

问题:

回答1:

回答2:

回答3:

收藏的人(0)

How to create a binary matrix of inventory per row

问题:

回答1:

回答2:

回答3:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮