Getting column name which holds a max value within

2019-03-25 11:05发布

问题:

For instance given:

dim1 <- c("P","PO","C","T")
dim2 <- c("LL","RR","R","Y")
dim3 <- c("Jerry1", "Jerry2", "Jerry3")
Q <- array(1:48, c(4, 4, 3), dimnames = list(dim1, dim2, dim3))

I want to reference within this array, the matrix that has the max dim3 value at the (3rd row, 4th column) location.

Upon identifying that matrix, I want to return the column name which has the maximum value within the matrix's (3rd Row, 1st Column) to (3rd Row, 3rd Column) range.

So what I'd hope to happen is that Jerry3 gets referenced because the number 47 is stored in its 3rd row, 4th column, and then within Jerry3, I would want the maximum number in row 3 to get referenced which would be 43, and ultimately, what I need returned (the only value I need) is then the column name which would be "R".

That's what I need to know how to do, obtain get that "R" and assign it to a variable, i.e. "column_ref", such that column_ref <- "R".

Please Please Please help.

回答1:

This should do it - if I understand correctly:

Q <- array(1:48, c(4,4,3), dimnames=list(
  c("P","PO","C","T"), c("LL","RR","R","Y"), c("Jerry1", "Jerry2", "Jerry3")))

column_ref <- names(which.max(Q[3,1:3, which.max(Q[3,4,])]))[1] # "R"

Some explanation:

which.max(Q[3,4,]) # return the index of the "Jerry3" slice (3)
which.max(Q[3,1:3, 3]) # returns the index of the "R" column (3)

...and then names returns the name of the index ("R").



回答2:

This post helped me to solve a data.frame general problem.
I have repeated measures for groups, G1 e G2.

> str(df)
'data.frame':   6 obs. of  15 variables:
$ G1       : num  0 0 2 2 8 8
$ G2       : logi  FALSE TRUE FALSE TRUE FALSE TRUE
$ e.10.100 : num  26.41 -11.71 27.78 3.17 26.07 ...
$ e.10.250 : num  27.27 -12.79 29.16 3.19 26.91 ...
$ e.20.100 : num  29.96 -12.19 26.19 3.44 27.32 ...
$ e.20.100d: num  26.42 -13.16 28.26 4.18 25.43 ...
$ e.20.200 : num  24.244 -18.364 29.047 0.553 25.851 ...
$ e.20.50  : num  26.55 -13.28 29.65 4.34 27.26 ...
$ e.20.500 : num  27.94 -13.92 27.59 2.47 25.54 ...
$ e.20.500d: num  24.4 -15.63 26.78 4.86 25.39 ...
$ e.30.100d: num  26.543 -15.698 31.849 0.572 29.484 ...
$ e.30.250 : num  26.776 -16.532 28.961 0.813 25.407 ...
$ e.50.100 : num  25.995 -14.249 28.697 0.803 27.852 ...
$ e.50.100d: num  26.1 -12.7 27.1 2.5 27.4 ...
$ e.50.500 : num  28.78 -9.39 25.77 2.73 23.73 ..

I need to know which measure (column) has the best (max) result. And I need to disconsider grouping columns.
I ended up with this function

apply(df[colIni:colFim], 1, function(x) colnames(df)[which.max(x)+(colIni-1)] 
#colIni: first column to consider; colFim: last column to consider

After having column name, another tiny function to get the max value

apply(dfm,1,function(x) x[x[1]])

And the function to solve similar problems, that return the column and the max value

mxCol=function(df, colIni, colFim){ #201609
  if(missing(colIni)) colIni=1
  if(missing(colFim)) colFim=ncol(df)
  if(colIni>=colFim) { print('colIni>=ColFim'); return(NULL)}
  dfm=cbind(mxCol=apply(df[colIni:colFim], 1, function(x) colnames(df)[which.max(x)+(colIni-1)])
           ,df)
  dfm=cbind(mxVal=as.numeric(apply(dfm,1,function(x) x[x[1]]))
           ,dfm)
  return(dfm)
}

In this case,

> mxCol(df,3)[1:11]
   mxVal     mxCol G1    G2 e.10.100 e.10.250 e.20.100 e.20.100d e.20.200 e.20.50 e.20.500
1 29.958  e.20.100  0 FALSE   26.408   27.268   29.958    26.418   24.244  26.553   27.942
2 -9.395  e.50.500  0  TRUE  -11.708  -12.789  -12.189   -13.162  -18.364 -13.284  -13.923
3 31.849 e.30.100d  2 FALSE   27.782   29.158   26.190    28.257   29.047  29.650   27.586
4  4.862 e.20.500d  2  TRUE    3.175    3.190    3.439     4.182    0.553   4.337    2.467
5 29.484 e.30.100d  8 FALSE   26.069   26.909   27.319    25.430   25.851  27.262   25.535
6 -9.962  e.30.250  8  TRUE  -11.362  -12.432  -15.960   -11.760  -12.832 -12.771  -12.810