checking for equality

2020-05-09 22:48发布

问题:

i want to check equality of a dataset. the data set is looking like this

Equips <- c(1,1,1,2,2,2,3,3,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,5,5,6,7,8)
Notifs <- c(10,10,20,55,63,67,71,73,73,73,81,81,83,32,32,32,32,
47,48,45,45,45,51,51,55,56,69,65,88)
Comps <- c("Motor","Ventil","Motor","Gehäuse","Ventil","Motor","Steuerung","Motor",
"Ventil","Gehäuse","Gehäuse","Ventil","Motor","Schraube","Motor","Festplatte",
"Heizgerät","Motor","Schraube","Schraube","Lichtmaschine","Bremse","Lichtmaschine",
"Schraube","Lichtmaschine","Lichtmaschine","Motor","Ventil","Schraube")
rank <- c(1,1,2,1,2,3,1,2,2,2,3,3,4,1,1,1,1,2,3,1,1,1,2,2,3,4,1,1,1)

df <- data.frame(Equips,Notifs,Comps,rank)

The data frame should be read line by line.

My problem is the following: I have a very big data set, and i want to take a look if the Comps in one Equips are the same in all ranks.

To specify: Equips 1 has got rank 1 and 2 i want to compare if there is a component listed in rank 1 and rank 2 ( in this example: YES)

Equips 2 hast got 3 ranks and here is, as well, no Comps which is listed in the first, second and third rank.

Equips 5 hast got 4 ranks and yes here is a Comps which is in every rank: namely "Lichtmaschine".

So what is my desired output? It would be enough, if i got an output, with the number of Equips, and with TRUE or FALSE(like summary command)

TRUE should be the output if there is a Comps which is listed in every rank (within one Equips)

There are also some notes: the dataset is very big so i need an automize version AND if it's possible, just with the standard R programm without any packages.

A really big Thanks for your effort.

Charly

回答1:

Here is an answer which uses the plyrpackage :

library(plyr)
ddply(df, .(Equips), function(d) {
  nb.comps <- length(unique(d$rank))
  tab <- table(d$rank, d$Comps) > 0
  tab <- margin.table(tab, 2)
  return(sum(tab>=nb.comps)>0)
})

Which gives :

  Equips    V1
1      1  TRUE
2      2 FALSE
3      3 FALSE
4      4 FALSE
5      5  TRUE

If you really don't want to use plyr, you can use the by function :

by(df, df$Equips, function(d) {
  nb.comps <- length(unique(d$rank))
  tab <- table(d$rank, d$Comps) > 0
  tab <- margin.table(tab, 2)
  return(sum(tab>=nb.comps)>0)
})

df$Equips: 1
[1] TRUE
-------------------------------------------------------- 
df$Equips: 2
[1] FALSE
-------------------------------------------------------- 
df$Equips: 3
[1] FALSE
-------------------------------------------------------- 
df$Equips: 4
[1] FALSE
-------------------------------------------------------- 
df$Equips: 5
[1] TRUE

If you want to summarize the result you can do something like this :

result <- by(df, df$Equips, function(d) {
  nb.comps <- length(unique(d$Comps))
  tab <- table(d$rank, d$Comps) > 0
  tab <- margin.table(tab, 2)
  return(sum(tab>=nb.comps)>0)
})


data.frame(nb.equips=dim(result), nb.matched=sum(result))

Which gives :

  nb.equips nb.matched
1         5          2


标签: r subset