I am working in text mining. I have 23 sentences that I have extracted from a text file along with 6 frequent words extracted from the same text file.
For frequent words, I created 1D array which shows words and in which sentences they occur. After that I took the intersection to show which word occurs with which each of other remaining words in sentence:
OccursTogether = cell(length(Out1));
for ii=1:length(Out1)
for jj=ii+1:length(Out1)
OccursTogether{ii,jj} = intersect(Out1{ii},Out1{jj});
end
end
celldisp(OccursTogether)
The output is somehow like this:
OccursTogether[1,1]= 4 3
OccursTogether[1,2]= 1 4 3
OccursTogether[1,3]= 4 3
In above [1,1] shows that word number 1 occurs with word 1 in sentence 4 and 3, [1,2] shows word 1 and word 2 occurs in sentence 1 2 and 3 and so on.
What I want to do is to implement an element absorption technique, which will remove all cells which contain supersets of other cells. As we can see above 4 and 3 in [1,1] are subset of [1,2] so OccursTogether[1,2]
entry should be deleted and output should be as follows:
occurs[1,1]= 4 3
occurs[1,3]= 4 3
Remember this should check all the possible subsets of entries in the system.