I have already implemented my algorithm using cells of multiple strings on Matlab, but I can't seem to do it through reading a file.
On Matlab, I create cells of strings for each line, let's call them line.
So I get
line= 'string1' 'string2' etc
line= 'string 5' 'string7'...
line=...
and so on. I have over 100s of lines to read.
What I'm trying to do is compare the words from to the first line to itself. Then combine the first and second line, and compare the words in the second line to the combined cell. I accumulate each cell I read and compare with the last cell read.
Here is my code on
for each line= a,b,c,d,...
for(i=1:length(a))
for(j=1:length(a))
AA=ismember(a,a)
end
combine=[a,b]
[unC,i]=unique(combine, 'first')
sorted=combine(sort(i))
for(i=1:length(sorted))
for(j=1:length(b))
AB=ismember(sorted,b)
end
end
combine1=[a,b,c]
..... When I read my file, I create a while loop which reads the whole script until the end, so how I can I implement my algorithm if all my cells of strings have the same name?
while~feof(fid)
out=fgetl(fid)
if isempty(out)||strncmp(out, '%', 1)||~ischar(out)
continue
end
line=regexp(line, ' ', 'split')
Suppose your data file is called
data.txt
and its content is:A very easy way to retain only the first unique occurrence is:
As already mentioned, this approach might not work if:
EDIT: solution for performance
Note that the resulting
idx
is:The advantage of keeping it in this form is that you save on space with respect to a cell array (which imposes 112 bytes of overhead per cell). You can also store it as a sparse array to potentially improve on storage costs.
Another thing to note, is that even if the logical array is longer than the e.g. double array which is indexing, as long as the exceeding elements are false you can still use it (and by construction of the above problem, idx satisfies this requirement). An example to clarify: