Word search algorithm using an m.file

I have already implemented my algorithm using cells of multiple strings on Matlab, but I can't seem to do it through reading a file.

On Matlab, I create cells of strings for each line, let's call them line.

So I get

     line= 'string1' 'string2' etc
     line= 'string 5' 'string7'...
     line=...

and so on. I have over 100s of lines to read.

What I'm trying to do is compare the words from to the first line to itself. Then combine the first and second line, and compare the words in the second line to the combined cell. I accumulate each cell I read and compare with the last cell read.

Here is my code on

for each line= a,b,c,d,...

for(i=1:length(a))
for(j=1:length(a))
  AA=ismember(a,a)
  end

  combine=[a,b]
  [unC,i]=unique(combine, 'first')
  sorted=combine(sort(i))

  for(i=1:length(sorted))
for(j=1:length(b))
  AB=ismember(sorted,b)
 end
 end

 combine1=[a,b,c]

..... When I read my file, I create a while loop which reads the whole script until the end, so how I can I implement my algorithm if all my cells of strings have the same name?

    while~feof(fid)
    out=fgetl(fid)
    if isempty(out)||strncmp(out, '%', 1)||~ischar(out)
    continue
    end
    line=regexp(line, ' ', 'split')

标签： algorithm matlab wordsearch

1条回答

小情绪 Triste *

2楼-- · 2019-09-08 16:43

Suppose your data file is called data.txt and its content is:

string1 string2 string3 string4
string2 string3 
string4 string5 string6

A very easy way to retain only the first unique occurrence is:

% Parse everything in one go
fid = fopen('C:\Users\ok1011\Desktop\data.txt');
out = textscan(fid,'%s');
fclose(fid);

unique(out{1})
ans = 
    'string1'
    'string2'
    'string3'
    'string4'
    'string5'
    'string6'

As already mentioned, this approach might not work if:

your data file has irregularities
you actually need the comparison indices

EDIT: solution for performance

% Parse in bulk and split (assuming you don't know maximum 
%number of strings in a line, otherwise you can use textscan alone)

fid = fopen('C:\Users\ok1011\Desktop\data.txt');
out = textscan(fid,'%s','Delimiter','\n');
out = regexp(out{1},' ','split');
fclose(fid);

% Preallocate unique comb
comb = unique([out{:}]); % you might need to remove empty strings from here

% preallocate idx
m   = size(out,1);
idx = false(m,size(comb,2));

% Loop for number of lines (rows)
for ii = 1:m
    idx(ii,:) = ismember(comb,out{ii});
end

Note that the resulting idx is:

idx =
     1     1     1     1     0     0
     0     1     1     0     0     0
     0     0     0     1     1     1

The advantage of keeping it in this form is that you save on space with respect to a cell array (which imposes 112 bytes of overhead per cell). You can also store it as a sparse array to potentially improve on storage costs.

Another thing to note, is that even if the logical array is longer than the e.g. double array which is indexing, as long as the exceeding elements are false you can still use it (and by construction of the above problem, idx satisfies this requirement). An example to clarify:

A = 1:3;
A([true false true false false])

0人赞添加讨论(0) 举报

Word search algorithm using an m.file

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间