I have a text file with the following format of information:
Name1 34 25 36 46
Name1 23 53 15 86
Name1 25 25 87 35
Name2 76 22 44 55
Name2 88 88 88 88
Name3 11 11 11 11
Name3 55 66 88 88
Name3 88 88 88 88
Name3 00 00 00 00
There are different "Names" and I have to arrange each name into an array slot. I would then need another way to allocate the date associated with each row to that specific spot. So for example, the first Name1 may have array{0}, but I would also need to associate the 34, 24, 36, and 46 somehow. I would also need to distinguish the different names from each other. What is the best way to do this? a 2x2 array does not seem to be the solution.
What I have so far is something along the lines of this:
%# read the whole file to a temporary cell array
fid = fopen(filename,'rt');
tmp = textscan(fid,'%s','Delimiter','\n');
fclose(fid);
%# remove the lines starting with headerline
tmp = tmp{1};
idx = cellfun(@(x) strcmp(x(1:10),'headerline'), tmp);
tmp(idx) = [];
%# split and concatenate the rest
result = regexp(tmp,' ','split');
result = cat(1,result{:});
%# delete temporary array (if you want)
clear tmp
Courtesy: Read txt file in Matlab
Could someone please tell me the best way to arrange the information? Thanks, help is much appreciated.
Judging from the code, why don't you use
fid = fopen(filename,'rt');
tmp = textscan(fid, '%s %d %d %d %d', 'Headerlines', 10);
fclose(fid);
textscan
uses space and newline as delimiters by default. If you give newline as delimiter explicitly, you loose the space as delimiter, and the portability (Windows often uses \r\n
as a single newline, whereas Unix-derived OSes use \n
). So, given your data, just leave it out.
Then you jump through hoops to remove 10 headerlines, while textscan
already has a nice baked-in option for that. So, those steps aren't needed. You proceed by splitting the stuff by a pass through regexp
with a space as delimiter, but since textscan
already splits on space, that's not needed either.
So, using the three lines above, you'll get
tmp =
{9x1 cell} [9x1 int32] [9x1 int32] [9x1 int32] [9x1 int32]
Now, now to store the data more conveniently. I can think of two ways:
- Cell arrays
- Structures
For both, you'll have to find the unique names first:
[names, inds] = unique(tmp{1});
Using cell arrays
This will give you a cell-array of the data sorted by name:
data = [tmp{2:end}];
results = arrayfun(@(x) data(strcmp(tmp{1},x),:), ...
names, 'uniformoutput', false);
Now you can index into results
as follows:
results{3}(1,4) %# for the 4th '11' for 'Name3'
Remember that Matlab is 1-based, so that a(3)
indicates the 3rd element of a
, not the 4th.
Breakdown of the command:
The function arrayfun
loops through the elements of the input array, applies a function to each element, and collects the results in either a regular array (if possible) or a cell-array (when impossible (error) and when given 'uniformoutput', false
). It's a bit like a foreach
-construct.
Taking the input array equal to the unique names
found in the first step, the trick is in the function to apply to each name. The function @(x) data(strcmp(tmp{1},x),:)
first finds the indices for the the given name in tmp{1}
(array containing all names) using strcmp
. These indices are then used to index data = [tmp{2:end}]
, i.e., all the other arrays.
The results for each individual unique name is then stored in the cell-array results
.
Using Structures
You can go one step further and use the cell-array results
to have a more human-readable data structure. After applying all the previous steps, execute this:
for ii = 1:numel(names)
output.(names{ii}) = results{ii}; end
Now you can reference to your data by name:
output.Name3(1,4) %# to index the 4th '11' from 'Name3'
The syntax your_struct.('someString')
is called dynamic structure referencing. It references or creates a field in the structure your_struc
called someString
.
Now, if names{ii}
contains underscores you want to get rid of, then you can define
camelCase = @(x) regexprep(x, '_+(\w?)', '${upper($1)}')
or
camelCase = @(x) regexprep(x, ' +(\w?)', '${upper($1)}')
for spaces. Then use
for ii = 1:numel(names)
output.( camelCase(names{ii}) ) = results{ii}; end
Kudos to these guys for that last one.
First, you should definitely read the data in using the method suggested by Rody (+1 for Rody for pointing it out), so I'm going to assume you got that far and have a variable called tmp like in Rody's code example.
Now, if I understand the problem correctly, you need to be able to distinguish each row of your example dataset from the other rows (using dates?) but at the same time you also need to easily distinguish the different names, some of which will be the same across several rows (again, I'm getting this from your example dataset).
One possible way of approaching this (that does admittedly have one drawback) is to use a structure. I'm going to assume you have obtained the variable tmp in Rody's answer and we'll go from there. Use the code:
NameVec = unique(tmp{1, 1});
for i = 1:1:size(NameVec, 1)
Index = ismember(tmp{1, 1}, NameVec{i, 1});
Struct.(NameVec{i, 1}).Data = ...
[tmp{1, 2}(Index), tmp{1, 3}(Index), tmp{1, 4}(Index), tmp{1, 5}(Index)];
end
Struct.NameVec = NameVec;
This code will create a structure where the first level within the structure has a field name for each unique name in the dataset (I've also included in the code the variable NameVec
in the first level of the structure so it can be used to reference the various fields later with a loop). Then within each field (Name1, Name2, and Name3 in this example), I've saved a data matrix containing the data associated with that name (where the individual rows are preserved).
The drawback to this approach is that if you want to get ALL the data back in one big array, you'll need to loop over the elements of the Struct.NameVec and retrieve the data matrix associated with each unique name. And loops are slow in matlab. So really, it does depend on how you plan to use the data.
Hope this helps!
ps, if you're not familiar with matlab structures, run this code:
tmp = cell(1, 5);
tmp{1, 1} = {'Name1'; 'Name1'; 'Name1'; 'Name2'; 'Name2'; 'Name3'; ...
'Name3'; 'Name3'; 'Name3';};
tmp{1, 2} = [34;23;25;76;88;11;55;88;00];
tmp{1, 3} = [25;53;25;22;88;11;66;88;00];
tmp{1, 4} = [36;15;87;44;88;11;88;88;00];
tmp{1, 5} = [46;86;35;55;88;11;88;88;00];
and then run the code I provided above on tmp
. Then have a look at the resulting structure called Struct
in the matlab variable editor. This should give you a feel for how they work.