I have a text file with the following format of information:
Name1 34 25 36 46
Name1 23 53 15 86
Name1 25 25 87 35
Name2 76 22 44 55
Name2 88 88 88 88
Name3 11 11 11 11
Name3 55 66 88 88
Name3 88 88 88 88
Name3 00 00 00 00
There are different "Names" and I have to arrange each name into an array slot. I would then need another way to allocate the date associated with each row to that specific spot. So for example, the first Name1 may have array{0}, but I would also need to associate the 34, 24, 36, and 46 somehow. I would also need to distinguish the different names from each other. What is the best way to do this? a 2x2 array does not seem to be the solution.
What I have so far is something along the lines of this:
%# read the whole file to a temporary cell array
fid = fopen(filename,'rt');
tmp = textscan(fid,'%s','Delimiter','\n');
fclose(fid);
%# remove the lines starting with headerline
tmp = tmp{1};
idx = cellfun(@(x) strcmp(x(1:10),'headerline'), tmp);
tmp(idx) = [];
%# split and concatenate the rest
result = regexp(tmp,' ','split');
result = cat(1,result{:});
%# delete temporary array (if you want)
clear tmp
Courtesy: Read txt file in Matlab
Could someone please tell me the best way to arrange the information? Thanks, help is much appreciated.
First, you should definitely read the data in using the method suggested by Rody (+1 for Rody for pointing it out), so I'm going to assume you got that far and have a variable called tmp like in Rody's code example.
Now, if I understand the problem correctly, you need to be able to distinguish each row of your example dataset from the other rows (using dates?) but at the same time you also need to easily distinguish the different names, some of which will be the same across several rows (again, I'm getting this from your example dataset).
One possible way of approaching this (that does admittedly have one drawback) is to use a structure. I'm going to assume you have obtained the variable tmp in Rody's answer and we'll go from there. Use the code:
This code will create a structure where the first level within the structure has a field name for each unique name in the dataset (I've also included in the code the variable
NameVec
in the first level of the structure so it can be used to reference the various fields later with a loop). Then within each field (Name1, Name2, and Name3 in this example), I've saved a data matrix containing the data associated with that name (where the individual rows are preserved).The drawback to this approach is that if you want to get ALL the data back in one big array, you'll need to loop over the elements of the Struct.NameVec and retrieve the data matrix associated with each unique name. And loops are slow in matlab. So really, it does depend on how you plan to use the data.
Hope this helps!
ps, if you're not familiar with matlab structures, run this code:
and then run the code I provided above on
tmp
. Then have a look at the resulting structure calledStruct
in the matlab variable editor. This should give you a feel for how they work.Judging from the code, why don't you use
textscan
uses space and newline as delimiters by default. If you give newline as delimiter explicitly, you loose the space as delimiter, and the portability (Windows often uses\r\n
as a single newline, whereas Unix-derived OSes use\n
). So, given your data, just leave it out.Then you jump through hoops to remove 10 headerlines, while
textscan
already has a nice baked-in option for that. So, those steps aren't needed. You proceed by splitting the stuff by a pass throughregexp
with a space as delimiter, but sincetextscan
already splits on space, that's not needed either.So, using the three lines above, you'll get
Now, now to store the data more conveniently. I can think of two ways:
For both, you'll have to find the unique names first:
Using cell arrays
This will give you a cell-array of the data sorted by name:
Now you can index into
results
as follows:Remember that Matlab is 1-based, so that
a(3)
indicates the 3rd element ofa
, not the 4th.Breakdown of the command:
The function
arrayfun
loops through the elements of the input array, applies a function to each element, and collects the results in either a regular array (if possible) or a cell-array (when impossible (error) and when given'uniformoutput', false
). It's a bit like aforeach
-construct.Taking the input array equal to the unique
names
found in the first step, the trick is in the function to apply to each name. The function@(x) data(strcmp(tmp{1},x),:)
first finds the indices for the the given name intmp{1}
(array containing all names) usingstrcmp
. These indices are then used to indexdata = [tmp{2:end}]
, i.e., all the other arrays.The results for each individual unique name is then stored in the cell-array
results
.Using Structures
You can go one step further and use the cell-array
results
to have a more human-readable data structure. After applying all the previous steps, execute this:Now you can reference to your data by name:
The syntax
your_struct.('someString')
is called dynamic structure referencing. It references or creates a field in the structureyour_struc
calledsomeString
.Now, if
names{ii}
contains underscores you want to get rid of, then you can defineor
for spaces. Then use
Kudos to these guys for that last one.