Group variables based on lengths of specific array

2019-03-05 06:01发布

I have a long list of variables in a dataset which contains multiple time channels with different sampling rates, such as time_1, time_2, TIME, Time, etc. There are also multiple other variables that are dependent on either of these times.

I'd like to list all possible channels that contain 'time' (case-insensitive partial string search within Workspace) and search & match which variable belongs to each item of this time list, based on the size of the variables and then group them in a structure with the values of the variables for later analysis.

For example:

  Name                     Size              Bytes  Class     

  ENGSPD_1            181289x1             1450312  double              
  Eng_Spd              12500x1              100000  double              
  Speed                41273x1              330184  double              
  TIME                 41273x1              330184  double              
  Time                 12500x1              100000  double              
  engine_speed_2        1406x1               11248  double              
  time_1              181289x1             1450312  double              
  time_2                1406x1               11248  double 

In this case, I have 4 time channels with different names & sizes and 4 speed channels which belong to each of these time channels.

whos function is case-sensitive and it will only return the name of the variable, rather than the values of the variable.

1条回答
SAY GOODBYE
2楼-- · 2019-03-05 06:47

As a preamble I'm going to echo my comment from above and earlier comments from folks here and on your other similar questions:

Please stop trying to manipulate your data this way.

It may have made sense at the beginning but, given the questions you've asked on SO to date, this isn't the first time you've encountered issues trying to pull everything together and if you continue this way it's not going to be the last. This approach is highly error prone, unreliable, and unpredictable. Every step of the process requires you to make assumptions about your data that cannot be guaranteed (size of data matching, variables being present and named predictably, etc.). Rather than trying to come up with creative ways to hack together the data, start over and output your data predictably from the beginning. It may take some time but I guarantee it's going to save time in the future and it will make sense to whoever looks at this in 6 months trying to figure out what is going on.

For example, there is absolutely no significant effort needed to output your variables as:

outputstructure.EngineID.time = sometimeseries;
outputstructure.EngineID.speed = somedata;

Where EngineID can be any valid variable name. This is simple and it links your data together permanently and robustly.


That being said, the following will bring a marginal amount of sanity to your data set:

% Build up a totally amorphous data set
ENGSPD_1       = rand(10, 1);
Eng_Spd        = rand(20, 1);
Speed          = rand(30, 1);
TIME           = rand(30, 1);
Time           = rand(20, 1);
engine_speed_2 = rand(5, 1);
time_1         = rand(10, 1);
time_2         = rand(5, 1);

% Identify time and speed variable using regular expressions
% Assumes time variables contain 'time' (case insensitive)
% Assumes speed variables contain 'spd', 'sped', or 'speed' (case insensitive)
timevars = whos('-regexp', '[T|t][I|i][M|m][E|e]');
speedvars = whos('-regexp', '[S|s][P|p][E|e]{0,2}[D|d]');

% Pair timeseries and data arrays together. Data is only coupled if
% the number of rows in the timeseries is exactly the same as the
% number of rows in the data array.
timesizes  = vertcat(speedvars(:).size);  % Concatenate timeseries sizes
speedsizes = vertcat(timevars(:).size);  % Concatenate speed array sizes

% Find intersection and their locations in the structures returned by whos
% By using intersect we only get the data that is matched
[sizes, timeidx, speedidx] = intersect(timesizes(:,1), speedsizes(:,1));

% Preallocate structure
ndata = length(sizes);
groupeddata(ndata).time = [];
groupeddata(ndata).speed = [];

% Unavoidable (without saving/loading data) eval loop :|
for ii = 1:ndata
    groupeddata(ii).time  = eval('timevars(timeidx(ii)).name');
    groupeddata(ii).speed = eval('speedvars(speedidx(ii)).name');
end

A non-eval method, by request:

ENGSPD_1       = rand(10, 1);
Eng_Spd        = rand(20, 1);
Speed          = rand(30, 1);
TIME           = rand(30, 1);
Time           = rand(20, 1);
engine_speed_2 = rand(5, 1);
time_1         = rand(10, 1);
time_2         = rand(5, 1);

save('tmp.mat')
oldworkspace = load('tmp.mat');
varnames = fieldnames(oldworkspace);

timevars = regexpi(varnames, '.*time.*', 'match', 'once');
timevars(cellfun('isempty', timevars)) = [];
speedvars = regexpi(varnames, '.*spe{0,2}d.*', 'match', 'once');
speedvars(cellfun('isempty', speedvars)) = [];

timesizes = zeros(length(timevars), 2);
for ii = 1:length(timevars)
    timesizes(ii, :) = size(oldworkspace.(timevars{ii}));
end
speedsizes = zeros(length(speedvars), 2);
for ii = 1:length(speedvars)
    speedsizes(ii, :) = size(oldworkspace.(speedvars{ii}));
end

[sizes, timeidx, speedidx] = intersect(timesizes(:,1), speedsizes(:,1));

ndata = length(sizes);
groupeddata(ndata).time = [];
groupeddata(ndata).speed = [];

for ii = 1:ndata
    groupeddata(ii).time  = oldworkspace.(timevars{timeidx(ii)});
    groupeddata(ii).speed = oldworkspace.(speedvars{speedidx(ii)});
end

See this gist for timing.

查看更多
登录 后发表回答