I have one dataset in which some timestamps are missing. I have written code so far as below,
x = table2dataset(Testing_data);
T1 = x(:,1);
C1 =dataset2cell(T1);
formatIn = 'yyyy-mm-dd HH:MM:SS';
t1= datenum(C1,formatIn);
% Creating 10 minutes of time interval;
avg = 10/60/24;
tnew = [t1(1):avg:t1(end)]';
indx = round((t1-t1(1))/avg) + 1;
ynew = NaN(length(tnew),1);
ynew(indx)=t1;
% replacing missing time with NaN
t = datetime(ynew,'ConvertFrom','datenum');
formatIn = 'yyyy-mm-dd HH:MM:SS';
DateVector = datevec(ynew,formatIn);
dt = datestr(ynew,'yyyy-mm-dd HH:MM:SS');
ds = string(dt);
The testing data has three parameters shown here,
Time x y
2009-04-10 02:00:00.000 1 0.1
2009-04-10 02:10:00.000 2 0.2
2009-04-10 02:30:00.000 3 0.3
2009-04-10 02:50:00.000 4 0.4
Now as you can see, for intervals of 10 minutes, there are missing timestamps (2:20 and 2:40) so I want to added that time stamp. Then I want the x
and y
values to be NAN
. So My output would be like,
Time x y
2009-04-10 02:00:00.000 1 0.1
2009-04-10 02:10:00.000 2 0.2
2009-04-10 02:20:00.000 NaN NaN
2009-04-10 02:30:00.000 3 0.3
2009-04-10 02:40:00.000 NaN NaN
2009-04-10 02:50:00.000 4 0.4
As you can see from my code, I am just able to add NaN
with time stamp but now would like to take its corresponding x and y value which I desired.
Please note I have more than 3000 data rows in the above format, I want to perform the same for my all values.
it seems to be a contradiction in your question; you say tthat you are able to insert NaN
in place of the missing time string but, in the example of the expected output you wrote the time string.
Also you refer to missing time stamp (2:20) but, if the time step is 10 minutes, in your example data there is another missing time stamp (2:40)
Assuming that:
- you actually want to insert the missing time sting
- you want to manage all the missing timestamp
you could modify your code as follows:
- the
ynew
time is not needed
- the
tnew
time should be used in place of ynew
- to insert the
NaN
values in the x
and y
column you have to:
- extract them from the
dataset
- create two new array initializing them to
NaN
- insert the original
x
and y
data in the location identified by indx
In the following yu can find an updated version of your code.
- the
x
and y
data are stored in the x_data
and y_data
array
- the new
x
and y
data are stored in the x_data_new
and y_data_new
array
at the end of the script, two table are generate: the first one is generated using the time as string
, the second one as cellarray.
The comments in the code should identify the modifications.
x = table2dataset(Testing_data);
T1 = x(:,1);
% Get X data from the table
x_data=x(:,2)
% Get Y data from the table
y_data=x(:,3)
C1 =dataset2cell(T1);
formatIn = 'yyyy-mm-dd HH:MM:SS';
t1= datenum(C1(2:end),formatIn)
avg = 10/60/24; % Creating 10 minutes of time interval;
tnew = [t1(1):avg:t1(end)]'
indx = round((t1-t1(1))/avg) + 1
%
% Not Needed
%
% ynew = NaN(length(tnew),1);
% ynew(indx)=t1;
%
% Create the new X and Y data
%
y_data_new = NaN(length(tnew),1)
y_data_new(indx)=t1
x_data_new=nan(length(tnew),1)
x_data_new(indx)=x_data
y_data_new=nan(length(tnew),1)
y_data_new(indx)=y_data
% t = datetime(ynew,'ConvertFrom','datenum') % replacing missing time with NAN
%
% Use tnew instead of ynew
%
t = datetime(tnew,'ConvertFrom','datenum') % replacing missing time with NAN
formatIn = 'yyyy-mm-dd HH:MM:SS'
% DateVector = datevec(y_data_new,formatIn)
% dt = datestr(ynew,'yyyy-mm-dd HH:MM:SS')
%
% Use tnew instead of ynew
%
dt = datestr(tnew,'yyyy-mm-dd HH:MM:SS')
% ds = char(dt)
new_table=table(dt,x_data_new,y_data_new)
new_table_1=table(cellstr(dt),x_data_new,y_data_new)
The output is
new_table =
dt x_data_new y_data_new
___________ __________ __________
[1x19 char] 1 0.1
[1x19 char] 2 0.2
[1x19 char] NaN NaN
[1x19 char] 3 0.3
[1x19 char] NaN NaN
[1x19 char] 4 0.4
new_table_1 =
Var1 x_data_new y_data_new
_____________________ __________ __________
'2009-04-10 02:00:00' 1 0.1
'2009-04-10 02:10:00' 2 0.2
'2009-04-10 02:20:00' NaN NaN
'2009-04-10 02:30:00' 3 0.3
'2009-04-10 02:40:00' NaN NaN
'2009-04-10 02:50:00' 4 0.4
Hope this helps.
Qapla'
This example is not too different from the accepted answer, but IMHO a bit easier on the eyes. But, it supports gaps larger than 1 step, and is a bit more generic because it makes fewer assumptions.
It works with plain cell arrays instead of the original table data, so that conversion is up to you (I'm on R2010a so can't test it)
% Example data with intentional gaps of varying size
old_data = {'2009-04-10 02:00:00.000' 1 0.1
'2009-04-10 02:10:00.000' 2 0.2
'2009-04-10 02:30:00.000' 3 0.3
'2009-04-10 02:50:00.000' 4 0.4
'2009-04-10 03:10:00.000' 5 0.5
'2009-04-10 03:20:00.000' 6 0.6
'2009-04-10 03:50:00.000' 7 0.7}
% Convert textual dates to numbers we can work with more easily
old_dates = datenum(old_data(:,1));
% Nominal step size is the minimum of all differences
deltas = diff(old_dates);
nominal_step = min(deltas);
% Generate new date numbers with constant step
new_dates = old_dates(1) : nominal_step : old_dates(end);
% Determine where the gaps in the data are, and how big they are,
% taking into account rounding error
step_gaps = abs(deltas - nominal_step) > 10*eps;
gap_sizes = round( deltas(step_gaps) / nominal_step - 1);
% Create new data structure with constant-step time stamps,
% initially with the data of interest all-NAN
new_size = size(old_data,1) + sum(gap_sizes);
new_data = [cellstr( datestr(new_dates, 'yyyy-mm-dd HH:MM:SS') ),...
repmat({NaN}, new_size, 2)];
% Compute proper locations of the old data in the new data structure,
% again, taking into account rounding error
day = 86400; % (seconds in a day)
new_datapoint = ismember(round(new_dates * day), ...
round(old_dates * day));
% Insert the old data at the right locations
new_data(new_datapoint, 2:3) = data(:, 2:3)
Output is:
old_data =
'2009-04-10 02:00:00.000' [1] [0.100000000000000]
'2009-04-10 02:10:00.000' [2] [0.200000000000000]
'2009-04-10 02:30:00.000' [3] [0.300000000000000]
'2009-04-10 02:50:00.000' [4] [0.400000000000000]
'2009-04-10 03:10:00.000' [5] [0.500000000000000]
'2009-04-10 03:20:00.000' [6] [0.600000000000000]
'2009-04-10 03:50:00.000' [7] [0.700000000000000]
new_data =
'2009-04-10 02:00:00' [ 1] [0.100000000000000]
'2009-04-10 02:10:00' [ 2] [0.200000000000000]
'2009-04-10 02:20:00' [NaN] [ NaN]
'2009-04-10 02:30:00' [ 3] [0.300000000000000]
'2009-04-10 02:40:00' [NaN] [ NaN]
'2009-04-10 02:50:00' [ 4] [0.400000000000000]
'2009-04-10 03:00:00' [NaN] [ NaN]
'2009-04-10 03:10:00' [ 5] [0.500000000000000]
'2009-04-10 03:20:00' [ 6] [0.600000000000000]
'2009-04-10 03:30:00' [NaN] [ NaN]
'2009-04-10 03:40:00' [NaN] [ NaN]
'2009-04-10 03:50:00' [ 7] [0.700000000000000]