`
Current Dataset- Has dates, ID and Values
Date | ID | Value
--------------------
2/4/17 | 3 | 4.4
2/4/17 | 9 | 6.2
2/9/17 | 3 | 4.7
2/9/17 | 4 | 7.4
2/9/17 | 9 | 9.4
2/11/17 | 3 | 9.7
2/11/17 | 7 | 12.4
`Expected result - I want to populate missing dates and peek each ID and their value until the next Date.
Expected Result
Date | ID | Value
--------------------
2/4/17 | 3 | 4.4
2/4/17 | 9 | 6.2
2/5/17 | 3 | 4.4
2/5/17 | 9 | 6.2
2/6/17 | 3 | 4.4
2/6/17 | 9 | 6.2
2/7/17 | 3 | 4.4
2/7/17 | 9 | 6.2
2/8/17 | 3 | 4.4
2/8/17 | 9 | 6.2
2/9/17 | 3 | 4.7
2/9/17 | 4 | 7.4
2/9/17 | 9 | 9.4
2/10/17 | 3 | 4.7
2/10/17 | 4 | 7.4
2/10/17 | 9 | 9.4
2/11/17 | 3 | 9.7
2/11/17 | 7 | 12.4
When having such case I'm usually using something like the script below.
Basically im "filling" the missing dates but cross joining all possible dates will all possible ID
values and then checking the previous Value
values.
Be aware that cross joining will produce big tables - for example having 10 Dates
and 10 ID
will result in table with 100 rows, 20 Dates
and 20 ID
- 400 rows etc and if we have too much history and too much IDs ... the things can go nasty quite quickly (keep an eye on the memory consumption)
// Load temp data
// just to be sure convert the original Date field to date
Data_Temp:
Load
date(date#(Date, 'M/D/YY')) as Date,
ID,
Value
;
Load * Inline [
Date , ID , Value
2/4/17 , 3 , 4.4
2/4/17 , 9 , 6.2
2/9/17 , 3 , 4.7
2/9/17 , 4 , 7.4
2/9/17 , 9 , 9.4
2/11/17 , 3 , 9.7
2/11/17 , 7 , 12.4
];
// Autogenerate all dates
// between min and max date
// from Data_Temp table
Dates:
Load
Date(MinDate+IterNo()-1) as Date
While
MinDate+IterNo()-1<=MaxDate;
;
Load
Min(Date) as MinDate,
Max(Date) as MaxDate
Resident
Data_Temp
;
// Join the distinct ID to the Dates table
// the result table will be cross join
// all dates for all IDs
join
Load
distinct
ID
Resident
Data_Temp
;
// join the "cross table" to the data table
// The result table will have the original data
// plus all missing dates per ID
join (Data_Temp)
Load * Resident Dates;
// we dont need this anymore
Drop Table Dates;
// This is the fun part ...
// 1. load the Data_Temp table and order it by ID and Date
// 2. for each row:
// 2.1. check if the current ID value is not equal to the previous row ID value
// if "yes" (new ID set) get the current Value
// if "no" check if the current value is > 0
// if "yes" keep the current value
// if "no" get the peek value of the current field (NewValue)
Data:
// // Uncomment the script section below
// // if we need to exclude the rows with NewValue = null
//Load
// *
//where
// NewValue > 0
//;
Load
ID,
Date,
Value,
if( ID <> peek(ID), Value,
if( Value > 0, Value, peek(NewValue) )
) as NewValue
Resident
Data_Temp
Order By
ID,
Date
;
// we dont need this anymore
Drop Table Data_Temp;
// // At this point we can drop the old Value field
// // and rename the NewValue to Value ... if needed
//Drop Field Value;
//Rename Field NewValue to Value;
After running the script this will be the result:
You can download an example QV file here