I've got some non-normalized data that I'm trying to unpivot the columns on, I'm hoping you all can help me figure out the best way to do this. I've done this using multiple union statements, but what I'm hoping to do is create a dynamic query that can do this over and over as more columns are added to the table. My data looks something like this: (Numerical columns go all the way through 50)
| Code | Desc | Code_0 | Desc_0| Period| 1 | 2 | 3 | 4 |
|-------|-------|--------|-------|-------|---------|--------|---------|----------|
| NULL | NULL | NULL | NULL | Date |29-Nov-13|6-Dec-13|13-Dec-13| 20-Dec-13|
|CTR07 |Risk | P1 | Phase1| P | 0.2 | 0.4 | 0.6 | 1.1 |
|CTR07 |Risk | P1 | Phase1| F | 0.2 | 0.4 | 0.6 | 1.1 |
|CTR07 |Risk | P1 | Phase1| A | 0.2 | 0.4 | 0.6 | 1.1 |
|CTR08 |Oper | P1 | Phase1| P | 0.6 | 0.6 | 0.9 | 2.7 |
|CTR08 |Oper | P1 | Phase1| F | 0.6 | 0.6 | 0.9 | 2.7 |
|CTR08 |Oper | P1 | Phase1| A | 0.6 | 0.6 | 0.9 | 2.7 |
Column Headers are the top Row. As you can see looking at the data, there are some oddities that need to be dealth with.
The first four NULL columns before the date fields start are a problem. Each column that has a numerical header (1-50) represents one week. The problem with that is that each week has not only the date field, but also the percentage values for that week in the same column. I'd like to get it pivoted down so it looks something like this:
| Code | Desc |Code_0 |Desc_0 | Period| Date |Percent|
|-------|-------|-------|-------|-------|---------|-------|
|CTR07 | Risk | P1 | Phase | P | 11/29/13| 0.2 |
|CTR07 | Risk | P1 | Phase1| F | 11/29/13| 0.2 |
|CTR07 | Risk | P1 | Phase1| A | 11/29/13| 0.2 |
|CTR08 | Oper. | P1 | Phase1| P | 11/29/13| 0.6 |
With each week's date in it's own column, and percentages grouped by their respective dates.
Keyed by distinct Code, Desc, CODE_0, Period and Date. I'd like to Separate the dates from the percentages that are in the numerical columns, then bring the numerical columns into their own column connected by date. As I said before, I've done it statically with UNION statements, but I'd like to write some kind of query that does it dynamically as the table expands. Any help would be greatly appreciated. Let me know if you need any additional information, this is my first question on StackOverflow, and I had two nice screenshots to show you, but I'm not up to 10 yet on this exchange. Only on Sci-Fi and Fantasy. I know right?
Code that I used in the union to statically create the bottom table:
select `Code`, `Desc`, `Code_0`, `Desc_0`, `Period`, (select STR_TO_DATE(`1`, '%d%b%y') from combined_complete where `1` = '29Nov13') as `Date`, `1` as `Percent`
from combined_complete
where period <> 'Date'
union
select `Code`, `Desc`, `Code_0`, `Desc_0`, `Period`, (select STR_TO_DATE(`2`, '%d%b%y') from combined_complete where `2` = '06Dec13') as `Date`, `2`
from combined_complete
where period <> 'Date'
union
select `Code`, `Desc`, `Code_0`, `Desc_0`, `Period`, (select STR_TO_DATE(`3`, '%d%b%y') from combined_complete where `3` = '13Dec13') as `Date`, `3`
from combined_complete
where period <> 'Date'
union
select `Code`, `Desc`, `Code_0`, `Desc_0`, `Period`, (select STR_TO_DATE(`4`, '%d%b%y') from combined_complete where `4` = '20Dec13') as `Date`, `4`
from combined_complete
where period <> 'Date'
For this suggestion I have created a simple 50 row table called TransPoser
, there may already be a table of integers available in MySQL or in your db, but you want something similar that will give your number 1 to N for those numbered columns.
Then, using that table, cross join to your non-normalized table (I call it BadTable) but restrict this to the first row. Then using a set of case expressions we pivot
those date strings into a column. It would be possible to convert to a proper date as we do this if needed (I would suggest it, but haven't included it).
This small transposition is then used as a derived table in the main query.
The main query ignores that first row, but also uses a cross join to force all original rows into the 50 rows (or 4 as we see in this example). This Cartesian product is then joined back to the derived table discussed above to supply the dates. Then it is another set of case expressions that transpose the percentages into a column, aligned to the date and various codes.
Example result (from sample data), blank lines added manually:
| N | CODE | DESC | CODE_0 | DESC_0 | THEDATE | PERCENTAGE |
|---|-------|------|--------|--------|-----------|------------|
| 1 | CTR07 | Risk | P1 | Phase1 | 29-Nov-13 | 0.2 |
| 1 | CTR07 | Risk | P1 | Phase1 | 29-Nov-13 | 0.2 |
| 1 | CTR07 | Risk | P1 | Phase1 | 29-Nov-13 | 0.2 |
| 1 | CTR08 | Oper | P1 | Phase1 | 29-Nov-13 | 0.6 |
| 1 | CTR08 | Oper | P1 | Phase1 | 29-Nov-13 | 0.6 |
| 1 | CTR08 | Oper | P1 | Phase1 | 29-Nov-13 | 0.6 |
| 2 | CTR07 | Risk | P1 | Phase1 | 6-Dec-13 | 0.4 |
| 2 | CTR07 | Risk | P1 | Phase1 | 6-Dec-13 | 0.4 |
| 2 | CTR07 | Risk | P1 | Phase1 | 6-Dec-13 | 0.4 |
| 2 | CTR08 | Oper | P1 | Phase1 | 6-Dec-13 | 0.6 |
| 2 | CTR08 | Oper | P1 | Phase1 | 6-Dec-13 | 0.6 |
| 2 | CTR08 | Oper | P1 | Phase1 | 6-Dec-13 | 0.6 |
| 3 | CTR07 | Risk | P1 | Phase1 | 13-Dec-13 | 0.6 |
| 3 | CTR07 | Risk | P1 | Phase1 | 13-Dec-13 | 0.6 |
| 3 | CTR07 | Risk | P1 | Phase1 | 13-Dec-13 | 0.6 |
| 3 | CTR08 | Oper | P1 | Phase1 | 13-Dec-13 | 0.9 |
| 3 | CTR08 | Oper | P1 | Phase1 | 13-Dec-13 | 0.9 |
| 3 | CTR08 | Oper | P1 | Phase1 | 13-Dec-13 | 0.9 |
| 4 | CTR07 | Risk | P1 | Phase1 | 20-Dec-13 | 1.1 |
| 4 | CTR07 | Risk | P1 | Phase1 | 20-Dec-13 | 1.1 |
| 4 | CTR07 | Risk | P1 | Phase1 | 20-Dec-13 | 1.1 |
| 4 | CTR08 | Oper | P1 | Phase1 | 20-Dec-13 | 2.7 |
| 4 | CTR08 | Oper | P1 | Phase1 | 20-Dec-13 | 2.7 |
| 4 | CTR08 | Oper | P1 | Phase1 | 20-Dec-13 | 2.7 |
The query:
select
n.n
, b.Code
, b.Desc
, b.Code_0
, b.Desc_0
, T.theDate
, case
when n.n = 1 then `1`
when n.n = 2 then `2`
when n.n = 3 then `3`
when n.n = 4 then `4`
/* when n.n = 5 then `5` */
/* when n.n = 50 then `50`*/
end as Percentage
from BadTable as B
cross join (select N from TransPoser where N < 5) as N
inner join (
/* transpose just the date row */
/* join back vis the number given to each row */
select
n.n
, case
when n.n = 1 then `1`
when n.n = 2 then `2`
when n.n = 3 then `3`
when n.n = 4 then `4`
/* when n.n = 5 then `5` */
/* when n.n = 50 then `50`*/
end as theDate
from BadTable as B
cross join (select N from TransPoser where N < 5) as N
where b.code is null
and b.Period = 'Date'
) as T on N.N = T.N
where b.code is NOT null
and b.Period <> 'Date'
order by
n.n
, b.code
;
for the above see this SQLFIDDLE
It really isn't fair to expect a fully prepared executable deliverable as the result of a question IMHO - it is "stretching the friendship". But to morph the above query into a dynamic query isn't too hard. it's just a bit "tedious" as the syntax is a bit tricky. I'm not that experienced with MySQL but this is how I would do it:
set @numcols := 4;
set @casevar := '';
set @casevar := (
select
group_concat(@casevar
,'when n.n = '
, n.n
,' then `'
, n.n
,'`'
SEPARATOR ' ')
from TransPoser as n
where n.n <= @numcols
)
;
set @sqlvar := concat(
'SELECT n.n , b.Code , b.Desc , b.Code_0 , b.Desc_0 , T.theDate , CASE '
, @casevar
, ' END AS Percentage FROM BadTable AS B CROSS JOIN (SELECT N FROM TransPoser WHERE N <='
, @numcols
, ') AS N INNER JOIN ( SELECT n.n , CASE '
, @casevar
, ' END AS theDate FROM BadTable AS B CROSS JOIN (SELECT N FROM TransPoser WHERE N <='
, @numcols
, ') AS N WHERE b.code IS NULL '
, ' AND b.Period = ''Date'' ) AS T ON N.N = T.N WHERE b.code IS NOT NULL AND b.Period <> ''Date'' ORDER BY n.n , b.code '
);
PREPARE stmt FROM @sqlvar;
EXECUTE stmt;
Demo of the dynamic approach
One rough way to go would be:
- Retrieve Columnames from
INFORMATION_SCHEMA.COLUMNS
of the table, either by a LIKE
-Pattern or ORDINAL_POSITION > 5
- For each of the columns execute a prepared statement that inserts the first five cols and the number col into a new table
- during insert, also sub-select the value for the date column by the NULL values