Mysql, reshape data from long / tall to wide

2019-01-08 23:29发布

问题:

I have data in a mysql table in long / tall format (described below) and want to convert it to wide format. Can I do this using just sql?

Easiest to explain with an example. Suppose you have information on (country, key, value) for M countries, N keys (e.g. keys can be income, political leader, area, continent, etc.)

Long format has 3 columns: country, key, value
  - M*N rows.
  e.g. 
  'USA', 'President', 'Obama'
   ...
  'USA', 'Currency', 'Dollar'

Wide format has N=16 columns: county, key1, ..., keyN
  - M rows
example: 
   country, President, ... , Currency
   'USA', 'Obama', ... , 'Dollar'

Is there a way in SQL to create a new table with the data in the wide format?

select distinct key from table;

// this will get me all the keys.

1) How do I then create the table using these key elements?

2) How do I then fill in the table values?

I'm pretty sure I can do this with any scripting language (I like python), but wanted to know if there is an easy way to do this in mysql. Many statistical packages like R and STATA have this command built in because it is often used.

======

To be more clear, here is the desired input output for a simple case:

Input:

country    attrName    attrValue     key  (these are column names)
US         President   Obama         2
US         Currency    Dollar        3
China      President   Hu            4
China      Currency    Yuan          5

Output

country    President    Currency    newPkey
US         Obama        Dollar      1
China      Hu           Yuan        2

回答1:

Cross-tabs or pivot tables is the answer. From there you can SELECT FROM ... INSERT INTO ... or create a VIEW from the single SELECT.

Something like:

SELECT country, 
       MAX( IF( key='President', value, NULL ) ) AS President,
       MAX( IF( key='Currency', value, NULL ) ) AS Currency,
       ...

FROM table 
GROUP BY country;

For more info: http://dev.mysql.com/tech-resources/articles/wizard/index.html



回答2:

I think I found the solution, which uses VIEWS and INSERT INTO (as suggested by e4c5).

You have to get your list of AttrNames/Keys yourself, but MYSQL does the other heavy lifting.

For the simple test case above, create the new_table with the appropriate columns (don't forget to have an auto-increment primary key as well). Then

CREATE VIEW a
AS SELECT country, attrValue
WHERE attrName="President";

CREATE VIEW b
AS SELECT country, attrValue
WHERE attrName="Currency";


INSERT INTO newtable(country, President, Currency)
SELECT a.country, a.attrValue, b.attrValue
FROM  a
INNER JOIN b  ON a.country=b.country;

If you have more attrNames, then create one view for each one and then adjust the last statement accordingly.

INSERT INTO newtable(country, President, Currency, Capital, Population)
SELECT a.country, a.attrValue, b.attrValue, c.attrValue, d.attrValue
FROM  a
INNER JOIN b  ON a.country=b.country
INNER JOIN c  ON a.country=c.country
INNER JOIN d  ON a.country=d.country;

Some more tips

  • use NATURAL LEFT JOIN and you don't have to specify the ON clause


回答3:

If you were using SQL Server, this would be easy using UNPIVOT. As far as I am aware, this is not implemented in MySQL, so if you want to do this (and I'd advise against it) you'll probably have to generate the SQL dynamically, and that's messy.