I have a table that looks like this:
id feh bar
1 10 A
2 20 A
3 3 B
4 4 B
5 5 C
6 6 D
7 7 D
8 8 D
And I want it to look like this:
bar val1 val2 val3
A 10 20
B 3 4
C 5
D 6 7 8
I have this query that does this:
SELECT bar,
MAX(CASE WHEN abc."row" = 1 THEN feh ELSE NULL END) AS "val1",
MAX(CASE WHEN abc."row" = 2 THEN feh ELSE NULL END) AS "val2",
MAX(CASE WHEN abc."row" = 3 THEN feh ELSE NULL END) AS "val3"
FROM
(
SELECT bar, feh, row_number() OVER (partition by bar) as row
FROM "Foo"
) abc
GROUP BY bar
This is a very make-shifty approach and gets unwieldy if there are a lot of new columns to be created. I was wondering if the CASE
statements can be made better to make this query more dynamic? Also, I'd love to see other approaches to doing this.
If you have not installed the additional module tablefunc, run this command once per database:
Answer to question
A very basic crosstab solution for your case:
The special difficulty here is, that there is no category (
cat
) in the base table. For the basic 1-parameter form we can just provide a dummy column with a dummy value serving as category. The value is ignored anyway.This is one of the rare cases where the second parameter for the
crosstab()
function is not needed, because allNULL
values only appear in dangling columns to the right by definition of this problem. And the order can be determined by the value.If we had an actual category column with names determining the order of values in the result, we'd need the 2-parameter form of
crosstab()
. Here I synthesize a category column with the help of the window functionrow_number()
, to basecrosstab()
on:The rest is pretty much run-of-the-mill. Find more explanation and links in these closely related answers.
Basics:
Read this first if you are not familiar with the
crosstab()
function!Advanced:
Proper test setup
That's how you should provide a test case to begin with:
Dynamic crosstab?
Not very dynamic, yet, as @Clodoaldo commented. Dynamic return types are hard to achieve with plpgsql. But there are ways around it - with some limitations.
So not to further complicate the rest, I demonstrate with a simpler test case:
Call:
Returns:
Built-in feature of
tablefunc
moduleThe tablefunc module provides a simple infrastructure for generic
crosstab()
calls without providing a column definition list. A number of functions written inC
(typically very fast):crosstab1()
-crosstab4()
are pre-defined. One minor point: they require and return alltext
. So we need to cast ourinteger
values. But it simplifies the call:Result:
Custom
crosstab()
functionFor more columns or other data types, we create our own composite type and function (once).
Type:
Function:
Call:
Result:
One polymorphic, dynamic function for all
This goes beyond what's covered by the
tablefunc
module.To make the return type dynamic I use a polymorphic type with a technique detailed in this related answer:
1-parameter form:
Overload with this variant for the 2-parameter form:
pg_typeof(_rowtype)::text::regclass
: There is a row type defined for every user-defined composite type, so that attributes (columns) are listed in the system catalogpg_attribute
. The fast lane to get it: cast the registered type (regtype
) totext
and cast thistext
toregclass
.Create composite types once:
You need to define once every return type you are going to use:
For ad-hoc calls, you can also just create a temporary table to the same (temporary) effect:
Or use the type of an existing table, view or materialized view if available.
Call
Using above row types:
1-parameter form (no missing values):
2-parameter form (some values can be missing):
This one function works for all return types, while the
crosstabN()
framework provided by thetablefunc
module needs a separate function for each.If you have named your types in sequence like demonstrated above, you only have to replace the bold number. To find the maximum number of categories in the base table:
That's about as dynamic as this gets if you want individual columns. Arrays like demonstrated by @Clocoaldo or a simple text representation or the result wrapped in a document type like
json
orhstore
can work for any number of categories dynamically.Disclaimer:
It's always potentially dangerous when user input is converted to code. Make sure this cannot be used for SQL injection. Don't accept input from untrusted users (directly).
Call for original question:
This is to complete @Damian good answer. I have already suggested the JSON approach in other answers before the 9.6's handy
json_object_agg
function. It just takes more work with the previous tool set.Two of the cited possible drawbacks are really not. The random key order is trivially corrected if necessary. The missing keys, if relevant, takes an almost trivial amount of code to be addressed:
For a final query consumer which understands JSON there are no drawbacks. The only one is that it can not be consumed as a table source.
I'm sorry about returning in the past, but the solution "Dynamic Crosstab" returns erroneous result table. Thus, the valN values are erroneously "aligned to the left" and they don't correspond to the column names. When the input table has "holes" in the values, e.g. "C" has val1 and val3 but not val2. This produces an error: val3 value will be ranged in the column val2 (i.e. the next free column) in the final table.
In order to return correct cells with "holes" in the right column, the crosstab query requires a 2nd SELECT in the crosstab, something like this
"crosstab('SELECT row_name, attrib, val FROM tbl ORDER BY 1,2', 'select distinct row_name from tbl order by 1')"
Although this is an old question, I would like to add another solution made possible by recent improvements in PostgreSQL. This solution achieves the same goal of returning a structured result from a dynamic data set without using the crosstab function at all. In other words, this is a good example of re-examining unintentional and implicit assumptions that prevent us from discovering new solutions to old problems. ;)
To illustrate, you asked for a method to transpose data with the following structure:
into this format:
The conventional solution is a clever (and incredibly knowledgeable) approach to creating dynamic crosstab queries that is explained in exquisite detail in Erwin Brandstetter's answer.
However, if your particular use case is flexible enough to accept a slightly different result format, then another solution is possible that handles dynamic pivots beautifully. This technique, which I learned of here
uses PostgreSQL's new
jsonb_object_agg
function to construct pivoted data on the fly in the form of a JSON object.I will use Mr. Brandstetter's "simpler test case" to illustrate:
Using the
jsonb_object_agg
function, we can create the required pivoted result set with this pithy beauty:Which outputs:
As you can see, this function works by creating key/value pairs in the JSON object from the
attrib
andvalue
columns in the sample data, all grouped byrow_name
.Although this result set obviously looks different, I believe it will actually satisfy many (if not most) real world use cases, especially those where the data requires a dynamically-generated pivot, or where resulting data is consumed by a parent application (e.g., needs to be re-formatted for transmission in a http response).
Benefits of this approach:
Cleaner syntax. I think everyone would agree that the syntax for this approach is far cleaner and easier to understand than even the most basic crosstab examples.
Completely dynamic. No information about the underlying data need be specified beforehand. Neither the column names nor their data types need be known ahead of time.
Handles large numbers of columns. Since the pivoted data is saved as a single jsonb column, you will not run up against PostgreSQL's column limit (≤1,600 columns, I believe). There is still a limit, but I believe it is the same as for text fields: 1 GB per JSON object created (please correct me if I am wrong). That's a lot of key/value pairs!
Simplified data handling. I believe that the creation of JSON data in the DB will simplify (and likely speed up) the data conversion process in parent applications. (You will note that the integer data in our sample test case was correctly stored as such in the resulting JSON objects. PostgreSQL handles this by automatically converting its intrinsic data types to JSON in accordance with the JSON specification.) This will effectively eliminate the need to manually cast data passed to parent applications: it can all be delegated to the application's native JSON parser.
Differences (and possible drawbacks):
It looks different. There's no denying that the results of this approach look different. The JSON object is not as pretty as the crosstab result set; however, the differences are purely cosmetic. The same information is produced--and in a format that is probably more friendly for consumption by parent applications.
Missing keys. Missing values in the crosstab approach are filled in with nulls, while the JSON objects are simply missing the applicable keys. You will have to decide for your self if this is an acceptable trade off for your use case. It seems to me that any attempt to address this problem in PostgreSQL will greatly complicate the process and likely involve some introspection in the form of additional queries.
Key order is not preserved. I don't know if this can be addressed in PostgreSQL, but this issue is mostly cosmetic also, since any parent applications are either unlikely to rely on key order, or have the ability to determine proper key order by other means. The worst case will probably only require an addition query of the database.
Conclusion
I am very curious to hear the opinions of others (especially @ErwinBrandstetter's) on this approach, especially as it pertains to performance. When I discovered this approach on Andrew Bender's blog, it was like getting hit in the side of the head. What a beautiful way to take a fresh approach to a difficult problem in PostrgeSQL. It solved my use case perfectly, and I believe it will likewise serve many others as well.
In your case I guess an array is good. SQL Fiddle