THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Maria Zakourdaev

How would you do dynamic pivot SQL in Amazon Redshift? (This elegant script might be used in many other databases as well)


Several days ago I have spent a few hours of my life figuring out how to do dynamic pivot in Amazon Redshift. To tell you the truth, I have expected much more from this DBMS SQL language.

Redshift is based on Postgre SQL 8.0.2 ( which was released in 2005 !!!! )

Anything you would want for this, not too difficult task,  does not exits.  No stored procedures. No JSON datatype. No variables outside of UDF, no queries inside UDFs. “UDF can be used to calculate values but cannot be used to call SQL functions”. Python UDFs also cannot query the data, only perform calculations.

Finally I have found one useful function LISTAGG that helped me to get distinct values of all pivoted columns.

For each group in a query, the LISTAGG aggregate function orders the rows for that group according to the ORDER BY expression, then concatenates the values into a single string.



LISTAGG( [DISTINCT] aggregate_expression [, 'delimiter' ] ) [ WITHIN GROUP (ORDER BY order_list) ]

As a next step, I have used Python UDF to dynamically build pivoting query ( I love Python more and more, this is such a nice language :-) . In addition, I have used Common Table Expression CTE ( yes, they have it! ) to pass a list of pivoted columns to Python UDF.


The last step, execution the SQL query, I am still trying to figure out – there is no dynamic query execution in Redshift SQL dialect. But my customer can execute this query from reporting tool.


Here is a setup and all queries.


Creating a table: 

create table maria_deleteme ( id int, year int, customer varchar, amount int);


Loading some test values:

insert into maria_deleteme values

( 12, 2010, 'customer1', 4),

( 2, 2010,'customer1', 80),

( 3, 2010,'customer2', 41),

( 4, 2011,'customer1', 45),

( 5, 2011,'customer2', 15),

( 6, 2011,'customer3', 18),

( 7, 2012,'customer1', 23),

( 8, 2012,'customer1', 1),

( 9, 2012,'customer1', 8),

( 10, 2012,'customer3', 98),

( 11, 2013,'customer2', 1);

This query gives us a list of pivoted columns:

       select listagg(distinct year','as years

       from maria_deleteme

The result will be :  2011,2013,2012,2010 


Using CTE over the above query and passing its result to Python UDF:

with vars

       as (

       select listagg(distinct year, ',') as years

       from maria_deleteme


select maria_pivoting ('maria_deleteme',years,'year','customer','amount')

from vars;


The above query will print out this:

select   sum (case when year = 2010  then amount else 0 end) as "2010" ,

          sum (case when year = 2012  then amount else 0 end) as "2012" ,

          sum (case when year = 2011  then amount else 0 end) as "2011" ,

          sum (case when year = 2013  then amount else 0 end) as "2013" ,


 from maria_deleteme group by customer ;


Which results in: 

# 2010 2012 2011 2013 customer

1 0 98 18 0 customer3

2 41 0 15 1 customer2

3 88 32 45 0 customer1 


Python UDF: 

DROP FUNCTION maria_pivoting (varchar,varchar,varchar,varchar)


CREATE FUNCTION maria_pivoting(tablename varchar, list_vals varchar, pivot_col varchar, groupby_col varchar, counted_col varchar )

RETURNS varchar(4000) IMMUTABLE AS $$


       vals = list_vals.split(",")

       pivot_query = " select  "


       for value in vals:

           pivot_query = pivot_query + ' sum (case when {} = {}  then {} else 0 end) as "{}" ,'.format(pivot_col,str(value),counted_col,str(value))


       pivot_query = pivot_query + ' {} from {} group by {} ;'.format(groupby_col,tablename,groupby_col)


       return pivot_query

$$ LANGUAGE plpythonu;


I will be glad to hear what do you think about Redshift SQL language ( Of course I know that this is very powerful database for data processing. And VERY expensive.)

Yours, Maria

Published Tuesday, November 7, 2017 10:39 AM by Maria Zakourdaev

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS



Rishi D said:

THis is really good solution. Thanks for this

November 10, 2017 2:47 PM

Maria Zakourdaev said:

Thank you for your feedback!

November 12, 2017 4:32 AM

Jonathan Kyrlach said:

Thank you for posting your solution to this very annoying product deficiency. I would like to point out that redshift is actually inexpensive. Try scoping out the price of a comparable cloud based columnar store, you'll see that Amz. Redshift is one of the most affordable. The tradeoff or that affordability is that customers don't have a lot of sway or influence to get annoying problems like this one addressed.

December 8, 2017 7:19 AM

Maria Zakourdaev said:

Thanks Johathan, there are indeed too much trade-offs but on the other hand, I think this database should not be used as an analytical database with it's concurrency limitations and lack of features. It was built to be a part of complex ETL flows.

December 17, 2017 6:55 AM

Sami Y said:

Hello! Thanks for this clever solution - I've implemented it and it's working, in the sense that it gives back the query string to run. Have you figured out how to have Redshift dynamically execute that query without me manually copying/pasting the query? Trying to implement this in an automated report. Many thanks!!!

February 13, 2018 8:27 AM

Bill W said:

Nice solution and it is interesting to see how folks use UDFs in Redshift.  I'm also a fan of listagg (and window functions).  However, I think there is a simpler solution:

select listagg(stmt, '\n') within group (order by ord) as query

from (

(select 0 as ord, 'select customer' as stmt)

union all

(select row_number() over (order by year) as ord, ', sum(decode("year",' || year || ',amount,0)) as "' || year || '"' as stmt

 from maria_deleteme

 group by year)

union all

(select 1000 as ord, 'from maria_deleteme group by customer order by customer;' as stmt)


No UDF required and the resulting query is  equivalent to yours (though I did neaten things up a bit).

select customer

, sum(decode("year",2010,amount,0)) as "2010"

, sum(decode("year",2011,amount,0)) as "2011"

, sum(decode("year",2012,amount,0)) as "2012"

, sum(decode("year",2013,amount,0)) as "2013"

from maria_deleteme group by customer order by customer;

March 2, 2018 3:10 PM

Matthew said:

Any luck on finding a way to execute the query dynamically? That would be really helpful.

July 12, 2018 8:17 AM

Shmuel Milavski said:

I will be happy if someone succeed to do so as Matthew asked.

August 27, 2018 7:10 AM

Leave a Comment


About Maria Zakourdaev

The shortest word in the English language that contains the letters: abcdef is… feedback! Let me know if I have touched something that is alive in the cosmos.
Privacy Statement