THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

SQLBI - Marco Russo

Please visit the same blog at - comments are now disabled here but you can comment on the new blog (all blog posts available here are available there, too).
You can follow me on Twitter: @marcorus

Improve NTILE performance

Today I implemented a classification through a NTILE equivalent function. The T-SQL NTILE implementation has very bad perfomance when used against large datasets of rows, so I and Davide Mauri realized a better implementation that Davide posted some weeks ago on his blog. No reaction from Microsoft guys and no comments on Davide blog, so I retry here to catch more attention on this topic.

Instead to use NTILE function you can use this faster query:

   CEILING(ROW_NUMBER() OVER (ORDER BY YearlyIncome ASC) / ((SELECT CAST(COUNT(*) + 1 AS FLOAT) FROM DimCustomer) / @n) ) AS NTileFast

Note that I prefer to use the RANK function instead than the ROW_NUMBER: even if I could get non-uniform distribution of data among clusters, I get all elements with the same value in the same cluster and in many scenario you can prefer this technique to avoid cases with random cluster assignment for items with value equal to a cluster limit. This is different from NTILE result, but I prefer it for my cluster assignment.

   CEILING(RANK() OVER (ORDER BY YearlyIncome ASC) / ((SELECT CAST(COUNT(*) + 1 AS FLOAT) FROM DimCustomer) / @n) ) AS NTileFast

Please read this post of Davide Mauri for a complete explanation and let us know what do you think about it


Published Wednesday, April 19, 2006 7:27 PM by Marco Russo (SQLBI)



David said:

Thanks this will come in very handy.  I was mainly looking to avoid rows with the same value showing up in different NTiles, but the added performance boost is a nice plus too.

November 3, 2011 12:21 PM
New Comments to this post are disabled

About Marco Russo (SQLBI)

Marco Russo is a consultant, writer and trainer specialized in Business Intelligence with Microsoft technologies. He runs the SQLBI.COM website, which is dedicated to distribute resources useful for BI developers, like Integration Services components, Analysis Services models, tools, technical information and so on. Marco is certified as MCT, MCDBA, MCSD.NET, MCSA, MCSE+I.

This Blog



Privacy Statement