THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Greg Low (The Bit Bucket: IDisposable)

Ramblings of Greg Low (SQL Server MVP, MCM and Microsoft RD) - SQL Down Under

  • Cannot determine next value for a sequence

    One of the most anticipated new features in SQL Server 2012 was the introduction of sequences. Prior to SQL Server 2012, developers had a choice of IDENTITY columns or a roll-your-own table mechanism.

    Sequences allow us to create a schema-bound object that is not associated with any particular table. For example, if I have a Sales.HotelBookings table, a Sales.FlightBookings table, and a Sales.VehicleBookings table, I might want to have a common BookingID used as the key for each table. If more than the BookingID was involved, you could argue that there is a normalization problem with the tables but we'll leave that discussion for another day.

    Recently when working with sequences however, I found a problem with their implementation. It works as described but is not useful.

    So let's start by creating the schema and the sequence:



    We could then use this schema as the default value for each of the three tables:


    All this is as expected. One question that often arises though, is "how do I know the last value for a given sequence". The answer provided is to query the sys.sequences view. We can do this as follows:


    The current_value colum in sys.sequences is defined as follows:

    Datatype: sql_variant NOT NULL

    The use of sql_variant here makes sense as the view needs to be able to provide the current value for all sequences, regardless of data type. Sequences can be created with any built-in integer type. According to BOL, the possible values are:

    • tinyint - Range 0 to 255
    • smallint - Range -32,768 to 32,767
    • int - Range -2,147,483,648 to 2,147,483,647
    • bigint - Range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
    • decimal and numeric with a scale of 0.
    • Any user-defined data type (alias type) that is based on one of the allowed types.

    The output of that column is described as:

    The last value obligated. That is, the value returned from the most recent execution of the NEXT VALUE FOR function or the last value from executing the sp_sequence_get_range procedure. Returns the START WITH value if the sequence has never been used.

    And this is where I have a problem with how it's defined. When you have never retrieved a value from the sequence, there is no last value obligated. What it does return is the first value that will be generated, but has not yet been generated:


    The documentation is correct but the behaviour is bizarre. I believe that this column should return NULL. Otherwise, there is no way to tell that this value has not yet been generated.

    If I generate a new value and then query it again ie:


    Note that the same value is returned:


    It's only when I request it another time, that I see the expected value:


    So the problem is that when you read the current value from the sys.sequences view, there's no way to know if this is the last value obligated or the next one that will be obligated.

    I'd really like to see this behaviour changed. Given that the SQL Server team rates backwards compatibility highly, an alternative would be to add a new column to sys.sequences that indicates that the sequence has never been used. There is a column is_exhausted. At a pinch, that could be set for new sequences.


    If you agree, you can vote here:

  • Plan Cache Pollution: Avoiding it and Fixing it

    While SQL Server’s plan cache generally is self-maintaining, poor application coding practices can cause the plan cache to become full of query plans that have only ever been used a single time and that are unlikely to ever be reused. We call this “plan cache pollution”.


    The most common cause of these issues are programming libraries that send multiple variations of a single query. For example, imagine I have a query like:

    SELECT c.CustomerID, c.TradingName, c.ContactName, c.PhoneNumber FROM dbo.Customers AS c WHERE c.CustomerID = @CustomerID AND c.BusinessCategory = @BusinessCategory AND c.ContactName LIKE @ContactNameSearch ORDER BY c.CustomerID;

    The query has three parameters: @CustomerID, @BusinessCategory, and @ContactNameSearch. If the parameters are always defined with the same data types ie: @BusinessCategory is always nvarchar(35) and so on, then we will normally end up with a single query plan. However, if on one execution the parameter is defined as nvarchar(35), and on the next execution it is defined as nvarchar(20), and on yet another execution it is defined as nvarchar(15), each of these queries will end up with different query plans. A similar problem would also occur if any of the plan-affecting SET options are different on each execution ie: if DATEFORMAT was dmy for one execution, and mdy for the next, you’ll also end up with a different plan.

    For more details on the internal causes of this or for a list of plan-affecting SET options, you might want to read the whitepaper that I prepared for the MSDN site. The latest version was for SQL Server 2012 and can be found here: (Plan Caching and Recompilation in SQL Server 2012).

    So what on earth would cause someone to send parameters defined differently each time? The worst offenders are not queries that are written intentionally, they are queries written by frameworks.

    As an example, while using the SqlCommand object in ADO.NET, it is convenient to use the AddWithValue(parametername, parametervalue) method of the Parameters collection. But notice that when you do this, you do not specify the data type of the parameter. ADO.NET has to derive an appropriate data type based on the data that you have provided. For string parameters, this can be particularly troubling. If the parameter value is initially “hello”, a query plan with an nvarchar parameter length of 5 will be cached after the command is executed. When the query is re-executed with a parameter value of “trouble”, the command will appear to be different as it has an nvarchar parameter with a length of 7.

    The more the command is executed, the more the plan cache will become full of plans for different length string parameters. This is particularly troubling for commands with multiple string parameters as plans will end up being stored for all combinations of all lengths of all the parameters. Some later variants of these libraries are improved by always deriving strings as nvarchar(4000). That’s not ideal but it’s much better than the previous mechanism.

    While someone coding with ADO.NET can use another method to add a parameter ie: one that allows specifying the data type as well, developers using higher level libraries do not have that option. For example, Lync to SQL uses AddWithValue() within the framework. The user has no control over that. Ad-hoc queries generated by end-user query tools can also cause a similar problem where many combinations of similar queries can end up becoming cached.

    Avoiding Plan Cache Pollution

    As mentioned, to work around such a problem, the application should use a method to add the parameter that allows specifying the data type precisely.

    As an example, nvarchar(100) might be used as the data type for each execution in the above example, if we know that all possible parameter lengths are less than 100.

    Treating Plan Cache Pollution

    There are several additional options that can help in dealing with plan cache pollution issues:


    FORCED PARAMETERIZATION can be set at the database level. SQL Server will often auto-parameterize queries by determining that a value looks like a parameter, even though you didn’t specify it as a parameter. Using the FORCED PARAMETERIZATION setting makes SQL Server become much more aggressive in deciding which queries to auto-parameterize. The down-side of this option is that it could potentially introduce parameter-sensitivity problems. (This option was added in SQL Server 2005).


    OPTIMIZE FOR ADHOC WORKLOADS is an sp_configure server level option. When set, SQL Server only caches a plan stub on the first execution of an ad-hoc query. The next time the same query is executed, the full plan is stored. Plan stubs are much smaller than query plans and this option ensures that the plan cache is not filled by query plans that have never been reused. (This option was added in SQL Server 2008). We tend to enable this option on most servers.


    Sometimes you can get into a situation where you simply cannot avoid the queries from creating this situation and you need to deal with it. DBCC FREESYSTEMCACHE can be used to clear the query cache. One little understood option on it however, is that you can then specify a particular Resource Governor resource pool. It then only clears the plans associated with that resource pool. (This command was first available in SQL Server 2005 but the option to clear a specific resource pool was added in SQL Server 2008).

    We often use this method to work around plan cache pollution issues. We try to isolate the badly-behaved applications or ad-hoc queries into one or more separate resource pools using Resource Governor. Then periodically, (perhaps every 5 or 10 minutes), we clear the plan cache for members of this “tough luck” pool.

    Best advice is to try to avoid the situation in the first place by appropriate coding techniques but that option isn’t available to everyone.

  • Mixing UNION and UNION ALL and other oddities seen in consulting

    I always say that one of the things that I love about consulting or mentoring work is that I see things (mostly code) that I would have never have thought of, both good and bad. You could spend all day dreaming up bizarre ideas and never come close to the ones that I just happen to come across.

    A good example of this was a site I was at a while back where every table had a GUID name. Yes, I'm talking about tables named dbo.[3B38AB7E-FB80-4E56-9E5A-6ECED7A8FA17] and so on. They had tens of thousands of tables named this way. Query plans were close to unreadable.

    Another was a site where the databases were named by their location on the disk. Yes, they had a database named X:\Database Files\Data Files\CoreDB.mdf. And yes that does mean that you end up using it like:


    Not all odd things that I see are so blatant though. Today I saw a more subtle coding issue. With SQL Server the UNION operation combines to rowsets into a single rowset. If UNION ALL is used then all rows are returned. With just UNION without the ALL, only distinct rows are returned. All good so far.

    But until today, I'd never stopped to think about what happens when you mix the two operations. For example, without running the code (or reading further ahead yet), what would you expect the output of the following command to be? (The real code read from a table but I've mocked it up with a VALUES clause to make it easier to see the outcome).


    I was left wondering if there was some type of operation precedence between UNION and UNION ALL. The output rows are:


    It isn't a case of precedence. The operations are just being applied in order. You can see this as follows:

    Executing the first part:



    returns the following with no surprises:


    Executing the first two parts:


    returns all rows from both queries:


    Executing the first three parts:


    returns the following rows. This is formed by taking the previous result, executing the third query then performing a distinct operation on the whole combined rowset.


    Executing the entire query then takes this previous result set and appends (based on the UNION ALL), the results from the fourth part of the query.


    Regardless of how it actually works, I think it's important to avoid writing code where the outcome is less than obvious. In this case, it was just a bug but if the code as written was the intended code, adding some parentheses to this query might have made the intent clearer.

    And of course in this case, a sprinkle of extra DISTINCT and GROUP BY operations made it a thing of beauty:


    which actually returned:


    So what they really should have written in the first place was:



  • Data Camp Sydney (Free)

    Hi Folks, on the 5th June (yes that’s Friday next week), I’m running a Data Camp day for the local Microsoft team. It’s being held at Cliftons in the city.

    We’ve got four topics for  the day:

    • Azure SQL DB
    • Azure DocumentDB
    • Azure Machine Learning
    • Azure Stream Analytics

    If you want to get your head around any/all of these, we’d love to see you there. Places are limited but you must register and can do so here:

  • Latest MVA Offerings for SQL Server and for Windows 10 (Yes it’s time to start looking at this)

    The team at Microsoft Virtual Academy (MVA) have pushed out some new content that’s relevant to database people.

    First, if you’re wondering about using Azure for SQL Server, the Jumpstart for SQL Server in Azure VMs is worth a look. Longer term, I suspect we’ll mostly end up using SQL Server as a platform service (Azure SQL DB) but in the short-term, implementing it in a VM will be more common as it’s probably both easier when migrating existing applications and a little more familiar to most.

    Next, if you have to deal with other databases (shock horror, yes there are others including open source ones), there is a course on Open Source Databases on Azure.

    Finally, you would have to have been living under a rock not to notice that Windows 10 is coming. But now, it’s time to start to get your head around what’s different. There’s a course that covers off the Fundamentals of the Technical Preview of Windows 10.

  • More Useful MVA Training Options

    I find many of the MVA courses great for quickly getting across concepts or areas that I haven’t worked with before.

    This month, the local MVP program has asked me to highlight the following MVA courses. The first two in particular look useful for SQL folk. The third one provides good background:

    Azure IaaS Technical Fundamentals

    Deep Dive into Networking Storage and Disaster Recovery Scenarios

    Embracing Open Source Technologies to Accelerate and Scale Solutions

    If you watch them, let me know what you think.

  • Azure Machine Learning Course–First Melbourne Offering

    Hi Folks, we’ve been working hard on a new Azure Machine Learning course.

    Come and spend a solid day finding out why Azure Machine Learning should be part of your arsenal.

    Our first Melbourne offering of Azure Machine Learning Core Skills is 31st July. I’d love to see you there:

  • Free eBook on Azure Machine Learning

    Have you looked into Azure Machine Learning as yet?

    If not, this might help. It’s a free eBook on getting started with it. Click the book to follow the link:


  • Out of Memory Errors in SSIS When Loading 32 bit DLLs

    Was speaking with a customer today about an issue where they were receiving “Out of Memory” exceptions when trying to load a 32 bit PostgreSQL ODBC driver from within an SSIS package.

    When the package was run from the command line using Dtexec, all was fine. When the package was run from within the SSIS Catalog, the same package refused to run. They had presumed it was some issue to do with 32 bit vs 64 bit drivers. The customer resolved it by installing the latest 64 bit PostgreSQL ODBC drivers.

    However, it’s important to know that when you see an “Out of Memory” error on attempting to load a 32 bit DLL, it usually doesn’t mean anything about memory at all.

    Under the covers, in 32 bit Windows, loading an accessing a function in a DLL was performed by:

    1. Making an API call to LoadLibrary() – this brought the DLL into memory if it wasn’t already present

    2. Making an API call to GetProcAddress() – because the DLL could be located anywhere in memory, there was a need to locate the actual memory address of the procedure in the DLL in its loaded location

    3. Making a call to the address returned by the GetProcAddress() call.

    With my previous developer hat on, there are several places where I’ve seen this go wrong.

    One is that people don’t check the return address from GetProcAddress(). It can return null if the procedure isn’t found. So someone who writes code that just immediately calls the address returned without checking if it is null, would end up generating the previous infamous “this program has performed an illegal operation and will be shut down” message that we used to see.

    The less common problem was that LoadLibrary() had its own qwerks. The worst was that if it could not locate the DLL, the error returned was “Out of Memory”. I always thought that was one of the silliest error messages to ever come out of Windows, but it’s entirely relevant here.

    When you see an “Out of Memory” error when attempting to load a 32 bit DLL, it’s time to check whether the DLL can be located by the process. The easiest (although not the cleanest) would be to make sure the DLL is in the GAC (global assembly cache).

  • Invoke-SqlCmd4 - A Superior SQL Commandlet

    Recently, I wrote about one of the issues with the Invoke-SqlCmd commandlet where it sets the ApplicationName when you use the parameter that should set the host.

    Fellow MVP Ben Miller sent me a copy of the Invoke-SqlCmd3 that they were using. It was much better.

    I then realized that there were many other options missing from these commandlets (such as options for Database Mirroring and Availablity Groups) and so I set about improving it.

    Ben has posted it to the PowerShell Code Repository. You’ll find Invoke-SqlCmd4 here:

    This version of the commandlet will allow setting many other parameters (and it sets the ApplicationName and HostName correctly). The parameters are:

    1. [Parameter(Position = 0, Mandatory=$true)] [string]$ServerInstance,

    2. [Parameter(Position = 1, Mandatory = $false)] [string]$DatabaseName,

    3. [Parameter(Position = 2, Mandatory = $false)] [string]$Query,

    4. [Parameter(Position = 3, Mandatory = $false)] [string]$UserName,

    5. [Parameter(Position = 4, Mandatory = $false)] [string]$Password,

    6. [Parameter(Position = 5, Mandatory = $false)] [Int32]$QueryTimeout = 600,

    7. [Parameter(Position = 6, Mandatory = $false)] [Int32]$ConnectionTimeout = 30,

    8. [Parameter(Position = 7, Mandatory = $false)] [string]$ApplicationName = "PowerShell SQLCMD",

    9. [Parameter(Position = 8, Mandatory = $false)] [string]$HostName,

    10. [Parameter(Position = 9, Mandatory = $false)] [ValidateSet("ReadOnly", "ReadWrite")] [string] $ApplicationIntent,

    11. [Parameter(Position = 10, Mandatory = $false)] [ValidateScript({test-path $_})] [string]$InputFile,

    12. [Parameter(Position = 11, Mandatory = $false)] [ValidateSet("DataSet", "DataTable", "DataRow")] [string]$OutputAs = "DataRow",

    13. [Parameter(Position = 12, Mandatory = $false)] [string]$FailoverPartnerServerInstance,

    14. [Parameter(Position = 13, Mandatory = $false)] [bool]$IsMultiSubnetFailover = $false

    Hope that helps someone.

  • Optimizing Dynamic Search Screens – Fixing the Performance of Common Code Paths

    One of the common challenges that arises in many applications is around how to optimize the SQL Server performance of dynamic search queries. For example, we've all seen applications with screens like this:


    Often, there will be many, many more fields to be searched. What I often hear from developers is that there is no way to optimize the query because you don't know in advance which options the user will choose. While this is partly true, it assumes that human behaviour is much more random than it really is. Let me explain:

    A few years back, I was doing performance tuning work at a company that books flights. They had a stored procedure that took a single parameter of a BookingNumber, and returned details of a customer's booking. It took around ½ second to execute.

    Eventually, one of the users said "I'd love to be able to look up the booking by LastName". So they changed the procedure to have an optional parameter to allow for that. Later another user asked for EmailAddress. And so on, and so on. When I arrived, they had a 4000 line procedure that allowed 14 optional parameters. Unfortunately, it now took 90 seconds to execute every time.

    What was truly sad, was that in a week of testing, I never saw any value passed except the BookingNumber. So they had created a situation where all the users all day long were experiencing glacial response times, for a situation that almost never occurred.

    I see this sort of scenario played out time and again. When confronted with a screen that allows the users to pick and choose search criteria, developers always seem to assume that people will choose a wide variety of options. Yet that isn't what happens. Generally, there will be a small set of options that they will choose again and again, and only rarely will they make other choices.

    Knowing that, you realise that even though you can't optimize for every possible code path, it's critical that the common paths are optimized. If necessary, provide alternate code paths for the high volume query types.

    Finally, for the atypical choices made by users, what do you do? A Swedish friend of mine (and fellow SQL Server MVP) Erland Sommarskog has some awesome articles at his web site One in particular is on Dynamic Search Conditions. I recommend reading it if you have never done so.

  • Power BI Designer March Update includes Google Analytics Connector

    The Power BI team have released details of their March update to the standalone Power BI designer.

    You’ll find details of the update here:

    The first thing I noticed is just how much faster the tool operates. The blog post mentioned performance enhancements but I really, really noticed them.

    One particular enhancement  that I wanted to call out was the additional of a connector for Google Analytics. I’ve been trying that this morning and have found it really easy to use. All the standard categories appear as available sets of data:


    I’ve picked those that I’m interested in:


    Then added them to a report:



    Very easy. And it’s clear which type of device/operating system we need to continue to ensure the best experience on at SQL Down Under.

  • Powershell Invoke-Sqlcmd –Hostname Parameter sets the Application Name instead of the Host Name

    Two of the parameters in SQL Server connections are the Application Name and the Host Name. You can see these in SQL Server if you execute the following command:


    I’ve always been a fan of having applications identify themselves in their connection strings. It makes tasks like tracing much easier. The tools supplied with SQL Server do a reasonable job of that as you can see above. But many other tools don’t do such a good job.

    I was working at a site today where they are using Powershell to execute commands for monitoring. I noticed that the Powershell commands did not set the Application Name in the connection string when using Invoke-Sqlcmd. Note the following example:


    I then tried to work out how to set the Application Name. When I checked the documentation for Invoke-Sqlcmd, it shows that the Hostname is set via the SQLCMD option –H, by using the –Hostname parameter.


    However, note that if you use the –Hostname option, it actually sets the Application Name and does not set the Host Name:



  • Suggest Compression Strategies for Tables and Indexes

    I make no secret of the fact that I think that table compression was the best thing added to SQL Server 2008.

    It seems like every week that I’m talking to clients about it. They have Enterprise Edition but they weren’t using it because they were familiar with O/S level disk compression and they think that compression will mean smaller and slower, yet this often couldn’t be further from the truth for customers whose databases are currently I/O bound or who don’t have enough memory in their buffer cache for it to work correctly.

    It’s important though, to make sure the correct form of compression is used for each table and index. In fact, the decision should be taken for each partition of each table and index if the table is partitioned. We often recommend different compression strategies for different partitions of the same table.

    To make it easier to decide which form of compression might suit, I’ve created another version of the script that I use as a first pass in determining compression strategies. The earlier version suggested ROW and/or PAGE compression. While this version can do the same, it also considers the clustered columnstore indexes that are part of the Enterprise Edition of SQL Server 2014. (I’ve included a switch to turn that function on or off in case you don’t have 2014 yet).

    The cutoff points are configurable but my default values are as shown.

    • CCI (Clustered Columnstore Index) will be recommended where the partition is scanned more than 95% of the time, updated less than 10% of the time, seeks and lookups are less than 5% of the time, and where there are at least 800,000 rows. It will also only be recommended if it is supported.
    • PAGE will be recommended where the partition is scanned more than 75% of the time and updated less than 20% of the time.
    • ROW will be recommended in all other cases. We believe that ROW should be the default in SQL Server across the board, instead of NONE.

    It is important that this script only be run after the system has been in use for long enough to have experienced typical usage patterns.

    I hope this helps you to get into using compression where it makes sense for you.


    -- Suggest data compression changes for tables and indexes


    -- Dr Greg Low

    -- March 2015



    DECLARE @ClusteredColumnstoreScansCutoff int = 95;

    DECLARE @ClusteredColumnstoreUpdatesCutoff int = 10;

    DECLARE @ClusteredColumnstoreSeeksLookupsCutoff int = 5;

    DECLARE @ClusteredColumnstoreTotalRowsCutoff bigint = 800000;


    DECLARE @PageCompressionScansCutoff int = 75;

    DECLARE @PageCompressionUpdatesCutoff int = 20;


    DECLARE @IsClusteredColumnstoreSupported bit = 1;



    WITH IndexUsageStats


    ( SELECT object_id AS ObjectID,

             index_id AS IndexID,

             COALESCE(user_seeks, 0) + COALESCE(system_seeks, 0) AS Seeks,

             COALESCE(user_scans, 0) + COALESCE(system_scans, 0) AS Scans,

             COALESCE(user_lookups, 0) + COALESCE(system_lookups, 0) AS Lookups,

             COALESCE(user_updates, 0) + COALESCE(system_updates, 0) AS Updates,

             COALESCE(user_seeks, 0) + COALESCE(system_seeks, 0)

               + COALESCE(user_scans, 0) + COALESCE(system_scans, 0)

               + COALESCE(user_lookups, 0) + COALESCE(system_lookups, 0)

               + COALESCE(user_updates, 0) + COALESCE(system_updates, 0) AS Operations

      FROM sys.dm_db_index_usage_stats

      WHERE database_id = DB_ID()




    ( SELECT SCHEMA_NAME(t.schema_id) AS SchemaName,

    AS TableName,

    AS IndexName,

             i.index_id AS IndexID,

             i.type_desc AS IndexType,

             CASE WHEN COALESCE(Operations, 0) <> 0

                  THEN CAST((COALESCE(Seeks, 0) + COALESCE(Lookups, 0)) * 100.0

                            / COALESCE(Operations, 0) AS int)

                  ELSE 0

             END AS SeekLookupPercentage,

             CASE WHEN COALESCE(Operations, 0) <> 0

                  THEN CAST(COALESCE(Scans, 0) * 100.0 / COALESCE(Operations, 0) AS int)

                  ELSE 0

             END AS ScanPercentage,

             CASE WHEN COALeSCE(Operations, 0) <> 0

                  THEN CAST(COALESCE(Updates, 0) * 100.0 / COALESCE(Operations, 0) AS int)

                  ELSE 0

             END AS UpdatePercentage,

             p.partition_number AS PartitionNumber,

             p.data_compression_desc AS CurrentCompression,

             p.rows AS TotalRows

       FROM sys.tables AS t

       INNER JOIN sys.indexes AS i

       ON t.object_id = i.object_id

       INNER JOIN sys.partitions AS p

       ON i.object_id = p.object_id

       AND i.index_id = p.index_id

       LEFT OUTER JOIN IndexUsageStats AS ius

       ON i.object_id = ius.ObjectID

       AND i.index_id = ius.IndexID

       WHERE i.index_id > 0

       AND t.is_ms_shipped = 0

       AND t.type = N'U'




    ( SELECT pud.*,

             CASE WHEN pud.ScanPercentage >= @ClusteredColumnstoreScansCutoff

                  AND pud.UpdatePercentage <= @ClusteredColumnstoreUpdatesCutoff

                  AND pud.SeekLookupPercentage <= @ClusteredColumnstoreSeeksLookupsCutoff

                  AND pud.TotalRows >= @ClusteredColumnstoreTotalRowsCutoff

                  AND @IsClusteredColumnstoreSupported <> 0

                  THEN N'CCI'

                  WHEN pud.ScanPercentage >= @PageCompressionScansCutoff

                  AND pud.UpdatePercentage <= @PageCompressionUpdatesCutoff

                  THEN N'PAGE'

                  ELSE N'ROW'

             END AS SuggestedCompression

      FROM PartitionUsageDetails AS pud


    SELECT spct.SchemaName,






    FROM SuggestedPartitionCompressionTypes AS spct

    WHERE spct.SuggestedCompression <> spct.CurrentCompression

    ORDER BY spct.SchemaName,


             CASE WHEN spct.IndexID = 1 THEN 0 ELSE 1 END,



  • Rebuild Partitioned Clustered Columnstore Indexes

    A short time back, I provided an updated script that can be used to rebuild clustered columnstore indexes when necessary.

    One thing that wasn’t covered by that script was the ability to only individual index partitions. Below, I’ve provided an updated version of the script that will rebuild only affected partitions if the clustered columnstore index is partitioned.


    -- Rebuild clustered columnstore indexes when necessary


    -- Dr Greg Low v2.2



    -- rebuild when more than supplied % of rows have been deleted

    -- rebuild when any segments contain more than supplied % deleted rows

    -- rebuild if more than supplied number of segments are empty

    -- rebuild when segments are on average less than the supplied % of

    --     maximum capacity


    DECLARE @DeletedTotalRowPercentage int = 10; 

    DECLARE @DeletedSegmentsRowPercentage int = 20;

    DECLARE @EmptySegmentsAllowed int = 0;

    DECLARE @SegmentPotentialFullnessPercentage int = 90;



    DECLARE @IndexesToRebuild TABLE (SchemaName sysname,

                                     TableName sysname,

                                     IndexName sysname,

                                     PartitionNumber int,

                                     IsPartitioned bit);




    WITH ClusteredColumnstoreIndexes


    ( SELECT t.object_id AS ObjectID,

             SCHEMA_NAME(t.schema_id) AS SchemaName,

    AS TableName,

    AS IndexName

      FROM sys.indexes AS i

      INNER JOIN sys.tables AS t

      ON i.object_id = t.object_id

      WHERE i.type = 5




    ( SELECT csrg.object_id AS ObjectID,

             csrg.partition_number AS PartitionNumber,

             csrg.total_rows AS TotalRows,

             csrg.deleted_rows AS DeletedRows,

             csrg.deleted_rows * 100.0 / csrg.total_rows AS DeletedPercentage,

             CASE WHEN csrg.total_rows = csrg.deleted_rows

                  THEN 1 ELSE 0

             END AS IsEmptySegment

      FROM sys.column_store_row_groups AS csrg

      WHERE csrg.state = 3 -- Compressed (Ignoring: 0 - Hidden, 1 - Open, 2 - Closed, 4 - Tombstone)




    ( SELECT cci.ObjectID,





             SUM(CAST(rg.TotalRows AS decimal(18,0))) AS TotalRows,

             SUM(CAST(rg.DeletedRows AS decimal(18,0))) AS DeletedRows,

             SUM(CAST(rg.DeletedRows AS decimal(18,0))) * 100.0

               / SUM(CAST(rg.TotalRows AS decimal(18,0)))

               AS DeletedPercentage,

             SUM(rg.IsEmptySegment) aS EmptySegments,

             COUNT(rg.TotalRows) AS TotalSegments

      FROM ClusteredColumnstoreIndexes AS cci

      INNER JOIN RowGroups AS rg

      ON cci.ObjectID = rg.ObjectID

      GROUP BY cci.ObjectID, cci.SchemaName, cci.TableName, cci.IndexName, rg.PartitionNumber


    INSERT @IndexesToRebuild (SchemaName, TableName, IndexName, PartitionNumber, IsPartitioned)

    SELECT s.SchemaName,





                            FROM IndexPartitionStats AS ps

                            WHERE s.SchemaName = ps.SchemaName

                            AND s.TableName = ps.TableName

                            AND s.IndexName = ps.IndexName

                            AND ps.PartitionNumber > 1)

                THEN 1

                ELSE 0


    FROM IndexPartitionStats AS s

    WHERE s.DeletedPercentage > @DeletedTotalRowPercentage

    OR s.EmptySegments > @EmptySegmentsAllowed


                       WHERE rg.ObjectID = s.ObjectID

                       AND rg.DeletedPercentage > @DeletedSegmentsRowPercentage)

    OR (s.TotalRows * 100 / (s.TotalSegments * 1048576.0))

                < @SegmentPotentialFullnessPercentage;


    DECLARE @SchemaName sysname;

    DECLARE @TableName sysname;

    DECLARE @IndexName sysname;

    DECLARE @PartitionNumber int;

    DECLARE @IsPartitioned bit;

    DECLARE @SqlCommand nvarchar(max);




      SELECT SchemaName, TableName, IndexName, PartitionNumber, IsPartitioned

      FROM @IndexesToRebuild

      ORDER BY SchemaName, TableName, IndexName, PartitionNumber;


    OPEN IndexList;


    FETCH NEXT FROM IndexList INTO @SchemaName, @TableName, @IndexName, @PartitionNumber, @IsPartitioned;




      SET @SqlCommand = N'ALTER INDEX ' + QUOTENAME(@IndexName)

                      + N' ON ' + QUOTENAME(@SchemaName)

                      + N'.' + QUOTENAME(@TableName)

                      + N' REBUILD'

                      + CASE WHEN @IsPartitioned <> 0

                             THEN N' PARTITION = ' + CAST(@PartitionNumber AS nvarchar(10))

                             ELSE N''


                      + ';';

      PRINT @SqlCommand;


      FETCH NEXT FROM IndexList INTO @SchemaName, @TableName, @IndexName, @PartitionNumber, @IsPartitioned;



    CLOSE IndexList;

    DEALLOCATE IndexList;





This Blog



No tags have been created or used yet.


  Privacy Statement