THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Greg Low (The Bit Bucket: IDisposable)

Ramblings of Greg Low (SQL Server MVP, MCM and Microsoft RD) - SQL Down Under

  • More Useful MVA Training Options

    I find many of the MVA courses great for quickly getting across concepts or areas that I haven’t worked with before.

    This month, the local MVP program has asked me to highlight the following MVA courses. The first two in particular look useful for SQL folk. The third one provides good background:

    Azure IaaS Technical Fundamentals

    Deep Dive into Networking Storage and Disaster Recovery Scenarios

    Embracing Open Source Technologies to Accelerate and Scale Solutions

    If you watch them, let me know what you think.

  • Azure Machine Learning Course–First Melbourne Offering

    Hi Folks, we’ve been working hard on a new Azure Machine Learning course.

    Come and spend a solid day finding out why Azure Machine Learning should be part of your arsenal.

    Our first Melbourne offering of Azure Machine Learning Core Skills is 31st July. I’d love to see you there:

    http://www.sqldownunder.com/Training/Courses/25

  • Free eBook on Azure Machine Learning

    Have you looked into Azure Machine Learning as yet?

    If not, this might help. It’s a free eBook on getting started with it. Click the book to follow the link:

    image

  • Out of Memory Errors in SSIS When Loading 32 bit DLLs

    Was speaking with a customer today about an issue where they were receiving “Out of Memory” exceptions when trying to load a 32 bit PostgreSQL ODBC driver from within an SSIS package.

    When the package was run from the command line using Dtexec, all was fine. When the package was run from within the SSIS Catalog, the same package refused to run. They had presumed it was some issue to do with 32 bit vs 64 bit drivers. The customer resolved it by installing the latest 64 bit PostgreSQL ODBC drivers.

    However, it’s important to know that when you see an “Out of Memory” error on attempting to load a 32 bit DLL, it usually doesn’t mean anything about memory at all.

    Under the covers, in 32 bit Windows, loading an accessing a function in a DLL was performed by:

    1. Making an API call to LoadLibrary() – this brought the DLL into memory if it wasn’t already present

    2. Making an API call to GetProcAddress() – because the DLL could be located anywhere in memory, there was a need to locate the actual memory address of the procedure in the DLL in its loaded location

    3. Making a call to the address returned by the GetProcAddress() call.

    With my previous developer hat on, there are several places where I’ve seen this go wrong.

    One is that people don’t check the return address from GetProcAddress(). It can return null if the procedure isn’t found. So someone who writes code that just immediately calls the address returned without checking if it is null, would end up generating the previous infamous “this program has performed an illegal operation and will be shut down” message that we used to see.

    The less common problem was that LoadLibrary() had its own qwerks. The worst was that if it could not locate the DLL, the error returned was “Out of Memory”. I always thought that was one of the silliest error messages to ever come out of Windows, but it’s entirely relevant here.

    When you see an “Out of Memory” error when attempting to load a 32 bit DLL, it’s time to check whether the DLL can be located by the process. The easiest (although not the cleanest) would be to make sure the DLL is in the GAC (global assembly cache).

  • Invoke-SqlCmd4 - A Superior SQL Commandlet

    Recently, I wrote about one of the issues with the Invoke-SqlCmd commandlet where it sets the ApplicationName when you use the parameter that should set the host.

    Fellow MVP Ben Miller sent me a copy of the Invoke-SqlCmd3 that they were using. It was much better.

    I then realized that there were many other options missing from these commandlets (such as options for Database Mirroring and Availablity Groups) and so I set about improving it.

    Ben has posted it to the PowerShell Code Repository. You’ll find Invoke-SqlCmd4 here:

    http://poshcode.org/5810

    This version of the commandlet will allow setting many other parameters (and it sets the ApplicationName and HostName correctly). The parameters are:

    1. [Parameter(Position = 0, Mandatory=$true)] [string]$ServerInstance,

    2. [Parameter(Position = 1, Mandatory = $false)] [string]$DatabaseName,

    3. [Parameter(Position = 2, Mandatory = $false)] [string]$Query,

    4. [Parameter(Position = 3, Mandatory = $false)] [string]$UserName,

    5. [Parameter(Position = 4, Mandatory = $false)] [string]$Password,

    6. [Parameter(Position = 5, Mandatory = $false)] [Int32]$QueryTimeout = 600,

    7. [Parameter(Position = 6, Mandatory = $false)] [Int32]$ConnectionTimeout = 30,

    8. [Parameter(Position = 7, Mandatory = $false)] [string]$ApplicationName = "PowerShell SQLCMD",

    9. [Parameter(Position = 8, Mandatory = $false)] [string]$HostName,

    10. [Parameter(Position = 9, Mandatory = $false)] [ValidateSet("ReadOnly", "ReadWrite")] [string] $ApplicationIntent,

    11. [Parameter(Position = 10, Mandatory = $false)] [ValidateScript({test-path $_})] [string]$InputFile,

    12. [Parameter(Position = 11, Mandatory = $false)] [ValidateSet("DataSet", "DataTable", "DataRow")] [string]$OutputAs = "DataRow",

    13. [Parameter(Position = 12, Mandatory = $false)] [string]$FailoverPartnerServerInstance,

    14. [Parameter(Position = 13, Mandatory = $false)] [bool]$IsMultiSubnetFailover = $false

    Hope that helps someone.

  • Optimizing Dynamic Search Screens – Fixing the Performance of Common Code Paths

    One of the common challenges that arises in many applications is around how to optimize the SQL Server performance of dynamic search queries. For example, we've all seen applications with screens like this:

    clip_image002

    Often, there will be many, many more fields to be searched. What I often hear from developers is that there is no way to optimize the query because you don't know in advance which options the user will choose. While this is partly true, it assumes that human behaviour is much more random than it really is. Let me explain:

    A few years back, I was doing performance tuning work at a company that books flights. They had a stored procedure that took a single parameter of a BookingNumber, and returned details of a customer's booking. It took around ½ second to execute.

    Eventually, one of the users said "I'd love to be able to look up the booking by LastName". So they changed the procedure to have an optional parameter to allow for that. Later another user asked for EmailAddress. And so on, and so on. When I arrived, they had a 4000 line procedure that allowed 14 optional parameters. Unfortunately, it now took 90 seconds to execute every time.

    What was truly sad, was that in a week of testing, I never saw any value passed except the BookingNumber. So they had created a situation where all the users all day long were experiencing glacial response times, for a situation that almost never occurred.

    I see this sort of scenario played out time and again. When confronted with a screen that allows the users to pick and choose search criteria, developers always seem to assume that people will choose a wide variety of options. Yet that isn't what happens. Generally, there will be a small set of options that they will choose again and again, and only rarely will they make other choices.

    Knowing that, you realise that even though you can't optimize for every possible code path, it's critical that the common paths are optimized. If necessary, provide alternate code paths for the high volume query types.

    Finally, for the atypical choices made by users, what do you do? A Swedish friend of mine (and fellow SQL Server MVP) Erland Sommarskog has some awesome articles at his web site www.sommarskog.se. One in particular is on Dynamic Search Conditions. I recommend reading it if you have never done so.

  • Power BI Designer March Update includes Google Analytics Connector

    The Power BI team have released details of their March update to the standalone Power BI designer.

    You’ll find details of the update here: http://blogs.msdn.com/b/powerbi/archive/2015/03/26/7-new-updates-to-the-power-bi-designer-preview-march-2015.aspx

    The first thing I noticed is just how much faster the tool operates. The blog post mentioned performance enhancements but I really, really noticed them.

    One particular enhancement  that I wanted to call out was the additional of a connector for Google Analytics. I’ve been trying that this morning and have found it really easy to use. All the standard categories appear as available sets of data:

    image

    I’ve picked those that I’m interested in:

    image

    Then added them to a report:

    image

     

    Very easy. And it’s clear which type of device/operating system we need to continue to ensure the best experience on at SQL Down Under.

  • Powershell Invoke-Sqlcmd –Hostname Parameter sets the Application Name instead of the Host Name

    Two of the parameters in SQL Server connections are the Application Name and the Host Name. You can see these in SQL Server if you execute the following command:

    image

    I’ve always been a fan of having applications identify themselves in their connection strings. It makes tasks like tracing much easier. The tools supplied with SQL Server do a reasonable job of that as you can see above. But many other tools don’t do such a good job.

    I was working at a site today where they are using Powershell to execute commands for monitoring. I noticed that the Powershell commands did not set the Application Name in the connection string when using Invoke-Sqlcmd. Note the following example:

    image

    I then tried to work out how to set the Application Name. When I checked the documentation for Invoke-Sqlcmd, it shows that the Hostname is set via the SQLCMD option –H, by using the –Hostname parameter.

    image

    However, note that if you use the –Hostname option, it actually sets the Application Name and does not set the Host Name:

    image

     



  • Suggest Compression Strategies for Tables and Indexes

    I make no secret of the fact that I think that table compression was the best thing added to SQL Server 2008.

    It seems like every week that I’m talking to clients about it. They have Enterprise Edition but they weren’t using it because they were familiar with O/S level disk compression and they think that compression will mean smaller and slower, yet this often couldn’t be further from the truth for customers whose databases are currently I/O bound or who don’t have enough memory in their buffer cache for it to work correctly.

    It’s important though, to make sure the correct form of compression is used for each table and index. In fact, the decision should be taken for each partition of each table and index if the table is partitioned. We often recommend different compression strategies for different partitions of the same table.

    To make it easier to decide which form of compression might suit, I’ve created another version of the script that I use as a first pass in determining compression strategies. The earlier version suggested ROW and/or PAGE compression. While this version can do the same, it also considers the clustered columnstore indexes that are part of the Enterprise Edition of SQL Server 2014. (I’ve included a switch to turn that function on or off in case you don’t have 2014 yet).

    The cutoff points are configurable but my default values are as shown.

    • CCI (Clustered Columnstore Index) will be recommended where the partition is scanned more than 95% of the time, updated less than 10% of the time, seeks and lookups are less than 5% of the time, and where there are at least 800,000 rows. It will also only be recommended if it is supported.
    • PAGE will be recommended where the partition is scanned more than 75% of the time and updated less than 20% of the time.
    • ROW will be recommended in all other cases. We believe that ROW should be the default in SQL Server across the board, instead of NONE.

    It is important that this script only be run after the system has been in use for long enough to have experienced typical usage patterns.

    I hope this helps you to get into using compression where it makes sense for you.

    -----------------------------------------------------------------------------

    -- Suggest data compression changes for tables and indexes

    --

    -- Dr Greg Low

    -- March 2015

    --

    -----------------------------------------------------------------------------

    DECLARE @ClusteredColumnstoreScansCutoff int = 95;

    DECLARE @ClusteredColumnstoreUpdatesCutoff int = 10;

    DECLARE @ClusteredColumnstoreSeeksLookupsCutoff int = 5;

    DECLARE @ClusteredColumnstoreTotalRowsCutoff bigint = 800000;

     

    DECLARE @PageCompressionScansCutoff int = 75;

    DECLARE @PageCompressionUpdatesCutoff int = 20;

     

    DECLARE @IsClusteredColumnstoreSupported bit = 1;

    -----------------------------------------------------------------------------

     

    WITH IndexUsageStats

    AS

    ( SELECT object_id AS ObjectID,

             index_id AS IndexID,

             COALESCE(user_seeks, 0) + COALESCE(system_seeks, 0) AS Seeks,

             COALESCE(user_scans, 0) + COALESCE(system_scans, 0) AS Scans,

             COALESCE(user_lookups, 0) + COALESCE(system_lookups, 0) AS Lookups,

             COALESCE(user_updates, 0) + COALESCE(system_updates, 0) AS Updates,

             COALESCE(user_seeks, 0) + COALESCE(system_seeks, 0)

               + COALESCE(user_scans, 0) + COALESCE(system_scans, 0)

               + COALESCE(user_lookups, 0) + COALESCE(system_lookups, 0)

               + COALESCE(user_updates, 0) + COALESCE(system_updates, 0) AS Operations

      FROM sys.dm_db_index_usage_stats

      WHERE database_id = DB_ID()

    ),

    PartitionUsageDetails

    AS

    ( SELECT SCHEMA_NAME(t.schema_id) AS SchemaName,

             t.name AS TableName,

             i.name AS IndexName,

             i.index_id AS IndexID,

             i.type_desc AS IndexType,

             CASE WHEN COALESCE(Operations, 0) <> 0

                  THEN CAST((COALESCE(Seeks, 0) + COALESCE(Lookups, 0)) * 100.0

                            / COALESCE(Operations, 0) AS int)

                  ELSE 0

             END AS SeekLookupPercentage,

             CASE WHEN COALESCE(Operations, 0) <> 0

                  THEN CAST(COALESCE(Scans, 0) * 100.0 / COALESCE(Operations, 0) AS int)

                  ELSE 0

             END AS ScanPercentage,

             CASE WHEN COALeSCE(Operations, 0) <> 0

                  THEN CAST(COALESCE(Updates, 0) * 100.0 / COALESCE(Operations, 0) AS int)

                  ELSE 0

             END AS UpdatePercentage,

             p.partition_number AS PartitionNumber,

             p.data_compression_desc AS CurrentCompression,

             p.rows AS TotalRows

       FROM sys.tables AS t

       INNER JOIN sys.indexes AS i

       ON t.object_id = i.object_id

       INNER JOIN sys.partitions AS p

       ON i.object_id = p.object_id

       AND i.index_id = p.index_id

       LEFT OUTER JOIN IndexUsageStats AS ius

       ON i.object_id = ius.ObjectID

       AND i.index_id = ius.IndexID

       WHERE i.index_id > 0

       AND t.is_ms_shipped = 0

       AND t.type = N'U'

    ),

    SuggestedPartitionCompressionTypes

    AS

    ( SELECT pud.*,

             CASE WHEN pud.ScanPercentage >= @ClusteredColumnstoreScansCutoff

                  AND pud.UpdatePercentage <= @ClusteredColumnstoreUpdatesCutoff

                  AND pud.SeekLookupPercentage <= @ClusteredColumnstoreSeeksLookupsCutoff

                  AND pud.TotalRows >= @ClusteredColumnstoreTotalRowsCutoff

                  AND @IsClusteredColumnstoreSupported <> 0

                  THEN N'CCI'

                  WHEN pud.ScanPercentage >= @PageCompressionScansCutoff

                  AND pud.UpdatePercentage <= @PageCompressionUpdatesCutoff

                  THEN N'PAGE'

                  ELSE N'ROW'

             END AS SuggestedCompression

      FROM PartitionUsageDetails AS pud

    )

    SELECT spct.SchemaName,

           spct.TableName,

           spct.IndexName,

           spct.PartitionNumber,

           spct.CurrentCompression,

           spct.SuggestedCompression

    FROM SuggestedPartitionCompressionTypes AS spct

    WHERE spct.SuggestedCompression <> spct.CurrentCompression

    ORDER BY spct.SchemaName,

             spct.TableName,

             CASE WHEN spct.IndexID = 1 THEN 0 ELSE 1 END,

             spct.IndexName;

     

  • Rebuild Partitioned Clustered Columnstore Indexes

    A short time back, I provided an updated script that can be used to rebuild clustered columnstore indexes when necessary.

    One thing that wasn’t covered by that script was the ability to only individual index partitions. Below, I’ve provided an updated version of the script that will rebuild only affected partitions if the clustered columnstore index is partitioned.

    ----------------------------------------------------------------------

    -- Rebuild clustered columnstore indexes when necessary

    --

    -- Dr Greg Low v2.2

     

    ----------------------------------------------------------------------

    -- rebuild when more than supplied % of rows have been deleted

    -- rebuild when any segments contain more than supplied % deleted rows

    -- rebuild if more than supplied number of segments are empty

    -- rebuild when segments are on average less than the supplied % of

    --     maximum capacity

     

    DECLARE @DeletedTotalRowPercentage int = 10; 

    DECLARE @DeletedSegmentsRowPercentage int = 20;

    DECLARE @EmptySegmentsAllowed int = 0;

    DECLARE @SegmentPotentialFullnessPercentage int = 90;

    ----------------------------------------------------------------------

     

    DECLARE @IndexesToRebuild TABLE (SchemaName sysname,

                                     TableName sysname,

                                     IndexName sysname,

                                     PartitionNumber int,

                                     IsPartitioned bit);

     

    SET NOCOUNT ON;

     

    WITH ClusteredColumnstoreIndexes

    AS

    ( SELECT t.object_id AS ObjectID,

             SCHEMA_NAME(t.schema_id) AS SchemaName,

             t.name AS TableName,

             i.name AS IndexName

      FROM sys.indexes AS i

      INNER JOIN sys.tables AS t

      ON i.object_id = t.object_id

      WHERE i.type = 5

    ),

    RowGroups

    AS

    ( SELECT csrg.object_id AS ObjectID,

             csrg.partition_number AS PartitionNumber,

             csrg.total_rows AS TotalRows,

             csrg.deleted_rows AS DeletedRows,

             csrg.deleted_rows * 100.0 / csrg.total_rows AS DeletedPercentage,

             CASE WHEN csrg.total_rows = csrg.deleted_rows

                  THEN 1 ELSE 0

             END AS IsEmptySegment

      FROM sys.column_store_row_groups AS csrg

      WHERE csrg.state = 3 -- Compressed (Ignoring: 0 - Hidden, 1 - Open, 2 - Closed, 4 - Tombstone)

    ),

    IndexPartitionStats

    AS

    ( SELECT cci.ObjectID,

             cci.SchemaName,

             cci.TableName,

             cci.IndexName,

             rg.PartitionNumber,

             SUM(CAST(rg.TotalRows AS decimal(18,0))) AS TotalRows,

             SUM(CAST(rg.DeletedRows AS decimal(18,0))) AS DeletedRows,

             SUM(CAST(rg.DeletedRows AS decimal(18,0))) * 100.0

               / SUM(CAST(rg.TotalRows AS decimal(18,0)))

               AS DeletedPercentage,

             SUM(rg.IsEmptySegment) aS EmptySegments,

             COUNT(rg.TotalRows) AS TotalSegments

      FROM ClusteredColumnstoreIndexes AS cci

      INNER JOIN RowGroups AS rg

      ON cci.ObjectID = rg.ObjectID

      GROUP BY cci.ObjectID, cci.SchemaName, cci.TableName, cci.IndexName, rg.PartitionNumber

    )

    INSERT @IndexesToRebuild (SchemaName, TableName, IndexName, PartitionNumber, IsPartitioned)

    SELECT s.SchemaName,

           s.TableName,

           s.IndexName,

           s.PartitionNumber,

           CASE WHEN EXISTS(SELECT 1

                            FROM IndexPartitionStats AS ps

                            WHERE s.SchemaName = ps.SchemaName

                            AND s.TableName = ps.TableName

                            AND s.IndexName = ps.IndexName

                            AND ps.PartitionNumber > 1)

                THEN 1

                ELSE 0

           END

    FROM IndexPartitionStats AS s

    WHERE s.DeletedPercentage > @DeletedTotalRowPercentage

    OR s.EmptySegments > @EmptySegmentsAllowed

    OR EXISTS(SELECT 1 FROM RowGroups AS rg

                       WHERE rg.ObjectID = s.ObjectID

                       AND rg.DeletedPercentage > @DeletedSegmentsRowPercentage)

    OR (s.TotalRows * 100 / (s.TotalSegments * 1048576.0))

                < @SegmentPotentialFullnessPercentage;

     

    DECLARE @SchemaName sysname;

    DECLARE @TableName sysname;

    DECLARE @IndexName sysname;

    DECLARE @PartitionNumber int;

    DECLARE @IsPartitioned bit;

    DECLARE @SqlCommand nvarchar(max);

     

    DECLARE IndexList CURSOR FAST_FORWARD READ_ONLY

    FOR

      SELECT SchemaName, TableName, IndexName, PartitionNumber, IsPartitioned

      FROM @IndexesToRebuild

      ORDER BY SchemaName, TableName, IndexName, PartitionNumber;

     

    OPEN IndexList;

     

    FETCH NEXT FROM IndexList INTO @SchemaName, @TableName, @IndexName, @PartitionNumber, @IsPartitioned;

     

    WHILE @@FETCH_STATUS = 0

    BEGIN

      SET @SqlCommand = N'ALTER INDEX ' + QUOTENAME(@IndexName)

                      + N' ON ' + QUOTENAME(@SchemaName)

                      + N'.' + QUOTENAME(@TableName)

                      + N' REBUILD'

                      + CASE WHEN @IsPartitioned <> 0

                             THEN N' PARTITION = ' + CAST(@PartitionNumber AS nvarchar(10))

                             ELSE N''

                        END

                      + ';';

      PRINT @SqlCommand;

      EXEC(@SqlCommand);

      FETCH NEXT FROM IndexList INTO @SchemaName, @TableName, @IndexName, @PartitionNumber, @IsPartitioned;

    END;

     

    CLOSE IndexList;

    DEALLOCATE IndexList;

     

     

     

     

  • Lots of (Mostly Free) eBooks from Microsoft Virtual Academy

    I know that many of my readers appreciate finding out when free eBooks become available.

    Microsoft Virtual Academy has been releasing quite a few lately. You’ll find them here: http://www.microsoftvirtualacademy.com/eBooks?PRID=MVP-AU-7914

    In particular, SQL Server and database folk might want to take a look at the following:

    • Introducing Microsoft SQL Server 2014
    • Introducing Windows Server 2012 R2
    • Introducing Microsoft Azure HDInsight
    • Microsoft Azure Essentials: Fundamentals of Azure
    • Microsoft Azure Essentials: Fundamentals of Azure
    • Introducing Windows Azure for IT Professionals
    • Rethinking Enterprise Storage: A Hybrid Cloud Model
  • Interested in Winning a Trip to the Ignite Conference?

    This is the first year for Microsoft’s new Ignite conference. Many people are keen to go but figure it would break the bank.

    image

    Here’s another option:

    The Microsoft Virtual Academy have their Heroes program running again. You get one point in the draw for just signing up. Even better though, take some courses and both learn something worthwhile and increase your chances of winning the competition:

    image

    Sign up and enter here now: SIGN UP           (NOTE: Australians Only for this one sadly)

  • Update on Options for Learning Chinese: eChineseLearning

    I seem to have quite a crowd of people that asks me how I’m going with learning Chinese whenever they see me. I promised that I would put together some posts on what has/hasn’t worked for me.

    Over the last few years I had spent quite a bit of time trying to get my Mandarin fluency up. I had finished the 5 levels of Rosetta Stone which helped a lot with vocab but I really don’t like what they’ve done with their online “Rosetta Studio” options. They aren’t good value any more and you don’t get enough practice to be worthwhile. What I wanted to do was to get a better vocab and to gain a lot more practice in conversation, as that’s where the real learning happens.

    I’ve tried a wide variety of options. My current top recommendation is:

    eChineseLearning – No idea where I first came across them (probably an online search) but this has been wonderful. This is providing me with a structured 1 on 1 conversational learning environment. Each week, I spend 3 separate sessions of just under an hour, one on one with a teacher from Wuhan (north of Beijing). They have a booking site where available timeslots are shown, both for my primary teacher (Amy) and for alternates if I need to have a class at a time that Amy isn’t available. I was quite wary about signing up for a service like this but I’m so happy I did. Amy is both interesting and interested in my learning, and provides a good structured approach to work towards proficiency.

    I think this is the quickest way to a good level of proficiency. Each lesson, we spend some time in general conversation, then spend a while working on some new words, then spend some time on grammatical issues, and finally spend some time with listening practice. I like the way it is organised and I like the flexibility of the booking. Most other sites I looked at needed you to commit to the same time each week. With my schedule, that wouldn’t work. eChineseLearning allows me to pick times in each week that will work for me. So if I’m travelling, or have family engagements, I can just work around it. You can change it up to 24 hours before each session.

    I could imagine this site being good for anyone that wants to start learning the language (ie: beginner), or who is keen to become more proficient.

    Like all these types of sites, they love referrals. They’d love you to tell them that you heard of them from me. If you do that, both you and I get free extra time. But that’s up to you.

    Either way, if you’ve wondered about learning another language like Mandarin or you have already tried and stalled, you might want to get a feel for it. They have a free trial lesson. You can sign up here: http://www.echineselearning.com/free-trial/index.html 

    Highly recommended!

  • Clustered columnstore index rebuild script–updated

    Recently I published a series of blog posts that provided scripts for rebuilding clustered columnstore indexes.

    Niko Neugebauer sent me a suggested update to the latest version of those scripts and it makes sense. The change will help to filter out row groups that are not compressed and Niko noted that as Microsoft have started to include more objects (like Tombstone values)  into sys.column_store_row_groups in Azure SQLDatabase, he believes that will help to prevent future SQL versions having a problem with the scripts.

    Sounds good so the latest version is shown below. Thanks Niko !

    ----------------------------------------------------------------------

    -- Rebuild clustered columnstore indexes when necessary

    -- Dr Greg Low v2.1

     

    ----------------------------------------------------------------------

    -- rebuild when more than supplied % of rows have been deleted

    -- rebuild when any segments contain more than supplied % deleted rows

    -- rebuild if more than supplied number of segments are empty

    -- rebuild when segments are on average less than the supplied % of

    --     maximum capacity

     

    DECLARE @DeletedTotalRowPercentage int = 10;  

    DECLARE @DeletedSegmentsRowPercentage int = 20;

    DECLARE @EmptySegmentsAllowed int = 0;

    DECLARE @SegmentPotentialFullnessPercentage int = 90;

    ----------------------------------------------------------------------

     

    DECLARE @IndexesToRebuild TABLE (SchemaName sysname,

                                     TableName sysname,

                                     IndexName sysname);

     

    WITH ClusteredColumnstoreIndexes

    AS

    ( SELECT t.object_id AS ObjectID,

             SCHEMA_NAME(t.schema_id) AS SchemaName,

             t.name AS TableName,

             i.name AS IndexName

      FROM sys.indexes AS i

      INNER JOIN sys.tables AS t

      ON i.object_id = t.object_id

      WHERE i.type = 5

    ),

    RowGroups

    AS

    ( SELECT csrg.object_id AS ObjectID,

             csrg.total_rows AS TotalRows,

             csrg.deleted_rows AS DeletedRows,

             csrg.deleted_rows * 100.0 / csrg.total_rows AS DeletedPercentage,

             CASE WHEN csrg.total_rows = csrg.deleted_rows

                  THEN 1 ELSE 0

             END AS IsEmptySegment

      FROM sys.column_store_row_groups AS csrg

      WHERE csrg.state = 3 -- Compressed (Ignoring: 0 - Hidden, 1 - Open, 2 - Closed, 4 - Tombstone)

    ),

    IndexStats

    AS

    ( SELECT cci.ObjectID,

             cci.SchemaName,

             cci.TableName,

             cci.IndexName,

             SUM(CAST(rg.TotalRows AS decimal(18,0))) AS TotalRows,

             SUM(CAST(rg.DeletedRows AS decimal(18,0))) AS DeletedRows,

             SUM(CAST(rg.DeletedRows AS decimal(18,0))) * 100.0

               / SUM(CAST(rg.TotalRows AS decimal(18,0)))

               AS DeletedPercentage,

             SUM(rg.IsEmptySegment) aS EmptySegments,

             COUNT(rg.TotalRows) AS TotalSegments

      FROM ClusteredColumnstoreIndexes AS cci

      INNER JOIN RowGroups AS rg

      ON cci.ObjectID = rg.ObjectID

      GROUP BY cci.ObjectID, cci.SchemaName, cci.TableName, cci.IndexName

    )

    INSERT @IndexesToRebuild (SchemaName, TableName, IndexName)

    SELECT s.SchemaName, s.TableName, s.IndexName

    FROM IndexStats AS s

    WHERE s.DeletedPercentage > @DeletedTotalRowPercentage

    OR s.EmptySegments > @EmptySegmentsAllowed

    OR EXISTS(SELECT 1 FROM RowGroups AS rg

                       WHERE rg.ObjectID = s.ObjectID

                       AND rg.DeletedPercentage > @DeletedSegmentsRowPercentage)

    OR (s.TotalRows * 100 / (s.TotalSegments * 1048576.0))

                < @SegmentPotentialFullnessPercentage;

     

    DECLARE @SchemaName sysname;

    DECLARE @TableName sysname;

    DECLARE @IndexName sysname;

    DECLARE @SqlCommand nvarchar(max);

     

    DECLARE IndexList CURSOR FAST_FORWARD READ_ONLY

    FOR

      SELECT SchemaName, TableName, IndexName

      FROM @IndexesToRebuild

      ORDER BY SchemaName, TableName, IndexName;

     

    OPEN IndexList;

     

    FETCH NEXT FROM IndexList INTO @SchemaName, @TableName, @IndexName;

     

    WHILE @@FETCH_STATUS = 0

    BEGIN

      SET @SqlCommand = N'ALTER INDEX ' + QUOTENAME(@IndexName)

                      + N' ON ' + QUOTENAME(@SchemaName)

                      + N'.' + QUOTENAME(@TableName)

                      + N' REBUILD PARTITION = ALL;';

      PRINT @SqlCommand;

      EXEC(@SqlCommand);

      FETCH NEXT FROM IndexList INTO @SchemaName, @TableName, @IndexName;

    END;

     

    CLOSE IndexList;

    DEALLOCATE IndexList;

     

  • PASS BA Analytics Conference for 2015

    I’m always telling people that the thing I love about BI projects is that they tend to appeal to the people who pay the bills. This is a good place to be doing project work.

    When I look at large companies like Amazon, I can imagine that the people who do the IT work that relates to order processing, invoicing, shipping, etc. do a great job. But I’ll bet that their life is full of budget cutbacks, headcounts, increased pressure to improve productivity, etc. By comparison, I’ll bet the projects that fund parts of their site that make recommendations for possible purchases, etc. are funded at an entirely different level. And that’s because these projects can more directly impact the profitability of the organization, and do so in a more visible way.

    When you stop and look at your career, you might want to consider whether you should be trying to move up the value stack within your organization.

    A great place to get ideas on this would be to attend the PASS BA Analytics Conference. This relatively new conference has a very different focus to the “standard” PASS summit. If analytics interests you at all, this would be the place to be in April:

    And if you provide the registration code UKBA05, you’ll get a $150 discount as well.

More Posts Next page »

This Blog

Syndication

Tags

No tags have been created or used yet.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement