THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Greg Low (The Bit Bucket: IDisposable)

Ramblings of Greg Low (SQL Server MVP, MCM and Microsoft RD) - SQL Down Under

  • Data Tales #10: A Matter of Existence

    Hi Folks,

    I’ve been continuing with my Data Tales series up at sqlmag.com.

    This time I’ve written about a pet hate: seeing code that counts rows when all it needs to do is to check for existence. However, all is not as bad as it seems because the optimizer does a pretty good job of protecting us from ourselves. (At least the SQL Server optimizer does Smile)

    It’s here: http://sqlmag.com/sql-server/data-tales-10-it-s-matter-existence

    Enjoy !

  • Power Query Webinar this week

    Hi Folks,

    That Power Query webinar that I’m doing for the global Power BI team is on this week.

    Love to see you attend.

    Registration is here: https://info.microsoft.com/CO-PowerBI-WBNR-FY17-08Aug-25-PowerQuery-PowerBI-Registration.html

  • Database on a diet–final section (part 4)

    Hi Folks,

    I’ve been writing articles for SQL Server Magazine as part of my Data Tales series.

    Over the last couple of months, I’ve been writing a min-series about a database on a diet.

    Earlier parts described the need for size reduction, how ROW and PAGE compression come into play, how to choose between them, etc.

    Part 4 (the final part) is now online here:

    http://sqlmag.com/sql-server/data-tales-9-case-database-diet-part-4

    It covers GZIP compression of large string fields, PDF size reduction in BLOBS, and more.

    Enjoy !

  • Power Query Webinar

    Hi Folks,

    Want to get your head around Power Query in both Excel and Power BI Desktop?

    I've got a webinar happening for the Power BI team later this month: https://info.microsoft.com/CO-PowerBI-WBNR-FY17-08Aug-25-PowerQuery-PowerBI-Registration.html

  • SQL Server index updates vs PostgreSQL index updates

    The link below is to quite a fascinating article that on the surface is about why Uber moved to MySQL from PostgreSQL.

    What it really is about is a severe limitation in how PostgreSQL indexes are updated.

    With SQL Server, if you update a few columns in a row, only the indexes that have that column are updated. In PostgreSQL, all indexes still need to be updated. I understand why they've done that but it's not a good outcome.

    SQL Server shines in this regard.

    http://use-the-index-luke.com/blog/2016-07-29/on-ubers-choice-of-databases

  • Happy Birthday Power BI from all of us

    It’s hard to believe that it’s only a year, given how far Power BI has come.

    Many of us in the community got together under the lead of Paul Turley and Adam Saxton and created a short video for Power BI’s birthday.

    Looks good:

     

    image

    https://youtu.be/An_8wf4RwBw

    So to James Phillips, and all the Power BI team, happy birthday from Down Under, Power BI !

     

    (Good to see some locals in there too Smile)

  • SQL Server 2016 Polybase Services stuck in Change Pending state

    I’ve been using the Polybase services in SQL Server 2016 all throughout the pre-release period without any issue.

    When I tried using them in the GA (Generally Availability) release, I could never get the two Polybase services to start. They were always stuck in a “Change Pending” state. Trying to start them in SQL Server Configuration Manager basically made SSCM look like it was hung.

    Trying to use the Polybase services to create an external data source returned:

    TCP Provider: No connection could be made because the target machine actively refused it.

    clip_image001

    Fellow MVP John Paul Cook came to the rescue today. He previously had an issue with TCP Chimney support and noted that it looked similar.

    True enough, the problem was that TCP wasn’t an enabled protocol on my SQL Server install on my machine. Enabling TCP and restarting made all good.

    Thanks John !

  • Search for a String in all Tables of a Database

    When I work in customer sites, there is a quite a difference between how different sites get data for testing. But it basically comes down to a variation of one of these five:

    • Testing? That’s for wimps. (Well perhaps for users)
    • I type in all the test data that I need.
    • We use a data generator to produce our test data.
    • We just periodically get a copy of production data.
    • We get production data and it gets masked before we can use it.

    It’s the last option that I’m concerned with today. Masking data is challenging and usually much more difficult than imagined. This week, I was working with a masked database and we ended up finding all client details tucked inside a chunk of XML that was used to hold the metadata for a series of images. They had done a good job of removing it from all the normal columns from all tables in the database, but it’s so easy to miss extras like this.

    So after we fixed the issue, I was left wondering how effectively it really had all been masked.

    The following script takes a while to run on a large database. It takes a string to search for, and looks to find any locations in the database that hold that string. Optionally, it dumps out a copy of all rows containing the string. It does this for char, nchar, varchar, nvarchar, varchar(max), nvarchar(max), and xml data types.

    I hope someone finds it useful.

    -- Search for a string anywhere in a database

    -- v1.0 Dr Greg Low, 11 June 2016

     

    DECLARE @StringToSearchFor nvarchar(max) = N'Jones';

    DECLARE @IncludeActualRows bit = 1;

     

    DECLARE @SchemaName sysname;

    DECLARE @TableName sysname;

    DECLARE @ColumnName sysname;

    DECLARE @IsNullable bit;

    DECLARE @TableObjectID int;

    DECLARE @Message nvarchar(max);

    DECLARE @FullTableName nvarchar(max);

    DECLARE @BaseDataTypeName sysname;

    DECLARE @WereStringColumnsFound bit;

    DECLARE @Predicate nvarchar(max);

    DECLARE @SQL nvarchar(max);

    DECLARE @SummarySQL nvarchar(max) = N'';

    DECLARE @NumberOfTables int;

    DECLARE @TableCounter int = 0;

     

    IF OBJECT_ID(N'tempdb..#FoundLocations') IS NOT NULL

    BEGIN

           DROP TABLE #FoundLocations;

    END;

     

    CREATE TABLE #FoundLocations

    (

        FullTableName nvarchar(max),

        NumberOfRows bigint

    );

     

    SET @NumberOfTables = (SELECT COUNT(*) FROM sys.tables AS t

                                           WHERE t.is_ms_shipped = 0

                                           AND t.type = N'U');

     

    DECLARE TableList CURSOR FAST_FORWARD READ_ONLY

    FOR

    SELECT SCHEMA_NAME(schema_id) AS SchemaName, name AS TableName, object_id AS TableObjectID

    FROM sys.tables AS t

    WHERE t.is_ms_shipped = 0

    AND t.type = N'U'

    ORDER BY SchemaName, TableName;

     

    OPEN TableList;

    FETCH NEXT FROM TableList INTO @SchemaName, @TableName, @TableObjectID;

     

    WHILE @@FETCH_STATUS = 0

    BEGIN

        SET @TableCounter += 1;

        SET @FullTableName = QUOTENAME(@SchemaName) + N'.' + QUOTENAME(@TableName);

        SET @Message = N'Checking table '

                          + CAST(@TableCounter AS nvarchar(20))

                          + N' of '

                          + CAST(@NumberOfTables AS nvarchar(20))

                          + N': '

                          + @FullTableName;

        PRINT @Message;

       

        SET @WereStringColumnsFound = 0;

        SET @Predicate = N'';

       

        DECLARE ColumnList CURSOR FAST_FORWARD READ_ONLY

        FOR

        SELECT c.name AS ColumnName, t.name AS BaseDataTypeName

        FROM sys.columns AS c

        INNER JOIN sys.types AS t

        ON t.system_type_id = c.system_type_id

        AND t.user_type_id = c.system_type_id -- note: want the base type not the actual type

        WHERE c.object_id = @TableObjectID

        AND t.name IN (N'text', N'ntext', N'varchar', N'nvarchar', N'char', N'nchar', N'xml')

           AND (c.max_length >= LEN(@StringToSearchFor) OR c.max_length < 0) -- allow for max types

        ORDER BY ColumnName;

       

        OPEN ColumnList;

        FETCH NEXT FROM ColumnList INTO @ColumnName, @BaseDataTypeName;

       

        WHILE @@FETCH_STATUS = 0

        BEGIN

              SET @WereStringColumnsFound = 1;

              IF @Predicate <> N''

              BEGIN

              SET @Predicate += N' OR ';

              END;

              SET @Predicate += CASE WHEN @BaseDataTypeName = N'xml'

                                     THEN N'CAST(' + QUOTENAME(@ColumnName) + N' AS nvarchar(max))'

                                     ELSE QUOTENAME(@ColumnName)

                                END

                              + N' LIKE N''%' + @StringToSearchFor + N'%''';

              FETCH NEXT FROM ColumnList INTO @ColumnName, @BaseDataTypeName;

        END;

       

        CLOSE ColumnList;

        DEALLOCATE ColumnList;

       

        IF @WereStringColumnsFound <> 0

              BEGIN

              SET @SQL = N'SET NOCOUNT ON;

                           INSERT #FoundLocations (FullTableName, NumberOfRows)

                           SELECT N''' + @FullTableName + N''', COUNT_BIG(*) FROM '

                       + @FullTableName

                            + N' WHERE '

                            + @Predicate

                                                + N';';

              EXECUTE (@SQL);

             

              IF (SELECT NumberOfRows FROM #FoundLocations WHERE FullTableName = @FullTableName) > 0

              BEGIN

                  SET @SummarySQL += N'SELECT * FROM ' + @FullTableName + N' WHERE ' + @Predicate + N';' + NCHAR(13) + NCHAR(10);

              END;

        END;

       

        FETCH NEXT FROM TableList INTO @SchemaName, @TableName, @TableObjectID;

    END;

     

    CLOSE TableList;

    DEALLOCATE TableList;

     

    SELECT *

    FROM #FoundLocations

    WHERE NumberOfRows > 0

    ORDER BY FullTableName;

     

    DROP TABLE #FoundLocations;

     

    IF @SummarySQL <> N'' AND @IncludeActualRows <> 0

    BEGIN

        EXECUTE (@SummarySQL);

    END;

     

  • Auto-Injection of Enterprise Edition Features

    There’s an interesting scenario that’s come up in creating some new samples for SQL Server 2016.

    I know that for many, many years people have asked for a way to limit developer edition to only use standard edition features, or at least to have a way to flag them. I’ve often thought that I’d like a way to set a “target environment” and have the colorization change anything that I can’t use in standard to an obviously different color.

    However, previously, if you used developer edition to create a database, as long as you didn’t use any enterprise features, you could then backup and restore that database to a standard edition server.

    That’s no longer the case with SQL Server 2016.

    If you create a temporal table in a database on developer edition, it automatically chooses PAGE compression for the history file. There is also no DDL to avoid this. The database can no longer be restored on a standard edition server. To work around it, you would need to rebuild the table without compression after it’s created.

    I see this as quite a change in behavior. I don’t think that features that are only available in enterprise (or other perhaps Azure) editions should be “auto-injected”.

    Clearly, if you script out the database and create a new database on standard edition using the script, it will create the history table without compression. But I suspect that many people create DBs as starter DBs using developer edition, and then deploy them to other editions that don’t support compression.

    I’d be interested in thoughts on how common this practice currently is.

  • Should my database have foreign key constraints?

    There is an endless discussion in the development community about whether databases should include declared foreign key constraints or not.

    As a consultant, I regularly visit a wide variety of client sites. Many of these have applications designed without constraints. When I ask why there are no constraints, the response is invariably one of the following:

    • The app ensures that it’s right
    • They don’t work well with our application development
    • They are too slow
    • What’s a foreign key constraint?

    Most of the sites that I work with have sizeable databases. This discussion is important in those situations.

    The app ensures that it’s right

    In most enterprises, most large databases end up being used by more than one single application, and often each application will be made up of many sub-applications or modules. That means that every place that accesses the data needs to apply the same rules. (The same issue applies to other forms of constraints as well).

    Worse, the databases often end up being accessed by applications from different authors, and often with different technology stacks.

    ETL processes are often used to move data into the databases or to update that data.

    Even worse, in real production scenarios, data-fixes often get applied directly to the database.

    A view of the world that says that it will be ok because everything goes through a single application is a very narrow view of the world.

    They don’t work well with our application development

    What this really boils down to in most cases is that the developers didn’t want to work out the order for updates and it’s just easier for them without constraints as they can update any table they want, in any order.

    That means that the data is periodically in invalid states and (hopefully) eventually consistent. But it also means that the data has many interim states that are actually invalid states. What happens with concurrent access at that point? What happens to a reporting application that finds invoices for customers that don’t exist?

    It would be helpful if SQL Server supported deferred constraint checking but today it doesn’t. I’ve been formally asking them for it for over 10 years: https://connect.microsoft.com/SQLServer/feedback/details/124728/option-to-defer-foreign-key-constraint-checking-until-transaction-commit

    I still think it’s one of the most important enhancements that could be made to SQL Server from a development point of view.

    They are too slow

    So often I’m told “we can’t do it because it would be too slow”. Yet almost every time I ask if they’ve actually tested it, I’m told “no”.

    The reality is usually that someone’s brother’s cousin’s friend read it somewhere on the Internet so they decided it would be a problem.

    Whenever I really test it, I find very little impact as long as appropriate indexing is in place, and sensible options are chosen when bulk importing data.

    I do occasionally find specific constraints where I decide to disable them but they are few and far between. And even then, I like to see the constraint still be in place (so it can be discovered by tools, etc. ) but just not checked. It certainly never applies to all constraints in a database.

    What’s a foreign key constraint?

    Sadly this is also a common question, fortunately mostly only on smaller databases and applications. Some developers really just don’t understand the issue but this is not most developers.

    Bottom Line

    When I do a detailed check on a system that has run without constraints for quite a while, I almost always find data integrity issues.

    When I show the integrity issues to the customer, they look surprised then say something like “ah, that’s right, we had that bug a few weeks back”. Pretty much every time, there are some issues somewhere.

    That’s ok if you’re building a toy application; not ok if you’re building a large financial application.

    At the very least, sites that decide not to have constraints should have a process in place that periodically checks data integrity. Most sites that don’t have constraints don’t do this either.

    It’s important to find issues as soon as they occur. Finding issues well after they have been created always makes them much harder to fix. At least with constraints, you get instant feedback that something’s wrong. If you have a bug in your ETL process that is messing up your data in some subtle way, you don’t want to find that out several weeks later after other processing has occurred on that data.

    Summary

    I like to see constraints in place until it’s proven that a particular constraint can’t be in place for a performance reason. Then, I like to see it (and only it) disabled rather than removed. A bonus is that it can usually still be discovered by tools.

  • Data Tales #8: Database on a Diet (Part 3)

    Hi Folks,

    My series of articles for SQL Server Magazine continues. Last time, I continued a short series about a large database that needed to go on a diet. Last time, I look at the internals of row and page compression, to see what happens when they are used. We saw the significant differences in how ROW and PAGE compression are implemented. So how do you decide what to use?

    This time, we look at when ROW and PAGE compression make sense, and provide detailed guidance on how to decide which should be used for which tables and indexes, or even for which partitions of which tables and indexes. A blended approach is usually the appropriate outcome.

    http://sqlmag.com/sql-server/data-tales-8-case-database-diet-part-3

    Enjoy!

  • Avoiding drives going offline when adding nodes to availability groups or creating / validating Windows clusters

    Several of my customers lately have mentioned that when they’ve added a node to a Windows failover cluster for supporting an Availability Group, that they’ve ended up with an outage because the non-shared drives have been taken offline by the installation process. All commented that it didn’t use to happen on Windows Server 2008 R2 and they were taken by surprise when it happened on Windows Server 2012 R2.

    I mentioned it in the MVP list and one of my US buddies Allan Hirt said that we need to uncheck the relevant checkbox during the install.

    I stepped through another one today and this is the culprit:

    image

    The default is now to add all eligible storage to the cluster.

    That is perhaps a reasonable default when setting up most Windows failover clusters, but it is really not a sensible default when working with Availability Groups. It ends up adding all your other drives and then puts them offline.

    Unchecking that box while performing installs/validations should avoid the issue.

    Thanks to Allan for pointing it out.

  • SQL Down Under Podcast 68–Guest Joe Yong–SQL Server 2016 StretchDB

    Hi Folks,

    One of the intriguing options that’s coming as part of SQL Server 2016 is StretchDB. I’ve been spending some time working with it and learning about it, and I’ve ended up with lots of questions.

    It was great to get an opportunity to speak to Joe Yong to get all my questions sorted.

    Hope you’ll enjoy it too. You’ll find it here: http://sqldownunder.azurewebsites.net/Podcasts

  • SQL Down Under podcast Show 67–Kasper DeJonge and SQL Server 2016 Analysis Services

    Last week I had an opportunity to record a new podcast with Kasper DeJonge from the SQL Server Analysis Services team. Kasper has also been working on Power Pivot and many other areas.

    SQL Server 2016 is lining up to be a great release for Analysis Services with significant enhancements for both tabular and multi-dimensional data models.

    I hope you enjoy the show. You’ll find it here:

    http://www.sqldownunder.com/podcasts

  • Very pleased to see free developer edition of SQL Server

    I spend a lot of my time working with ISVs (or software houses) and with individual developers. For SQL Server to have a long-term future, we need to be appealing to more and more new developers, to get them to understand that SQL Server is a great platform for building their applications above.

    One of the most important aspects of this right up front, is making it really frictionless to get started with SQL Server.

    In the past, it has at times been very difficult to purchase SQL Server Developer Edition. To me, that’s been ridiculous if we want to attract new developers to the platform. I have a friend who, a year or so back, wanted to get Developer Edition and spent weeks (literally) trying to find out how to buy it. In the USA, it was available in the Microsoft store. You could pay $40 and download it. If you had an MSDN subscription, it was also just available for download. But for some reason that I’ll never fathom, it wasn’t available in the Australian Microsoft store and you had to purchase it from a license vendor instead.

    Because Developer Edition was a low-cost item, the license vendors weren’t interested in it. Even when I spoke to the local Microsoft sales subsidiary, after two weeks I couldn’t find out how to buy it either. The local sales team seems to be basically goaled on Enterprise Edition sales, so a Developer Edition license didn’t even feature on their radar.

    Sadly, by comparison, my friend downloaded PostgreSQL and was using it 15 minutes later. This situation could not continue.

    The Visual Studio team had been rightly proud of their Visual Studio Community Edition and how well it was going. It was just a free download.

    I was a very vocal member of a crowd asking “Where is the SQL Server Community Edition?”.

    Well this week, our pleas have been answered. Any developer can just join the Visual Studio Dev Essentials program and then download SQL Server Developer Edition for free.

    I’ve got much more to post about what else we need to do to help get the new developers but credit must be given where it’s due. Thanks for listening Microsoft !

More Posts Next page »

This Blog

Syndication

Tags

No tags have been created or used yet.

Archives

Privacy Statement