THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Greg Low (The Bit Bucket: IDisposable)

Ramblings of Greg Low (SQL Server MVP, MCM and Microsoft RD) - SQL Down Under

  • Search for a String in all Tables of a Database

    When I work in customer sites, there is a quite a difference between how different sites get data for testing. But it basically comes down to a variation of one of these five:

    • Testing? That’s for wimps. (Well perhaps for users)
    • I type in all the test data that I need.
    • We use a data generator to produce our test data.
    • We just periodically get a copy of production data.
    • We get production data and it gets masked before we can use it.

    It’s the last option that I’m concerned with today. Masking data is challenging and usually much more difficult than imagined. This week, I was working with a masked database and we ended up finding all client details tucked inside a chunk of XML that was used to hold the metadata for a series of images. They had done a good job of removing it from all the normal columns from all tables in the database, but it’s so easy to miss extras like this.

    So after we fixed the issue, I was left wondering how effectively it really had all been masked.

    The following script takes a while to run on a large database. It takes a string to search for, and looks to find any locations in the database that hold that string. Optionally, it dumps out a copy of all rows containing the string. It does this for char, nchar, varchar, nvarchar, varchar(max), nvarchar(max), and xml data types.

    I hope someone finds it useful.

    -- Search for a string anywhere in a database

    -- v1.0 Dr Greg Low, 11 June 2016

     

    DECLARE @StringToSearchFor nvarchar(max) = N'Jones';

    DECLARE @IncludeActualRows bit = 1;

     

    DECLARE @SchemaName sysname;

    DECLARE @TableName sysname;

    DECLARE @ColumnName sysname;

    DECLARE @IsNullable bit;

    DECLARE @TableObjectID int;

    DECLARE @Message nvarchar(max);

    DECLARE @FullTableName nvarchar(max);

    DECLARE @BaseDataTypeName sysname;

    DECLARE @WereStringColumnsFound bit;

    DECLARE @Predicate nvarchar(max);

    DECLARE @SQL nvarchar(max);

    DECLARE @SummarySQL nvarchar(max) = N'';

    DECLARE @NumberOfTables int;

    DECLARE @TableCounter int = 0;

     

    IF OBJECT_ID(N'tempdb..#FoundLocations') IS NOT NULL

    BEGIN

           DROP TABLE #FoundLocations;

    END;

     

    CREATE TABLE #FoundLocations

    (

        FullTableName nvarchar(max),

        NumberOfRows bigint

    );

     

    SET @NumberOfTables = (SELECT COUNT(*) FROM sys.tables AS t

                                           WHERE t.is_ms_shipped = 0

                                           AND t.type = N'U');

     

    DECLARE TableList CURSOR FAST_FORWARD READ_ONLY

    FOR

    SELECT SCHEMA_NAME(schema_id) AS SchemaName, name AS TableName, object_id AS TableObjectID

    FROM sys.tables AS t

    WHERE t.is_ms_shipped = 0

    AND t.type = N'U'

    ORDER BY SchemaName, TableName;

     

    OPEN TableList;

    FETCH NEXT FROM TableList INTO @SchemaName, @TableName, @TableObjectID;

     

    WHILE @@FETCH_STATUS = 0

    BEGIN

        SET @TableCounter += 1;

        SET @FullTableName = QUOTENAME(@SchemaName) + N'.' + QUOTENAME(@TableName);

        SET @Message = N'Checking table '

                          + CAST(@TableCounter AS nvarchar(20))

                          + N' of '

                          + CAST(@NumberOfTables AS nvarchar(20))

                          + N': '

                          + @FullTableName;

        PRINT @Message;

       

        SET @WereStringColumnsFound = 0;

        SET @Predicate = N'';

       

        DECLARE ColumnList CURSOR FAST_FORWARD READ_ONLY

        FOR

        SELECT c.name AS ColumnName, t.name AS BaseDataTypeName

        FROM sys.columns AS c

        INNER JOIN sys.types AS t

        ON t.system_type_id = c.system_type_id

        AND t.user_type_id = c.system_type_id -- note: want the base type not the actual type

        WHERE c.object_id = @TableObjectID

        AND t.name IN (N'text', N'ntext', N'varchar', N'nvarchar', N'char', N'nchar', N'xml')

           AND (c.max_length >= LEN(@StringToSearchFor) OR c.max_length < 0) -- allow for max types

        ORDER BY ColumnName;

       

        OPEN ColumnList;

        FETCH NEXT FROM ColumnList INTO @ColumnName, @BaseDataTypeName;

       

        WHILE @@FETCH_STATUS = 0

        BEGIN

              SET @WereStringColumnsFound = 1;

              IF @Predicate <> N''

              BEGIN

              SET @Predicate += N' OR ';

              END;

              SET @Predicate += CASE WHEN @BaseDataTypeName = N'xml'

                                     THEN N'CAST(' + QUOTENAME(@ColumnName) + N' AS nvarchar(max))'

                                     ELSE QUOTENAME(@ColumnName)

                                END

                              + N' LIKE N''%' + @StringToSearchFor + N'%''';

              FETCH NEXT FROM ColumnList INTO @ColumnName, @BaseDataTypeName;

        END;

       

        CLOSE ColumnList;

        DEALLOCATE ColumnList;

       

        IF @WereStringColumnsFound <> 0

              BEGIN

              SET @SQL = N'SET NOCOUNT ON;

                           INSERT #FoundLocations (FullTableName, NumberOfRows)

                           SELECT N''' + @FullTableName + N''', COUNT_BIG(*) FROM '

                       + @FullTableName

                            + N' WHERE '

                            + @Predicate

                                                + N';';

              EXECUTE (@SQL);

             

              IF (SELECT NumberOfRows FROM #FoundLocations WHERE FullTableName = @FullTableName) > 0

              BEGIN

                  SET @SummarySQL += N'SELECT * FROM ' + @FullTableName + N' WHERE ' + @Predicate + N';' + NCHAR(13) + NCHAR(10);

              END;

        END;

       

        FETCH NEXT FROM TableList INTO @SchemaName, @TableName, @TableObjectID;

    END;

     

    CLOSE TableList;

    DEALLOCATE TableList;

     

    SELECT *

    FROM #FoundLocations

    WHERE NumberOfRows > 0

    ORDER BY FullTableName;

     

    DROP TABLE #FoundLocations;

     

    IF @SummarySQL <> N'' AND @IncludeActualRows <> 0

    BEGIN

        EXECUTE (@SummarySQL);

    END;

     

  • Auto-Injection of Enterprise Edition Features

    There’s an interesting scenario that’s come up in creating some new samples for SQL Server 2016.

    I know that for many, many years people have asked for a way to limit developer edition to only use standard edition features, or at least to have a way to flag them. I’ve often thought that I’d like a way to set a “target environment” and have the colorization change anything that I can’t use in standard to an obviously different color.

    However, previously, if you used developer edition to create a database, as long as you didn’t use any enterprise features, you could then backup and restore that database to a standard edition server.

    That’s no longer the case with SQL Server 2016.

    If you create a temporal table in a database on developer edition, it automatically chooses PAGE compression for the history file. There is also no DDL to avoid this. The database can no longer be restored on a standard edition server. To work around it, you would need to rebuild the table without compression after it’s created.

    I see this as quite a change in behavior. I don’t think that features that are only available in enterprise (or other perhaps Azure) editions should be “auto-injected”.

    Clearly, if you script out the database and create a new database on standard edition using the script, it will create the history table without compression. But I suspect that many people create DBs as starter DBs using developer edition, and then deploy them to other editions that don’t support compression.

    I’d be interested in thoughts on how common this practice currently is.

  • Should my database have foreign key constraints?

    There is an endless discussion in the development community about whether databases should include declared foreign key constraints or not.

    As a consultant, I regularly visit a wide variety of client sites. Many of these have applications designed without constraints. When I ask why there are no constraints, the response is invariably one of the following:

    • The app ensures that it’s right
    • They don’t work well with our application development
    • They are too slow
    • What’s a foreign key constraint?

    Most of the sites that I work with have sizeable databases. This discussion is important in those situations.

    The app ensures that it’s right

    In most enterprises, most large databases end up being used by more than one single application, and often each application will be made up of many sub-applications or modules. That means that every place that accesses the data needs to apply the same rules. (The same issue applies to other forms of constraints as well).

    Worse, the databases often end up being accessed by applications from different authors, and often with different technology stacks.

    ETL processes are often used to move data into the databases or to update that data.

    Even worse, in real production scenarios, data-fixes often get applied directly to the database.

    A view of the world that says that it will be ok because everything goes through a single application is a very narrow view of the world.

    They don’t work well with our application development

    What this really boils down to in most cases is that the developers didn’t want to work out the order for updates and it’s just easier for them without constraints as they can update any table they want, in any order.

    That means that the data is periodically in invalid states and (hopefully) eventually consistent. But it also means that the data has many interim states that are actually invalid states. What happens with concurrent access at that point? What happens to a reporting application that finds invoices for customers that don’t exist?

    It would be helpful if SQL Server supported deferred constraint checking but today it doesn’t. I’ve been formally asking them for it for over 10 years: https://connect.microsoft.com/SQLServer/feedback/details/124728/option-to-defer-foreign-key-constraint-checking-until-transaction-commit

    I still think it’s one of the most important enhancements that could be made to SQL Server from a development point of view.

    They are too slow

    So often I’m told “we can’t do it because it would be too slow”. Yet almost every time I ask if they’ve actually tested it, I’m told “no”.

    The reality is usually that someone’s brother’s cousin’s friend read it somewhere on the Internet so they decided it would be a problem.

    Whenever I really test it, I find very little impact as long as appropriate indexing is in place, and sensible options are chosen when bulk importing data.

    I do occasionally find specific constraints where I decide to disable them but they are few and far between. And even then, I like to see the constraint still be in place (so it can be discovered by tools, etc. ) but just not checked. It certainly never applies to all constraints in a database.

    What’s a foreign key constraint?

    Sadly this is also a common question, fortunately mostly only on smaller databases and applications. Some developers really just don’t understand the issue but this is not most developers.

    Bottom Line

    When I do a detailed check on a system that has run without constraints for quite a while, I almost always find data integrity issues.

    When I show the integrity issues to the customer, they look surprised then say something like “ah, that’s right, we had that bug a few weeks back”. Pretty much every time, there are some issues somewhere.

    That’s ok if you’re building a toy application; not ok if you’re building a large financial application.

    At the very least, sites that decide not to have constraints should have a process in place that periodically checks data integrity. Most sites that don’t have constraints don’t do this either.

    It’s important to find issues as soon as they occur. Finding issues well after they have been created always makes them much harder to fix. At least with constraints, you get instant feedback that something’s wrong. If you have a bug in your ETL process that is messing up your data in some subtle way, you don’t want to find that out several weeks later after other processing has occurred on that data.

    Summary

    I like to see constraints in place until it’s proven that a particular constraint can’t be in place for a performance reason. Then, I like to see it (and only it) disabled rather than removed. A bonus is that it can usually still be discovered by tools.

  • Data Tales #8: Database on a Diet (Part 3)

    Hi Folks,

    My series of articles for SQL Server Magazine continues. Last time, I continued a short series about a large database that needed to go on a diet. Last time, I look at the internals of row and page compression, to see what happens when they are used. We saw the significant differences in how ROW and PAGE compression are implemented. So how do you decide what to use?

    This time, we look at when ROW and PAGE compression make sense, and provide detailed guidance on how to decide which should be used for which tables and indexes, or even for which partitions of which tables and indexes. A blended approach is usually the appropriate outcome.

    http://sqlmag.com/sql-server/data-tales-8-case-database-diet-part-3

    Enjoy!

  • Avoiding drives going offline when adding nodes to availability groups or creating / validating Windows clusters

    Several of my customers lately have mentioned that when they’ve added a node to a Windows failover cluster for supporting an Availability Group, that they’ve ended up with an outage because the non-shared drives have been taken offline by the installation process. All commented that it didn’t use to happen on Windows Server 2008 R2 and they were taken by surprise when it happened on Windows Server 2012 R2.

    I mentioned it in the MVP list and one of my US buddies Allan Hirt said that we need to uncheck the relevant checkbox during the install.

    I stepped through another one today and this is the culprit:

    image

    The default is now to add all eligible storage to the cluster.

    That is perhaps a reasonable default when setting up most Windows failover clusters, but it is really not a sensible default when working with Availability Groups. It ends up adding all your other drives and then puts them offline.

    Unchecking that box while performing installs/validations should avoid the issue.

    Thanks to Allan for pointing it out.

  • SQL Down Under Podcast 68–Guest Joe Yong–SQL Server 2016 StretchDB

    Hi Folks,

    One of the intriguing options that’s coming as part of SQL Server 2016 is StretchDB. I’ve been spending some time working with it and learning about it, and I’ve ended up with lots of questions.

    It was great to get an opportunity to speak to Joe Yong to get all my questions sorted.

    Hope you’ll enjoy it too. You’ll find it here: http://sqldownunder.azurewebsites.net/Podcasts

  • SQL Down Under podcast Show 67–Kasper DeJonge and SQL Server 2016 Analysis Services

    Last week I had an opportunity to record a new podcast with Kasper DeJonge from the SQL Server Analysis Services team. Kasper has also been working on Power Pivot and many other areas.

    SQL Server 2016 is lining up to be a great release for Analysis Services with significant enhancements for both tabular and multi-dimensional data models.

    I hope you enjoy the show. You’ll find it here:

    http://www.sqldownunder.com/podcasts

  • Very pleased to see free developer edition of SQL Server

    I spend a lot of my time working with ISVs (or software houses) and with individual developers. For SQL Server to have a long-term future, we need to be appealing to more and more new developers, to get them to understand that SQL Server is a great platform for building their applications above.

    One of the most important aspects of this right up front, is making it really frictionless to get started with SQL Server.

    In the past, it has at times been very difficult to purchase SQL Server Developer Edition. To me, that’s been ridiculous if we want to attract new developers to the platform. I have a friend who, a year or so back, wanted to get Developer Edition and spent weeks (literally) trying to find out how to buy it. In the USA, it was available in the Microsoft store. You could pay $40 and download it. If you had an MSDN subscription, it was also just available for download. But for some reason that I’ll never fathom, it wasn’t available in the Australian Microsoft store and you had to purchase it from a license vendor instead.

    Because Developer Edition was a low-cost item, the license vendors weren’t interested in it. Even when I spoke to the local Microsoft sales subsidiary, after two weeks I couldn’t find out how to buy it either. The local sales team seems to be basically goaled on Enterprise Edition sales, so a Developer Edition license didn’t even feature on their radar.

    Sadly, by comparison, my friend downloaded PostgreSQL and was using it 15 minutes later. This situation could not continue.

    The Visual Studio team had been rightly proud of their Visual Studio Community Edition and how well it was going. It was just a free download.

    I was a very vocal member of a crowd asking “Where is the SQL Server Community Edition?”.

    Well this week, our pleas have been answered. Any developer can just join the Visual Studio Dev Essentials program and then download SQL Server Developer Edition for free.

    I’ve got much more to post about what else we need to do to help get the new developers but credit must be given where it’s due. Thanks for listening Microsoft !

  • Learning Mandarin vs Cantonese

    I had a former colleague ask me the other day if he should learn Cantonese or Mandarin. He was going to spend a few months in Hong Kong and southern China.

    Here are my opinions. I’m sure there are many who might differ but this is how I see it today.

    Hong Kong is a tricky one. A large number of people there can speak Mandarin given the difficult relationship between “mainland” China and Hong Kong, even locals who can speak it don’t treat you the same as if you speak Cantonese. Many just aren’t happy about aspects of how China now runs Hong Kong. But the way that I see it, is that resist as they might, they will be “integrated”.

    In the meantime, and probably for another generation, Cantonese is certainly where it’s at. Already though, China makes all children in Hong Kong learn Mandarin.

    People don’t seem to understand how determined the Chinese government is to achieve standardisation. In the end, it’s the only way you can run such a large country without anarchy. They even have a single time zone for the whole of China. I can’t see any chance of them supporting 900 or so dialects going forward. It would probably scare the Cantonese speakers today to think about it, but I think that eventually Cantonese will be like Aboriginal languages are in Australia: more of a curiosity. Clearly, that will take some time.

    Personally, I wouldn’t put any effort into Cantonese but you might decide differently if you are intending to be in a majority Cantonese-speaking country for quite a while. You certainly get treated better by locals if you use at least some of their dialect. Same thing happens in Shanghai. Add some Shanghai-ese into your Mandarin, and they are so happy.

    Another thing I’ve liked about learning Chinese is that the writing system is essentially the same for all dialects. So even when I see signs written in Hong Kong, I can still read most of it. I have no idea what the words are in Cantonese (although I can easily guess some), but I know what the sign means, and that’s what counts the most.

    The challenge with Chinese reading and writing is simplified vs traditional systems. In the 1950’s and 1960’s, the Chinese government decided to simplify the writing system as they had so many people that were illiterate. So they took a bunch of characters and made simplified versions of them. The simplified ones are now used widely. However, once again, Hong Kong is a hold-out. Most people in Hong-Kong or Taiwan still use traditional characters. So there are some characters that I have to stop to try to work out what they are.

    The younger people in Hong Kong are learning Simplified characters. Other countries like Singapore have standardised on Simplified. Even in the middle of China, however, there are movements to try to reinstate Traditional characters as they are seen as more meaningful. I can’t see that prevailing and simplified characters will be the future.

    Most older Chinese writing that I see in Australia is Traditional as many of the Chinese that originally came here did so from Hong Kong, and many came before Simplified writing was introduced. Most new Chinese that you see in Australia is Simplified. That’s what all the Chinese tourists and students tend to use. When I ride the trains in Melbourne, I used to predominantly hear Cantonese. Now I predominantly hear Mandarin, largely due to an influx of people from other areas of China, and particularly due to younger students.

    I’ve been using a variety of methods for learning Mandarin over the years. My main effort at present is to attend 3 x 1 hour one-on-one lessons from Hanbridge Mandarin each week. (http://www.hanbridgemandarin.com)

    I’ve got other posts that I’ll update soon, on the other methods that I’m using, but the online lessons (my current teachers are from ShenZhen) have increased my abilities the fastest of any method that I have tried.

    And I just speak to every Chinese person that I get an opportunity to do so.

  • SQL Down Under Show 66–Riccardo Muti and Chris Finlan–SQL Server 2016 Reporting Services

    Hi Folks,

    I recently got to record a new podcast with Riccardo Muti and Chris Finlan from the SQL Server Reporting Services team.

    Over recent years, Reporting Services seemed to be in a rut without forward momentum.

    2016 however promises to be a really big year for Reporting Services.

    You’ll find the show here: http://www.sqldownunder.com/podcasts

    Enjoy!

  • R Tools for Visual Studio

    In recent months, I’ve been brushing up my R skills. I’ve had a few areas of interest in this:

    * R in Azure Machine Learning

    * R in relation to Power BI and general analytics

    * R embedded (somewhat) in SQL Server 2016

    As a client tool, I’ve been using RStudio. It’s been good and very simple but it’s a completely separate environment. So I was excited when I saw there was to be a preview of new R tooling for Visual Studio.

    I’ve been using a pre-release version of R Tools for Visual Studio for a short while but I’ve already come to quite like it. It’s great to have this embedded directly within Visual Studio. I can do everything that I used to do in RStudio but really like the level of Intellisense, etc. that I pick up when I’m working in R Tools for Visual Studio.

    So today I was pleased to see the announcement that these tools have gone public. You’ll find more info here in today’s post from Shahrokh Mortazavi in the Azure Machine Learning blog: https://blogs.technet.microsoft.com/machinelearning/2016/03/09/announcing-r-tools-for-visual-studio-2/

  • Data Tales #7: Database on a diet (part 2)

    Hi Folks,

    My series of articles for SQL Server Magazine continues. Last month, I started a short series about a large database that needed to go on a diet. This month, I look at a bit of the internals of row and page compression, to see what happens with you use this.

    http://sqlmag.com/database-administration/data-tales-7-case-database-diet-part-2

    Enjoy!

  • Extra Power BI single day course in Melbourne May 10th

    We normally run our Power BI Core Skills class as the 2nd day of the week in our 5 day BI Core Skills.

    We’ve had extra demand for the Power BI day so we’ve added an extra one in Melbourne on May 10th. Details are here: http://www.sqldownunder.com/Training/Courses/20

    image

    Early bird pricing ends April 26th.

  • Interesting to get a report card–more on learning Chinese

    I’m continuing to learn Chinese (Mandarin) in my spare time. (Although I’m not sure I really have any).

    This year I made a change in how I’m doing things. I decided to give the team at Hanbridge Mandarin a try. I’m doing online Skype/Webex based video classes 3 times per week for about an hour, one on one with a teacher. Teachers for Hanbridge mostly seem to be ShenZhen based.

    It continues to be fun to keep on with my learning. Last year I managed to pass HSK3 and this year I plan to take the HSK4 exam. I’m a little concerned about it as each level seems to get twice as hard as the previous level. HSK5 seems to be about the level required for university entry in China.

    One thing that I’ve found interesting with Hanbridge is that they send a report card every few months. It feels like being back in school. I was happy with this one and am looking forward to expanding my abilities this year.

    image

  • BETWEEN vs >= and <=

    I love it when I get queries that are actually easy to answer.

    Today, one of my developer friends asked me if it was better to use BETWEEN or to use >= and <= when filtering for a range of dates.

    From a logic perspective, I like the idea that a single predicate expresses your intent rather than needing two predicates to do the same. For example, consider the following two queries:

    image

    I’d argue that the first one expresses the intent slightly more clearly than the second query. The intent is to find orders in a particular range of dates. Having that as a single predicate expresses that intent slightly more clearly than having to assemble the intent from multiple predicates. At least I think so.

    But the bigger question is about performance. It’s easy to see that they are identical. If you enter the following query against the AdventureWorks database:

    image

    Then request an estimated execution plan (Ctrl-L), you’ll see this:

    image

     

    The missing index warning isn’t relevant to this discussion and if you hover over the Clustered Index Scan, you’ll see this:

    image

    Note under the Predicate heading that SQL Server has converted the original BETWEEN predicate into a pair of >= and <= predicates anyway. You’ll find it does the same for LIKE predicates as well. LIKE ’A%’ becomes >= ’A’ AND < ’B’.

    So performance is identical. It’s more of a style issue, and I think that BETWEEN is (only very) slightly more expressive so I’d prefer it.

    UPDATE: Aaron Bertrand posted a pertinent comment on this. I would only lean to using BETWEEN if I’m strictly working with dates or other types of discrete values (ints, etc.), not with datetime values that actually contain times. If that was the case, I’d definitely lean towards the separate predicates.

More Posts Next page »

This Blog

Syndication

Tags

No tags have been created or used yet.

Archives

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement