THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Jamie Thomson

This is the blog of Jamie Thomson, a freelance data mangler in London

  • Beware the ß

    I stumbled upon an interesting little nuance of SQL Server’s behaviour over the past few days that I was not aware of and I figure its worth talking about it here so others are aware. It concerns the handling of the german character “ß” which I understand to be german shorthand for “ss” (I’m no expert on language or linguists so don’t beat me up if I’m wrong about that).

    In short, two NVARCHAR values that differ only by one using “ß” and the other using “ss” will be treated as the same. This is easily demonstrated using the following code snippet:

    SELECT 'Oktoberstrasse'
    UNION
    SELECT
    'Oktoberstraße';
    SELECT N'Oktoberstrasse'
    UNION
    SELECT
    N'Oktoberstraße';

    which returns:

    image

    (Collation on my database is set to ‘SQL_Latin1_General_CP1_CI_AS’)

    Notice that casting the values as NVARCHAR (which is what the N'' notation does) causes SQL Server to treat them as the same. Obviously this is going to cause a problem if you need to treat those as distinct values (such as inserting into a column with a unique key upon it – which is the problem I encountered that caused me to stumble across this)

    There is a bug submission to Connect regarding this issue at 'ß' and 'ss' are NOT equal in which a member of the SQL Server team says:

    Our current behavior follows the SQL/ISO standard and unless those standards are updated with the latest changes we don't intend to change the behavior in SQL Server. Changing existing SQL Server behavior has lot of implications and today we rely on Windows for all of our windows collation sorting capabilities. If and when in the future Windows adopts these new rules / Unicode standard we will incorporate it in SQL Server.

    In other words, SQL Server is merely following the behaviour as defined by the International Standards Organisation so its not a bug in SQL Server as such, just a nuance that one needs to be aware of. And now you are.

    @Jamiet

  • Really useful Excel keyboard shortcuts

    I love me a good keyboard shortcut and there’s some I would like to use in Excel all the time if only I could darn well remember them.

    CTRL+ Spacebar   Select the entire column containing the currently selected cell(s)
    Shift + Spacebar   Select the entire row containing the currently selected cell(s)
    CTRL + ‘+’   Insert cells
    CTRL + ‘-’   Delete cells

    If you combine these you’ll find they become really powerful. For example, CTRL+Spacebar followed by CTRL + ‘+’ inserts a new column into a worksheet (which is what I wanted to do this morning when I began googling this).

    I figured the only way I will ever ingrain these into my muscle memory is if I write them down somewhere, nowhere better than on my blog.

    If you’re a fellow keyboard shortcut fetishist and want to share any obscure ones that you know of then I’m all ears, please put them in the comments below.

    @Jamiet

  • Linqpad – bring the power of LINQ to SQL Server development

    One of my biggest laments about being a SQL Server developer is that the tools provided by Microsoft to do our thang compare woefully to the feature rich, (largely) bug-free, “it-just-works” impression that one gets from seeing the tools available to “other” developers in the Microsoft ecosystem (I’m thinking of our appdev brethren here, the ones that use C#, Javascript, and all that jazz). I could write a long blog post complaining about this (and perhaps one day I will) but in this blog post I want to shine a light on a tool called Linqpad.

    Linqpad enables you to issues queries written in LINQ against a database and in that respect is an alternative to SQL Server Management Studio (SSMS). What is LINQ? It stands for Language Integrated Query and is a technology that Microsoft brought out a few years ago for writing queries inside .Net code. The great thing about Linqpad is that it enables one to write LINQ queries without having to know anything about .Net.

    In the screenshots below I show a simple query against a database but written two ways, one using T-SQL that anyone reading this blog post will likely know, and one using LINQ:

    SNAGHTML618728e4

    Some things to notice here. The two queries look very similar in that they contain the same keywords {SELECT, FROM}. Second thing to notice is that the FROM clause comes before the SELECT clause and if you know anything about the logical order of execution of a SELECT query you’ll realise that this intuitively makes sense. Lastly the table is called [dbo].[BulletinLine] but in the LINQ query its called [BulletinLines], its been pluralised (a convention that is common to .Net developers) and there’s no [dbo] prefix. Other than those things its intuitively clear that these two queries are doing exactly the same thing and its worth pointing out that under the covers the LINQ query is converted into a T-SQL query.

    So OK, if you accept that LINQ can do pretty much anything that a T-SQL SELECT query can do the next obvious question is “Why should I bother when T-SQL already does what I need?” The answer, in a word, is productivity. Or, to put it another way, intellisense works properly. Let’s say for example I want to select a subset of all the columns, intellisense comes to our aid:

    image

    One might well retort “well that works in SSMS as well” but in my experience intellisense in SSMS is, at best, flaky. In some circumstances it simply doesn’t work and the worst part of this is that its often inexplicable as to why. (In case you can’t tell, intellisense in SSMS drives me up the wall and I’m sure I’m not the only one.)

    Some other nice things about LINQ. Here’s the equivalent of a WHERE clause to filter on [BulletinId]=6:

    image

    If you don’t know LINQ then (in my opinion) its not intuitively obvious what’s going on here. What the above query is doing can effectively be described as:

    Take the collection of BulletinLines, filter in WHERE BulletinId equals 6

    Where this gets really powerful is the ability to stack these things up like so:

    image

    Take the collection of BulletinLines, filter in WHERE BulletinId equals 6. From the resultant collection filter in WHERE Colour=”White”

    If we only want the top 3:

    image

    Take the collection of BulletinLines, filter in WHERE BulletinId equals 6. From the resultant collection filter in WHERE Colour=”White”. From the resultant collection take the first 3

    I love how expressive this is and when you get fully conversant with LINQ its wonderfully intuitive too. If I haven’t convinced then, well, that’s OK. If you’re not convinced but do want to be convinced then check out Why LINQ beats SQL. The overriding argument there is “LINQ is in most cases a significantly more productive querying language than SQL”. Make sure you check out the section entitled “Associations” too which demonstrates how you can filter based on what is known about an entity in other tables, when you grok what’s going on there you’ll realise the power that LINQ brings.

    If you want to give this a go then go and download Linqpad now from http://www.linqpad.net/. Its free however some of the more advanced features (such as intellisense) only light up when you pay for the pro or premium edition which are priced at $39 & $58 respectively for a perpetual license, a bit more than the app generation are used to paying but hardly bank-breaking either.

    Are any other SQL developers out there using Linqpad? Please share your experiences in the comments below, I’d really like to read them.

    @Jamiet

  • Don’t learn SSDT, learn about your databases instead

    Last Thursday I presented my session “Introduction to SSDT” at the SQL Supper event held at the offices of 7 Digital (loved the samosas, guys). I did my usual spiel, tour of the IDE, connected development, declarative database development yadda yadda yadda… and at the end asked if there were any questions. One gentleman in attendance (sorry, can’t remember your name) raised his hand and stated that by attempting to evangelise all of the features I’d missed the single biggest benefit of SSDT, that it can tell you stuff about database that you didn’t already know.

    I realised that he was dead right. SSDT allows you to import your whole database schema into a new project and it will instantly give you a list of errors and/or warnings pertaining to the objects in your database. Invalid references (e.g a long-forgotten stored procedure that refers to a non-existent column), unnecessary 3-part naming, incorrect case usage, syntax errors…it’ll tell you about all of ‘em! Turn on static code analysis (this article shows you how) and you’ll learn even more such as any stored procedures that begin with “sp_”, WHERE clauses that will kill performance, use of @@IDENTITY instead of SCOPE_IDENTITY(), use of deprecated syntax, implicit casts etc…. the list goes on and on.

    I urge you to download and install SSDT (takes a few minutes, its free and you don’t need SQL Server or Visual Studio pre-installed), start a new project:

    image

    right-click on your new project and import from your database:

    image

    and see what happens:

    image

    You may be surprised what you discover. Let me know in the comments below what results you get, total number of objects, number of errors/warnings, I’d be interested to know!

    @Jamiet

  • Clone an Azure VM using Powershell

    In a few months time I will, in association with Technitrain, be running a training course entitled Introduction to SQL Server Data Tools. I am currently working on putting together some hands-on lab material for the course delegates and have decided that in order to save time in asking people to install software during the course I am simply going to prepare a virtual machine (VM) containing all the software and lab material for each delegate to use. Given that I am an MSDN subscriber it makes sense to use Windows Azure to host those VMs given that it will be close to, if not completely, free to do so.

    What I don’t want to do however is separately build a VM for each delegate, I would much rather build one VM and clone it for each delegate. I’ve spent a bit of time figuring out how to do this using Powershell and in this blog post I am sharing a script that will:

    1. Prompt for some information (Azure credentials, Azure subscription name, VM name, username & password, etc…)
    2. Create a VM on Azure using that information
    3. Prompt you to sysprep the VM and image it (this part can’t be done with Powershell so has to be done manually, a link to instructions is provided in the script output)
    4. Create three new VMs based on the image
    5. Remove those three VMs

    SNAGHTML16add98b

    The script has one pre-requisite that you will need to install, Windows Azure Powershell. You also need to be a Windows Azure subscriber which, if you're reading this blog post, I'm assuming you already are.

    Simply download the script and execute it within Powershell, assuming you have an Azure account it should take about 20minutes to execute (spinning up VMs and shutting the down isn’t instantaneous). If you experience any issues please do let me know.

    There are additional notes below.

    Hope this is useful!

    @Jamiet 

    Notes:

    • Obviously there isn’t a lot of point in creating some new VMs and then instantly deleting them. However, this demo script does provide everything you need should you want to do any of these operations in isolation.
    • The names of the three VMs that get created will be suffixed with 001, 002, 003 but you can edit the script to call them whatever you like.
    • The script doesn’t totally clean up after itself. If you specify a service name & storage account name that don’t already exist then it will create them however it won’t remove them when everything is complete. The created image file will also not be deleted. Removing these items can be done by visiting http://manage.windowsazure.com.
    • When creating the image, ensure you use the correct name (the script output tells you what name to use):

    image

    • Here are some screenshots taken from running the script:

    image

    image

    • When the third and final VM gets removed you are asked to confirm via this dialog:

    image

    Select ‘Yes’

  • SQL Relay 2013R2 – Bigger, faster, SQL-ier

    Fresh from a successful UK tour in June the SQL Relay team are at it again. This time there are ten venues all around the UK between 11th and 29th November.

    image

    Head to http://www.sqlrelay.co.uk/ to sign-up. I’ll be at the London one, hope to see you there!

    @Jamiet

  • SSDT gotcha – Moving a file erases code analysis suppressions

    I discovered a little wrinkle in SSDT today that is worth knowing about if you are managing your database schemas using SSDT. In short, if a file is moved to a different folder in the project then any code analysis suppressions that reference that file will disappear from the suppression file. This makes sense if you think about it because the paths stored in the suppression file are no longer valid, but you probably won’t be aware of it until it happens to you. If you don’t know what code analysis is or you don’t know what the suppression file is then you can probably stop reading now, otherwise read on for a simple short demo.

    Let’s create a new project and add a stored procedure to it called sp_dummy.

    image

    Naming stored procedures with a sp_ prefix is generally frowned upon and hence SSDT static code analysis will look for occurrences of this and flag them. So, the next thing we need to do is turn on static code analysis in the project properties:

    image

    A subsequent build causes a code analysis warning as we might expect:

    image

    Let’s suppose we actually don’t mind stored procedures with sp_ prefixes, we can just right-click on the message to suppress and get rid of it:

    image

    That causes a suppression file to get created in our project:

    image

    Notice that the suppression file contains a relative path to the file that has had the suppression placed upon it. Now if we simply move the file within our project to a new folder notice that the suppression that we just created gets removed from the suppression file:

    image

    As I alluded above this behaviour is intuitive because the path originally stored in the suppression file is no longer relevant but you’re probably not going to be aware of it until it happens to you and messages that you thought you had suppressed start appearing again. Definitely one to be aware of.

    @Jamiet 

     

  • Enforce SSIS naming conventions using BI-xPress

    A long long long time ago (in 2006 in fact) I published a blog post entitled Suggested Best Practises and naming conventions in which I suggested a bunch of acronyms that folks could use to prefix object names in their SSIS packages, thus allowing easier identification of those objects in log records, here is a sample of some of those suggestions:

    SNAGHTMLf8e56b8

    If you have adopted these naming conventions (and I am led to believe that a bunch of people have) then you might like to know that you can now check for adherence to these conventions using a tool called BI-xPress from Pragmatic Works. BI-xPress includes a feature called the Best Practices Analyzer that scans your packages and assess them according to some rules that you specify. In addition Pragmatic Works have made available a collection of these rules that adhere to the naming conventions I specified in 2006

    image

    You can download this collection however I recommend you first read the accompanying article that demonstrates the capabilities of the Best Practices Analyzer. Pretty cool stuff.

    @Jamiet

  • SSIS ReportingPack v1.1.1.0 – Execution Stats visualised in a Gannt chart via usp_ssiscatalog

    There are many SSRS reports in SSIS Reporting Pack and the one I’ve perhaps spent most time on is the executable_duration report which provides this visualisation:

    image

    I happen to think this is really useful because it shows the start time and duration of each executable relative to all the other executables in an execution. It does it by leveraging the execution_path property of an executable (which I am on record as saying is, in my opinion, the best feature in SSIS 2012).

    I have wanted to provide a similar visualisation in usp_ssiscatalog for some time and today I have finally gotten around to implementing it. With this new release one can call up the executable stats using this query:

    EXEC usp_ssiscatalog
          
    @a                      =  
    'exec'
      
    ,   @exec_execution         =  
    0
      
    ,   @exec_events            =  
    0
      
    ,   @exec_errors            =  
    0
      
    ,   @exec_warnings          =  
    0
      
    ,   @exec_executable_stats  =  
    1

    And the returned dataset will include a new column called Gannt. This screenshot demonstrates the value that this provides:

    20130905gannt

    On the left we have the execution_path of each executable and on the right is a bar indicating the start time and duration relative to all the other executables. Collectively these bars provide a nifty Gannt chart-alike view of your execution thus allowing you to easily identify which executables are taking up the most time in your executions.

    In addition there is a similar view for the list of executions that is gotten by executing:

    EXEC usp_ssiscatalog @a='execs'

    That return a new column, relative_duration:

    SNAGHTML16daf47b

    which simply provides a bar for each execution indicating how long it took relative to all the other executions.

    I hope these new visualisations in usp_ssiscatalog prove useful to you. If they do (or even if they don’t) please let me know, I love getting feedback and I don’t get nearly as much as I would like for SSIS Reporting Pack. Get the goods from SSIS Reporting Pack v1.1.1.0.

    @Jamiet

  • Learn from me about SSDT in London, March 2014

    Microsoft released SQL Server Data Tools (SSDT) along with SQL Server 2012 in the Spring of 2012. Since then I’ve noticed an upward tick in both the number of organisations that are using SSDT and the number of questions that are getting asked about it on forums. There is some confusion about what SSDT actually is (Microsoft hasn’t helped there), why people should be using SSDT and, most importantly, how to make best use of it.

    Its clear that people want to learn more about SSDT so I have joined forces with Technitrain to offer a 2–day training course, in London, in March 2014 called Introduction to SQL Server Data Tools

    image

    image

    The course will cover:

    Day 1

    • SSDT Positioning
    • IDE Tour
    • Connected Database Development
    • Declarative, offline, database development
    • Publishing
    • Continuous Integration (CI) and Continuous Deployment

    Day 2

    • Data Publishing
    • Refactoring
    • Database unit testing
    • References and composite projects
    • Database Drift
    • Code analysis

    If this sounds like your bag then please sign up on the Technitrain website.

    @Jamiet

  • SSISReportingPack with usp_ssiscatalog v1.1 available – first release with integrated installer

    Since December 2010 I have been maintaining a suite of reports that can be used atop the SSIS Catalog in SSIS 2012, I called it the “SSIS Reporting Pack”. While building those reports I realised that there was also a need to make it easier to issue common but ad-hoc queries against the SSIS Catalog and that realisation gave rise to a stored procedure called sp_ssiscatalog which I first released in November 2012. Both of these pieces of open source software are hosted at http://ssisreportingpack.codeplex.com however they have always been distributed separately, with separate install mechanisms for each.

    Until today that is. As of version 1.1 a single installer exists that will provide all the artifacts required for both the reports and usp_ssiscatalog:

    image

    Head to http://ssisreportingpack.codeplex.com to get the goods. I have updated the home page in an attempt to make the installation instructions clearer for both the reports and usp_ssiscatalog.

     

    You probably want to know what new features are in this release. Well, not all that much if I’m honest. Most notable are:

    • A fix for a user reported issue where the reports would fail to deploy due to error The definition of the report '/folder-project-package' is invalid. The value expression for the text box 'breadcrumbFolder' referes to a non-existing report parameter 'FolderId'".
    • The eagle-eyed of you may have noticed that usp_ssiscatalog is a new name for this stored procedure, it used to be called sp_ssiscatalog. The reason for the change is simply to follow best-practice for naming of SQL Server stored procedures where any such object in a user database should not be prefixed “sp_”. The previous name came about because the original intention was for this stored procedure to live in [master], as it transpired this would not have been a wise decision however the name of the stored procedure had already been established. Any call to sp_ssicatalog will still work for the foreseeable future however I recommend that you should change to use usp_ssiscatalog instead, if you can.

    The big story here is that the reports and usp_ssiscatalog are now delivered in the same installer and to prove it here is a screenshot of the installer:

    image

    This lays a foundation for future enhancements that I want to make. Watch this space.

    @Jamiet

  • An ETL joke #fridayfun

    I received a text from my bank recently where they appear to have mistaken me from someone with a rather strange first name. As an ETL developer by trade there really was only one possible response:

    image

     

    This one amused me given the hype around Hadoop at the moment:

    image

    https://twitter.com/timoelliott/status/365737607126188032/photo/1

    Have a good weekend folks!

    @Jamiet

  • Declarative ETL

    T-SQL is a declarative programming language. This means that a T-SQL developer defines what (s)he wants the query to do, not how to do it. The secret sauce that turns the “what” into the “how” is a piece of software called the query optimiser and there are tomes dedicated to how one can make the query optimiser dance to their will. Generally the existence of a query optimiser is a good thing as in most cases it will do a better job of figuring out the “how” than a human being could*.

    SSIS dataflows on the other hand are an imperative programming language**, the data integration developer dataflow builds a data pipeline to move data exactly as (s)he desires it to happen. In SSIS, There is no equivalent of the query optimiser.

    I’ve often pondered whether there is an opportunity for someone to build a declarative data pipeline, that is, a method for a data integration developer to define what data should be moved rather than how to move it. Over the last few months I’ve come to the realisation that Power Query (formerly known as Data Explorer) actually fits that description pretty well. If you don’t believe me go and read Matt Masson’s blog post Filtering in Data Explorer in which he talks about query folding:

    [Power Query] will automatically push filters directly to the source query

    Even though we selected the full table in the UI before hiding the columns certain columns, we can see the source query only contains the columns we want

    “filters” aren’t the only type of query folding that Data Explorer can do. If you watch the queries, you’ll see that other operations, such as column removal, renaming, joins, and type conversions are pushed as close to the source as possible.

    Let’s not idly dismiss this. Query folding is very impressive and is (I think) analogous to predicate pushdown that is done by the query optimiser (for more on predicate pushdown read Nested Views, Performance, and Predicate Pushdown by Dave Wentzel). Does Power Query then have the equivalent of a query optimiser? I would say yes, it does although its more accurate to say that its the underlying query language called M for which this query optimiser exists!

    So, we have a declarative data integration language which is surfaced in Power Query and hence can only (currently) push data into an Excel spreadsheet. Imagine if M were extended were extended into a fully-fledged data integration tool that could move data between any heterogeneous source, would that constitute “declarative ETL” and is there a need for such a tool?

    I’ll leave that one out there for you to ponder. Comments would be most welcome.

    @Jamiet 

    *OK, you might debate whether the query optimiser can do a better job than a human but let’s save that for another day.

    **We can debate whether or not SSIS dataflows are a programming language or not but again, let’s save that for another day. For the purposes of this discussion just go with me, OK.

  • A fool leaves their computer unlocked. What to do next?

    A light hearted post for a Friday…

    A couple of days ago on Twitter I asked…

    image 

    The replies flew in thick and fast, there are some fairly vindictive people out there in the Twitter community let me tell you so i thought it’d be fun to show you a sample! Let’s go with a couple of harmless non-evasive suggestions first from Jens Vestergaard and Randi Borys:

    image

    image

    The screenshot one I particularly like. Those two are pretty tame though, if you want to get really nasty then try this suggestion from Brent Ozar:

    image

    Oh my word! In true “embrace and extend” style though Brent and Rob Farley built upon the idea:

    image

    image

    Nasty!!! Mr Farley upped the cruelty ante though with his next suggestion which I think might be my favourite:

    image

    Do that to a green SQL developer and soon they’ll be thinking they’re in the seventh circle of Microsoft-tool-hell! Not to be out-done Brent replied with the ever popular:

    image

    I can think of a few variations on that theme – the hosts file might be one thing you might want to target.

    Want to really make someone really angry? Kenneth Nielson has a suggestion:

    image

    If humiliation tactics are your bag Andy P has one for you:

    image

    And for the time-poor amongst you Koen Verbeeck offers:

    image

    Lastly if pure criminality is your bag, as it clearly is for Rowena Parsons, simply:

    image

    Check all of the suggestions here: https://twitter.com/jamiet/status/367264282045456384 and if you have any suggestions of your own please stick them in the comments below!

    @Jamiet

  • Is DQS-in-the-cloud on its way?

    LinkedIn profiles are always a useful place to find out what's really going on in Microsoft. Today I stumbled upon this little nugget from former SSIS product team member Matt Carroll:

    March 2012 – December 2012 (10 months)Redmond, WA

    Took ownership of the SQL 2012 Data Quality Services box product and re-architected and extended it to become a cloud service. Led team and managed product to add dynamic scale, security, multi-tenancy, deployment, logging, monitoring, and telemetry as well as creating new Excel add-in and new ecosystem experience around easily sharing and finding cleansing agents. Personally designed, coded, and unit tested in-memory trigram matching algorithm core to better performance, scale and maintainability. Delivered and supported successful private preview of the new service prior to SQL wide reorganization. 

    http://www.linkedin.com/profile/view?id=9657184

     Sounds as though a Data-Quality-Services-in-the-cloud (which I spoke of as being a useful addition to Microsoft's BI portfolio in my previous blog post Thoughts on Power BI for Office 365 ) might be on its way some time in the future. And what's this SQL wide reorganization? Interesting stuff.

    @Jamiet

     

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement