THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Jamie Thomson

This is the blog of Jamie Thomson, a freelance data mangler in London

  • Derive fiscal week using Excel

    I had to do something in Excel today that might be useful to other folks so sharing it here.

    I've got a A4 sheet sellotaped to my desk that tells me the start and end date of each fiscal week in our fiscal year. Its useful but its only for 2015 and moreover I could do with the information in digital form rather than dead tree form. I figured I could achieve the same using Excel, the resultant workbook can be seen at: (Annoyingly, I have configured the pivot table to display the values as dates not numbers, but it seems Excel Online doesn't honour that, so you will have to download the document and simply refresh the pivot table in order to view it properly.)


    It allows you to pick the month in which the fiscal year starts (ours is March so that’s what is currently set in the workbook):


    and assumes that the first Monday of that month is the first day of the fiscal year.

    Doubtless there's easier ways to do this (there's probably a function in Excel that I don't know about) but nevertheless, this works for me and it was kinda fun figuring it out – there’s lots of formulae involved here if you care to dig in to it. Thought I'd share it in case it was useful for anyone else. (Probably best not trust that its correct either, you might want to check it).


  • Iterate over a collection in Cronacle

    In my new life as a Hadoop data monkey I have been using a tool called Redwood Cronacle as a workflow/scheduler engine. One thing that has shocked me after years of working in the Microsoft ecosystem is the utter dearth of useful community content around Cronacle. There simply isn’t anything out there about it, nobody blogs about Cronacle (top link when googling for “cronacle blog” is, no forums, precisely zero questions (at the time of writing) on stackoverflow tagged Cronacle… there’s just nothing. Its almost as if nobody else out there is using it and that’s infuriating when you’re trying to learn it.

    In a small effort to change this situation I’ve already posted one Cronacle-related blog post Implementing a build and deploy pipeline for Cronacle and in this one I’m going to cover a technique that I think is intrinsic to any workflow engine, iterating over a collection and carrying out some operation on each iterated value (you might call it a cursor). There’s a wealth of blog posts on how to do this using SSIS’s ForEach Loop container because its a very common requirement (here is one I wrote 10 years ago) but I couldn’t find one pertaining to Cronacle. Here we go…

    We have identified a need to be able to iterate over a dataset within Cronacle and carry out some operation (e.g. execute a job) for each iterated value. This article explains one technique to do it.

    My method for doing this has two distinct steps:

    1. Build a dataset and return that dataset to Cronacle
    2. Iterate over the recordset

    There are many ways to build a dataset (in the example herein I execute a query on Hadoop using Impala) hence the second of these two steps is the real meat of this article. That second step is however its pointless without the first step, hence both steps will be explained in detail.

    Here's my Cronacle Job Chain that I built to demo this:


    Step 1 - Build the collection

    To emphasize a point made above, I could have used one of many techniques to build a collection to be iterated over, in this case I issued an Impala query:


    N.B. The beeline argument --showheader has no effect when used with the -e option (only has an effect when a file is specified using the -f option). This is an important point as you will see below.

    When this JobDefinition gets executed we can observe that the collection is assigned to the outParam parameter:


    outParam is of type String, not a String array. The values therein are delimited by a newline character ("\n")

    Step 2 - Iterate over the collection

    The output parameter from the first job in the Job Chain is fed into an input parameter of the second job in the Job Chain:


    From there we write Redwood Script (basically Java code) to split the string literal into an array and then execute an arbitrary Job Definition "JD_EchoInParameterValue_jamie_test" for each iterated value.


    Thus, the code shown above is the important part here. It takes a collection that has been crowbarred into a string literal and splits it by \n into a string array then passes each element of that array to another job as a parameter. I’ve made the code available in a gist:

    When executed observe that "JD_EchoInParameterValue_jamie_test" gets called three times, once for each value in the array ("col, "1", "2")




    I’m still a Cronacle beginner so its quite likely that there is an easier way to do this. The method I’ve described here feels like a bit of a hack however that’s probably more down to my extensive experience with SSIS which has built-in support for doing this (i.e. the For Each Loop container).

    Comments are welcome.

    You can read all of my blog posts relating to Cronacle at


    * SSIS is the tool that I used to use to do this sort of stuff in the Microsoft ecosystem

  • Fetch once, write many times

    As I’ve said before these days I’m spending time working on Hadoop rather than SQL Server and one of my colleagues today alerted me to a rather cool feature Of Hive (in a nutshell Hive is the technology that makes enables us to write SQL statements against Hadoop). Check this out:


    Pretty cool, no? From a single tablescan I can summarise and insert into two destinations. Its the capability of a SSIS dataflow, but in SQL. If I ever make it back to the SQL Server world I’ll be requesting this feature in T-SQL pronto.


  • Tabular Data Packages and a BIML challenge

    Update. The challenge set below has been responded to by Scott Currie who also happens to be the inventor of BIML. He has made his work  available at

    In February 2004 I joined a company called Conchango and had the pleasure of working with some very very smart people (Mick Horne, Dan Perrin, Pete Spencer, Paul Mcmillan, Steve Wright amongst others) who schooled me in the ways and means of building data integration solutions. One piece of IP that Conchango had revolved around a means of transporting a collection of data files in a well-known format. We called it the "Generic Feed Interface" and it broke down like this:

    • Data was kept in CSV files. There was no limit to how many files were in a collection.
    • Each collection had two extra files, an inventory file and a metadata file. Both of these were also CSV files.
    • The inventory file listed all of the data files
    • The metadata file defined the data type of each column in each data file

    Here’s a very simple example of what this looked like, a collection consisting of one datafile:




    As well as this format of defining data our tangible IP consisted of two DTS packages that could:

    • be pointed at a database and produce such a collection
    • be given a collection and import it into a specified database, creating target tables if required

    This was used to extract data from systems of record which could then later be imported elsewhere for additional processing (e.g. as a source for a data warehouse). We liked this approach because we were loosely-coupling our data warehouse from the systems-of-record, this brought benefits such as the ability to extract data from source at a different time from which it was loaded into the target. It was a great piece of IP and was used at a time which heralded a very successful period for us in the Microsoft BI space. Doubtless many of you reading this have invented and/or used similar mechanisms for moving data.

    It was with interest then that earlier this week I was alerted (by Matt Smith) to an effort to achieve the same called Tabular Data Package.


    Tabular Data Packages include a JSON file for the same as which we used the Inventory and Metadata file but nevertheless its the same basic concept, it is a well-known format for transporting data in human-readable files. Tabular Data Packages are advocated by the Open Data Institute, see Jeni Tennison's blog post on the subject 2014: The Year of CSV.

    It occurs to me that it would be really useful to have a couple of SSIS packages that work with a Tabular Data Package in the same manner that our DTS packages worked with our data files collection all those years ago. Then again, using the SSIS object model to dynamically generate packages based on some known metadata (which is what would be required here) is notoriously difficult, better would be if there existed a code-generation tool for SSIS packages. Luckily such a thing exists, its called BIML, it is free and is becoming a very very popular means for developing SSIS solutions.

    Therefore I’m issuing a friendly challenge. Can anyone out there build a BIML script that can, given a database connection string, generate a Tabular Data Package containing all the data in that database? Furthermore, can someone also build a BIML script that can consume a Tabular Data Package and push all the data therein into a SQL Server database?

    The challenge is set. Is anyone game?


  • Why won’t you implement my little feature request, Microsoft?

    I’ve been an active member of the Microsoft developer community* for many years now and in that time one of my frustrations has been numerous futile attempts to get new features into SQL Server and SQL Server Integration Services. As of today, 27th March 2015, I’ve submitted 109 bugs and 289 suggestions


    to Microsoft Connect and while there are occasional successes (TOKEN function in SSIS sticks in my mind as one feature request that led directly to inclusion in the product**) the most common response is “Closed as won’t fix” (which is almost a meme in itself these days). I’m as guilty as anyone at getting irate at such responses.

    In the cold light of day I like to think I can be empathetic kind of guy and can accept that getting a feature implemented in a behemoth product such as SQL Server perhaps is not as simple as a developer bashing out a few lines of code and committing to source control. With that in mind today (thanks to Paul Stovell) I stumbled upon a very old blog post from Eric Lippert  How many Microsoft employees does it take to change a lightbulb? which explains some of the contortions one has to go through to get a feature implemented in a Microsoft product and only one of these (the first one) actually involves writing the code.


    Now granted there is an argument to say they’re making it more difficult than it needs to be and 12 years on from that blog post shipping software is arguably easier than it was back then, but its worth remembering that implementing features for millions of customers is likely a different category of problem to any that us developers who use these products have to deal with (the product I work on arguably counts its customers in tens rather than millions). Call me a Microsoft apologist if you like, you might be right, but I do think that some people should perhaps weigh up what their “5 minute fix” actually entails before they start ranting about “Closed as wont fix” on social media. I include myself in that.


    * assuming you consider SQL server developers to be members of the Microsoft developer community. That is up for debate of course.

    **I remember Matt Masson telling me at the time that he implemented this on a plane journey home to Canada one day

  • Analysing SQLBlog using Power Query

    In December 2011 I wrote a blog post entitled Querying RSS feed subscriber count on Google Reader using Data Explorer‏ in which I used a product called Data Explorer, that was in beta at the time, to find out how many people subscribed on Google Reader to various bloggers on Here were the results:

    It was a nice demo of what was possible using Data Explorer however it did have some limitations, mainly that I had to type in the list of bloggers and their URLs. It would have been much more elegant to have Data Explorer request a list of bloggers from and iterate over that list, querying for information about each in turn.

    Times move on and today, 3 and a bit years later, Google Reader is defunct and Data Explorer has morphed into Power Query. It looks different and has lots of new features which make it much more conducive to garnering information from the web. I thought it might be time well spent (and kinda fun) to revisit my previous blog post and see if Power Query can collate more useful information regarding than it could back then.


    If you can’t be bothered reading the detail of this post then just take a look at this screenshot that I shared on Twitter earlier today of a Power BI dashboard that shows information pertaining to


    This dashboard shows:

    That screenshot is taken from which is a place where Power BI reports (that are based on Power Query queries) can be stored and shared. Unfortunately they can only be shared with people whose email domain is the same as yours so I can’t share a link directly to the page which is a real shame because Power BI reports are interactive and the benefit is much greater when one is able to interact with them. If you would like this situation to change (and you should) then go and vote for these requests:

    The real work here goes on in the Power Query query that retrieves the data that underpins these reports. If you’d like to know more about that, read on!

    The Power Query part

    Power Query is, to all intents and purpose, a draggy-droppy-pointy-clicky UI over a functional language called M. The M queries that I built to obtain the data depicted above I have shared in this github gist: or you could download as a Power BI Designer file: sqlblog.pbix. Whether you copy-and-paste the M code or you download the .pbix you will need to download and install Power BI Designer in order to run the queries and see the resultant data.

    One caveat to running the queries, ensure you have created a Data Source setting on your machine for, it should look just like this:


    Retrieving data from a website using Power Query is essentially a web scraping exercise. Here are the steps the M query basically goes through to retrieve this information:


      • For each link in the list of monthly archives
        • retrieve the monthly archive
        • for each blog post summary in the monthly archive
          • retrieve the title and number of comments

    That’s pretty much it. The code looks a bit gnarly at first glance but all its essentially doing is navigating the DOM and following a few hyperlinks.

    The reports

    The reports are Power View reports. Power View is fairly intuitive so I don’t think its worth talking about it here too much. I do however think it is worth showing some examples of the insights that Power View can provide on top of this data.

    How many blog posts and how many comments on SQLBlog

    image image

    Slicing by blogger

    Charting all of the blog posts and ordering by number of comments is useful, that’s how we see that Marco Russo’s post DateTool dimension: an alternative Time Intelligence implementation is the most popular:


    However if we introduce a slicer we can analyse a single blogger’s blog posts, let’s take Aaron Bertrand for example seeing as he has posted the most:


    Aaron’s most popular blog post (going by the number of comments) is his ditty from October 2010 Fun with software : uninstalling SQL Server 2008 R2 Evaluation Edition.

    How many bloggers ARE there on



    Taking it further

    There is a heck of a lot of other data available on should one want to go and fetch it and Power Query makes it really easy to do that. It would be interesting, for example, to chart the tally of blog posts per month and see when “peak blog month” occurred. I’ll leave that as an exercise for the reader.


  • Implementing a build and deploy pipeline for Cronacle

    In my new role I am working on a system that makes heavy use of Redwood Software’s Cronacle. If you hit that link you’ll find a lot of marketing mumbo jumbo about Cronacle such as:

    • Cronacle gives you greater process consistency and quality
    • speed up your mission critical IT and business processes by as much as 90%

    Read into that what you will, to me Cronacle is a scheduler and orchestration tool. That might come across as slightly belittling but its not intended to be, in my (limited) experience Cronacle does a couple of things and does them very well (certainly in comparison to the tool I’m more familiar with, SSIS, that’s a comparison I hope to explain more fully in a future blog post).

    In the aforementioned post New year, new beginnings I said

    My aim is to instil my beliefs about continuous integration, unit testing, failing fast, on-demand provisioning into my new team

    which, to use a buzzword du jour, might well have been written as “My aim is to instil a devops culture into my team”. First step has been to integrate Cronacle into our build/deploy pipeline and in this post I’d like to describe how one goes about doing that.

    Scene setting

    We are using Cronacle v9. We are developing Job Chains and Job Definitions (aka Process Chains and Process Definitions) and it is those objects that we want to:

    1. store in a version control system (aka source control)
    2. compose into a single deployable artefact as part of a Continuous Integration (CI) build
    3. deploy to our various dev, test and prod environments in a repeatable fashion

    Storing in VCS

    The first pre-requisite to implementing a successful devops culture (in my opinion) is to use a version control system (VCS). I’m not fussy about which VCS one is using, as long as one uses one of them. We are using Subversion. My only stipulation is that no code in our build and deployment pipeline relies on us using Subversion because I don’t want to be tightly coupled to a particular technology.

    Typically one stores human-readable files in a VCS and I was keen that we did the same for our Cronacle Job Chains and Job Definitions. Cronacle enables one to export objects:


    when selecting that option a binary file with a .car extension is downloaded:


    .car file is shorthand for “Cronacle Jar”. Jar files are (as I understand them) used in the Java ecosystem as a mechanism for distributing stuff, they’re basically just .zip files with a different extension. Hence, get hold of some compression software that understands the .zip format (I recommend 7zip*), unzip your .car file and in the unzipped folder you’ll find a JobDefinition folder containing an XML file that defines the object that was exported:


    That XML file is the human-readable file that we check in to our VCS.


    We use TeamCity on which to run our CI builds (its my first time using TeamCity in anger and I have become an unabashed fan) in combination with an internally-built build automation framework called PSP.Build that is built atop PowerShell and Psake. PSP.Build is loosely based on Lloyd Holman’s OneBuild and follows the principles that Lloyd sets out at Run the same build process everywhere. We also use Pester for unit testing our Powershell code - I’m loving me some Pester.

    Building for Cronacle means reconstituting those XML files that we have in Subversion into a deployable .car file. The easiest way to do that is simply to zip them up for which, again, we are using 7zip. In addition there are some other files that need to be bundled into the .car file, one can see those files in the aforementioned .car file that is used for exporting from Cronacle.


    You can simply take a copy those files, they don’t need changing. (We bundle them inside PSP.Build so that we always have them available to our build process.)

    So in summary, in order to build a .car file you need three things:

    • XML files that define the Cronacle objects
    • A zip utility
    • com and META-INF folders

    Once you have them it just takes a bit of PowerShell to bundle them up. Here’s a direct copy of our code that does that:

    Remove-CarFiles -sourcePath $sourcePath #Simply deletes all the .car files it finds underneath $sourcePath

    Get-ChildItem -path "$basePath" -Recurse | Where-Object {$_.Name -eq "CronacleCarIncludes"} |

           Get-ChildItem | Copy-Item -Destination "$sourcePath\$nameofFolderToBeZippedUp" -Recurse -Force #fetches "com" & "META-INF" folders


    $zipperExe = Get-ChildItem -Path $basePath -Include "7za.exe" -Recurse |

           Select-Object -First 1 | foreach {$_.FullName} #find 7zip, which is bundled in our build framework, PSP.Build

    if ($zipperExe -eq $null) {throw "7za.exe could not be found in $basepath"}


    "Creating .car for sourcePath: $sourcePath\$nameofFolderToBeZippedUp" | Write-Verbose

    Push-Location "$sourcePath\$nameofFolderToBeZippedUp"

    & $zipperExe a -tzip "$" * #zip it all up!


    [I’m slightly reluctant to show my PowerShell code in public as it nearly always get criticised, so be gentle OK Smile]


    So, the output from building all our Cronacle objects is a .car file. How do we deploy that .car file to our Cronacle cluster? Its actually quite simply, we execute our .car file using java.exe and pass in a few parameters. The Cronacle documentation informs us of the generic form of the command and gives an example:


    (This documentation does not appear online anywhere so I can’t link to it I‘m afraid)

    The smarts to say what to do with all the Cronacle object definitions is within that com folder that was mentioned above, hence its imperative that that com folder is included in the .car file.

    Again, we wrote a load of PowerShell code to automate the process of building up the command-line and executing it, here’s that code if you’re interested:

    function Invoke-DeployCronacleCarFile () {










        try {

            if ($carFilePath.Extension -ne ".car") {throw "File: '$carFilePath' is invalid. File must have a .car extension."}

            if (-not (Test-Path $carFilePath)) {throw "File: '$carFilePath' does not exist."}


            $cmd =  "$javaExeFilePath"

            $arguments = '-jar',"$carFilePath", '-server', "$server", '-txt','-ruleset',"$importRuleset"


            if (-not ($username -eq $null -or $username -eq "")){$arguments += '-username', "$username"}

            "cmd: $cmd" | Write-Verbose

            "arguments: $arguments" | Write-Verbose

            if (-not ($password -eq $null -or $password -eq "")){$arguments += "-password","$password"}

            Invoke-CommandLine -cmd $cmd -arguments $arguments


        catch {




    function Invoke-CommandLine ([string]$cmd, [string[]]$arguments, [string]$errorMessage = "Error executing command: " + $cmd) {

        & $cmd @arguments #| write-host

      if ($LastExitCode -ne 0) {

        throw $errorMessage



    That may look slightly daunting but there’s not too much going on here. It takes the path to a .car file, the path to java.exe, server, username, password and an ImportRuleSet**, concatenates things together into a bunch of arguments, then sends it to Invoke-CommandLine to be executed.


    That’s pretty much it. The nuts and bolts of this is relatively simple, you’re simply zipping your definitions up and deploying that zip file using java.exe.

    Hope this helps.


    * We bundle 7zip inside PSP.Build so we don’t have to rely on the actor who kicks off the build (which would be TeamCity or one of our team members) having it installed - I’m pretty fierce about reducing external dependencies for our build/deploy pipeline. If its a pre-requisite then it gets bundled into PSP.Build.

    ** An ImportRuleSet defines environment-specific settings. A discussion of ImportRuleSets is outside the scope of this article however if you’ve come this far you’ll probably already know whether you need to be using them or not.

  • Test for a warning message using Pester

    Here’s a little ditty that’s worth throwing out (if nothing else so that I can find it later), if you’re not using Pester to test your PowerShell code then this post probably isn’t for you. If you’re not using PowerShell stop reading now.

    I wanted to write a Pester test that tested whether or not a piece of code threw a warning or not. I discovered from Understanding Streams, Redirection, and Write-Host in PowerShell that it is possible to redirect stuff in the PowerShell Warning (or Verbose or Debug) stream to the output stream. Here’s an example of doing just that:

    "Here is a message that I will send to the warning stream then redirect back to the output stream" | Write-Warning 3>&1 | %{$_.message}


    The “3>&1” part says “take the contents of stream #3 (i.e. the Warning stream) and send it to stream #1 (i.e. the Output stream)”. Once its in the output stream you can operate upon it as normal.

    In my Pester test I wanted to know whether or not a particular piece of code returned a warning matching a given string. Using the above technique its simply:

    (Some-Code-That-Is-Getting-tested) 3>&1) -match "Desired warning message" | Should Be $true

    Cool stuff! I love Pester.


  • New OneDrive API, possibly unified with OneDrive for Business

    Microsoft offers two similar, but separate, services containing the OneDrive moniker. There’s OneDrive (which is free) and OneDrive for Business (which isn’t). Strangely (as my ex-colleague Mark Wilson points out at OneDrive for Business: lots of cloud storage; terrible sync client) the free offering is much better than the paid-for one.

    Microsoft state in their blog post Taking the Next Step in Sync for OneDrive on 7th January 2015:

    It was clear that the right approach was to converge to a single sync engine and experience that would be able to provide all of the benefits of the consumer and business service to all customers faster. We decided to start with the consumer sync engine foundation from Windows 7 and Windows 8 and add the right capabilities from the other two engines.

    That’s Microsoft-speak for “Yes, OneDrive for Business is shit so we’re dumping it in favour of the OneDrive stuff”. Good news.

    Back in October 2014 Microsoft announced the availability of a Files API that allowed developers the ability to access files stored in Office 365

    we’re enhancing the opportunity for developers with Office 365, with new APIs for mail, files, calendar and contacts. These new robust REST-based APIs empower all developers to leverage the more than 400 petabytes of data (as of March 2014) and popular services across Office 365 in any application.

    New Office 365 extensibility for Windows, iOS, Android and web developers

    I asked a couple of the guys that worked on the API would the files API work against OneDrive and/or OneDrive 4 Business? The answer from Chris Johnson was “working on it”:


    That sounded like good news too.

    Yesterday Microsoft announced the availability of The new OneDrive API which is a RESTful API. Given that they are attempting to unify OneDrive and OneDrive for Business, and also given Chris Johnson’s response to my question in October, I wondered whether this API would cover OneDrive for Business as well so I decided to check out the new OneDrive API using the API console and found something quite promising. A request to which GETs a user’s default drive returns:


    Note the mention of

    "driveType": "consumer"

    That says to me that there will be other driveTypes and so I assume that this API will be used to access OneDrive for Business stuff also. More good news if that’s the case. I tweeted that question but at the time of writing haven’t had an answer.

    Some questions still remain in my mind:

    • What’s the difference between this new OneDrive API and the files API announced back in October? Do we have two teams building essentially the same thing? That sounds like the bad old Microsoft of old so I do hope not.
    • Are we ever going to get a RESTful API that will enable us to push data into an Excel workbook that is stored on OneDrive? Chris Webb and I have been asking for this for five years now. Again I asked the question on Twitter and am still hoping for a response. I do have high hopes that the new Power BI REST API will sate my needs here (right now I wouldn’t know because if one isn’t in the US then one cannot access it).


  • Response to “SSDT limitations”

    A commenter on my blog post Considerations when starting a new SSDT database project asked me to comment on his blog post SSDT Limitations. i wrote a response but when I tried to post it I got told it was too long:


    so I’m posting my response here instead. I’d suggest taking a read of the commenter’s post before reading on.

    "SSDT does not handle multiple file groups"
    I haven't experienced this. I'm pretty sure I've used SSDT to deploy multiple filegroups. Can you clarify what is meant by "does not handle"?

    "SSDT wants to control everything"
    I'd restate that as "SSDT gives you the option to control everything". If you don't want to put all your database objects into your SSDT project, then don't.

    "Production will not look like UAT. For SSDT, everything looks the same."
    Yes, this is a problem. SSDT espouses a mantra of "build once, deploy anywhere" but this breaks down if you want something to be different on each environment - security is the obvious thing here. My approach is to specify roles in the database project that have permissions assigned to them and then later (perhaps in a post-deployment script) add users into those roles. I describe this more fully here: A strategy for managing security for different environments using the Database Development Tools in Visual Studio 2010

    "This means storing in version control production:
    -Logins & passwords
    -app role passwords
    If you're storing passwords in source control then that must mean you're using SQL authentication instead of Windows authentication - that's your decision and I don't see how that is a shortcoming of SSDT. If you don't want to store them in source control - don't. SSDT has mechanisms that allow you to specify values at deployment time (search for SSDT sqlcmd variables). Bottom line, if you're storing passwords in source control then more fool you - nothing in SSDT forces you to do this.

    "It is generally preferred to at least have the option to inspect what will be deployed"
    Correct. And that is why (as you pointed out) SSDT has "the option of either publishing directly to the database or generating a script to run for the deployment."

    "Unfortunately for SSDT the upgrade generates a single file. In practice this means is that if the database upgrade fails for any reason recovery is rather complex. You'll need to work out where the upgrade script failed and then what action you will take from there."
    True. There are things you can do to mitigate this:
    -Test your deployments on a copy of your production database first
    -Do small, frequent releases rather than the big bang approach of infrequently deploying large, monolithic sets of lots of changes

    "The product feels rather like a beta release, bringing even a simple project into production will expose some pretty serious limitation."
    Beta release (to me) implies that it doesn't behave the way its been designed to. I put it to you that it behaves exactly the way its been designed to, it just so happens that that you don't like that behaviour. Fair enough if that's the case, you're not alone in that regard. There are other options for doing deployments if you don't like the way that SSDT espouses. Perhaps look at migrations ( or the old-fashioned method of manually writing change scripts.

    "However the largest issue is more fundamental: the approach where you are expected to trust the magic box to generate a valid upgrade script."
    Yes, you do have to put your trust in SSDT. If you're not happy doing so, don't do it, or just use the option to generate the deployment script and not run it. I personally am happier putting my trust in a repeatable algorithm rather than human beings who tend to be error prone. That's just me, your mileage may vary.

    "It is very easy to generate scripts that will fail to upgrade the database successfully."
    There are certain changes that SSDT is not easily able to cater for (e.g. adding a non-nullable column to a table that contains data) however I'm not sure that constitutes "very easy to generate failing scripts". Personally I think SSDT is better at generating working deployment scripts than a human being is because its quicker and less error prone. As I've alluded, its imperative that you test your deployments before pushing to a production database.

    You asked me to comment and I have done. Clearly we have a difference of opinion about SSDT and that's OK. I hope you can find a way around your complaints because I honestly believe SSDT is the best tool out there for managing SQL Server deployments - I hope you come to the same conclusion one day.


  • Dumb querying, Impala vs Hive

    I alluded yesterday that I wouldn’t be posting much on this blog for the foreseeable future but I’ve come across something today that was quite interesting so I’m already back.

    I’m using Hadoop and using Hive (version 0.12) to query the contents of some tables. Here’s the output from issuing “SELECT COUNT(*) FROM tablename”


    Your eyes don’t deceive you, that’s 223seconds to determine that this table has 0 rows in it. That staggered me. A colleague suggested I try the same query using Impala instead:


    Same result, but this time it came back in <1second.

    What’s the moral here? I’m not really sure. I won’t go out on a limb and say “Impala is quicker” because all I’ve proved is that its quicker to count the number of rows in an empty table which is a fairly pointless exercise. Hive is designed for running operations in a batch manner over massive data sets, nonetheless I think its interesting and I suspect some of my buddies in the SQL server community (whom I suspect will be the only ones reading this) will find it interesting also. Note that Hive v0.12 is old (came out in October 2013) and also that the Stinger initiative is in full flow which promises much much faster response times to Hive queries like these:



  • New Year, new beginnings

    2014 was a year of great change in my personal life and 2015 has begun with a significant change in my work life, I have decided to end my 5-year sojourn into the world of freelancing by taking up a permanent role at Dunnhumby. This is a significant move for me and I wanted to explain why I’d done it (a number of folks have been asking) and to note the implications.

    As I alluded above 2014 was a very turbulent year and one that I am glad to put behind me. My experiences in that year have caused me to re-evaluate my priorities in life and I have have taken the decision to “go permie” because it means I can spend more time with my daughters, Bonnie and Nicki; Dunnhumby’s office is not a long commute which means I can be home by 1730 every night – anyone who works in and around London will tell you that that is virtually unheard of and for me it more than justifies losing the obvious benefits of being a freelancer.

    Working at Dunnhumby is a great move for me too. For 15 years I’ve built a career in the ETL/Data Integration space where the systems I worked on were all about deriving value from data yet were secondary to the client’s core business. At Dunnhumby data integration is their business and that excites me, I’m going to be Lead Developer on a core system that feeds data to some of the world’s largest retailers (with a little research you can probably guess one of them) so there is a tangible opportunity to contribute to the bottom line. Moreover, its refreshing to go into a meeting where everyone is already talking your language, when shifting, aggregating and managing data is what the company does you invariably find that everyone you speak to has more than a passing appreciation of what you do – that’s a welcome change for me. Can you name another company where someone in HR knows what ETL stands for?

    There will be opportunities to push our solution into the cloud and anyone knows me in a work capacity knows that that is an area that excites me. My first aim is to instil my beliefs about continuous integration, unit testing, failing fast, on-demand provisioning and other associated buzzwords into my new team and then lead them on a journey to a truly world class solution. Check back in a few years to see how I did!

    One negative aspect of the new role is that my 15 year association with SQL Server has reached a hiatus. The system I’m working on is built on Hadoop, there’s no SQL Server in sight and that saddens me. I’ve loved every minute of my involvement with SQL Server and have met some wonderful people on that journey, some of whom have become very close friends. On the positive side my SQL Server experience is not going to go to waste as I’ll still be writing SQL day-in, day-out (on Hive and Impala) and the principles that I stand by when I build ETL solutions apply equally in a none-SQL Server landscape. I’m looking forward to learning more about the other half live.

    Blogging about SQL Server and associated interests has been an intrinsic part of my life for the last ten years and in many ways has come to define who I am, I wonder now how it will continue. I admit that my desire to blog has waned somewhat since the heady days of 2006 and 2007 when I was averaging almost one blog post every two days but I’m hoping I still have that spark that persuades me to commit my learnings in writing from time to time. I’ll also keep answering questions on the SSDT forum as I’m still a massive fan of that toolset and believe that its differentiating value to SQL Server hasn’t been realised as it should.

    Not much else to say on this subject really. See you on the other side. Or on Twitter, you can’t keep me quiet on there.


  • Parallel Foreach loops, one reason to use Powershell Workflow instead of SSIS

    Lately I’ve been using Azure Automation which is a service on Azure for running Powershell Workflow scripts, as such as I’ve diving deep into Powershell Workflow. What’s Powershell Workflow? I’m glad you asked:

    A workflow is a sequence of programmed, connected steps that perform long-running tasks or require the coordination of multiple steps across multiple devices or managed nodes. Windows PowerShell Workflow lets IT pros and developers author sequences of multi-device management activities, or single tasks within a workflow, as workflows. By design, workflows can be long-running, repeatable, frequent, parallelizable, interruptible, stoppable, and restartable. They can be suspended and resumed; they can also continue after an unexpected interruption, such as a network outage or computer restart.
    Getting Started with Windows PowerShell Workflow

    So Powershell Workflow does exactly what it says on the tin, its a workflow engine. Those of you that, like me, have been knee deep in SQL Server Integration Services (SSIS) for most of the past decade will know that SSIS too has its own workflow engine – its got a different name, Control Flow, but its still a workflow engine. One frequent ask of SSIS’s Control Flow is a Parallel ForEach Loop but there seems little hope that we’re going to get one (raised on Connect in March 2014, closed as Won’t Fix 5 months later) without going to 3rd party solutions like Cozyroc’s Parallel Loop Task.

    As the title of this post has already told you Powershell Workflow has a big advantage over SSIS, it includes a Parallel Foreach Loop out-of-the-box and you can read all about it at about_Foreach-Parallel. Or type “help about_Foreach-Parallel” into your nearest Powershell window. From there:

    The Parallel parameter of the ForEach keyword runs the commands in a ForEach script block once for each item in a specified collection.
    The items in the collection, such as a disk in a collection of disks, are processed in parallel. The commands in the script block run sequentially on each item in the collection.

    That’s fairly self-explanatory. If you’re interested to know more I’ve put together a cool little Powershell Workflow demo that makes calls to a URI, first serially (using a conventional Foreach Loop) then parallelly (which isn’t, I’m pretty sure, a real word but I’m using it anyway) and hosted it as a githib gist: Parallel-vs-Serial.ps1. You can simply copy-paste the script into Powershell ISE, hit F5 and you’re away. On my machine the serial calls completed in 19seconds, the parallel calls in 13seconds. No brainer!

    Take a look if you have 5 minutes. This is good stuff.



  • Learn from me about SSDT in London, February 2015

    Microsoft released SQL Server Data Tools (SSDT) along with SQL Server 2012 in the Spring of 2012. Since then I’ve noticed an upward tick in both the number of organisations that are using SSDT and the number of questions that are getting asked about it on forums. There is some confusion about what SSDT actually is (Microsoft hasn’t helped there), why people should be using SSDT and, most importantly, how to make best use of it. If you want to know more then a good place to start is my blog series 10 days of SSDT or Get to Know SQL Server 2012's SQL Server Data Tools on

    Its clear that people want to learn more about SSDT so if those articles don’t satiate you know that I have joined forces with Technitrain to offer a 2–day training course, in London, in February 2015 called Introduction to SQL Server Data Tools



    The course will cover:

    Day 1

    • SSDT Positioning
    • IDE Tour
    • Connected Database Development
    • Declarative, offline, database development
    • Publishing
    • Continuous Integration (CI) and Continuous Deployment

    Day 2

    • Data Publishing
    • Refactoring
    • Database unit testing
    • References and composite projects
    • Database Drift
    • Code analysis

    If this sounds like your bag then please sign up on the Technitrain website. I ran this course for Technitrain in March 2014 after which 100% rated the trainer and the course content as outstanding. 100% also rated the course overall as outstanding.


  • Archiving Azure Automation logs to Azure BLOB Storage

    For the past few months at work I’ve been diving headlong into a service on Microsoft Azure called Azure Automation which is, to all intents and purposes, a service for hosting and running Powershell scripts (properly termed “runbooks”). I’ve got a lot to say and a lot to share about Azure Automation and this is the first such post.

    Each time Azure Automation runs a runbook (a running instance of a runbook is termed a job) it stores a log of that job however that log isn’t readily available as a log file on a file system as one might be used to. There are two ways to access the log, either via the Azure Management Portal at or programatically using Powershell cmdlets. Both are handy however I wanted the logs available as a file in Azure BLOB Storage so that they could be easily viewed and archived (in the future we’ll be building Hive tables over the top of the logs – remind me to blog about that later once its done). I have written a runbook called Move-AzureAutomationRunbookLogs that will archive logs for a given runbook and for a given time period into a given Azure BLOB Storage account and I’ve made it available on the Azure Automation gallery:


    It does pretty much what it says on the tin so if you’re interested hit the link and check it out. For those that are simply interested in reading the code I’ve provided a screenshot of it below too (suggestions for how to copy-paste Powershell code into Live Writer so that it keeps its formatting are welcome!!). There’s a screenshot of the output too if that’s the sort of thing that floats your boat.

    After this archiving runbook has run you’ll find two folders under the root folder that you specified:


    and inside each of those is a folder for each runbook whose logs you are collecting (the folder contains a file per job). The “job” folder contains a short summary of each job however the “joboutput” folder contains the good stuff – the actual output from the run.

    One last point, if you’re going to use this runbook I highly recommend doing so in conjunction with a technique outlined by Eamon O’Reilly at Monitoring Azure Services and External Systems with Azure Automation. Eamon’s post describes a technique for running operations such as these on a schedule and its something that I‘ve found to be very useful indeed.

    Hope this is useful!




More Posts Next page »

This Blog


Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement