THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Jamie Thomson

This is the blog of Jamie Thomson, a freelance data mangler in London

  • Prompt for a password with a mask using Powershell

    Here’s some code that I absolutely know I’m going to need again in the future, what better place to put it than on my blog!

    If you need to prompt the user for a password when using Powershell then you want to make sure that the value types in isn’t visible on the screen. That’s quite easy using the –AsSecureString parameter of the Read-Host cmdlet however its not quite so easy to retrieve the supplied value. The following code shows how to do it:

    $response = Read-host "What's your password?" -AsSecureString
    $password = [Runtime.InteropServices.Marshal]::PtrToStringAuto([Runtime.InteropServices.Marshal]::SecureStringToBSTR($response)

    I don’t know of a quick and easy way to format Powershell code for a blog post so here’s a screenshot instead:

    image

    I’ve also put this on pastebin: http://pastebin.com/2D6xaz0U

    All credit goes to Paul Williams for his post Converting System.Security.SecureString to String (in PowerShell)

    @Jamiet

  • Azure Automation – the beginning of cloud ETL on Azure?

    I have maintained a watching brief on what I refer to as “cloud ETL”, that is the ability build ETL routines in a cloud environment and therefore leverage all the benefits that the cloud model brings*. Thus far my main opinion piece in this area is What would a cloud-based ETL tool look like? in which I laid out what features I thought a cloud ETL tool should have:

    • Data transformation would be done “in the cloud” i.e. I wouldn’t need to own my own hardware in order to run it
    • Ability to consume data from/push data to <many different data protocols>
    • Adapters (possibly with a plug-in model) for cloud storage and API providers
    • Job scheduler
    • Workflow. (e.g. Do this, then do that. Do these things in parallel. Only do this if some condition is true. Restart from here in case of failure.)
    • An IDE (open to debate whether the IDE should be “in the cloud” as well)
    • Ability to carry out common transformations (join, aggregate, sort, projection) on those heterogeneous data sources
    • Ability to authenticate using different authentication mechanisms
    • Configurable logging
    • Ability to publish transformed data in a manner that makes it consumable rather than insert it into another data store

    http://sqlblog.com/blogs/jamie_thomson/archive/2013/02/12/what-would-a-cloud-based-etl-tool-look-like.aspx

    Given that I have spent the majority of my career working with Microsoft technologies (in particular their ETL tool, SSIS) I am interested to know whether Microsoft will offer a cloud ETL tool. With that in mind I was interested to discover a new service on Azure that is currently in preview called Azure Automation (read Announcing Microsoft Azure Automation Preview). Azure Automation is essentially a a cloud-based workflow tool and, as I said above, workflow is a feature that I believe a cloud-based ETL tool should encompass:

    • Workflow. (e.g. Do this, then do that. Do these things in parallel. Only do this if some condition is true. Restart from here in case of failure.)

    SSIS developers will of course be aware that SSIS has its own workflow tool (termed the Control Flow). It always kind of bugged me that different Microsoft tools had their own workflow technology. SSIS had one, I believe BizTalk had one, there was another called Windows Workflow Foundation (WWF) and in fact there was a possibility within the SQL Server 2008 timeframe that SSIS would replace its Control Flow with WWF (that never happened and the Program Manager that wanted to do it has since left the SSIS product team).

    Azure Automation is built upon Powershell Workflow which in turn is built upon WWF (now simply called Workflow Foundation – WF). It certainly seems as though WF is becoming the foundational workflow technology to rule them all within Microsoft and that is no bad thing in my opinion – it seems foolish to reinvent the wheel every time. Powershell Workflow has the following cmdlets for building workflows:

    • Workflow
    • Parallel
    • Foreach –parallel
    • Sequence
    • InlineScript
    • Checkpoint-workflow
    • Suspend-workflow

    Those are all fairly self-explanatory. Of particular interest to me is Foreach –parallel (we’ve been asking for a native Parallel ForEach Loop in SSIS for years) and that might be even more useful in a scale-out infrastructure such as can be offered by the cloud (imagine firing off multiple FTP tasks in parallel, all working on different Azure nodes). Checkpoint-Workflow also sounds very interesting:

    A checkpoint is a snapshot of the current state of the workflow, including the current values of variables, and any output generated up to that point, and it saves it to disk. You can add multiple checkpoints to a workflow by using different checkpoint techniques. Windows PowerShell automatically uses the data in newest checkpoint for the workflow to recover and resume the workflow if the workflow is interrupted, intentionally or unintentionally.
    http://technet.microsoft.com/en-us/library/jj574114.aspx

    Stateful restartability that you can control, all out-of-the-box. How cool is that? So much better than the awful checkpointing feature within SSIS.

    It certainly appears to me that Azure Automation could satisfy my desire for a workflow engine for the purposes of cloud-ETL. Now if only Microsoft were working on cloud-based dataflows too we’d have something akin to SSIS-in-the-cloud Winking smile.

    @Jamiet

    *My own personal opinion is that the benefits of the cloud model can be summed up simply as “OPEX not CAPEX”. You may have your own definition, and that’s OK.

  • Restart Framework added to SSIS Reporting Pack

    On 31st March 2014 I released version 1.2.0.0 of SSIS Reporting Pack, my open source project that aims to enhance the SSIS Catalog that was introduced in SSIS 2012. This is a big release because it includes an entirely new feature  -the Restart Framework.

    Introduction

    The Restart Framework exists to cater for a deficiency within SSIS, that being the poor support for restartability. Let's define what I mean by restartability:

    A SSIS execution that fails should, when re-executed, have the ability to start from the previous point of failure.

    SSIS provides a feature called checkpoint files that are intended to help in this scenario but I am of the opinion that checkpoint files are an inadequate solution to the problem, I explain why in my blog post Why I don't use SSIS checkpoint files.
    The Restart Framework was designed to overcome the many shortcomings of checkpoint files.

    One of the fundamental tenets of the Restart Framework is that the packages that you, the developer, build for your solution should not be required to contain any variables, parameters, tasks, or event handlers in order to make them work with the Restart Framework. In fact your packages should be agnostic of the fact that they are being executed by the Restart Framework.

    TL;DR: A video that demonstrates the installation and base functionality of the Restart Framework can be viewed at https://www.youtube.com/watch?v=syV0Wpwhlnk

    Terminology

    Let's define some important terms that you will need to become familiar with if you are going to use the Restart Framework.

    ETLJob

    An ETLJob is the definition of some work that an end-to-end ETL process needs to perform. An ETLJob would typically incorporate many SSIS packages. Each ETLJob has a name (termed ETLJobName) which can be any value you want, some example ETLJobNames might include:
    • Nightly Data Warehouse Load
    • Monthly Reconciliation
    • All backups

    ETLJobStage

    Each ETLJob contains one or more ETLJobStages. These are the "building blocks" of your solution and for each ETLJobStage there must exist a package in your SSIS project with a matching name. For example, an ETLJobStage with the name "FactInternetSales" will require a SSIS package called "FactInternetSales.dtsx".

    The Restart Framework allows the declaration of dependencies between ETLJobStages - an ETLJobStage cannot start until all ETLJobStages with a lower ETLJobStageOrder have completed successfully. This is a fundamental tenet of the Restart Framework as it needs to know the order in which ETLJobStages need to occur in order that it can restart execution from the previous point of failure.

    The Restart Framework provides some stored procedures that should be used to define ETLJobs, ETLJobStages and the dependencies between them.

    One important point to make about ETLJobStages is that the Restart Framework only supports restartability of a failed ETLJobStage, the Restart Framework has no control (and, indeed, does not care) what occurs within that ETLJobStage. The implication therefore is that the onus is on the package developer to ensure that each ETLJobStage is re-runnable from the start of that package in the event of failure; in other words an ETLJobStage must be idempotent.

    ETLJobHistory

    Each time an ETLJob is executed a record is inserted into a table called ETLJobHistory and a unique ETLJobHistoryId is assigned. Crucially, when a previously-failed ETLJob is restarted it retains the same ETLJobHistoryId, compare this to SSIS' own execution_id which will be different whenever an ETLJob is restarted.

    The ETLJobHistoryId can be particularly useful when used for lineage purposes in a data warehouse loading routine. Every inserted or updated record can have the ETLJobHistoryId stored against it which is useful for providing lineage information such as when the record was inserted/updated.

    What's included

    SsisReportingPack database

    This is the same database that houses usp_ssiscatalog and all of its supporting code modules. All of the database objects that support the Restart Framework are in a schema called RestartFramework.


    20140408 restartframeworkschema.png

    SSIS packages

    The Restart Framework consists of two packages that must be included in every SSIS project that is intending to use the Restart Framework hence they will need to be added into your SSIS project within Visual Studio.
    Root.dtsx

    This package must be executed in order to have any execution be managed by the Restart Framework. It takes a single parameter, ETLJobName, to indicate which ETLJob it should execute. Root.dtsx will interrogate the Restart Framework metadata in the SsisReportingPack database to determine which ETLJobStages are included.

    For each ETLJobStage Root.dtsx will fire off a new instance of ThreadControllor.dtsx, passing it a ThreadID and an ETLJobStageOrder.

    Root.dtsx can fire off eight concurrent instances of ThreadController.dtsx. This number if configurable however eight is the maximum. You could easily extend Root.dtsx to fire off more than eight if you so desired.

    Here is a screenshot of Root.dtsx control flow:
    20140412 22-05-10 root.dtsx.png

    ThreadControllor.dtsx
    This package is responsible for calling your packages that actually do some work. It receives a ThreadId and ETLJobStageOrder from Root.dtsx which it uses to interrogate the database to get a list of ETLJobStageNames that it needs to execute. It loops over that list and executes a package of the same name from the current project.
    When an ETLJobStage completes successfully it is the job of ThreadController.dtsx to update the database to indicate that this has occurred.
    Here is a screenshot of ThreadController.dtsx control flow:
    20140412 22-13-06 threadcontroller.dtsx.png
  • SSDT gets some small enhancements for SQL Server 2014

    Release-to-manufacturing of SQL Server 2014 was announced today and with it comes some welcome news about some small enhancements to SSDT:

    The SSDT team has been heads down getting our SQL Server 2014 release ready to go.  However we've made time to make some targeted improvements in this area.  In our upcoming release for SQL Server 2014 the Change Connection part of this blog will be implemented.  The F6 functionality (included moving between multiple result sets) didn't make it, but will be in the update after that. – Patrick Sirr

    http://sqlblog.com/blogs/jamie_thomson/archive/2013/03/19/connected-development-in-ssdt-versus-ssms.aspx#53271

    and

    I've implemented F6 and shift-F6 to cycle through the tabs as SSMS does (including for multiple result sets). However the fix won't make it into our upcoming SQL Server 2014 release due in a couple weeks. It'll be in the next update.

    http://connect.microsoft.com/SQLServer/feedback/details/780990/ssdt-f6-to-move-between-panes-in-a-query-window

    Hopefully there’ll be a few more enhancements in the SSDT Connected Development arena other than the two mentioned here (more scripting options would be nice).

    Who says submitting to Microsoft Connect never works? Smile

     

    Don’t forget that you don’t need to have SQL Server 2014 in order to get these new features. SSDT is free and supports all versions back to SQL Server 2005.

    @Jamiet

  • OneNote API – finally! Now, about that Excel API…

    I’m a frequent user of OneNote and so was delighted with today’s news that there is now a public API available so that third party apps and services can put stuff into your OneNote notebooks (an API? welcome to the modern web, OneNote). One of those third party services is ifttt so I’ve set up a few ifttt recipes to dump stuff into OneNote:

    All very nice thanks very much.

    I do have a few quibbles though (otherwise why would I be writing a blog post, right? Smile ). Firstly, the API only allows you to create pages, it cannot append to existing ones. Second, and more importantly, you can’t choose which workbook section to create the page in. I find this really annoying, take the example of my ifttt recipe above that bungs all my blog posts into OneNote – how much more useful would it be if we could choose which section to put them into? As it stands right now I would have to go and move them all after the event. Still, credit where credit is due, the API exists and I harbour hopes that it will improve over time.


    A OneNote API is nice and all but one thing I’ve been craving for years is an API that allows me to insert data into an Excel spreadsheet residing on OneDrive. I’ve written in the past about the Excel Services REST API where I lamented:

    Although I haven't demonstrated it here Excel Services' REST API does provide a makeshift way of altering the data by changing the value of specific cells however what it does not allow you to do is add new data into the workbook. Google Docs allows this.
    Exploring the Excel Services REST API

    Chris Webb (who has joined me in this crusade) raised a forum thread in June 2010 entitled Excel Web App API? where he requested such a thing, nearly four years later and we’re still waiting.

    Ifttt allows recipes that trigger every time you tweet, how cool would it this could be used to insert a new row into an Excel spreadsheet on OneDrive for each of my tweets*? Well I would like that anyway and the existence of this new OneNote API rekindles my hope that one day such an API for Excel might exist – please don’t let me wait another 4 years though, Microsoft!

    @Jamiet 

    * Before anyone leaves a comment telling me so, I’m already aware that I can use ifttt to insert all my tweets into a Google Docs spreadsheet and indeed I’m already doing so. I’d just prefer it for Excel, that’s all.

  • Thoughts on Office 365, Windows Azure Active Directory, Yammer & Power BI

    This week a SharePoint conference took place somewhere and I took more than a passing interest because it clearly wasn't a SharePoint conference, it was a Office365/Yammer conference and as far as I can discern the big takeaways were:

    It was interesting to me because Power BI is something that is on my radar and which is delivered via Office 365. This got me thinking about scenarios where Power BI & Yammer could play together more effectively.

    The BI delivery team that I currently work for is trying find ways to make the information that we produce more discoverable, more accessible and to promote the use of the information that we provide throughout the company. The company is an Office365 customer however they pretty much use it only as an email & IM provider - none of the SharePoint-y stuff is used. The company is also a Yammer customer.

    The confluence of Yammer and Power BI might make an interesting story here. Imagine, for example, the ability to build a Power View report using Power BI and then share that throughout the organisation using Yammer, perhaps via a Yammer group. Anyone viewing their Yammer feed would be able to view and interact with that Power View report without leaving Yammer. I’m not talking about simply viewing an image of a report either – I’d want to be able to slice’n’dice that report right within my Yammer feed.

    I’ve long thought that we need to think of new ways of delivering BI to the masses and I believe social collaboration tools present a great opportunity to do that. I’m excited about what Yammer + Power BI could bring, let’s hope Microsoft don’t royally screw it up.

    I still believe that Microsoft’s Master Data Services (MDS) should be offered through Power BI and again the opportunity to collaboratively compile and discuss data that resides in MDS is compelling. I see no reason why people wouldn’t want to change MDS data from within their Yammer feed – why would we force them to go elsewhere? Again I opine, bring the data to wherever your users are, don’t make them go somewhere else.


    Hidden away behind all of the announcements was the implicit assertion that Windows Azure Active Directory is critical to Microsoft’s cloud efforts. Office 365 sits on top of Windows Azure Active Directory and I don’t think many people realise the significance of that. Whoever manages your company’s employees’ identities has a huge opportunity for selling new stuff to you and that’s why Windows Azure Active Directory is free. This is not a new play for Microsoft, over the past 20 years or so they’ve become a huge player in the corporate landscape and that’s in no small way down to Active Directory – own the identity and you can sell them other stuff like SharePoint, Windows, SQL Server etc… By allowing you to extend your Active Directory into the cloud and have pervasive groups its not far off being a no-brainer for companies to use Windows Azure & Office 365.

    Active Directory in the cloud, public and private groups, identity management, developer APIs … those are the big plays here and is very much like what I described in my blog post Windows Live Groups predictions and “Active directory in the cloud”. The names and players have changed but the concepts I outlined there are now happening. Back then I said:

    [This] gives rise to the idea of Groups becoming something analogous to an "active directory in the cloud". This is a disruptive idea partly because it could become the mechanism by which Microsoft grant access to their online properties in the future.

    Even more powerful is the idea that 3rd party websites that authenticate visitors … could use Groups to determine what each user can do on that site. Groups will become part of an authentication infrastructure that anyone in the world can leverage.

    This "active directory in the cloud" idea relies on a robust API that allows a 3rd party site to add and remove people from groups.

    Believe it or not that was six years ago. Don’t want to say I told you so, but…

    @Jamiet

  • Capturing query and IO statistics using Extended Events

    The commands

    SET STATISTICS TIME ON
    SET STATISTICS
    IO ON

     

    return information about query executions and are very useful when doing performance tuning work as they inform how long a query took to execute and the amount of IO activity that occurred as a result of that query.

    These are very effective features however to my mind they do have a drawback in that the information they provide is not accessible in the actual query window from which the query was executed. This means the results cannot be collected, stored in a table, and then queried – such information would have to be manually copied and pasted from the messages pane into (say) a spreadsheet for further analysis.

    This is dumb. I’m a SQL Server developer, I want my data available so that I can bung it into a table in SQL Server and issue queries against it. That is why, a couple of weeks ago, I submitted a request to Microsoft Connect entitled Access to STATS TIME & STATS IO from my query in which I said:

    I recently was doing some performance testing work where I was evaluating the affect of changing various settings on a particular query. I would have liked to simply run my query inside a couple of nested loops in order to test all permutations but I could not do that because every time I executed the query I had to pause so I could retrieve the stats returned from STATISTICS IO & STATISTCS TIME and manually copy and paste (yes, copy and paste) the information into a spreadsheet.

    This feels pretty dumb in this day and age. Why can we not simply have access to that same information within my query? After all, we have @@ROWCOUNT, ERROR_MESSAGE(), ERROR_NUMBER() etc... that provide very useful information about the previously executed statement, how about @@STATISTICS for returning all the IO & timing info? We can parse the text returned by that function to get all the info we need.
    Better still, provide individual functions e.g.:
    @@QUERYPARSETIME
    @@QUERYCOMPILETIME
    @@QUERYEXECUTIONTIME
    @@SCANCOUNT
    @@LOGICALREADS
    @@PHYSICALREADS
    @@READAHEADREADS

    Ralph Kemperdick noticed my submission and correctly suggested that the same information could be accessed using Extended Events. Based on this I’ve written a script (below) that issues a series of queries against the AdventureWorks2012 sample database, captures similar stats that would be captured by SET STATISTICS then presents them back at the end of the query. Here are those results:

    image

    The information is not as comprehensive as what you would get from SET STATISTICS (no Read-Ahead Reads for example, and no breakdown of IO per table) but should be sufficient for most purposes.

    You can adapt the script accordingly for whatever information you want to capture, the important part of the script is the creation of the XEvents session for capturing the queries, then reading and shredding the XML results thereafter.

    Hope this is useful!

    @Jamiet

    UPDATE: Turns out you don't need all of this. I've just been informed that Richie Rump has written a parser at http://statisticsioparser.com/ that does all of this for you. Simple paste in your STATISTICS IO output and press the button - it will do all the hard work for you and give you the results back in a nice readable graph. You can paste in multiple results at once too.

    --Create the event session
    CREATE EVENT SESSION [queryperf] ON SERVER
    ADD EVENT sqlserver.sql_statement_completed
    ADD TARGET package0.event_file(SET filename=N'C:\temp\queryperf.xel',max_file_size=(2),max_rollover_files=(100))
    WITH MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_MULTIPLE_EVENT_LOSS,
                
    MAX_DISPATCH_LATENCY=120 SECONDS,MAX_EVENT_SIZE=0 KB,
                
    MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=ON);

    --Set up some demo queries against AdventureWorks2012 in order to evaluate query time & IO
    USE AdventureWorks2012
    DECLARE    @SalesPersonID INT;
    DECLARE    @salesTally INT;
    DECLARE    mycursor CURSOR FOR
    SELECT
    soh.SalesPersonID
    FROM   Sales.SalesOrderHeader soh
    GROUP  BY soh.SalesPersonID;
    OPEN mycursor;
    FETCH NEXT FROM mycursor INTO @SalesPersonID;
    ALTER EVENT SESSION [queryperf] ON SERVER STATE = START;
    WHILE @@FETCH_STATUS = 0
    BEGIN
           DBCC
    FREEPROCCACHE;
          
    DBCC DROPCLEANBUFFERS;
          
    CHECKPOINT;
          
    SELECT @salesTally = COUNT(*)
          
    FROM Sales.SalesOrderHeader  soh
          
    INNER JOIN Sales.[SalesOrderDetail] sod        ON  soh.[SalesOrderID] = sod.[SalesOrderID]
          
    WHERE SalesPersonID = @SalesPersonID
          
    FETCH NEXT FROM mycursor INTO @SalesPersonID;
    END
    CLOSE
    mycursor;
    DEALLOCATE mycursor;
    DROP EVENT SESSION [queryperf] ON SERVER;

    --Extract query information from the XEvents target
    SELECT q.duration,q.cpu_time,q.physical_reads,q.logical_reads,q.writes--,event_data_XML,statement,timestamp
    FROM   (
          
    SELECT  duration=e.event_data_XML.value('(//data[@name="duration"]/value)[1]','int')
           ,      
    cpu_time=e.event_data_XML.value('(//data[@name="cpu_time"]/value)[1]','int')
           ,      
    physical_reads=e.event_data_XML.value('(//data[@name="physical_reads"]/value)[1]','int')
           ,      
    logical_reads=e.event_data_XML.value('(//data[@name="logical_reads"]/value)[1]','int')
           ,      
    writes=e.event_data_XML.value('(//data[@name="writes"]/value)[1]','int')
           ,      
    statement=e.event_data_XML.value('(//data[@name="statement"]/value)[1]','nvarchar(max)')
           ,      
    TIMESTAMP=e.event_data_XML.value('(//@timestamp)[1]','datetime2(7)')
           ,       *
          
    FROM    (
                  
    SELECT CAST(event_data AS XML) AS event_data_XML
                  
    FROM sys.fn_xe_file_target_read_file('C:\temp\queryperf*.xel', NULL, NULL, NULL)
                  
    )e
          
    )q
    WHERE  q.[statement] LIKE 'select @salesTally = count(*)%' --Filters out all the detritus that we're not interested in!
    ORDER  BY q.[timestamp] ASC
    ;

  • Why don’t app stores offer subscriptions?

    Accepted wisdom when one purchases an app from a business store is that one gets free updates for life. This is, quite obviously, an unsustainable business model and I suspect is the main reason why so many apps use advertising to generate income.

    There is though, in the enterprise world at least, a move to a subscription-based business model (i.e. renting software) the most obvious examples of which are Office 365 and Adobe Creative Cloud and I’m left wondering why app stores don’t offer a similar option.

    Today I installed an app called Tweetium that offers a (paid for) premium option, here is why the premium option exists:

    image

    Again this strikes me as unsustainable. The customer pays once yet Tweetium has to pay TweetMarker every month. Forever. It doesn’t take an expert mathematician to realise that eventually Tweetium’s monthly outlay could exceed the income they have saved up from purchases.

    It seems to me there is a simple solution to all this. App stores could offer an option for customers to rent apps rather than buy them. Its more sustainable for the app vendor and the app store provider gets a more predictable income stream (which CFOs seem to like). Why don’t app stores not do this? Seems like a no-brainer to me

    Just a random thought for a Sunday morning.

    @Jamiet

    UPDATE: Apparently iOS & Android app stores *do* offer subscription models, I just wasn't aware of it.
  • My SQL Server gripe at SQL Supper

    On 17th February 2014 (3 days ago) I visited an event called SQL Supper held at Microsoft’s central London office, Cardinal Place. The event was basically a QnA session with Mark Souza, Conor Cunningham, Nigel Ellis, Hatay Tuna & Ewan Fairweather and one part of the evening was loosely termed the gripe session where the attendees were invited to stick their hand in the air and when asked have a good old whinge about something in SQL Server that, well, frankly pissed them off. Given the members of the panel this was inevitably focused on the database platform in SQL Server rather than the BI stuff and this is what I was only too happy to gripe about:

    Microsoft seem to have dropped the ball on database developer productivity, both in the language and the tooling. A decade ago this is something that SQL Server was renowned for, I put it to you that this is no longer the case.  SSDT came out with SQL Server 2012 and its a great tool, I love it, but in the two years since there have been various maintenance releases but hardly any new features. SSMS has hardly changed for years, extensibility is still not truly supported. Intellisense does not work properly 100% of the time. As far as I can recall T-SQL has had only two major features (TRY/CATCH & windowing functions) in the last ten years.

    Please fix this. Show database developers some love again.

    I could write pages and pages of gripes just under the banner of developer productivity but I’ll leave you with that concise summary. It is of course a matter of opinion, feel free to agree or disagree.

  • Dacpac braindump - What is a dacpac?

    In this week’s earlier blog post First release of my own personal T-SQL code library on Github I talked of how one could use a dacpac to distribute a bunch of code to different servers. Upon reading the blog post Jonathan Allen (of SQL Saturday Exeter fame), with whom I’ve been discussing dacpacs with on-and-off recently, sent me this email:

    Hi Jamie,

    The DacPac thing I emailed about in December hasnt taken off yet but I have just downloaded your code library to take a look and I like the way the dacpac works. Should I be able to open that in VS or is the dacpac compiled/built in VS? The video you linked to didnt cover dapac at all so I am in the dark on how to create one/them.

    If I can build a database and create a dacpac simply then this could be really useful.

    Jonathan’s email made me realise that there is perhaps a lot of confusion about what dacpacs are, what they can be used for and how they can be used so I figured a braindump of what I know about them might be useful, that’s what you’re getting in this blog post.

     

    What is a dacpac?

    A dacpac is a file with a .dacpac extension.

    image

    In that single file are a collection of definitions of objects that one could find in a SQL Server database such as tables, stored procedures, views plus some instance level objects such as logins too (the complete list of supported objects for SQL Server 2012 can be found at DAC Support For SQL Server Objects and Versions). The fact that a dacpac is a file means you can do anything you could do with any other file, store it, email it, share it on a file server etc… and this means that they are a great way of distributing the definition of many objects (perhaps even an entire database) in a single file. Or, as Microsoft puts it, a self-contained unit of SQL Server database deployment that enables data-tier developers and database administrators to package SQL Server objects into a portable artifact called a DAC package, also known as a DACPAC. That in itself is, I think, very powerful.

    Ostensibly a dacpac is a binary file so you can’t just open it up in your favourite text editor and look at the contents of it. However, what many people do not know is that the format of a dacpac is simply the common ZIP compression format and hence we can add .zip to the end of a dacpac filename:

    image

    and open it up like you would any other .zip file to have a look inside. If you do so you will see this:

    image

    The contents of that zip file conform to something called the Open Packaging Convention (OPC). OPC is a standard defined by Microsoft for, well, for zipping up files basically. You have likely used files conforming to OPC before without knowing it, docx, .xlsx, .pptx are the most common ones that you might recognise if you use Microsoft Office and there are some more obscure ones such as .ispac (SSIS2012 developers should recognise that). (For a more complete list of OPC-compliant file types see the wikipedia page).

    Notice in the screenshot above showing the innards of TSQLCodeLibrary.dacpac the biggest file is model.xml. This is the file that contains the definition of all our SQL Server objects. I won’t screenshot that here but I encourage you to get hold of a .dacpac file (here’s one) and have a poke around to see what’s in that model.xml file.

    What are dacpacs for?

    Dacpacs are used for deploying SQL Server objects to an instance of SQL Server. That’s it. If your job does not ever involve doing that then you probably don’t need to read any further.

    Dacpac pre-requisites

    A .docx file (i.e. A Microsoft Word document) isn’t much use to someone if they don’t have the software (i.e. Microsoft Word) to make use of it and so the analogy holds for dacpacs; in order to use them you need to have some software installed and that software is called the Data-tier Application Framework (or DAC Framework for short, or DacFx for even shorter).

    Incidentally, you may be wondering what DAC stands for at this point. I think its “Data-Tier Application” in which case you may be thinking that the acronym DAC is a stupid one especially as DAC also stands for something else in SQL Server, I would agree!

    DacFx is available to download for free however you’ll probably never need to do that as installation of DacFX occurs whenever you install SQL Server, SQL Server client tools or SQL Server Data Tools (SSDT). If DacFX is installed you should be able to see it in Programs and Features:

    image

    How does one deploy a dacpac?

    In dacpac nomenclature the correct term for deploying a dacpac is publishing however the two generally get used interchangeably. There are two methods of publishing a dacpac which I’ll cover below.

    Publish via SSMS

    In SSMS’s Object Explorer right-click on the databases node and select “Deploy Data-tier Application…” (told you they used those terms interchangeably):

    image

    This launches a wizard that prompts you to choose a dacpac, fill in some particulars (e.g. database name) and then deploy it for you by calling out to DacFx. Unfortunately this wizard is not very good because it doesn’t (currently) support all features of dacpacs, namely if your dacpacs contain any sqlcmd variables (I won’t cover those here but they are commonly used within dacpacs) a value needs to be supplied for them; the wizard doesn’t prompt you for a value and hence the deployment fails.

    This. Is. Stupid. Microsoft should be suitably lambasted for not providing this basic functionality. Anyway, due to this limitation you’re most likely to be using the other method which is…

    Publish via command-line

    One component distributed in DacFx is a command-line tool called sqlpackage.exe which will quickly become your best friend if you use dacpacs a lot. sqlpackage.exe can do a lot of things and those “things” are referred to as actions, one of those actions is publishing a dacpac. Here’s the syntax for publishing a dacpac using sqlpackage.exe:

    "%ProgramFiles(x86)%\Microsoft SQL Server\110 \DAC\bin\SqlPackage.exe"
          /action:Publish
          /SourceFile:<path to your dacpac>
          /TargetServerName:<SQL instance you are deploying to>
          /TargetDatabaseName:<Name of either (a)the database to create or (b) the existing database to deploy into>

    Publishing is idempotent

    Notice from my comment above for TargetDatabaseName that you can deploy to an existing database. You might ask why you might want to publish into an existing database, after all, the objects you are publishing might already exist. This segues nicely into what I see as the biggest benefit of dacpacs and DacFx, the software interrogates the target database to determine whether or not the objects already exist or not and if they do not it will create them. If they do already exist it will determine whether the definition has changed or not and if it has, it will make those changes. DacFx will always protect your data so if it determines that an operation would cause data destruction (e.g. removing a column from a table) then it will (optionally) throw an error and fail. You never again need to write an ALTER statement or first check that an object exists in order to change an object definition, DacFx will do it for you. To put it another way, publishing using dacpacs and DacFx is idempotent.

    How does one create a dacpac?

    Of course in order to publish a dacpac you’re first going to have to create one and one of Jonathan’s questions above pertained to exactly this. There are two distinct ways to do create a dacpac.

    Use an SSDT Project

    SQL Server Data Tools (SSDT) projects are basically a project type within Visual Studio that provide a way of building DDL for SQL Server databases. I’m not going to cover SSDT projects in any detail here except to say that when such a project is built the output is a dacpac. Note that SSDT can also publish the dacpac for you however I didn’t mention that above as the publish operation is essentially another wrapper around the same DacFx functionality used by sqlpackage.exe.

    Create from an existing database

    One can right-click on a database in SSMS and click on “Extract Data-tier Application…” to create a dacpac containing the definition of all objects in that database:

    image

    Wrap-up

    Should you be using dacpacs? I can’t answer that question for you but hopefully what I’ve done is given you enough information so that you can answer it for yourself. Some people might like the way dacpacs encapsulate many objects into a single file and their idempotent deployment, others may prefer good old simple, handcrafted T-SQL scripts which don’t have any pre-requisites other than SQL Server itself. The choice is yours.

    Further reading

    UPDATE

    David Atkinson from Redgate has been in touch to tell me about another dacpac feature that I didn’t know about. It is possible to right-click on a dacpac in Windows Explorer and choose to unpack it:

    image

    That essentially unzips it but what you also get is a file called Model.sql that will create all of the objects in the dacpac:

    image

    Very useful indeed! David tells me that Redgate use this functionality to enable comparison of a dacpac using their SQL Compare tool as you can read about at Using a DACPAC as a data source.

  • First release of my own personal T-SQL Code Library on github

    Like many (most???) T-SQL developers I keep a stash of useful code that I’ve garnered down the years because I know its all going to come in useful at some point in the future. It includes code I’ve written myself and also code that others have shared on their own blogs. For example my code library includes the following:

    I’ve never seen the point of keeping one’s code library to one’s self, might as well share it in case anyone else might find it useful, so up to now I’ve kept my collection of scripts publicly available on SkyDrive (go see it if you like).

    That’s all fine and dandy but I figured this could be improved. SkyDrive is a file sharing site and whilst it includes a nice code viewer/editor it is not an ideal solution for storing code, code should be stored in a version control system (e.g. Git, TFS, Subversion, etc..). I opted to make my code library available on Github at https://github.com/jamiekt/TSQLCodeLibrary/ because it provides:

    • file version history
    • releases
    • ability for anyone else to fork my code library and build upon it to maintain their own code library
    • lots of tools necessary for modern code development

    and moreover all the cool kids seem to be using Github so I figured I’d give it a bash as well.

    The code library exists as a collection of views, functions and stored procedures in an SSDT project. I’m a massive fan of SSDT so there were many reasons for my choosing to do this but the overriding reason was that SSDT provides a single binary (i.e. a dacpac file) containing the entire code library that can be distributed as easily as emailing the file to someone. Deploying a dacpac is pretty simple and so is a great method for sharing T-SQL code.

    What’s in my T-SQL code library?

    In this first release, not much. There are only nine objects though I hasten to add that this is only a first release and I have a backlog of stuff that I need to add in there. One of the many advantages of using SSDT is that it makes it easy to add extended properties to describe the objects and the code library includes a view that surfaces all of that extended property information:

    SELECT schema_name,object_name,object_type,CodeLibraryDescription
    FROM jt.[vwCodeLibraryDescriptions]

    20140112TSQLCodeLibraryv0.1_objects

    How do you install the code library?

    Download the two binaries:

    • master.dacpac
    • TSQLCodeLibrary.dacpac

    and store them together in a folder. Open a command prompt at that folder and type:

    "%ProgramFiles(x86)%\Microsoft SQL Server\110\DAC\bin\SqlPackage.exe"
     /action:Publish
     /SourceFile:TSQLCodeLibrary.dacpac
     /TargetServerName:"<your_sql_instance>"
     /TargetDatabaseName: <prefered_database_name>

    (replacing <your_sql_instance> with the name of the SQL Server instance where you want to create the code library and <prefered_database_name> with whatever you want the database to be called. Get rid of the line feeds as well, they are just used here for clarity)

    This will create a SQL Server database containing my code library:


    20140112DeployedTSQLCodeLibraryv0.1

    If any of the code in my code library proves useful to you then that’s great however my wish here is that some of you other folk out there feel motivated to share your own code in a similar manner. If you do so please post a comment below and let me know.

    @Jamiet

  • Schema Compare or Publish [SSDT]

    Yesterday on Twitter Ryan Desmond asked Is there a good read for #SSDT regarding deploying changes via schema compare vs solution deployment?

    image

    I don’t know of any article that covers this so in this blog post I offer my opinion on the subject.

    First some background. When building databases offline using the database project type (.sqlproj) in SSDT you have two options for deploying the DDL code in your project into a physical database:

    • Schema Compare
    • Publishing

    Under the covers both do the same basic operation; build a dacpac from your project, compare it to the target database, build a script that will make the requisite changes to the target database, execute that script.

    Ryan was asking which of these one should use. I suggested that publishing was a better option and here are two reasons why:

    1. Publish will include your pre and post deployment scripts as well whereas Schema Compare will not. (And if your retort is that you cannot run those scripts more than once then you’re doing it wrong, rewrite them.)
    2. If the debug target for your project is configured correctly then a publish operation can be triggered simply by pressing F5. That’s massively more productive than the point-and-click nature of Schema Compare. Its even better if you have multiple SSDT projects in your solution as you can publish all of them with a single key stroke.

    Does anyone out there have a different opinion? Let me know in the comments.

    @Jamiet

  • Beware the ß

    I stumbled upon an interesting little nuance of SQL Server’s behaviour over the past few days that I was not aware of and I figure its worth talking about it here so others are aware. It concerns the handling of the german character “ß” which I understand to be german shorthand for “ss” (I’m no expert on language or linguists so don’t beat me up if I’m wrong about that).

    In short, two NVARCHAR values that differ only by one using “ß” and the other using “ss” will be treated as the same. This is easily demonstrated using the following code snippet:

    SELECT 'Oktoberstrasse'
    UNION
    SELECT
    'Oktoberstraße';
    SELECT N'Oktoberstrasse'
    UNION
    SELECT
    N'Oktoberstraße';

    which returns:

    image

    (Collation on my database is set to ‘SQL_Latin1_General_CP1_CI_AS’)

    Notice that casting the values as NVARCHAR (which is what the N'' notation does) causes SQL Server to treat them as the same. Obviously this is going to cause a problem if you need to treat those as distinct values (such as inserting into a column with a unique key upon it – which is the problem I encountered that caused me to stumble across this)

    There is a bug submission to Connect regarding this issue at 'ß' and 'ss' are NOT equal in which a member of the SQL Server team says:

    Our current behavior follows the SQL/ISO standard and unless those standards are updated with the latest changes we don't intend to change the behavior in SQL Server. Changing existing SQL Server behavior has lot of implications and today we rely on Windows for all of our windows collation sorting capabilities. If and when in the future Windows adopts these new rules / Unicode standard we will incorporate it in SQL Server.

    In other words, SQL Server is merely following the behaviour as defined by the International Standards Organisation so its not a bug in SQL Server as such, just a nuance that one needs to be aware of. And now you are.

    @Jamiet

  • Really useful Excel keyboard shortcuts

    I love me a good keyboard shortcut and there’s some I would like to use in Excel all the time if only I could darn well remember them.

    CTRL+ Spacebar   Select the entire column containing the currently selected cell(s)
    Shift + Spacebar   Select the entire row containing the currently selected cell(s)
    CTRL + ‘+’   Insert cells
    CTRL + ‘-’   Delete cells

    If you combine these you’ll find they become really powerful. For example, CTRL+Spacebar followed by CTRL + ‘+’ inserts a new column into a worksheet (which is what I wanted to do this morning when I began googling this).

    I figured the only way I will ever ingrain these into my muscle memory is if I write them down somewhere, nowhere better than on my blog.

    If you’re a fellow keyboard shortcut fetishist and want to share any obscure ones that you know of then I’m all ears, please put them in the comments below.

    @Jamiet

  • Linqpad – bring the power of LINQ to SQL Server development

    One of my biggest laments about being a SQL Server developer is that the tools provided by Microsoft to do our thang compare woefully to the feature rich, (largely) bug-free, “it-just-works” impression that one gets from seeing the tools available to “other” developers in the Microsoft ecosystem (I’m thinking of our appdev brethren here, the ones that use C#, Javascript, and all that jazz). I could write a long blog post complaining about this (and perhaps one day I will) but in this blog post I want to shine a light on a tool called Linqpad.

    Linqpad enables you to issues queries written in LINQ against a database and in that respect is an alternative to SQL Server Management Studio (SSMS). What is LINQ? It stands for Language Integrated Query and is a technology that Microsoft brought out a few years ago for writing queries inside .Net code. The great thing about Linqpad is that it enables one to write LINQ queries without having to know anything about .Net.

    In the screenshots below I show a simple query against a database but written two ways, one using T-SQL that anyone reading this blog post will likely know, and one using LINQ:

    SNAGHTML618728e4

    Some things to notice here. The two queries look very similar in that they contain the same keywords {SELECT, FROM}. Second thing to notice is that the FROM clause comes before the SELECT clause and if you know anything about the logical order of execution of a SELECT query you’ll realise that this intuitively makes sense. Lastly the table is called [dbo].[BulletinLine] but in the LINQ query its called [BulletinLines], its been pluralised (a convention that is common to .Net developers) and there’s no [dbo] prefix. Other than those things its intuitively clear that these two queries are doing exactly the same thing and its worth pointing out that under the covers the LINQ query is converted into a T-SQL query.

    So OK, if you accept that LINQ can do pretty much anything that a T-SQL SELECT query can do the next obvious question is “Why should I bother when T-SQL already does what I need?” The answer, in a word, is productivity. Or, to put it another way, intellisense works properly. Let’s say for example I want to select a subset of all the columns, intellisense comes to our aid:

    image

    One might well retort “well that works in SSMS as well” but in my experience intellisense in SSMS is, at best, flaky. In some circumstances it simply doesn’t work and the worst part of this is that its often inexplicable as to why. (In case you can’t tell, intellisense in SSMS drives me up the wall and I’m sure I’m not the only one.)

    Some other nice things about LINQ. Here’s the equivalent of a WHERE clause to filter on [BulletinId]=6:

    image

    If you don’t know LINQ then (in my opinion) its not intuitively obvious what’s going on here. What the above query is doing can effectively be described as:

    Take the collection of BulletinLines, filter in WHERE BulletinId equals 6

    Where this gets really powerful is the ability to stack these things up like so:

    image

    Take the collection of BulletinLines, filter in WHERE BulletinId equals 6. From the resultant collection filter in WHERE Colour=”White”

    If we only want the top 3:

    image

    Take the collection of BulletinLines, filter in WHERE BulletinId equals 6. From the resultant collection filter in WHERE Colour=”White”. From the resultant collection take the first 3

    I love how expressive this is and when you get fully conversant with LINQ its wonderfully intuitive too. If I haven’t convinced then, well, that’s OK. If you’re not convinced but do want to be convinced then check out Why LINQ beats SQL. The overriding argument there is “LINQ is in most cases a significantly more productive querying language than SQL”. Make sure you check out the section entitled “Associations” too which demonstrates how you can filter based on what is known about an entity in other tables, when you grok what’s going on there you’ll realise the power that LINQ brings.

    If you want to give this a go then go and download Linqpad now from http://www.linqpad.net/. Its free however some of the more advanced features (such as intellisense) only light up when you pay for the pro or premium edition which are priced at $39 & $58 respectively for a perpetual license, a bit more than the app generation are used to paying but hardly bank-breaking either.

    Are any other SQL developers out there using Linqpad? Please share your experiences in the comments below, I’d really like to read them.

    @Jamiet

More Posts Next page »

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement