THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Andy Leonard

Andy Leonard is a Data Philosopher at Enterprise Data & Analytics, an SSIS Trainer, Consultant, developer of the Data Integration Lifecycle Management (DILM) Suite, a Business Intelligence Markup Language (Biml) developer and BimlHero; SQL Server database and data warehouse developer, community mentor, engineer, and farmer. He is a co-author of SQL Server Integration Services Design Patterns, and the Stairway to Integration Services. Keep up with Andy, join his mailing list!

  • Please Blog

    My name is Andy and I’m a blogger.

    I encourage you to blog, too. Why? Because you know stuff. I need to learn some of the stuff you know. You’ve experienced stuff. I’d like to read your experiences – again, to learn from them. Others feel the same way.

    “I don’t have anything to say.”

    That’s simply not true. You have plenty to say. Maybe it’s hard to get started. I get that. So here’s a starter post for you: Write a blog post about starting a blog. If you want you can mention this post. It’s ok if you’d rather not. But write that post.

    “I don’t know where to go to get started.”

    Search Google or Bing or your favorite search engine for “how to start a blog” or just click one of those links (I’ve already saved the search). Why don’t I just tell you how to start a blog? You’re going to have to learn how to learn (if you don’t already know how to learn). To paraphrase a famous person, “blogger, teach thyself.”

    I only know how to start one blog (well, a couple, but still). I don’t know how to start your blog. You’ll have to figure that out on your own. I’ve given you one idea for one post (above).

    “All the good stuff has already been written.”

    Probably not. Think about this: If you were writing this very blog post instead of me, would you use the same grammar? Punctuation? Alliteration? Someone reading this post is experiencing the equivalent of nails on chalkboard because of my writing style. (I know you’re there. Thank you for reading anyway. I love you. I’m sorry.)

    Even if all the good stuff has been written, it hasn’t been written in your style, in your voice.

    You ran into that issue just the other day, didn’t you? The one you had to search to find the solution? That quirky error with the error message that might as well have been written in ancient Sumerian for all the help it provided? Write about that. If another blog post helped you, link to that blog in your post. Most bloggers appreciate the link.

    “I’m too busy.”

    Really? How much television do you watch per week?

    Peace,
    Andy

  • The Recording for the SSIS Catalog Compare Version 2 Launch Event is Available

    /The recording for the SSIS Catalog Compare Version 2 Launch Event is now available for viewing (registration required).

    Enjoy!

    :{>

    Learn More:

    SSIS Catalog Compare
    Join the DILM Suite Mailing List

  • Why Automate?

    Because, as Jen Underwood (jenunderwood.com | LinkedIn | @idigdata) states in an upcoming podcast: The future of data science is automation.

    If automation is the future, how do we decide what to automate? We look for the long pole. What’s the long pole in data science?

    DataWranglingQuoteUnicorn

    Data Integration is the Long Pole

    According to Lars Nielsen in his book Unicorns Among Us:

    “As intellectually stimulating as the work might be, there are also mundane aspects to the data scientist's job. In fact, there is pure drudge work that can take from 50 to 80 percent of a data scientist's time: This involves the collecting, cleaning and organizing of unruly, disparate, messy, unstructured data before it can be mined for the gold of actionable BI. Some call this aspect of the job ‘data wrangling.’" (emphasis mine)

    The long pole in data science is “data wrangling” or “data munging,” also known as data integration.

    “How Do We Automate Data Integration, Andy?”

    I’m glad you asked! My answer is:

    1. Eliminate redundancy.
    2. Automate execution.
    3. Practice DILM (Data Integration Lifecycle Management).

    First, eliminate the cut-and-paste drudge work from data integration development.

    What is the “cut-and-paste drudge work?” A significant amount of data integration is based on repeatable patterns. One can automate SSIS design patterns, such as truncate-and-load and incremental load, using Biml (Business Intelligence Markup Language). Hours, days, and weeks of work can be reduced to minutes by using Biml and Biml Frameworks, such as BimlFlex and the Biml Express Metadata Framework.

    Second, automate data integration execution.

    Data integration execution has long been grouped into processing jobs or batches. A best practice in SSIS development is to build small, unit-of-work packages that perform specific load or transformation operations. Why? Smaller packages are easier to develop, test, manage, and maintain. Unit-of-work packages promote code re-use, as some packages can be used in multiple batched operations.

    There’s no free lunch, and building unit-of-work packages presents a new issue: Now one has lots more SSIS packages to execute. An execution framework addresses this issue. Enterprise Data & Analytics offers a free execution framework, the SSIS Framework Community Edition.

    The SSIS Framework Community Edition includes a Parent SSIS package that executes packages collected into batches called “SSIS Applications.” Metadata about SSIS Applications is stored in tables integrated (although isolated by a custom schema) into the SSISDB database. Data integration professionals can configure metadata for three (or three hundred) packages in a single batch, and this batch can be executed by starting the Parent SSIS package and overriding a single parameter.

    Consider this quote in a CNN article from a senior official with Obama 2012 US re-election campaign:

    Obama2012DataIntegration

    Third, data integration needs enterprise lifecycle management, like all other software development. (Data integration development is software development, after all.)

    Repeatable, collaborative, and communicable processes form the heart of enterprise DevOps. Repeatable releases and source control for SSIS are no longer optional because they improve code quality and reduce downtime. Enterprises need at least three lifecycle “tiers” – Development, Test, and Production. Why? Development and Production environments are usually not in question; what about this third tier? It’s not important what this tier is called – or even if there are more tiers between Development and Production. This tier is important because it’s not Production and not Development.

    All software works in Development. Software is built in Development and the design-time defaults all point to Development resources. Enterprises do not want the first deployment of any software to be the Production deployment. Instead, a test deployment – to a different environment (not Development and not Production) – will assure all external parameters are properly configured and included in the deployment plan. A successful test deployment to an environment (lifecycle tier) that matches Production dramatically improves confidence that the Production deployment will succeed.

    When deploying SSIS to the SSIS Catalog, though, you need to be sure your test deployment tier is closely aligned to the Production environment. That can be… challenging, but SSIS Catalog Compare detects – and can script and deploy – differences between SSIS Catalogs residing in different Data Integration Lifecycle tiers. Catalog Compare generates scripts for externalized parameters – parameters that override the design-time defaults – by scripting SSIS Catalog Environments, Environment Variables, Project and Package References, and Reference Mappings.

    Conclusion

    Why is automation important? Automating data integration changes the dynamics for data science.

    Jen’s right. Lars is right. Automation is the future of data science and automating the long pole – data integration – is the place to begin. For SSIS automation, DILM Suite is a solution.

    What can we accomplish by automating data science? We can change the world.

    :{>

    Learn More:
    From Zero to Biml - 19-22 Jun 2017, London 
    IESSIS1: Immersion Event on Learning SQL Server Integration Services – April 2017, Chicago 
    IESSIS2: Immersion Event on Advanced Integration Services – Oct 2017, Chicago

    Tools:
    SSIS Framework Community Edition
    Biml Express Metadata Framework
    SSIS Catalog Compare
    DILM Suite

    Recordings and Posts:
    SSIS Lifecycle Management
    Advanced SSIS Execution
    SSIS Academy: Using the SSIS Catalog Day 1 - Create the Catalog and Deploy
    SSIS Academy: Using the SSIS Catalog Day 2 - Package Execution and Monitoring

    The Data Integration Lifecycle Management (DILM) Blog Series:
    An Example of Data Integration Lifecycle Management with SSIS, Part 0
    An Example of Data Integration Lifecycle Management with SSIS, Part 1
    An Example of Data Integration Lifecycle Management with SSIS, Part 2
    An Example of Data Integration Lifecycle Management with SSIS, Part 3
    An Example of Data Integration Lifecycle Management with SSIS, Part 4

  • Parsing SSIS Catalog Messages for Lookup Performance, v2

    A couple years ago I wrote Administering SSIS: Parsing SSIS Catalog Messages for Lookups. I’ve updated the script to return all Lookup Transformation performance messages captured by the SSIS Catalog with Basic logging enabled (the SSIS Catalog’s default Logging Level). Some smart data integration people recommend setting the SSIS Catalog Logging Level to Verbose and querying catalog schema views to retrieve row counts and performance metrics. I’m not a fan of running a Verbose logging level by default due to the overhead. Granted, the overhead is relatively light but it’s not nothing. Sometimes I need all the cycles!

    I prefer to passively gather performance metrics – even if it’s a little convoluted and more work (for me). I don’t often need these metrics immediately available, so I execute queries similar to the query below and store the performance metrics in a table. I can (and do) build dashboards to track SSIS performance (perhaps I should blog about them…) using this passively-collected performance metrics.

    Without further ado, the updated query:

    Use SSISDB
    go

    declare @LookupIdString varchar(100) = '% rows in the cache. The processing time was %'
    declare @LookupNameStartSearchString varchar(100) = '%:Information: The %'
    declare @LookupNameStartLen int = Len(@LookupNameStartSearchString) - 2
    declare @LookupNameEndSearchString varchar(100) = '% processed %'
    declare @LookupNameEndLen int = Len(@LookupNameEndSearchString) - 2
    declare @ProcessingTimeString varchar(100) = 'The processing time was '
    declare @ProcessingTimeSearchString varchar(100) = '%' + @ProcessingTimeString + '%'
    declare @CacheUsedString varchar(100) = 'The cache used '
    declare @CacheUsedSearchString varchar(100) = '%' + @CacheUsedString + '%'

    Select
    SubString(om.[message]
            , (PatIndex(@LookupNameStartSearchString, om.[message]) + @LookupNameStartLen)
            , (PatIndex(@LookupNameEndSearchString, om.[message]) - (PatIndex(@LookupNameStartSearchString, om.[message]) + @LookupNameStartLen))
              ) As LookUpName
    , Convert(bigint, Substring(om.[message]
                              , (PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen)
                              , ((CharIndex(' ', om.[message], PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen))
                                  -
                                 (PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen)
                                )
                               )
             ) As LookupRowsCount
    , Convert(decimal(16, 3), Substring(om.[message]
                                      , (PatIndex(@ProcessingTimeSearchString, om.[message]) + Len(@ProcessingTimeString) + 1)
                                      , ((CharIndex(' ', om.[message], PatIndex(@ProcessingTimeSearchString, om.[message]) + Len(@ProcessingTimeString) + 1))
                                         -
                                         (PatIndex(@ProcessingTimeSearchString, om.[message])+ Len(@ProcessingTimeString) + 1)
                                        )
                                       )
             ) As LookupProcessingSeconds
    , Convert(bigint, Substring(om.[message]
                             , (Patindex(@CacheUsedSearchString, om.[message]) + Len(@CacheUsedString) + 1)
                             , ((Charindex(' ', om.[message], Patindex(@CacheUsedSearchString, om.[message]) + Len(@CacheUsedString) + 1))
                                 -
                                (Patindex(@CacheUsedSearchString, om.[message]) + Len(@CacheUsedString) + 1))
                               )
             ) As LookupBytesUsed
    , Convert(decimal(16, 3), (Convert(bigint, Substring(om.[message]
                                                       , (PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen)
                                                       , ((CharIndex(' ', om.[message], PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen))
                                                          -
                                                          (PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen))
                                                        )
                                      )
                                     /
                                      (Convert(decimal(16, 3), Substring(om.[message]
                                                                       , (PatIndex(@ProcessingTimeSearchString, om.[message]) + Len(@ProcessingTimeString) + 1)
                                                                       , ((CharIndex(' ', om.[message], PatIndex(@ProcessingTimeSearchString, om.[message]) + Len(@ProcessingTimeString) + 1))
                                                                          -
                                                                          (PatIndex(@ProcessingTimeSearchString, om.[message])+ Len(@ProcessingTimeString) + 1 ))
                                                                        )
                                               )
                                      )
                               )
            ) As LookupRowsThroughput
    , ex.start_time As ExecutionStartTime
    , ex.folder_name As FolderName
    , ex.project_name As ProjectName
    , ex.package_name As PackageName
    --Into rpt.LookupMetrics -- drop table rpt.LookupMetrics
    From [catalog].operation_messages om
    Join [catalog].executions ex
      On ex.execution_id = om.operation_id
    Where om.message_source_type = 60 -- Data Flow Task
      And om.[message] Like @LookupIdString
      -- data type-checking
      And IsNumeric(Substring(om.[message], (PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen ), ((CharIndex(' ', om.[message], PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen)) - (PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen)))) = 1 -- Check LookupRowsCount
      And IsNumeric(Substring(om.[message], (PatIndex(@ProcessingTimeSearchString, om.[message]) + Len(@ProcessingTimeString) + 1 ), ((CharIndex(' ', om.[message], PatIndex(@ProcessingTimeSearchString, om.[message]) + Len(@ProcessingTimeString) + 1)) - (PatIndex(@ProcessingTimeSearchString, om.[message])+ Len(@ProcessingTimeString) + 1 )))) = 1 -- Check LookupProcessingSeconds
      And IsNumeric(Substring(om.[message], (Patindex(@CacheUsedSearchString, om.[message]) + Len(@CacheUsedString) + 1 ), ((Charindex(' ', om.[message], Patindex(@CacheUsedSearchString, om.[message]) + Len(@CacheUsedString) + 1)) - (Patindex(@CacheUsedSearchString, om.[message]) + Len(@CacheUsedString) + 1)))) = 1 -- Check LookupBytesUsed
      -- catch 0-valued denominator rows
      And Convert(bigint, Substring(om.[message], (PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen ), ((CharIndex(' ', om.[message], PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen)) - (PatIndex(@LookupNameEndSearchString, om.[message]) + @LookupNameEndLen)))) <> 0 -- Check LookupRowsCount
    Order By operation_id DESC

    :{>

    Related Training:
    IESSIS1: Immersion Event on Learning SQL Server Integration Services – April 2017, Chicago 
    From Zero to Biml - 19-22 Jun 2017, London
    IESSIS2: Immersion Event on Advanced Integration Services – Oct 2017, Chicago

    Learn More:
    SSIS Design Patterns and Biml: A Day of Intelligent Data Integration – Boston SQL Saturday precon, 24 Feb 2017
    Save Time and Improve SSIS Quality with Biml
    An Example of Data Integration Lifecycle Management with SSIS, Part 4
    The Recordings for SSIS Academy: Using the SSIS Catalog are Available
    SSIS Catalog Compare v2.0 Launch Event 7 Mar 2017!

  • Use Amazon’s Alexa to Create a Copy of a Database Using Redgate SQL Clone…

    CMYK_SQL Clone col 1

    …and win an Amazon Echo Dot and one lucky winner will score a free SQL Clone license!

    About SQL Clone

    SQL Clone

    I can hear you thinking, “What’s SQL Clone, Andy?” I’m glad you asked!

    SQL Clone is a database provisioning tool that removes much of the time and disk space needed to create multiple database copies. With SQL Clone, database copies can be created in seconds and each copy uses around 40MB of disk space, no matter the size of the original database. Provisioning requests can be dealt with very quickly, and teams can work locally on up-to-date, isolated copies of the database to speed up development, accurately test code, and fix issues faster.

    About the Amazon Echo Dot

    I can hear you thinking, “So how do I win an Amazon Echo Dot, Andy?” I’m glad you asked!

    Amazon’s Alexa Voice Service is the engine behind the Amazon Echo, Tap, and Dot devices. The talented developers at Redgate built an Alexa Skill – a capability that enables interaction between people and devices and services – to facilitate voice-activated database cloning using SQL Clone. You can view a cool video of SQL Clone in action via Alexa here!

    “How Do I Enter to Win?”

    To enter the competition, post a comment stating what “skill” you would build with an Echo Dot (if time and money were no object). Maybe you’d like to know when a backup fails or how many deployments have been made to a database. Or maybe you would like to a reminder to record your favorite TV show! It doesn’t necessarily have to be database-related.

    The contest ends 30 April. Winners will be announced 5 May on the Redgate blog. Please check the Redgate blog on May 5 to see if you've won.  See the full contest terms and conditions are here.

    So have at it! Leave a comment to enter to win a free Amazon Echo Dot. Redgate is giving away five Dots! And one lucky winner will receive a SQL Clone license!

    :{>

    PS – Learn how Redgate created an Alexa skill for SQL Clone.

  • Presenting Use Biml to Automate SSIS Design Patterns at SQL Saturday–Richmond!

  • Presenting Use Biml to Automate SSIS Design Patterns at SQL Saturday–Raleigh!

  • Creating SSIS Packages with the SQL Server Import and Export Wizard

    This material was originally posted on the Linchpin People blog.

    In this post, I demonstrate how to use the SQL Server 2012 Import and Export Wizard to create a SQL Server 2012 Integration Services (SSIS 2012) package. The demo is created on a virtual machine running Windows Server 2012.

    You can use the SQL Server Import and Export Wizard to learn more about SQL Server Integration Services. Often, one will reach the “Save and Run Package” just before executing the Wizard. The “Run immediately” checkbox is checked and the “Save SSIS Package” checkbox is unchecked by default. If you check the “Save SSIS Package” checkbox, you can select to store the SSIS package in the MSDB database (the “SQL Server” option) or File system. You may also select the Package Protection Level:

    ImportExportWizard1

    If you select these options, the next step is selecting the target location of the package:

    ImportExportWizard2

    The next page of the Import and Export Wizard displays a summary of the selected options:

    ImportExportWizard3

    Clicking the Finish button creates, saves, and executes the SSIS package as configured, and the results are displayed:

    ImportExportWizard4

    Cool. Now what?

    Now you have an SSIS package saved in the file system. In this case, the file is named ImportExportWizard.dtsx. It can be opened in SQL Server Data Tools by right-clicking the file and selecting “Edit”:

    ImportExportWizard5

    Windows Server 2012 prompts for which program to use to perform the Edit. I select Microsoft Visual Studio Version Selector:

    ImportExportWizard6

    SQL Server 2012 Data Tools uses the Visual Studio 2010 Integration Development Environment (IDE) at the time of this writing. Note the “Run” (Play) button is disabled on the toolbar:

    ImportExportWizard7

    We have an SSIS package created with the Import and Export Wizard, but we cannot re-execute it.

    Bummer. But all is not lost.

    Visual Studio needs a debugger to execute packages in debug mode. When we open an SSIS (*.dtsx) file all by itself, Visual Studio doesn’t load a debugger. To have Visual Studio load the SSIS debugger, we can create a Solution containing a Project. Begin by clicking File, New, Project:

    ImportExportWizard8

    Name the new solution and project, and select a folder:

    ImportExportWizard9

    Once the solution and project are created, right-click the SSIS Packages virtual folder in Solution Explorer, and select “Add Existing Package”:

    ImportExportWizard10

    Navigate to the location where you stored the SSIS package in the final steps of the Import and Export Wizard:

    ImportExportWizard11

    When you click the OK button, the SSIS package is copied into the project folder and added to the new project:

    ImportExportWizard12

    When you open the SSIS package in the context of a Visual Studio project and solution, the Debug button is enabled on the toolbar and the SSIS package can be executed in debug mode:

    ImportExportWizard13

    You can use your knowledge of the Import and Export Wizard to learn more about SSIS package development. Happy Integrating!

    :{>

  • Long Poles and Critics

    OldNewThingCoverAs a consultant, I get calls to complete projects started by someone else or extend projects completed by someone else. When I look at someone else's work it's sometimes tempting to say, "Wow - they did that wrong." But I don't. Instead I say, "I'm not sure why they built it this way." That may sound back-handed but I make sure it's not by asking follow-up questions. Which questions? My favorite is, "What was the problem they were trying to solve?" It’s entirely possible my predecessor delivered precisely what he was asked to deliver. Plus software projects evolve (especially elegant software projects). If software solves a problem it's common for "new" problems to come into focus.

    We speak about this in terms like "long pole," the subject of this article by Raymond Chen, gifted thinker and author of The Old New Thing (which, I am embarrassed to admit, just found it’s way onto my Kindle). If I'm taking down a tent, I may decide to take down the tallest (longest) pole first. That makes the tent noticeably shorter and provides clear evidence to onlookers that I'm doing my job (aka “highly visible”). But then, *another* pole becomes the long pole. And the cycle repeats.

    Some things to keep in mind before criticizing:

    1. Delivering software is a collaboration between the service provider and the customer. It's never 50/50%. There's always imbalance even if it's as little as 49/51% – and this applies to success as well as failure. If you criticize – especially early-on – you may be speaking to the person on the 51% side of a failure. You may be unwittingly feeding the beast with all new criticisms, which leads to my next consideration...
    2. If I criticize the work of others, I am priming my customer to criticize the next bit of work she sees. Who's on deck? Me and my work.
    3. “But what if the person before me did something truly horrible, Andy?” That’s a fair question and I have a question for you, “Are you aware of 100% of the factors that went into the decisions made by your predecessor?” I’m certain the answer is “no.” Are you aware of 50%? Again, no. At best, you’re speaking to one side of an incomplete project. You will most likely have no opportunity to speak with your predecessor and the person with whom you are speaking is not going to tell you all of their side. You’re not going to get even half of the story! Now, your predecessor could have delivered something dangerous, illegal, insecure, of poor quality, not up to modern standards and best practices, or merely a solution of which you do not approve. They could well and truly be 100% to blame. Your customer may indicate that they want you to disparage the work of your predecessor. I advise you to resist the temptation to do so. Again, my advice is to fall back to “I don’t understand why they built it this way,” or (perhaps), “Based on what I’ve heard today, I would have designed this differently,” or, “I would have delivered the solution so that ______.” Why? Because you don’t know the other side of the story.

    Maybe I’m wrong about this. Maybe you’ve built a reputation as an expert by disparaging the work of others thinking that you will get all of the work and everyone will think you’re a rock star. Maybe. Or maybe John Nash and I are right about coopetition, and there’s more work out there than you can handle alone and that you have, unwittingly, introduced errors into your deliverables and primed your customers to criticize the mistakes of consultants.  (Even you.)

    Time will tell.

    Peace.

    :{>

    Learn More:
    SSIS Design Patterns and Biml: A Day of Intelligent Data Integration – Boston SQL Saturday precon, 24 Feb 2017
    Save Time and Improve SSIS Quality with Biml
    An Example of Data Integration Lifecycle Management with SSIS, Part 4
    The Recordings for SSIS Academy: Using the SSIS Catalog are Available
    SSIS Catalog Compare v2.0 Launch Event 7 Mar 2017!

    Related Training:
    IESSIS1: Immersion Event on Learning SQL Server Integration Services – April 2017, Chicago 
    IESSIS2: Immersion Event on Advanced Integration Services – Oct 2017, Chicago
    From Zero to Biml - 19-22 Jun 2017, London

  • The Last SSIS Book You Read

    SSISDesignPatternsCoverEvery now and then I get an email or meet someone at an event and they say something like, “I learned SSIS from your book!” I’m not going to lie to you, that makes me feel pretty good. Since I’ve written most of my books as part of a team, I usually reply, “It was a team effort and our team was a bunch of really smart people… plus me!”

    These folks tell me they read other books about SSIS (SQL Server Integration Services) but they didn’t really grok SSIS until they read the book authored by the team and me. I suppose that could mean the team and I are awesome. I believe that’s mostly true, but I am definitely biased…

    Here’s what I believe is really happening, though. I think these folks – and many others I’ve never met – learned more than they realized from those other books. I think our book was the last or latest SSIS book they read. I think the other books exposed these readers to complex data integration concepts, perhaps for the very first time. I still remember learning data integration and it was painful for me – and I believe I have a knack for it!

    I think our book is merely the book they read after the others. I’m willing to bet other folks have read our books first, then read books by other authors, and told those authors the same thing. I’m cool with that. As an author, I’m happy you’re reading books about the topic.

    :{>

    Learn More:
    SSIS Design Patterns and Biml: A Day of Intelligent Data Integration – Boston SQL Saturday precon, 24 Feb 2017
    Save Time and Improve SSIS Quality with Biml
    An Example of Data Integration Lifecycle Management with SSIS, Part 4
    The Recordings for SSIS Academy: Using the SSIS Catalog are Available
    SSIS Catalog Compare v2.0 Launch Event 7 Mar 2017!

    Related Training:
    IESSIS1: Immersion Event on Learning SQL Server Integration Services – April 2017, Chicago 
    IESSIS2: Immersion Event on Advanced Integration Services – Oct 2017, Chicago
    From Zero to Biml - 19-22 Jun 2017, London

  • Intelligent Data Integration, SSIS Design Patterns, and Biml

    On Thursday, 15 Jun 2017, I have the privilege and honor of delivering a full-day precon titled SSIS Design Patterns and Biml: A Day of Intelligent Data Integration as part of SQL Saturday Dublin. If you’re interested in learning more or attending, you can learn more here.

    “Isn’t This The Same Presentation You Delivered Before, Andy?”

    Yes and no. It has the same title but…

    I’ve focused on Biml presentations for the past two years. Over the past eighteen months I’ve built the DILM Suite. These facts intersect: My goal is to facilitate DevOps and Continuous Integration (CI) with SQL Server Integration Services (SSIS) and Biml plays an important role; namely, automated SSIS code generation. The DILM Suite development work has impacted my webinars and presentations – especially this precon. I delivered SSIS Design Patterns and Biml: A Day of Intelligent Data Integration once before, over a year ago in Atlanta. Since then I’ve delivered modules of that presentation in Enterprise Data & Analytics webinars. With each delivery the DILM Suite development work has informed and inspired changes to the content of the modules; the content has evolved and the 15 Jun delivery will be different.

    This evolution-of-content has happened to many of my Biml presentations. In some cases the updates are such that today’s version of the presentation is a radical departure from the first delivery. Why? I’m constantly learning. Writing the DILM Suite has intensified my learning. As I’ve shepherded this vision and watched it come to fruition, I’ve discovered new possibilities and more use cases.

    “Mini-Cycles”

    I catch a glimpse of what’s possible and develop until it’s reality. As I develop, the glimpse becomes more defined and I add and refine features in response. This “mini-cycle” continues until I reach a good stopping point with a solution, product, or utility. By then I’ve caught a glimpse of a solution to another problem and begin developing a different solution… and the mini-cycle repeats for this other solution, product, or utility.

    With DILM Suite, I catch a glimpse of a Euler diagram (I think visually, in graphs) showing how two or more of the solutions, products, and utilities work together to facilitate more complex DevOps and SSIS CI scenarios. This started in early 2016 around the time I began releasing a handful of free utilities. There will be more free utilities (one within the next 30 days) but the list of free DILM Suite stuff at the time of this writing is:

    The blog post titled An Example of Data Integration Lifecycle Management with SSIS, Part 4 provides a glimpse of how one might use four of these free tools together (everything except the Biml Express Metadata Framework – which hadn’t been released at that time). Today, at the time of this writing, that glimpse is my latest “pinnacle.” The Euler’s in my mind, though, are already two pinnacles beyond that – and working on a 3rd. It’s likely the 15 Jun delivery of the SSIS Design Patterns and Biml: A Day of Intelligent Data Integration precon will contain material beyond these five free tools.

    The delivery after 15 Jun will likely contain still more material. I’m continuously integrating my Continuous Integration and DevOps-for-SSIS thoughts, and then building tools and designing best practices and patterns to support the latest version of my vision.

    I don’t expect to stop.

    Ever.

    “Is the SSIS Design Patterns and Biml: A Day of Intelligent Data Integration Precon Just One Big Commercial for the DILM Suite, Andy?”

    Goodness no.

    In the first part I’m going to share everything I know about using what’s-in-the-box to deliver enterprise-class data integration with SSIS – some of which Kent Bradshaw and I covered in the 3-part webinar series titled SSIS Academy: Using the SSIS Catalog (we stayed “in the box” for these three webinars). In the second part I’ll point out some gaps in the OOTB solutions and demonstrate some ways to close them. Examples of some (not all) solutions are free DILM Suite tools.

    Conclusion

    I hope to see you at SQL Saturday Dublin 17 Jun! If you’re interested in learning more about DevOps and Data Integration Lifecycle Management, I also hope to see you at the SSIS Design Patterns and Biml: A Day of Intelligent Data Integration precon.

    You might enjoy engaging Enterprise Data & Analytics consultants because we like helping teams learn more about DevOps and CI for SSIS.

    :{>

    Learn More:
    An Example of Data Integration Lifecycle Management with SSIS, Part 4
    The Recordings for SSIS Academy: Using the SSIS Catalog are Available
    Save Time and Improve SSIS Quality with Biml
    SSIS Catalog Compare v2.0 Launch Event 7 Mar 2017!

    Related Training:
    From Zero to Biml - 19-22 Jun 2017, London
    IESSIS1: Immersion Event on Learning SQL Server Integration Services – April 2017, Chicago 
    IESSIS2: Immersion Event on Advanced Integration Services – Oct 2017, Chicago

  • On DevOps…

    DevOps is getting a lot of attention in SQL Server and Developer communities these days. Two friends have blogged about the outage and accompanying data loss experienced by GitLab:

    I’m sure there are other posts, these are the two I read recently. Like Brent and Mike, my first impression on hearing about the incident was: I admire their transparency. It’s on-going. GitLab posted about the incident and they were, in my opinion, incredibly honest with themselves. That’s integrity. And it’s worth its weight in californium.

    Brent’s post translated the episode into SQL Server-speak, which is awesome. Mike’s post, also awesome, used the GitLab incident to reinforce some DevOps best practices.

    Mike emphasizes the balance needed between the Dev side of DevOps and the Ops side. I’m going to kick that up a notch.

    If you are not balancing Development and Operations efforts, you are not practicing DevOps.

    What is DevOps?

    I (and likely Brent and Mike) have seen a lot of organizations struggle with this question. I’ve helped companies implement DevOps (Enterprise Data & Analytics helps organizations implement DevOps – learn more here). In my experience, companies that struggle with DevOps most often misunderstand the difference between DevOps and Agile methodologies. When well-intentioned, good-hearted, intelligent people confuse DevOps and Agile, they try to implement something like Scrum in Operations. In my humble opinion and based on experience, this either fails outright or morphs (in order to survive) into something other than Agile Operations. (Note: One can label their operations activities “Agile.” That doesn’t mean they’re practicing agile.)

    Operations is its own practice. Even in DevOps. Can we use some of the same tools and best practices designed for managing and monitoring software development in operations? Goodness yes. But that doesn’t make Operations agile.

    “What Does DevOps Mean in Operations, Then, Andy?”

    Automation. Scripted and documented, idempotent (re-executable without harming state), repeatable automation.

    Documentation

    DevOps documentation includes but is not limited to vendor and internal team documentation, wikis, knowledge bases, how-to guides, and run books. Run books may should include practices, policies, and procedures. Additional Operations documentation will cover Outage Response Management which will include on-call schedules and escalation policy and procedures.

    Most Operations teams use Service Desk ticketing software such as Jira Service Desk or ZenDesk. This too is part of Operations documentation.

    Automation

    Creating scripts and/or using tools to manage previously-manual or menial tasks – that’s automation. Automation impacts both Development and Operations in the DevOps enterprise. Regarding data-related technology, tools like Business Intelligence Markup Language (Biml) save SSIS package development time and improve quality. For managing SSIS, DevOps enterprises should take a look at the DILM Suite – a collection of tools and utilities that surface SSIS-related configurations, generate scripts, and automate deployments between Development, Test, QA, and Production instances of SSIS Catalogs.

    The Cost

    Creating and maintaining this level of Operations documentation costs time and money. It impacts time to market. But it is as necessary as unit testing software.

    I’m a huge fan of Agile methodologies. I can produce witnesses – ask anyone who’s worked with me. I’ve led many Agile Data Warehouse and Agile Business Intelligence projects. I like Agile because it’s faster and cleaner than waterfall methodologies. I like Agile because it places decision-making power for development where it belongs – in the hands of the software developers (a concept Agile borrows from Kanban, the Japanese just-in-time scheduling system).

    Creating and maintaining much of the Operations documentation is the responsibility of the development team. Your Scrum Board (or software development project plan) needs a marker for this documentation. More importantly, the project needs to be given the time to create (or update) this documentation. Granted, not all documentation needs to be complete before the software goes live. Run books describing execution, monitoring, and Operational responses, though? That information is vital at go-live.

    How much does it cost to build this level of documentation? It depends.

    One thing’s for sure: It’s cheaper than a single catastrophic failure.

    :{>

    You might like working with Enterprise Data & Analytics because we grok DevOps and SSIS.

    Learn More:
    Broken References in the SSIS Catalog
    SSIS Academy: Using the SSIS Catalog Day 1 - Create the Catalog and Deploy
    SSIS Academy: Using the SSIS Catalog Day 2 - Package Execution and Monitoring
    SSIS Academy: Using the SSIS Catalog Day 3 - SSIS Configuration
    SSIS Catalog Management
    SSIS Lifecycle Management

    Related Training:
    IESSIS1: Immersion Event on Learning SQL Server Integration Services – April 2017, Chicago 
    IESSIS2: Immersion Event on Advanced Integration Services – Oct 2017, Chicago
    From Zero to Biml - 19-22 Jun 2017, London

  • The Recording for Save Time and Improve SSIS Quality with Biml is Available

  • BimlExpress Metadata Framework

    I’m not good at naming things. “Biml Express Metadata Framework” is descriptive but the acronym – BEMF – is difficult to pronounce (for me, at least). Kent Bradshaw wants to name it George. We’ll keep working on a name…

    What is the BimlExpress Metadata Framework?

    The Biml Express Metadata Framework uses Business Intelligence Markup Language (Biml) to:

    • Read metadata stored in a SQL Server database.
    • Build SSIS projects that create the target database, schemas, and tables.
    • Build an SSIS project that contains one SSIS package per table, plus a Controller package that executes each table-package.
    bemf_0

    I demonstrated this solution in the webinar Save Time and Improve SSIS Quality with Biml. It’s free; both the webinar recording and the framework (although registration is required to view the webinar recording). It’s the latest addition to the DILM Suite. The zip file contains a backup of the AdventureWorks2014 database, BimlMetadata database, and the SQL Server Data Tools (SSDT) solution BimlExpressMetadataFramework.  You may download the zip file here.

    Once open, the SSDT solution appears as shown here:

    bemf_1

    To begin, please open the text file named README_BimlExpressMetadataFramework.txt found in the Miscellaneous virtual folder:

    bemf_2

    This file will contain the latest execution notes for the solution:

    bemf_3

    Generating the Build Destination Database SSIS Package

    To generate the SSIS Package named 10_Build_Destination_Databases.dtsx multi-select the Biml files 0_00_BuildConnections.biml and 0_10_BuildDestinationDatabaseMain.biml. Right-click the selected files and click Generate SSIS Packages:

    bemf_4

    If there are no errors, the Project Connection Manager named __master__AdventureWorks2014_Stage.conmgr and the SSIS Package named 10_Build_Destination_Databases.dtsx are generated:

    bemf_5

    Generating the Build Destination Schemas SSIS Package

    To generate the SSIS Package named 20_Build_Destination_Schemas.dtsx multi-select the Biml files 0_00_BuildConnections.biml and 0_20_BuildDestinationSchemasMain.biml. Right-click the selected files and click Generate SSIS Packages:

    bemf_6

    If there are no errors, the Project Connection Manager named AdventureWorks2014_Stage.conmgr and the SSIS Package named 20_Build_Destination_Schemas.dtsx are generated:

    bemf_7

    At any time during this process, you may see a dialog similar to the one shown here:

    bemf_9

    If you see this dialog, always Select All and click the Commit button.

    You may also see this dialog:

    bemf_10

    If you see this dialog, always click the Yes to All button.

    Why? The BimlExpress Metadata Framework was not originally developed for BimlExpress, the free SSIS package code-generation tool from Varigence. Originally, this framework was built in Mist (Biml Studio) and ported to BimlExpress. One unfortunate result of the port was some SSIS artifacts are recreated during the process (note: this does not happen in the Commercial version of the Biml Framework, currently sold as an implemented solution available from Enterprise Data & Analytics).

    Generating the Build Destination Tables SSIS Package

    To generate the SSIS Package named 30_Build_Destination_Tables.dtsx multi-select the Biml files 0_00_BuildConnections.biml and 0_30_BuildDestinationTablesMain.biml. Right-click the selected files and click Generate SSIS Packages:

    bemf_8

    If there are no errors, the SSIS Package named 30_Build_Destination_Tables.dtsx is generated:

    bemf_12

    Executing the Build SSIS Packages

    Open and execute the SSIS Package named 10_Build_Destination_Databases.dtsx:

    bemf_11

    If the database already exists, the SSIS Package named 10_Build_Destination_Databases.dtsx will succeed and do nothing.

    Open and execute the SSIS Package named 20_Build_Destination_Schemas.dtsx:

    bemf_13

    If the schemas already exist, the SSIS Package named 20_Build_Destination_Schemas.dtsx will succeed and do nothing.

    Open and execute the SSIS Package named 30_Build_Destination_Tables.dtsx:

    bemf_14

    There’s an Execute SQL Task for each table. The T-SQL statement in each Execute SQL Task drops the table if it exists and then creates the table. Once created, the database, schemas, and tables appear in SQL Server Management Studio (SSMS) Object Explorer:

    bemf_15

    Delete the existing SSIS artifacts. This is another side-effect of porting this framework from BimlStudio to BimlExpress. Before proceeding, we need to delete the existing Project Connection Managers and SSIS Packages:

    bemf_16

    You will need to confirm your intention to delete these artifacts:

    bemf_17

    Generating the SSIS Project That Contains the Loader SSIS Packages

    To generate the SSIS packages that load the AdventureWorks2014_Staging database, multi-select the following Biml files:

    • 0_00_BuildConnections.biml
    • 0_10_BuildDestinationDatabaseMain.biml
    • 0_20_BuildDestinationSchemasMain.biml
    • 0_30_BuildDestinationTablesMain.biml
    • 1_99_Main.biml

    Right-click the selected files and click Generate SSIS Packages:

    bemf_18

    The SSIS loader packages are generated.

    bemf_19

    A Controller SSIS package is also generated, named 99_Execute_AdventureWorks2014_Stage_Loader_Staging_Packages.dtsx:

    bemf_20

    The Controller package uses Execute SSIS Package Tasks to call the SSIS loader packages.

    Conclusion

    The BimlExpress Metadata Framework may not have the coolest name but it’s free and performs a lot of work in a short amount of time.

    :{>

    Learn More:
    Save Time and Improve SSIS Quality with Biml
    SSIS Catalog Compare v2.0 Launch Event 7 Mar 2017!
    The Basics of Biml – Populating the Biml Relational Hierarchy

    Related Training:
    IESSIS1: Immersion Event on Learning SQL Server Integration Services – April 2017, Chicago 
    IESSIS2: Immersion Event on Advanced Integration Services – Oct 2017, Chicago
    From Zero to Biml - 19-22 Jun 2017, London

  • An Example of Data Integration Lifecycle Management with SSIS, Part 4

    In this post I will demonstrate advanced options for executing SSIS packages in the SSIS Catalog. I’ll also demonstrate SSIS Catalog Browser, SSIS Framework Community Edition, Framework Browser, and Catalog Reports – free utilities and solutions from DILM Suite that may help as you implement your enterprise Data Integration Lifecycle Management (DILM).

    The Problem I am Trying to Solve

    I am staging data from flat files using SSIS packages in two SSIS Projects. Here are the projects containing the packages as viewed in SSIS Catalog Browser:

    dilm4_0a

    I want to execute these packages from the SSIS Catalog in the following order:

    1. LoadWidgetsFlatFile.dtsx
    2. ArchiveFile.dtsx (to archive the Widgets flat file)
    3. LoadSalesFlatFile.dtsx
    4. ArchiveFile.dtsx (to archive the Sales flat file)

    I’ve followed an SSIS development best practice and developed several SSIS projects containing function-focused, unit-of-work SSIS packages. I cannot use the Execute Package Task because it only allows me to execute SSIS packages contained within the same SSIS project:

    dilm4_1

    What to do? Well, I could go all custom task on you…

    dilm4_2

    … but I am getting way ahead of myself. I’ll write more later about the ALC Execute Catalog Package Task. Promise.

    Three (Well, Four) SQLAgent Job Steps

    I can create a SQLAgent job with three four job steps (I want to execute the ArchiveFile.dtsx package twice, once per flat file):

    dilm4_3

    One may schedule a job step for each SSIS package one wishes to execute. I’ve seen SQLAgent jobs that execute other SQLAgent jobs which in turn execute SSIS packages from the SSIS Catalog. There’s nothing wrong with this approach and there are benefits to using SQLAgent as an SSIS workflow manager.

    “But…” (you saw this “but” coming, didn’t you?)

    While I like using SQLAgent as a scheduler, I prefer to only use it as a scheduler. I prefer to use (or build) a data integration workflow management solution to manage data integration workflow.

    SSIS Framework Community Edition

    Kent Bradshaw and I work together at Enterprise Data & Analytics. We built the SSIS Framework to manage data integration workflow. Our frameworks are available in three flavors: Community Edition, Commercial Edition, and Enterprise Edition. The Community Edition is not only free, it’s open source! The documentation (SSIS Framework Community Edition Documentation and User Guide.docx) is a primer on designing your very own SSIS Framework.

    Like all editions of our frameworks, SSIS Framework Community Edition is integrated into the SSIS Catalog. Why? We want our frameworks to participate in your Data Integration Lifecycle, and we believe the SSIS Catalog is a vital component of enterprise DILM (Data Integration Lifecycle Management).

    Many enterprise data integration platforms include built-in framework functionality. Why? Because framework functionality is necessary for enterprise DILM (Data Integration Lifecycle Management). The SSIS Catalog is a framework but it lacks some of the functionality found in competing data integration platforms. You can implement the missing functionality in SSIS. The SSIS Framework Community Edition implements some of the missing functionality.

    How can SSIS Framework Community Edition help?

    It may help to understand some fundamentals of our framework first. Our Framework contains three entities: Applications, Packages, and Application Packages. The core object is the Application Package. The metadata contained in the custom.ApplicationPackages table isn’t terribly exciting:

    dilm4_4

    An Application Package represents a Framework Package that will execute as part of a Framework Application, as you can glean from the metadata in the screenshot above. The last four Application Packages shown represent four packages that execute as part of Application ID 2. The Package IDs for these four packages are 4, 5, and 6. “But wait, Andy. You wrote four packages. Why are there only three PackageID values?” That is an excellent question. The answer is: PackageID 6 appears twice.

    The Framework Package table is straightforward and simple. It contains metadata used to identify the Package in the SSIS Catalog. All we need is the Catalog “path” to the SSIS Package. The Catalog path is a four-part hierarchy: Catalog\Folder\Project\Package. For now, there can be only one SSIS Catalog per instance of SQL Server and it is always named “SSISDB.” So we don’t store that value in the SSIS Framework Community Edition:

    dilm4_5

    The last three packages listed are PackageIDs 4, 5, and 6: LoadWidgetsFlatFile.dtsx, LoadSalesFlatFile.dtsx, and ArchiveFile.dtsx. These are the packages that are part of our Framework Application. You can think of a Framework Application as a collection of Framework Packages configured to execute in a specified order:

    dilm4_6

    The “Stage EDW Data” Framework Application is identified by ApplicationID 2. If you recall, ApplicationID 2 is mapped to PackageIDs 4, 5, and 6 (LoadWidgetsFlatFile.dtsx, LoadSalesFlatFile.dtsx, and ArchiveFile.dtsx) in the ApplicationPackages table shown above.

    The cardinality between Framework Applications and Framework Packages is many-to-many. We see an Application can contain many Packages. The less-obvious part of the relationship is represented in this example: a single Package can participate in multiple Applications or even in the same Application more than once. Hence the need for a table that resolves this many-to-many relationship. I hope this helps explain why Application Package is the core object of our SSIS Frameworks.

    The Parent.dtsx SSIS Package

    In SSIS Framework Community Edition (and the other editions of our SSIS Framework), an SSIS package named Parent.dtsx serves as the workflow engine:

    dilm4_7

    Community Edition’s Parent.dtsx package has a single Package Parameter named ApplicationName:

    dilm4_8

    The ApplicationName parameter is used in the first Execute SQL Task – named “SQL Get Framework Packages” – which selects the Application Packages based on ApplicationName and ordered by the ExecutionOrder attribute stored in the Application Packages table:

    dilm4_9

    The query, executed to select the Application Packages contained in the “Stage EDW Data” Framework Application, returns these results:

    dilm4_10

    The “SQL Get Framework Packages” Execute SQL Task stores this full resultset to an SSIS object data-type variable named User::ApplicationPackages:

    dilm4_11

    A Foreach Loop named “FOREACH Application Package” uses a Foreach ADO Enumerator to shred the contents of the User::ApplicationPackages object variable:

    dilm4_12

    The Foreach ADO Enumerator “points” at each row in the resultset in the order the rows are retrieved. It copies the values from each field into other SSIS variables, as shown here:

    dilm4_13

    Inside the “FOREACH Application Package” Foreach Loop Container, an Execute SQL Task named “SQL Execute Child Package” calls a stored procedure named custom.execute_catalog_package (we’ll take a closer look at this stored procedure in a bit):

    dilm4_16

    Custom.execute_catalog_package requires three parameters: @package_name, @project_name, and @folder_name. They are supplied by the “SQL Execute Child Package” Execute SQL Task from the SSIS variables mapped in the “FOREACH Application Package” Foreach Loop Container:

    dilm4_15

    As a result of the “FOREACH Application Package” Foreach Loop Container variables mappings, the “SQL Execute Child Package” Execute SQL Task will execute, calling the custom.execute_catalog_package stored procedure and passing it the values on each row, in order, of the resultset returned by the “SQL Get Framework Packages” Executequery.

    There’s more, but your head is hurting enough for one blog post.

    A Test Execution

    Executing the Parent.dtsx package in SSDT succeeds:

    dilm4_17

    A quick glance at Catalog Reports confirms all four Application Packages executed:

    dilm4_18

    Framework Browser

    There’s one last thing. You can view the contents of the SSIS Framework Community Edition using Framework Browser, another free utility from DILM Suite.

    When you first open Framework Browser you may be prompted to for “Defaults.” If so, click File—>Open on the Defaults form and navigate to a file named CommunityEdition.json:

    dilm4_19

    When you open CommunityEdition.json, the Defaults form will appear similar to this:

    dilm4_20

    Click the Save button to return to the Framework Browser main form.

    In the Catalog Instance textbox, enter the name of the SQL Server instance that hosts your SSIS Framework Community Edition and click the Connect button. You Framework metadata should load and appear similar to that shown in this screenshot:

    dilm4_21

    Framework Browser surfaces SSIS Framework Community Edition metadata.

    Conclusion

    In this post I demonstrated advanced options for executing SSIS packages in the SSIS Catalog. I demonstrated SSIS Catalog Browser, SSIS Framework Community Edition, Framework Browser, and Catalog Reports – free utilities and solutions that make up the DILM Suite that may help you as you implement your enterprise Data Integration Lifecycle Management (DILM). I even snuck in a screenshot of a custom SSIS task – the Execute Catalog Package task – of which I’ll write more later.

    :{>

    You might like working with Enterprise Data & Analytics because we like helping teams learn more about the SSIS Catalog.

    Learn More:
    Advanced SSIS Execution
    SSIS Academy: Using the SSIS Catalog Day 1 - Create the Catalog and Deploy
    SSIS Academy: Using the SSIS Catalog Day 2 - Package Execution and Monitoring

    Previous Posts in this Series:
    An Example of Data Integration Lifecycle Management with SSIS, Part 0
    An Example of Data Integration Lifecycle Management with SSIS, Part 1
    An Example of Data Integration Lifecycle Management with SSIS, Part 2
    An Example of Data Integration Lifecycle Management with SSIS, Part 3

    Related Training:
    SSIS Lifecycle Management (free recording, registration required) 
    IESSIS1: Immersion Event on Learning SQL Server Integration Services – April 2017, Chicago 
    IESSIS2: Immersion Event on Advanced Integration Services – Oct 2017, Chicago
    From Zero to Biml - 19-22 Jun 2017, London

More Posts Next page »

This Blog

Syndication

News

My Latest Book:

Community Awards



Friend of Red Gate

Contact Me

Archives

Privacy Statement