THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Andy Leonard

Andy Leonard is a Data Philosopher at Enterprise Data & Analytics, an SSIS Trainer, Consultant, and developer; a Business Intelligence Markup Language (Biml) developer and BimlHero; SQL Server database and data warehouse developer, community mentor, engineer, and farmer. He is a co-author of SQL Server Integration Services Design Patterns, and author of Managing Geeks - A Journey of Leading by Doing, and the Stairway to Integration Services.

  • BI’s Not Dead

    Timo Elliot [Blog | @timoelliott] wrote yesterday that BI is Dead and Julie Koesmarno [Blog | @mssqlgirl] was kind enough to tweet about it. The compelling point in Mr. Elliott’s post is:

    “The most charitable view is that Gartner feels it has to exaggerate the demise of BI in order to get customers to pay attention to the changes before it’s too late.”

    Mr. Elliott goes on to point out Gartner is redefining the term “BI” to apply only to what most BI folks now refer to as “self-service BI.” But Gartner also admits:

    “’IT-modeled data structures… promote governance and reusability across the organization’ and ‘In many organizations, extending IT-modeled structures in an agile manner and combining them with any of the sources listed above is a core requirement’”

    If you’re not confused you’re reading it wrong. I can hear you asking, “Andy, what do you think?” I’m glad you asked!

    What I Think

    I won’t pretend to understand the subtleties of Gartner’s logic at the time of this writing. I promise to dig more and try to get my head around what they’re saying and writing. I doubt, however, that Gartner is being wishy-washy. I think they’re trying to tell us something about a trend in Business Analytics.

    What’s the Trend?

    <trend>Self-service BI is gaining traction.</trend>

    I’ve been in the Business Intelligence  field since before I knew what it was called. When I did learn what it was called, we called it Decision Support Systems (DSS) and (in manufacturing) Manufacturing Execution Systems (MES). Later, I learned that the part of I really enjoyed was called Data Acquisition, a vital part of Supervisory Control and Data Acquisition (SCADA). Which led me to a career in Data Integration.

    I’ve heard variations on this theme of “data integration is dead” for two decades. It’s not true. I believe it may one day become true, but today’s not the day.

    “Why Isn’t the Report of BI’s Demise Accurate, Andy?”

    Consider Grant Fritchey’s [Blog | @grfritchey] recent foray into R. Grant is, in my humble opinion, one of the smartest and most capable data minds on the planet. He’s an engineer who gets the value of analysis. Don’t take my word for it, ask anyone who’s read his books or heard Grant present.  I promise I am not picking on Grant, I am using his post to make a point. And that point is this:

    “Data analytics is often hard.” – Andy, circa 2016

    Self-service BI is in better shape today than at any time in the past. The plethora of available tools are awesome and empower users like never before.

    But…

    Raw data is rarely in a format that lends itself to consumption by self-service BI tools.

    I believe this is the lesson of Grant’s post.

    It’s Not Just Big Data, Either

    There’s a lot of buzz surrounding Big Data, and the buzz is right and proper most of the time. But most of my customers do not have data volumes that reach the threshold of Big Data. As I wrote a few years back, Little Data Remains Important and most of my customers have Complex (or Difficult, or Hard) Data – not Big Data.

    Just Add Garbage…

    Data Quality is important because garbage-in = garbage-out. But what you may not realize is data collected into any store for analysis is going to be aggregated. Small variations in data quality, when summed or multiplied, bend predictive analytics lines in less-than-accurate directions. So much so that – sometimes – the ratio of signal-to-noise (good quality data to bad quality data) can be north of 99% when analytics solutions become so inaccurate that they’re useless for predictive analytics. Note: this isn’t the DQ threshold for all solutions but it is for some. All data-based solutions become useless once data quality dips below some threshold, and I’d hazard a guestimate that most solutions cross that line with a good:bad ratio in the 90-something percentile.

    Combined, complexity and data quality limit the effectiveness of self-service BI. It’s not the fault of the individual or the individual self-service BI tool. It’s the data. As I stated, data analytics is often hard.

    Conclusion

    In my opinion, Gartner is right to raise the flag about the shift towards self-service BI because this shift will impact the workloads for many IT shops. For now, there’s still plenty of data collection, master data management, integration, and cleansing work to be had; much of it in support of self-service BI tools.

    :{>

    Learn more:

    Related Training:
    IESSIS1: Immersion Event on Learning SQL Server Integration Services
    IESSIS2: Immersion Event on Advanced SQL Server Integration Services

    Related Articles:
    Stairway to Integration Services
    Stairway to Biml

    Related Books
    SSIS Design Patterns

    Help!
    Enterprise Data & Analytics – We are here to help.

  • SQL Server Developer Edition is FREE!

    I’ve blogged about the value of SQL Server Developer Edition for years. It’s Enterprise Edition with a End User License Agreement (EULA) that states you cannot use the SKU for production. Developer Edition used to sell for around $50 USD, but it is now free!

    Download SQL Server 2014 Developer Edition today!

    :{>

  • Announcing Biml Academy!

    Announcing Enterprise Data & Analytics Biml Academy – 5 Days of Free Training – 9-13 May!

    Welcome to Biml Academy! Join me for a five-day online event and learn Biml now!

    [Recording] Biml Academy – Lesson 0 – SSIS Design Patterns – Before we dive into code generation, let’s discuss and demonstrate some SSIS Design Patterns.
    [Recording] Biml Academy – Lesson 1 – Build Your First SSIS Package with Biml – Learn how to build your an SSIS package using Biml.
    [Recording] Biml Academy – Lesson 2 – Use Biml with SSIS Design Patterns – Learn how to use Biml with SSIS Design Patterns.
    [Recording] Biml Academy – Lesson 3 – Use Biml to Build and Load a Staging Database – Learn how to build and load a staging database using Biml.
    [Recording] Biml Academy – Lesson 4 – Automate SSIS Design Patterns with Biml to Load a Dimension – Learn how to automate SSIS Design Patterns with Biml to a load star schema dimension table.

    Updates

    I estimate each lesson will average 1 hour. 

    The lessons will be recorded. 

    :{>

  • The Recording for DILM with SSIS Catalog Compare is now Available

    The recording for the (free!) Enterprise Data & Analytics webinar “DILM with Catalog Compare” is now available. Registration is required.

    Enjoy!

    :{>

  • Free Webinar 28 Mar at 1:00 PM EDT - DILM with Catalog Compare

    Join me 28 Mar 2016 at 1:00 PM EDT for DILM with Catalog Compare - another (FREE!) webinar from Enterprise Data & Analytics!

    DILM (Data Integration Lifecycle Management) is how data integration is done in the modern DevOps enterprise. Join Andy Leonard, Data Philosopher at Enterprise Data & Analytics, for a look at a helpful and exciting new utility – SSIS Catalog Compare – that will help you manage enterprise data integration with SSIS.

    Register today!

    :{>

  • Recording Available for “Biml 201: Batched Incremental Load in the SSIS Framework”

    You can view the recording for Biml 201: Batched Incremental Load in the SSIS Framework - a free webinar from Enterprise Data & Analytics (registration required).

    The source code is available here.

    Enjoy!

    :{>

    Next webinar: DILM with Catalog Compare - 28 Mar 2016, 1:00 PM EDT. Register today!

  • Free Webinar–“Biml 201: Batched Incremental Load in the SSIS Framework” 7 Mar

    Join me 7 Mar 2016 at 12:00 PM ET for a (FREE!) webinar titled Biml 201: Batched Incremental Load in the SSIS Framework.

    While delivering the (FREE!) webinar titled SSIS Framework Community Edition, my friend John Sterrett (blog | @johnsterrett) asked about using Biml to generate SSIS packages that execute in an SSIS Framework like the SSIS Framework Community Edition (download the code and documentation here). I said, “No, but I can whip one up!” So I did. And now it’s scheduled for 7 Mar at noon ET.

    I hope to see you there!

    :{>

  • The Recording “Introducing SSIS Framework Community Edition” is Available!

    The recording for the free webinar titled Introducing SSIS Framework Community Edition is now available! Registration is required. You may download the code and documentation here.

    Enjoy!

    :{>

  • Easy SSIS Migration with SSIS Catalog Compare Recording is Available!

    The recording for the webinar titled Easy SSIS Migration with SSIS Catalog Compare is now available!

    Enjoy!

    :{>

  • Microsoft is Listening

    There’s a back-story to my post titled SSIS 2016 CTP 3 and Data Flow Column LineageID. I’ll get to the story in a minute, but first I’d like to say that every company has excellent customer service. You don’t believe me? Ask them, “Company, do you have excellent customer service?” They will all answer, “Yes. Yes we do.”

    Microsoft has been engaging the software development community and MVP’s for years – decades, even. As a former SQL Server MVP, I was repeatedly told Microsoft is listening and that my thoughts and suggestions were welcome. So I shared my thoughts and suggestions, like many of my colleagues in the community. Most of our suggestions were not implemented. Many were marked “Works as designed” or “Will not fix” at Microsoft Connect. Granted, no one can implement every feature or even correct every single bug. It’s a) not possible; and b) not cost-effective to do so. But after quite a bit of time making requests and up-voting the excellent requests of others – and then seeing disappointing responses – I (and many others) realized some other force was driving the agenda for Microsoft’s development teams and overriding our suggestions. Many became jaded as a result. I will confess some jading occurred on my part.

    Recently, I and others began hearing fresh calls for thoughts and suggestions. My first (somewhat jaded) thought was, “That’s nice. That’s real nice.” (…which is a reference to an old Southern US joke that I will classify as “unkind”…)

    The Story

    About the time SQL Server 2016 CTP 3.1 was released, I read about the features I covered in the post titled SSIS 2016 CTP 3 and Data Flow Column LineageID. As I worked through some demo packages, however, I found some things did not appear to work – at least not yet; not in the Visual Studio 2015 SSIS template. I communicated with the SSIS team and was pleasantly surprised to receive an email stating they would attempt to work the changes into CTP 3.3. I was optimistic, if cautiously so.

    After SQL Server 2016 CTP 3.3 was released, I received a follow-up email from the team member who had addressed the concerns I identified – apologizing for the long wait and pointing to a detailed blog post describing the implementation of the updated features. </Applause!>

    Conclusion

    It’s one thing to say you’re listening to your community. It’s a different thing to actually listen to your community. One is some words. The other is some action.

    I’m happy to report Microsoft is, in fact, listening. I couldn’t be happier. Thank you!

    :{>

  • SSIS 2016 CTP 3 and Data Flow Column LineageID

    Back When We Used to Carve Our Own Chips Out of Wood…

    Back in the old days (2005), SSIS Data Flow LineageIDs were created at design-time. Here’s a screenshot of an SSIS 2005 package’s XML:

    SSIS2016Lineage03

    When an error occurred, the ID property of the column in error was supplied to the ErrorColumn column field of the Error Output, as shown here:

    SSIS2016Lineage04

    Although it wasn’t simple, it was possible to use the value of the ID supplied in the ErrorColumn value to identify the offending column in a Data Flow Task. It was a manual process that involved:

    · Opening the package’s XML

    · Finding the column ID value shown in the ErrorColumn field of the Error Output

    · Reading the LineageID value for that same column

    · Tracing the LineageID back to its original assignment in the Data Flow Task:

    SSIS2016Lineage05

    The LineageID “49” maps to the “name” column in the image at the top of this post. The “name” column raised an error because someone (me) made the destination column way too small to hold any name values.

    Back in the day, some folks came up with some pretty clever ways to automate identifying the name of the column causing SSIS Data Flow errors.

    In Later Versions of SSIS…

    The LineageID property of Data Flow columns changed in later versions of SSIS. In SSIS 2012, the LineageID property is there, but the value looks more like an SSIS package path than its SSIS 2005 counterpart.:

    SSIS2016Lineag1a

    The LineageID attribute in SSIS 2016 CTP 3.3, however, appears similarly:

    SSIS2016Lineage_LineageID

    A New Beginning…

    SSIS 2016 CTP3.3 offers a solution. First, there are now two new columns in the SSIS Data Flow Component Error Output – ErrorCode – Description and ErrorColumn – Description:

    SSIS2016Lineage06a

    The new columns provide extremely useful (plain language) error metadata that will, in my opinion, greatly reduce the amount of time required to identify data-related load failures in the Data Flow Task.

    But that’s not all. If you configure a log to capture the DiagnosticEx event, you will receive a message that provides the Data Flow column ID. To have a look, add a new log to an SSIS package that contains a configured Data Flow Task. On the Details tab, select the DiagnosticEx event:

    SSIS2016LineageDiagnosticEx

    When the package runs, a DiagnosticEx event will record XML describing the DTS:PipelineColumnMap. Viewing the XML in either a text file or SQL Server is no fun, but if you copy the XML and paste it into an XML file in Visual Studio (SSDT), you can format it nicely, as shown below:

    SSIS2016Lineage07a

    It’s possible to call a new method (GetIdentificationStringByID<) on the ComponentMetadata class in a Script Component, passing it the LineageID of a Data Flow column, and get the IdentificationString of the column. You can learn more about the GetIdentificationStringByID method – and everything I’ve written in this post – by reading Bo Fan’s (excellent) blog post on the subject here.

    :{>

  • Congratulations to BimlHero Cathrine Wilhelmsen!

    CathrineBimlHero

    I’m happy to pass along the news that my friend Cathrine Wilhelmsen (blog | @cathrinew) has been recognized as a BimlHero! Congratulations, Cathrine!

    You can learn more about Biml from Cathrine and Scott Currie at SQLBits in May 2016. On 4 May, Scott and Cathrine deliver a full day of Biml training titled Learn Biml Today: Start Using It Tomorrow. Note, at the time of this writing I am also listed as a co-presenter but I will, unfortunately, be unable to attend SQLBits this year. :( But go anyway! See Scott and Cathrine!

    :{>

  • Data Integration is the Foundation

    Unless you live under a rock, you’ve seen the buzz about Data Lakes, Big Data, Data Mining, Cloud-tech, and Machine Learning. I watch and read reports from two perspectives: technical and as a consultant.

    As a Consultant

    If you watch CNBC, you won’t hear discussions about ETL Incremental Load or Slowly Changing Dimensions Design Patterns. You will hear them using words like “cloud” and “big data,” though. That means people who watch and respect the people on CNBC are going to hire consultants who are knowledgeable about cloud technology and Big Data.

    As an Engineer

    I started working with computers in 1975. Since that time, I believe I’ve witnessed about one major paradigm shift per decade. I believe I am now witnessing two at the same time: 1) A revolution in Machine Learning and all the things it touches (which includes Big Data and Data Lakes); and 2) the Cloud. These two are combining in some very interesting ways. Data Lakes and Big Data appliances and systems are the sources for many systems, Machine Learning and Data Mining solutions are but a couple of their consumers. At the same time, much of this technology and storage is either migrating to the Cloud, or is being built there (and in some cases, only there). But all of this awesome technology depends on something…

    Data

    In order for Machine Learning or Data Mining to work, there has to be data in the Data Lake or in the Big Data appliance or system. Without data, the Data Lake is dry. Without data, there’s no “Big” in Big Data. How do these solutions acquire data?

    It Depends

    Some of these new systems have access to data locally. But many of them – most, if I may be so bold – require data to be rounded up from myriad sources. Hence my claim that data integration is the foundation for these new solutions.

    What is Data Integration and Why is it Important?

    Data integration is the collection of data from myriad, disparate sources into a single (or minimal number of) repository (repositories). It’s “shipping” the data from where it is to someplace “nearer.” Why is this important? Internet connection speeds are awesome these days. I have – literally – 20,000 times more bandwidth than when I first connected to the internet. But modern internet connection speeds are hundreds-to-millions times slower than networks running inside data centers. Computing power – measured in cycles or flops per second – is certainly required to perform today’s magic with Machine Learning. But if the servers must wait hours (or longer) for data – instead of milliseconds? The magic happens in slow-motion. In slow-motion, magic doesn’t look awesome at all.

    Trust me, speed matters.

    Data integration is the foundation on which most of these systems depend. Some important questions to consider:

    • Are you getting the most out of your enterprise data integration?
    • Could your enterprise benefit from faster access to data – perhaps even near real-time business intelligence?
    • How can you improve your enterprise data integration solutions?

    :{>

    Learn more:

    Enterprise Data & Analytics
    Stairway to Integration Services
    IESSIS1: Immersion Event on Learning SQL Server Integration Services
    EnterpriseDNA Training

  • Deploying a Single Package First in SSIS 2016

    I gathered recently with some friends who are geeks to discuss geeky stuff. As is often the case, there were a lot of “what if” questions, and we answered most of them by firing up laptops and VMs for testing. I had some trouble with my VM running my instance of the December preview running in Visual Studio 2015 on Windows Server 2016 Technical Preview 4. So I was unable to answer one curious question until later. When I did answer it, I grabbed some screenshots and decided to blog about it. And here we are. :)

    The question was:

    “What Happens if Your First SSIS Project Deployment is a Single-Package Deployment?”

    To test, I created a new SSIS project named “DeploymentTest1” and added three simple SSIS packages. I right-clicked Package3.dtsx and clicked Deploy Package. The Integration Services Deployment Wizard started, as expected. Since I was deploying to a pristine Catalog, I created a new Catalog Folder named “Deployment”. There were no projects in my catalog, and I was curious how the Deployment Wizard would handle this.

    Once the folder was created I saw a new button: “New project…”. I couldn’t click the OK button until I created a Catalog Project to contain my SSIS package:

    Cat2016Dep_1

    So I created a new Catalog Project. Just to see if I could do it, I named the Catalog Project something different (“Deployment_Testing_1”:) from my SSIS project (“DeploymentTest1”):

    Cat2016Dep_2

    I have to admit I was a little surprised that it worked:

    Cat2016Dep_3

    Once I clicked the OK button, the Integration Services Deployment Wizard displayed a three-part Path, “/SSISDB/<FolderName>/<CatalogProjectName>” instead of the usual two-part Path (“/SSISDB/<FolderName>”):

    Cat2016Dep_4

    Once deployed, Package3 appeared content, if alone, in the SSIS Catalog node of SSMS:

    Cat2016Dep_5

    I wondered what might happen if I now deployed the SSIS Project from SQL Server Data Tools – Business Intelligence Edition (SSDT-BI), so I tried it. When the Integration Services Deployment Wizard reached the Select Destination step, the Catalog Project I created when deploying Package3 was remembered:

    Cat2016Dep_6

    This makes perfect sense. I just didn’t know what would happen. Hence, testing.

    Once the SSIS project was deployed from SSDT-BI, Package3 was no longer alone. And that was a good thing.

    Cat2016Dep_7

    Still, I was curious how SSIS 2016 tracks the target Catalog Project. So Kent Bradshaw and I conducted a wee bit of testing. Here’s what we learned. There are three new attributes stored inside the SSIS 2016 version of the <SSIS Project Name>.dtproj.user file:

    Cat2016Dep_8

    ServerName, PathOnServer, and Run64BitRuntime aren’t found in previous versions of the file (at least the couple we checked). Again this makes perfect sense with single-package deployment. It especially makes sense if you can must create a new Catalog Project to perform a single-package deployment first.

    We also tested deploying the SSIS Project as a whole first, and SSIS 2016 appears to work just like SSIS 2012 and 2014.

    Conclusion

    As we chatted some more, Kent and I realized this decoupling of the SSIS Project name and the Catalog Project name has the potential to confuse developers not familiar with the new functionality. While we really like the idea of being able to deploy a single package to perform a quick fix, we realize this opens the door to heretofore unseen issues. It’s not just coupling between Catalog Projects and SSIS Projects in development, this potentially decouples Catalog Projects from source control. That has Kent and I concerned. How does one know an SSIS Project has been deployed to Production under a different Catalog Project name? We don’t know the answer to that question at this time.

    For now, we’re going to suggest SSIS developers adhere to a practice of naming Catalog Projects with the same name as the SSIS Project, even though we’ve proven it’s not technically necessary. We won’t call it a “best practice” – at least not yet. But we will call it an “Andy and Kent practice.” In the event someone doesn’t adhere to our suggestion, it’s possible to update the <SSIS Project Name>.dtproj.user file and correct this for future development of the SSIS project. (We tested!) If you make this change, do yourself a favor and check it into source control right afterwards.

    As always, I welcome your thoughts in the Comments (Although comments are moderated and may not appear for a day).

    Thoughts?

    :{>

  • The Recording for DILM DevOps: SSIS Frameworks + Design Patterns is available!

    The recording for the Enterprise Data & Analytics webinar “DILM DevOps: SSIS Frameworks + Design Patterns” is available here. It’s free! But registration is required.

    Enjoy!

    :{>

This Blog

Syndication

News


My Companies



Community Awards

Friend of Red Gate

Contact Me

Archives

  Privacy Statement