<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Search results matching tags 'SSIS' and 'Heartland Developers' Conference'</title><link>http://sqlblog.com/search/SearchResults.aspx?o=DateDescending&amp;tag=SSIS,Heartland+Developers%27+Conference&amp;orTags=0</link><description>Search results matching tags 'SSIS' and 'Heartland Developers' Conference'</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>HDC08 Omaha Demos Posted</title><link>http://sqlblog.com/blogs/kent_tegels/archive/2008/10/17/9546.aspx</link><pubDate>Fri, 17 Oct 2008 20:17:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:9546</guid><dc:creator>ktegels</dc:creator><description>Slides and demos for my hdc08omaha talks are now available for download from &lt;A href="http://tinyurl.com/5oykgj"&gt;http://tinyurl.com/5oykgj&lt;/A&gt; Thanks to all who attended!</description></item><item><title>Upcoming talks on SQL Spatial and SSIS</title><link>http://sqlblog.com/blogs/kent_tegels/archive/2008/06/05/7153.aspx</link><pubDate>Thu, 05 Jun 2008 12:42:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:7153</guid><dc:creator>ktegels</dc:creator><description>&lt;P&gt;Yesterday I drove down from Sioux Falls to Omaha so that I could catch a talk being given by Sudhir Gajre to the &lt;A href="http://www.omahamtg.com/" target=_blank&gt;Omaha SQL Server/BI Interest Group.&lt;/A&gt; I first got to know Sudhir when he, I and Luke Schollmeyer were restarting a SQL Server Users Group in Omaha. Sudhir is a stud at SQL Server performance tuning and helped write one of the best papers on it (see &lt;A href="http://sqlcat.com/whitepapers/archive/2007/12/16/microsoft-sql-server-2005-tuning-tips-for-peoplesoft8-x.aspx" target=_blank&gt;Microsoft SQL Server 2005 Tuning Tips for PeopleSoft8.x&lt;/A&gt;. &lt;/P&gt;
&lt;P&gt;I am the next scheduled speaker for that group, so I took a couple of minutes last night to ask them what they would like me to talk about. Response was a little slow, so I suggest that I could give my Spatial Computing with SQL Server talk. Almost no reaction. &lt;/P&gt;
&lt;P&gt;Somebody in the group suggest talking about the new MERGE statement. I liked that said, "What if I wrapped that into a talk about "What is new in Integration Services 2008." The room came back to life. &lt;/P&gt;
&lt;P&gt;So here's an initial outline of what that talk will probably cover: &lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;The changes in the SSIS pipeline architecture for back-pressure and thread scheduling The scripting environment changes &lt;/LI&gt;
&lt;LI&gt;Working with the Cache transformation &lt;/LI&gt;
&lt;LI&gt;Using T-SQL MERGE with SSIS &lt;/LI&gt;
&lt;LI&gt;Using Change Data Capture with SSIS &lt;/LI&gt;
&lt;LI&gt;(time permitting) Using the Data Profiler task &lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;We clustered about for a while following Sudhir's talk discussing just how much BI is taking off for Microsoft and how useful a talk like this would be. I have to chuckle every time I get into a conversation like this: I my opinion, SSIS isn't a BI tool that developers just happen to be interested in, it's a Developer's tool that just happens to very helpful to the BI specialist. Really understanding and applying many of the changes for SSIS 2008 is made easier, I think, if you have a Developer mindset about it. &lt;/P&gt;
&lt;P&gt;I am going to submit&amp;nbsp;the "Whats new in SSIS 2008"&amp;nbsp;for the Heartland Developer's Conferences to be held later in the year in &lt;A href="http://www.heartlanddc.com/Omaha/%22" target=_blank&gt;Omaha&lt;/A&gt; and &lt;A href="http://www.heartlanddc.com/Minneapolis/" target=_blank&gt;Minneapolis&lt;/A&gt; too.&lt;/P&gt;
&lt;P&gt;If you are a user group leader and would be interested in having me come visit and give a talk, please feel free to contact me through this site.&lt;/P&gt;</description></item><item><title>What the SSIS Masters Know, Part 3</title><link>http://sqlblog.com/blogs/kent_tegels/archive/2007/10/15/2976.aspx</link><pubDate>Mon, 15 Oct 2007 04:32:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:2976</guid><dc:creator>ktegels</dc:creator><description>&lt;p&gt;Shunryu Suzuki, a Zen master who came America in 1959, once wrote that "in the beginner's mind there are many possibilities, but in the expert's mind there are few." This definitely seems to be to of SQL Server 2005 Integration Services (SSIS). But not always in a good way. If you are coming to Integration Services from SQL Server 2000's Data Transformation Services there are a number of such "moments of Zen" waiting for you. One of the most significant ones is that the two products are designed around two different patterns. SSIS is designed to support the Extract, Transform and Load (ETL) pattern where and DTS is geared towards Extract, Load and Transform (ELT).&lt;/p&gt;
&lt;p&gt;The difference in these two is much more than simple word order. In simple terms, the ELT pattern relies on the power of a database store to do the work transforming extracted data from the source schema to the destination schema. The ETL pattern, conversely, normally uses a specialized engine to stream data from the source schema, transform it then load it to a destination database. Both patterns have different strengths and weakness. For example ELT maximizes the data store but normally lacks the ability to difficult transformations easily. ETL engines have more sophisticated transformation capabilities but tend to heavily tax system resources while running.&lt;/p&gt;
&lt;p&gt;This difference in patterns tends to frustrate those who are making the transition.&lt;/p&gt;  
&lt;p&gt;But as most any Zen practitioner will tell you, Zen functions in non-duality. So to really understand the Zen of SSIS, we must understand that it functions in non-duality as well. True, SSIS is designed around the ETL pattern but that does not mean that you must -- or even should -- avoid using the ELT pattern with it sometimes. In this particular project, we have an example of that. One of the things that SSIS does not currently have a multi-value collection (e.g., a sorted key-value dictionary) that can be passed between tasks. In cases where we want that, it makes sense to leverage a database table. &lt;/p&gt;
&lt;p&gt;A hallmark of DTS is as being a simple tool for getting simple tasks done quickly. However, its limitations make doing complex tasks fairly hard. SSIS conversely makes doing complex things easier but because of the ramp up it takes to learn the new pattern and new tools, it is fair to say that it makes doing easy thing initially harder.&lt;/p&gt;
&lt;p&gt;I believe one the key things that the SSIS masters understand is this Zen. They know the duality of when to use what and how while understanding when not use something and why. They seek to understand the strengths and weakness of the patterns and the tools. At the same time, they seek to integrate these understandings into a highly effective suchness.&lt;/p&gt;</description></item><item><title>What The SSIS Masters Know, Part 2</title><link>http://sqlblog.com/blogs/kent_tegels/archive/2007/10/10/2933.aspx</link><pubDate>Wed, 10 Oct 2007 19:45:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:2933</guid><dc:creator>ktegels</dc:creator><description>&lt;P&gt;One of the first tasks we must do well as SSIS developers is really understand our source data. That seems rather obvious but it when we fail to do this task well, we frequently end up having to re-plan, re-work and re-do the packages we have designed. Generally speaking there are five facets to data that we need to have correct understanding of before building a package:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Schema and the sub-facets of cardinality and ordinality&lt;/LI&gt;
&lt;LI&gt;Parse-ability&lt;/LI&gt;
&lt;LI&gt;Volume&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;Schema that is what entities we have, what attributes those entities have and positional relationship do the attributes have. When we think about source data entities, we natural gravitate towards understanding them in terms of their persistence medium. We have things like Comma Separated Value files, database tables and so on. I think that is somewhat dangerous -- data is more that its container. The important thing to understand here what whole thing is is that persisted trying to represent. Any given entity may have many attributes and there is no guarantee that any one persistence has all or even most of the attributes. In such cases, we must be especially careful about how we design our packages since integrating streams of entities from multiple data sources can become a very expensive process when done incorrectly. &lt;/P&gt;
&lt;P&gt;Attributes normally represent scalar facets of an entity. Attributes themselves will have many properties like name, value and the data type chosen to express that value in. A key but many times easily overlooked property of an attribute is if any given instance of that attribute is allowed to be undefined (null in database terms). A difficult and hard problem to solve occurs when the source data is allowed to have undefined values for a given attribute but its destination is not.&lt;/P&gt;
&lt;P&gt;For all but the most trivial of data integration projects, I feel it absolutely essential to gather and build a correct understanding of both the logical and physical data model of the sources and destinations before attempting to construct a package. Once this process is done, it becomes easier to see how entities and their attributes map. Do this at the logical level first, then at the physical level. This avoids not seeing the forest for the trees.&lt;/P&gt;
&lt;P&gt;Two other aspects of the Schema deserve special attention: Cardinality and Ordinality. &lt;/P&gt;
&lt;P&gt;By cardinality I ask what are the restrictions on distinct values occurring with an attribute or attribute set as well as what are the relations between entities. For example, most database products allow for constraints requiring unique values with one or more columns within the context of the table itself and between tables. If a destination does not allow for repeated attribute value but a source does, we know we will need to address this somehow in our work.&lt;/P&gt;
&lt;P&gt;A less commonly considered aspect of data, especially when persisted, is it ordinality. By this I mean what order the persisted values are in and especially if they are a given sort order. Consider a source -- simple CSV file -- that represents a key-value pair. One of the things we definitely want to know about that source is if the keys have been written in any order and, if so, is it ascending or descending. Of course, the persisted data may be written in such a way that preserves order by many attribute values. Certain SSIS transformations require sorted data. If a data source is already in order, then this one less expensive task we need to do in the transformation process. &lt;/P&gt;
&lt;P&gt;The importance of being able to parse a persisted data set properly should be obvious, yet one non-obvious thing about SSIS is that it really struggles with dynamic data sources. In cases where the data source where the schema is unknown in advance or varies within the persistence, we have to plan to do additional work. SSIS optimizes the data flow pipeline assuming a series of statically defined buffers are passed (in a sense) between components. It is ultimately up to us bridge the gap between a dynamic data source or destination and the static nature of the data flow engine.&lt;/P&gt;
&lt;P&gt;The sheer volume data must be considered. The default settings for memory and CPU allocation are based on source data load of one gigabyte and small number of transformational steps. Specific transformations like sorting and aggregation can easily and rapidly consume memory to the point of exhaustion for larger data sources. Identifying such potential bottlenecks before coding the solution can make the package development process go faster.&lt;/P&gt;
&lt;P&gt;I have four data sources from my demonstration:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Two XML files containing information about known postal codes in the United States, one of the files suffers from truncated data for the Postal Code.&lt;/LI&gt;
&lt;LI&gt;A Comma Separated Value formatted file representing a list of Postal Codes, City, State and County names.&lt;/LI&gt;
&lt;LI&gt;A list of State names derived by query from the AdventureWorks database&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;The destination is a SQL Server database and this Postal Code data will be loaded into three tables:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;States, key by the State's abbreviation and containing the name of given state&lt;/LI&gt;
&lt;LI&gt;Cities, keyed by a CityID and having the name of the city and the abbreviation for the matching state&lt;/LI&gt;
&lt;LI&gt;ZipCodeList, key by the Postal Code and having a CityID and the Latitude and Longitude&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;The States table is populated by query during the construction of the database using the AdventureWorks Database. The other two tables will be populated from the CSV and XML files. However, there are two problems found in that data:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;In one of the XML files, the Postal Code for some locations were written without a leading zero&lt;/LI&gt;
&lt;LI&gt;There are more Postal Codes known in the XML files than in the CSV file, so we will need to invent a way discover the city and state that is most local to that Postal Code. In the presentation, I will show a way to do this using Microsoft's MapPoint 2006 via its COM APIs.&lt;/LI&gt;&lt;/UL&gt;</description></item><item><title>What The SSIS Masters Know, Part 1</title><link>http://sqlblog.com/blogs/kent_tegels/archive/2007/10/02/2822.aspx</link><pubDate>Tue, 02 Oct 2007 05:52:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:2822</guid><dc:creator>ktegels</dc:creator><description>&lt;P&gt;Hello world and yes, I am alive. I have finally caught up enough to begin regularly blogging about SQL Server again. It is good to be back!&lt;/P&gt;
&lt;P&gt;As you may know, I will be giving a talk at this year's Heartland Developers' Conference. (&lt;A href="http://www.heartlanddc.com/"&gt;http://www.heartlanddc.com&lt;/A&gt;) My presentation in entitled "What the SSIS Masters Know." I picked this topic since I feel that many developers are not aware of power that the SQL Server Integration Services (SSIS) offers them. I have also seen many developers struggle with learning curve. As with any powerful tool, there is bound to be a fair amount of complexity. In upgrading a number of SQL Server 2000 DTS packages to SSIS and reading the major books available on the topic I came to realize just how amazing this tool really can be. &lt;/P&gt;
&lt;P&gt;A few months ago, I was having dinner with a friend. She asked, "So what do the SSIS masters know that the rest of us don't?" No answer came immediately to mind so I told her I would have to think about that. She smiled and asked me just to talk. About an hour later -- after our meals had come and gone -- she had written down these bullet points for me:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;The Zen of SSIS: ELT vs. ETL&lt;/LI&gt;
&lt;LI&gt;Understanding Control Flows and Data Flows&lt;/LI&gt;
&lt;LI&gt;Static pipeline, dynamic flow&lt;/LI&gt;
&lt;LI&gt;Understanding asynchronous vs. synchronous components&lt;/LI&gt;
&lt;LI&gt;How to Optimize Data Flows&lt;/LI&gt;
&lt;LI&gt;Duct-Tape Scripting&lt;/LI&gt;
&lt;LI&gt;Using Configurations&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;She asked a few more questions, not really wanting to know the answers for herself, but because she knew it would get me thinking about this list. What did I mean by "Zen?" Why does the difference in the types of flows really matter? What problems does the design of the Data Flow present? Where do performance problems come from? Is there a technique for analyzing A Data Flow for potential issues? Why did I get so exciting about scripting? What do the developers ignore but the Database Administrators fret about most?&lt;/P&gt;
&lt;P&gt;Thankfully, she is very attentive.&amp;nbsp;And &lt;EM&gt;incredibly patient&lt;/EM&gt; when I go rambling off. &lt;/P&gt;
&lt;P&gt;The next day I sent this list in with a little garnish describing the talk as topic submission. The talk got the green light and started thinking about how to present these topics to that audience. &lt;/P&gt;
&lt;P&gt;The problem here is that these topics can be abstract. I loathe giving talks like that. Like good writing, a good presentation should show, not tell. It should demonstrate in concrete terms why these things mater to the SSIS developer. In the next post, we will talk about data set and database design that will help us put the seven things that master SSIS developers know into context.&lt;/P&gt;
&lt;P&gt;Since many of you will not be able to join us that that conference, I thought I would write a series of blog posts covering the essential parts of that presentation. At the end of this series, I will post a URL where you can down load the presentation and the demo files.&lt;/P&gt;</description></item></channel></rss>