THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Eric Johnson

My Thoughts on SQL Server and Database Related Technologies.

SSIS 2008 and the New Lookup

    SSIS is a pretty useful tool for designing ETL processes. One of the transformations I was disappointed with in 2005 was the lookup. I found it a little better than the lookup functionality in DTS, but not too much better. Well, Microsoft must have read my mind because SQL Server 2008 has a new shiny Lookup transformation that is, in my opinion, much improved.

    In 2005, the lookup had cache options, but they were really heavy handed. And you could deal with a lookup not finding a match by using the error output, but then how would you know the difference between a non-match and a real error? Let's start with the cache. There are now three cache options and two choices for your cache source.

    clip_image001

    • Full Cache - This will load the entire reference dataset into memory before the first lookup is performed. This can be very efficient for small tables, but think what would happen if your lookup table was a few GB in size….that's a lot to load into cache.
    • Partial Cache - This is probably what most people think of when they conceptualize lookup cache in their head. At first, the cache is empty and each time a row matches or doesn't match for that matter, the row, or the fact that the lookup didn't find a row, is cached. Subsequent rows with the same lookup fields will find their data, or lack thereof, in cache.
    • No Cache - The lookup will generate the reference dataset each time the transformation runs.

    As for data sources for your cache, you can still go to the database, but now you can also set up your own custom cache connection. This gives you the ability to build your cache in a separate step and then reference it with your lookup. This is a little involved so I won't go into more detail here, but keep your eyes open for another blog entry... ;) Furthermore, if you use the partial cache, you can manage your cache size on the Advanced page of the Lookup's properties.

    So I like the changes to caching, but I like the output options even more. You can now specify that rows with no match be redirected to their own output, or be ignored all together. This now gives you three outputs to work with; one for a match, one for no match, and another for errors. I put together a small sample package and added a screen shot below, you can see that I am using all three outputs. In this case, when an e-mail address is not found in the lookup, I use a derived column to put "NA" into the email address column before I load. I then union this data with the rows that found a match and do an insert into my destination table.

    In the end, this new lookup is much cleaner and easier to use than its 2005 predecessor. Plus the addition of the Cache Connection Manager is a big win…..more on that coming soon.

    clip_image002

     

Eric Johnson
SQL Server MVP
Co-Host CS Techcast
Published Wednesday, July 01, 2009 4:33 PM by ejohnson2010
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Bradley said:

i would be nice to see the source for this

May 26, 2011 4:05 PM
 

DNS Lookup said:

This gives you the ability to build your cache in a separate step and then reference it with your lookup.

_________________

Mary

August 4, 2011 2:26 AM

Leave a Comment

(required) 
(required) 
Submit

About ejohnson2010

Eric (Microsoft SQL MVP) is the co-founder of Consortio Services, and the primary Database Technologies Consultant. His background in Information Technology is diverse, ranging from operating systems and hardware to specialized applications and development. He has even done his fair share of work on networks. Since IT is really just a way to support business processes, he has also acquired his MBA. All in all, he has 10 years of experience with IT, a great amount of which has been working with Microsoft SQL Server. Eric has managed and designed databases of all shapes and sizes. He has delivered numerous SQL Server training classes and webcasts as well as presentations at national technology conferences. Most recently, he presented at TechMentor on SQL Server 2005 Replication, Reporting Services, and Integration Services. In addition, he is active in the local SQL Server Community, serving as the President of the Colorado Springs SQL Server Users Group.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement