The Lookup transformation in SSIS has changed a lot in SQL Server 2008. One of the best new features is the ability to pre-build your cache which gives you a lot of control over what is cached and how the cached data is managed. The basic lookup offers additional cache features such as Full, Partial, or even No Cache, but the real power comes in the new Cache Transformation. The Cache Transformation uses the new Cache Connection manager to allow you to build your cache before it is needed. The connection manager and the Cache Transformation are pretty simple. For the Cache Manager Connection, you just need to specify the columns that will be stored and which columns will be indexed, as shown below. The indexed columns will be the only columns you can use as a lookup column in a Lookup Transformation.
Once you have the connection manager configured, you just use a Cache Transformation to pump data into the cache. The Cache Transformation is very simple; you specify a Cache Connection manager and set up the mapping of the incoming data to the columns in the cache. Last but not least, you pull data from your source and send it into the Cache Transformation. I have a very simple example data flow below which is populating a lookup cache; and the source in this example uses the T-SQL shown.
SELECT ContactID, EmailAddress
WHERE EmailAddress IS NOT NULL
AND EmailPromotion <> 0
This allows me to populate my cache with all email addresses that are not blank, where the person has not opted out of email promotion. Remember, the data flow to populate your cache can be as complex as you want. This can really allow you to pare down large sets of data to a smaller, easier to use data set that is appropriate for caching.
Once you have populated your cache, all you have left to do is use it in a Lookup Transformation. On the first page of your Lookup Transformation properties, specify a Connection Type of Cache Manager and on the connection page, specify your Cache Connection Manager object. Everything else is just like using the Lookup with a regular data source. You will need to set up your mappings and the column (or columns) that you want to return. One other note: when using a Cache Connection Manager with a lookup, you have to select Full Cache as the cache mode. The other two are disabled and for good reason. You have taken control of your cache at this point and don’t want SSIS doing anything further.
So there you have it, managing your own lookup cache should allow you to make your SSIS packages a little more streamlined and allow them to run a bit faster. Not to mention, you should be able to minimize the memory hogging packages that occurred when older lookups got a little out of hand.