THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Jamie Thomson

This is the blog of Jamie Thomson, a freelance data mangler in London

Have SSIS' differing type systems ever caused you problems?

One thing that has always infuriated me about SSIS is the fact that every package has three different type systems; to give you an idea of what I am talking about consider the following:

  1. The SSIS dataflow's type system is made up of types called DT_*  (e.g. DT_STR, DT_I4)
  2. The SSIS variable type system is based on .Net datatypes (e.g. String, Int32)
  3. The types available for Execute SQL Task's parameters are based on something else - I don't exactly know what (e.g. VARCHAR, LONG)

Speaking euphemistically ... this is not an optimum situation (were I not speaking euphemistically I would be a lot ruder) and hence I have submitted a suggestion to Connect at [SSIS] Consolidate three type systems into one requesting that it be remedied. This accompanying blog post is not however a request for votes (though that would be nice); the reason is actually subtler than that. Let me explain.

I have been submitting bugs and suggestions pertaining to SSIS for years and have, so far, submitted over 200 Connect items. If that experience has taught me anything it is this - Connect items are not generally actioned because they are considered "nice to have". No, SSIS Connect items get actioned because they cause customers grief and if I am perfectly honest I must admit that, other than being a bit gnarly, SSIS' three type system architecture has never knowingly caused me any significant problems.

The reason for this blog post is to ask if any reader out there has ever encountered any problems on account of SSIS' three type systems or have you, like me, never found them to be a problem? Errors or performance degredation caused by implicit type conversions would, I believe, present a strong case for getting this situation remedied in a future version of SSIS so if you HAVE encountered such problems I would encourage you to leave a comment on the Connect submission accordingly. Let me know in the comments too - I would be interested to hear others' opinions on this.

@Jamiet

Published Monday, March 14, 2011 6:46 PM by jamiet

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Adam Machanic said:

$100 says the Connect submission will be closed "by design."

March 14, 2011 1:47 PM
 

SQLChap said:

Yes, it's very annoying. Most of my problems seem to come from text files needing converting to unicode, lookup parameters being in the wrong type and of course anything that's come from excel changing type as it feels like.

March 14, 2011 1:59 PM
 

Julie Smith said:

I DEFINITELY have encountered problems with SSIS' three flavors of data types.  It is one of my least favorite things about my favorite platform!  WHY?????

March 14, 2011 3:05 PM
 

Michael K. Campbell said:

Yeah - No surprises there. The bug is already closed as "Resolved - won't fix". (Just as Adam called - and I doubt he would have found anyone to bet against him.)

And, that's sort of fair - because this isn't a bug.

But it is a GIGANTIC pain in the butt. And it IS busted - no matter how you slice it. It's counter-intuitive and silly and makes packages insanely brittle and much harder to create, maintain, and extend.

In other words, this works now 'as designed'... but the design is a nightmare and since Microsoft LOVES to talk about how they listen to customers, they should fix/address this issue because it's something that newbies clear on up to gurus all fight and hate.

March 14, 2011 3:12 PM
 

jamiet said:

Michael,

I didn't file it as a bug, I filed it as a suggestion.

-Jamie

March 14, 2011 4:01 PM
 

Adam Machanic said:

Wow, I can't believe they closed it so quickly. Heads completely in the sand. Or perhaps somewhere else. But that would be inappropriate for me to mention here.

March 14, 2011 4:12 PM
 

Steve said:

definitely have ran into issues, all the time. very hard to manage and keep straight, and when something changes upstream, watch out.

March 14, 2011 4:35 PM
 

Sudeep Raj said:

Never faced an issue as such, Have just got used to it. Now for me yes it is a nice to have, but to a person new to SSIS it becomes difficult to explain the rationale behind this. Thats the only issue I face.

March 14, 2011 5:22 PM
 

AlexK said:

Jamie,

Thank you for writing this up. So far our system does not use any SSIS, all loads are developed in C#, and I think this way we are saving money on maintenance.

There are several discussions on stackoverflow, with many answerers recommending not to use SSIS. Next time someone wants to use SSIS, I will refer them to your excellent blog post series.

March 14, 2011 5:22 PM
 

Bhavik Merchant said:

I've had issues with this for years. The date type mismatches are the worst!

In the ssis course I teach written by an mvp it states the Exec SQL task is based on COM data types

March 14, 2011 9:17 PM
 

Bhavik Merchant said:

Wasn't specific enough... yes I agree that its just frustrating to work with but definitely should be resolved/consolidated. Other than that they may be some internal performance overhead associated with the conversions.. and this simply can't be avoided wit the current architecture

March 14, 2011 9:27 PM
 

Julian Kuiters said:

While I've had frustration with it myself, you have to remember the SSIS is about INTEGRATION (or more so migrating data to SQL Server) from many different types of data sources.

I use it a lot for importing flat file text and binary data from oracle, db2, custom databases and even websites. None of the data arrives in varchar or bigint datatypes, its raw bits or strings or an undetermined length of numbers.

SSIS is amazing at importing these different formats because it reads them into .Net data types that can be manipulated before loading into SQL Server. (If you have a good look you'll find the DT_STR and other types are wrappers around the .Net String and other types with more database type functions that were in the .Net framework when it SSIS was launched).

Remember the power of SSIS is that you are not constrained by the T-SQL language or data types - you are working *outside* SQL Server. It just wouldn't be appropriate to restrict the datatypes to only what sql server supports.

.... and you can't use the native .Net data types like "long" because they don't support nullability    (you use "long?" to support null)

.... just imagine the Connect questions if it was all nullable .net types....... "why do all the data types end in a question mark ? "

March 15, 2011 4:18 AM
 

jamiet said:

Julian,

You say:

"It just wouldn't be appropriate to restrict the datatypes to only what sql server supports."

I didn't say that it should. In fact if you look at the Connect submission you will see that my suggestion was to use a heterogenous type system (i.e. one that is not tied to a database platform).

From what you have you appear to think I am suggesting SSIS should use the same datatypes as SQL Server when in fact nothing could be further from the truth.

In other words I think we're in full agreement. :)

-Jamie

March 15, 2011 5:22 AM
 

Frank Szendzielarz said:

If I remember rightly there was additional overhead in working with columns in a custom data transform component. There was also some problem with understanding the relationship between the different datetimes (DT_DATE, datetime2(precision),etc).

Otherwise, it hasn't really caused us  much trouble, when I think of it. The additional metadata stringency and paranoia has had the opposite effect: being nervous enough to give data quality the time and attention it deserves.

March 15, 2011 8:10 AM
 

John Schmid said:

I'm new to SSIS (a couple weeks now). The variables are bad as mentioned above. I find it infuriating that a data conversion tool is not able to get at the SQL Server error code when a DB insert fails. What good does the SSIS error '-10790384801 - Could not insert row' do for anyone? Don't get me started on the fact that VS 2010 support is late and VS 2008 SSIS support for Team Foundation leaves a lot to be desired. I am currently struggling trying to create packages that convert a database table to a new format but that can be restarted with a corrected CSV file from a prior conversion attempt. Good luck with that!

March 15, 2011 9:45 AM
 

Frank Szendzielarz said:

John Schmid

You might find the following helpful: http://ssisctc.codeplex.com/

Personally, I am enchanted with SSIS. I found that once I became aware of the idiosyncracies, things fell into place. The combination with C# in script components in general makes life quite simple for me, as well as offering the kind of performance and data validation that would otherwise be very difficult to achieve.

In fact, on the topic of variables, one thing that would make me think twice about using SSIS would be when .NET native types had explicit precision, and when parallelism becomes visual and intuitive.

March 15, 2011 11:49 AM
 

John Schmid said:

Thanks for the link Frank! My post sounded more like a rant than I intended. :) Are there any 3rd party tools you recommend or use for your SSIS development? I also like dipping into the script, but it seems like doing so as often as neccesary would lend itself better to building a specialized C# app that does what I want.

March 15, 2011 12:33 PM
 

jamiet said:

John,

You are not alone in disliking SSIS based on early impressions. Take a read of this: http://ayende.com/Blog/archive/2007/07/15/SSIS-15-Faults.aspx and my follow-ups: http://consultingblogs.emc.com/jamiethomson/archive/2007/07/27/SSIS_3A00_-The-backlash-continues.aspx & http://consultingblogs.emc.com/jamiethomson/archive/2007/07/30/SSIS_3A00_-A-response-from-Microsoft-to-the-growing-criticism.aspx

What I generally find is that some people (such as Frank) persevere with SSIS enough until they discover that once you understand it, it is very productive. Others don't reach that tipping point and they end up writing blogs like the one I pointed to above. No-one is right or wrong - it comes down to personal preference.

JT

March 15, 2011 12:51 PM
 

John Schmid said:

Jamie,

I'm not afraid of investing a bit of time in this, though the learning curve seems greater than it first looked. I'm being asked to convert 60 SQL Svr source tables to a new table structure in the same DB for an upgraded product to use. Our old tables did not have referential integrity while the new tables do. I am converting tables sequentially to not violate the new constraints. My requirement is that rows that cannot be inserted should be redirected to a CSV file with a message for correction which will fail the pkg. The thought is the CSV should be corrected and the pkg restarted to process corrected rows, followed by the remainder of the tables in the sequence.

I really don't understand why I can't get the SQL error that prevented the insert. Did I try and force a null where I shouldn't, cause a duplicate, violate the FK constraint...? With the exception of Oren, I can't even find where anyone is even asking about this in SSIS land.

I’d also like to declare an alternate source for the same pkg and load from my corrected CSV file instead of the source DB table. I am having a difficult time doping that out and was hoping maybe a third party control might handle it.

I can probably live with everything else. I’ll be making a major commitment in writing SSIS packages to do this. It would be nice to have a sense that this product is important and will receive attention in later updates for these issues, but I’m not sure that’s the case.

March 15, 2011 3:05 PM
 

jamiet said:

John,

The learning curve is definitely steep - certainly no-one that has ever used SSIS will dispute that.

In the next version SSIS will have many improvements that you might lump under "usability" - lots of these improvements are already public knowledge.

For your specific problem, I'm not really sure what to say. You should know that if you redirect rows that fail on insertion into another data sink then the dataflow will not fail. The error message is available - just not *easily* available; see here for more details: http://consultingblogs.emc.com/jamiethomson/archive/2005/08/08/1969.aspx

I have no idea whether this will help you or not - I'll hope that it does.

regards

Jamie

March 15, 2011 5:10 PM
 

Frank Szendzielarz said:

John

Two responses: one that we use BIDS helper. The other that there is some component in the link I sent that decodes some error messages for you. You can get at the errors.

An alternative is to use a try catch in a script component and INSERT manually.

My philosophy with C# and SSIS is simply to feel fluent in both and combine at will. You can make an entire data processing app in C# and just use the SSIS data source connected to a data flow script component for no other reason than to get the rows read and automatically generate the row as strongly typed properties (and read with parallelism). It doesn't matter most of the time, the only down side is that you need to think more carefully about breaking down work in a source control situation.

For the configurable source, you could try a number of things: use both sources and a Union or Merge. Make the source sql configured to an expression if you want them mutually exclusive. Another option is to use a stage table and have one of two previous data flows move source A to stage or source B to stage depending on a switch in the control flow.

March 15, 2011 7:17 PM
 

John Schmid said:

Hi Jamie,

I had stumbled onto that post and actually tried it out. The problem is that when a row cannot be inserted I keep getting an SSIS error: - 1071610801. Then when I retrieve the description for that, I get ‘An error has occurred while sending this row to destination data source’ I have received this while forcing a duplicate row, or violating a FK constraint (when I missed performing a validating lookup). It seems like this catchall is all that is available though I know the DBMS has more to tell. Why can’t I get at: -2627 – Violation of primary key. Cannot insert duplicate, or -547 Insert stmt conflicted with FK constraint?

I’m aware that sending the failed rows to a CSV destination will return pkg success. I think I can detect this and fail the pkg myself in script code. An option to ‘Redirect row and fail package’ would be on my nice to have list.

Hi Frank,

Thanks for the excellent ideas. I’m sure C# can get the INSERT information I’m looking for. I was trying too hard to stay in SSIS. I’ll also have a look at a union of the two sources. I was thinking along those lines, but was hoping someone more experienced was going to point me to something magical.

-John

March 16, 2011 10:02 AM
 

AlexK said:

John,

If you have decent command of C#, you might get your job done much faster using it. IMO SSIS was not designed as a tool for strong developers.

Regarding the rows that cannot be inserted, it might be easier to keep them in staging tables in the same database - that allows for easier analysis, detection of duplicates etc.

Good luck.

March 16, 2011 11:53 AM
 

Frank Szendzielarz said:

John,

It's just about thinking in 'batches' instead of rows. You can use two data flows, going to a stage, in the control flow. One can be on and one can be off, depending on a variable. You don't even have to use a db stage - you can use the raw or recordset targets. A third dataflow would consume the output of the first two (whichever of the first two executes). No magic required. Hope it helps.

March 17, 2011 4:41 AM
 

Mike Sexton said:

"were I not speaking euphemistically I would be a lot ruder"

Ahh British understatement.  The issue I've seen arise when you mess up the conversions or if someone changes a source or destination without telling the ETL people (and that NEVER  happens, right? - American sarcasm)

If you are experienced you know to look to conversions as part of routine debugging.  For more junior people - particularly people who have learned to rely on implicet conversions in SQL Server - this can be harder to track down.  What would improve matters is a more detailed error message that points the less experienced folks to the conversion process as a potential source of errors.

My $.02

Mike  

March 17, 2011 11:40 AM
 

SDC said:

I would say with this, first, yes, I find it annoying (to follow your example and understate wildly how I really feel about it).

In terms of 'does it cause problems?' it's really hard to quantify such a thing. People learn to 'live with it' but I don't doubt in aggregate it's caused significant lost productivity and generated a lot of ill will toward SSIS (Jamie T provides some helpful links re: that above).

Dismissively closing Connect items also generates some ill will, I'm sure...as we see here even in these comments.

March 17, 2011 3:15 PM
 

P Williams said:

Yes Jamie, this was one of my biggest annoyances when I first started using SSIS nearly five years ago now. However, as my knowledge increased I understood more of why MS developed this way & then accepted it more. Understanding that SSIS needs its own native datatypes helped me but I could never understand why MS didn't document this or explain this in far more detail in the Help files - after all SSIS deals with data right ? - so understanding how datatypes are used is fundamental (still I'm not sure why variables have to have their own datatype format different to the data workflows).

In then end as my knowledge has increased this has become more of the odd annoyance rather than anything else. What I ended up doing was creating a matrix of how to convert between the different datatypes and filled it in as I came across experiences(issues ;-)).

So I think MS could have helped themselves a lot here if initially they had provided a lot more information on understanding why this was done along with more detailed information.

March 18, 2011 7:56 AM
 

jamiet said:

P Williams,

" I understood more of why MS developed this way "

Call me cynical but I suspect that the "why" is : "The dataflow and the workflow engines were built by two seperate teams and they never bothered agreeing on a universal type system until it was too late!"

I have trouble believing that this was a "design decision" :)

JT

March 18, 2011 8:11 AM
 

John Schmid said:

Frank, Jamie,

It looks like raw files are still the method to move bulk data between data flow tasks via a control flow. Is that correct? Or are there any more articles on the subject you could point me at for further reading? (I was reading Jamie's Data Flow Mechanics post but that predates 2008R2)

Thanks,

- John

March 18, 2011 11:20 AM
 

jamiet said:

John,

Raw files are indeed a great way to move data between dataflows. Another method is to drop the data into a database table. yet another way is to store the data in an object in a recodset in memory and I discuss this approach here: http://consultingblogs.emc.com/jamiethomson/archive/2006/06/28/SSIS_3A00_-Comparing-performance-of-a-raw-file-against-a-recordset-destination.aspx

The Data Flow Mechanics post is still very relevant today by the way.

Regards

Jamie

March 18, 2011 12:22 PM
 

jlevy said:

My biggest problem (and it's not a performance issue) has been trying to find which type I need to use in which place.  I'm relatively new to SSIS (only been working with it off and on for about 18 months) and was, until about a month ago, the only one in my department who had any experience with the product.  My first encounter with moving between type systems was on a sizeable project with several files and data sets being sent back and forth to various places as well as multiple stored procedures being called with variables passed.  It led to several hours of wasted development time while I searched BOL to find the charts for and explanations of each of the type systems, followed by several more hours of trial and error while I figured out which type system I needed to use in each instance that was throwing warnings.

I'm sure there's documentation available on the various forums and blogs that's easier to follow now that I know what I'm looking for, but at the time, I was grasping at straws trying to figure out the warnings.

Jennifer

March 21, 2011 11:12 AM
 

SSIS Junkie said:

Yesterday I received a notification that one of my submissions to http://connect.microsoft.com had received

April 7, 2011 10:12 AM
 

Geoff said:

Once I learned SSIS, I haven't had any "real" problems with it, just annoyances like having to set a variable using a Script Task instead of an expression if it's longer than 4000 characters, etc.

However, learning SSIS is much harder because of this. Getting parameter-based SQL statements to run, understanding the metadata in a data flow task...the differing (and more or less unexplained!) type systems cause problems here. Given the Microsoft philosophy ("Here are the docs...now go use Google to find out how it REALLY works!"), it seems unlikely this will be fixed (until MS buys Informatica ;)).

May 27, 2011 8:40 AM
 

Elmer Parica said:

There were less issues with DTS versus SSIS and realy mind bogging that Microsoft unable to address this supposed to be lot easier and developer friendly than DTS.

July 18, 2011 2:51 PM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement