THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Rob Farley

- Owner/Principal with LobsterPot Solutions (a MS Gold Partner consulting firm), Microsoft Certified Master, Microsoft MVP (SQL Server), APS/PDW trainer and leader of the SQL User Group in Adelaide, Australia. Rob is a former director of PASS, and provides consulting and training courses around the world in SQL Server and BI topics.

The SSIS tuning tip that everyone misses

Hi! - Great that you've found this page, but it's no longer here! You can find the content over at: http://blogs.lobsterpot.com.au/2011/02/17/the-ssis-tuning-tip-that-everyone-misses/

Published Thursday, February 17, 2011 5:19 PM by Rob Farley

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

jamiet said:

Superb post Rob, and thank you for taking the time to write it up in such detail.

JT

February 17, 2011 2:13 AM
 

Paul White said:

I agree with Jamie, this is an excellent post.  More technical stuff please, Rob :)

Paul

February 17, 2011 2:26 AM
 

Boyan Penev said:

Thanks, Rob - awesome post!

February 17, 2011 2:45 AM
 

Alex Whittles said:

Great post thanks Rob.

Time to review a couple of my existing source queries!!

February 17, 2011 3:55 AM
 

mark blakey said:

Ah ha, thats what it was....

February 17, 2011 6:22 AM
 

Dant said:

great post!

first time i seen the FAST option.

i'm not a SSIS heavy user, but this certainly has other uses

February 17, 2011 8:54 AM
 

Julie Smith said:

Excellent!  This option also gives the developer more freedom to possibly do more transformations using TSQL, whereas when you have a huge data source, often the choice is to avoid any extra work on the SQL Engine.  Thanks!

February 17, 2011 11:08 AM
 

kendra little said:

That's really interesting! It's another way of looking at thinking about what your customer/consumer actually needs, rather than just doing what the normal approach is.

February 17, 2011 12:54 PM
 

Rob Farley said:

Dant: Indeed - I used SSIS because it's commonly used, but the same principle applies to any streaming of data.

Everyone: Thanks for your compliments! I just hope it makes people think... :)

February 17, 2011 4:02 PM
 

TheSmilingDBA said:

That was a great technical example. Please keep them coming.

Thomas

February 17, 2011 5:00 PM
 

FishinDBA said:

Thanks Rob.  Pure magic.

February 17, 2011 10:04 PM
 

andyleonard said:

Hi Rob,

  Awesome tip! And yes, I was one of the people that missed this.

:{>

February 18, 2011 8:20 AM
 

MeganathanK said:

Thanks a million Rob :) Nice article

February 18, 2011 10:46 AM
 

B4PJS said:

Cheers Rob, I never even knew this existed! Good stuff :)

February 18, 2011 11:03 AM
 

TheSQLGuru said:

Being a relational engine performance tuning consultant I just have to say that this type of activity can CRUSH a SQL server so be VERY careful with it's usage unless you have COMPLETE control of everything that is (trying to) happening on the box!!

February 18, 2011 5:17 PM
 

Rob Farley said:

No worries everyone. ;)

Kevin: Agreed - especially if "this type of activity" is SSIS in general, which can be known to create cursors for inserts, or update a million rows one...at...a...time. And if they decide to turn on Transactions, then the locking can be just nasty.

I don't want people to put the FAST 10000 hint on all their Source Queries. Instead, I want people to think about their execution plans as the first part of the data flow, and consider what kind of optimising can be done on that - remembering that it's not always best to pick the query that will finish first.

You'll know as well as anyone that sometimes the fastest queries can be more expensive on the system because of locks, parallelism, tempdb usage, and other factors - and that choosing a fast plan is not always as ideal as choosing a plan which is less CPU intensive, or less IO intensive, etc. We've all seen cases where we hurt an individual query (eg, MAXDOP 1-ing it) for the sake of concurrency.

Your point is excellent - thank you for making it. Far too often, people tune individual situations without giving thought to the rest of the story. Just today I was talking to a fellow MVP about the performance impact of FKs. They do have a negative impact on the performance of inserts, updates and deletes, but under consideration of the benefits, we generally choose to keep them. I hope people consider the FAST 10000 as a useful tool, but not see every query as the proverbial nail.

February 18, 2011 6:34 PM
 

Koen Verbeeck said:

Great article!

I knew optimising your sources was important in SSIS, but I had never thought about the FAST option. Great tip!

February 21, 2011 2:55 AM
 

Brad Schulz said:

I've been out of commission for several weeks... Finally read your blog post today...

EXCELLENT stuff, Rob!  Keep 'em coming!

--Brad

March 1, 2011 3:42 PM
 

Jorge Segarra said:

Great post! As someone still learning the ropes of SSIS this was great for me to learn and think about. Tried it out briefly on a data flow that was pulling 6.7 million rows and I don't know if it was my imagination but it "seemed" to go faster when watching it pass rows inside BIDS. I'll have to try it out more later, thanks again for great post and keep up the good work!

March 2, 2011 4:43 PM
 

Rob Farley said:

Thanks guys.

Jorge - make sure you look at the execution plan of your query and investigate if it's helping, and consider the suitability of the hint.

If you have a blocking transformation in your SSIS flow, the data won't be released past it until all the data has reached it. This means that FAST won't help the speed of the SSIS package, only how fast buffers start appearing. In fact, it's quite likely to make it slower.

It's all about managing the bottlenecks, and this post is giving a suggestion to those people who haven't considered that the flow really starts in the execution plan.

I'll try to put a few more posts together about blocking operations, and the similarities between tuning SSIS Data Flows and tuning queries.

March 2, 2011 8:25 PM
 

Valentino Vranken said:

Very nice post Rob!  As you've pointed out, the FAST option certainly has its use in specific situations.

Looking forward to your upcoming posts :-)

Best regards, Valentino.

March 4, 2011 9:39 AM
 

Rob Farley said:

I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if

March 7, 2011 7:01 PM
 

Mona Hasavari said:

Hi Rob, Could you please put some documents regading how to speed up  the import from oracle to Sql server.

Thanks,

Mona

May 4, 2011 10:16 AM
 

Rob Farley said:

My guys are great! When PASS started accepting abstract submissions for their Summit (in October this

May 6, 2011 12:06 AM
 

Feodor Georgiev said:

Rob,

this is a great post!

Do you have any idea about how to do this if you have a CTE as a query for your data source?

Feodor

October 11, 2011 8:26 AM
 

David said:

Hi Rob,

Great article.  I've managed to cut our data warehouse load time by 50% thanks to your tips.

Cheers

December 14, 2011 3:39 AM
 

Rob Farley said:

Feodor - Using a CTE in a query makes little difference to the way it runs, it's pretty much the same as using a view.

David - Great news! That's the kind of comment I love!

December 14, 2011 4:40 AM
 

kvvr said:

great post

May 14, 2013 5:05 AM
 

Karel Van Camp said:

Brilliant,

A STG package that took 08:40m now runs in 21 seconds.

Simply by adding this 1 line to the end of the query.

That is about 25 times faster.

Thanks man.

October 10, 2013 4:33 AM
 

KVC said:

Sorry, 08:40:00 against 00:00:21.

That is about 1500x rather than 25x

(before we had a Buffers Spooled = 1 problem)

October 10, 2013 4:36 AM
 

Rob Farley said:

Karel, I'm pleased it helped!

October 10, 2013 6:44 AM
 

Mike G said:

Thanks for keeping this posted. You have changed my life. My pkg wrote 183,000 records in about 12 minutes vs. the same 183,000 in less than 2 seconds!! Now I will no longer have time to get coffee anymore. ;o)

November 19, 2013 11:42 AM
 

Nate West said:

This is a great IT example of how bottlenecks are explained in the excellent business book "The Goal". IMHO, it should be read by anyone doing data movement!

January 21, 2014 12:16 PM
 

Vinodhini said:

So useful topic.superb explanation.Thanks!

November 26, 2014 7:35 AM
 

Sam said:

Awesome tip. I have a related question. First I want to share some data

My Performance Analysis:

Table config:

# of rows in source query - 945,000

size: 556 MB

HINT              | DefaultBufferMaxRows | Time

OPTION (FAST 100) | 10,000               | 1:16

OPTION (FAST 100) | 100,000              | 1:16

OPTION (FAST 1000)| 100,000              | 1:54

OPTION (FAST 1000)| 500,000              | 1:54

OPTION (FAST 100) | 500,000              | 1:16

OPTION (FAST 1)   | 500,000              | 1:16

OPTION (FAST 1)   | 50,000               | 1:33

OPTION (FAST 100) | 50,000               | 1:33

I am trying to make sense out of this data. I understand the OPTION (FAST XX) but not clear how buffer is being used and why it makes a difference.

So my question is:

Is there a way (if yes then how) to find out (other than just experimenting) that what are the optimal values for:

OPTION FAST

DefaultBufferMaxRows, DefaultBufferSize

Rows per batch, Maximum insert commit size

July 14, 2015 12:41 PM
 

Sam said:

Another thing to add to above question...

How do you know when to use modulo approach to divide the data source so they can be processed in parallel.

When I added the modulo to best scenario above and divided the data into 3 groups I gained some more performance

OPTION (FAST 100, MAXDOP 1) | 10,000 | 1:03 (key % 3 = 0,  key % 3 = 1, key % 3 = 2)

July 14, 2015 12:53 PM
 

Rob Farley said:

Hi Sam,

I find that setting FAST to be the same as the buffer size can help, as you're telling it to fill up that first buffer as quickly as possible, but sometimes smaller FAST values can help encourage Nested Loops, which can be faster in some situations.

But the details about how large the buffer size should be are blurry and dependent on a lot of factors.

In your second comment, you mention parallelism, but also use MAXDOP 1. It's definitely worth working out ways that you can encourage parallelism in your main query as well as in SSIS. As far as "how do you know when...?" is concerned, it's really down to what you are seeing as the bottleneck.

Ultimately, it's good that you've tried a few options to see what works for you.

Rob

July 14, 2015 8:16 PM
 

Sam said:

Thanks Rob for a quick response. This is a great post. Good Job!

July 15, 2015 11:50 AM
 

JQ said:

How is the DISTINCT clause in your query going to prevent duplicates in the destination data source if you are pulling and inserting the first 10000 rows already?

December 13, 2015 9:11 PM
 

Rob Farley said:

Hi JQ - I'm not sure I fully understand your question. The point of the post is to explore the impact of removing blocking operators from the query acting as the source. I used DISTINCT to produce a quick example of a blocking operator. There are various ways to prevent duplicates in your destination, but that's more about how you handle your inserts than how you pull the data out of the source, as the source would have no idea what's already in your destination.

December 13, 2015 9:28 PM
 

chandan kumar said:

after all my source has 36 million records but in one batch it fetches only 2,00,000 records only, how can i improve data retrieval rate if we have to join 9 to 10 tables in source query.

July 22, 2016 6:52 AM
 

Rob Farley said:

36M rows is quite a bit, but 18 batches isn't many. Hopefully there aren't too many things like Sorts in there. I would suggest you explore general tuning, as the difference between performance for pulling 2M compared to 36M probably isn't going to produce a very different plan.

July 22, 2016 6:59 AM
 

Jan said:

I suppose FIRST_ROWS hint is the Oracle equivalent MSSQL OPTION FAST, but has anyone tested it? If so, I'd be interested to see the results. (Too busy to test it myself at this time, sorry.)

July 29, 2016 3:41 AM
 

RJ said:

WOW

August 3, 2016 6:53 AM
 

Abeljda said:

Rob, this reduced the duration of one of the loads here by 70%!! Realize this is a 5 yr old post but your still helping out with this. Thank you.

August 18, 2016 2:21 PM
 

Richard said:

Hi Rob,

Will this still apply in SSIS 2016 with the Data Flow Buffer Auto Sizing capability ?

Thanks

Rich

September 20, 2016 3:33 PM
 

Ayusman said:

Absolutely fantastic and quite evident...

May 15, 2017 3:27 AM
 

Anis said:

This is a great tip for optimization.

I was using this in SQL and not in SSIS.

September 6, 2017 6:40 AM
 

Michael said:

Riddle me this, cause i like what your saying here....

Whats the secret to loading CLOBS from an Oracle datasource to a SQL Server destination?

cause this Hint has no effect...

September 11, 2017 11:12 AM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Tags

No tags have been created or used yet.

News

News? Haven't you read my blog?

My Company


Can't find something?

Contact Me

IM: rob_farley@hotmail.com
Twitter: @rob_farley
Skype: rob_farley
E: rob_farley@hotmail.com

MVP (SQL Server)




Certifications








Adelaide SQL UG

Privacy Statement