THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Page Free Space

See also my SQL Server technical articles on SQLperformance.com

Parallel Execution Plans Suck

Published Thursday, May 3, 2012 7:57 AM by Paul White

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Alejandro Mesa said:

Paul,

Very good catch, and well done dissection of the problem.

I did a test with MAXDOP=16, and it consumed the whole fruit (1,000,000,000 rows). Sadly, I do not have SQL Server 2012 in a big box to do the trsting. I would expect the same if we use the new OFFSET ... FETCH functionality.

Here is the version:

Microsoft SQL Server 2008 (SP2) - 10.0.4266.0 (X64)   Nov  5 2010 16:11:39   Copyright (c) 1988-2008 Microsoft Corporation  Developer Edition (64-bit) on Windows NT 6.0 <X64> (Build 6002: Service Pack 2)

This item got my vote, and thanks for sharing this with us.

--

AMB

May 3, 2012 1:38 PM
 

Piotr Rodak said:

Excellent post Paul,

We in fact encountered serious performance issue of potentially similar nature on SQL 2008.

We had an insert..select query that was inserting rows from a join of few tables into a temp table. Everything was fine and the query finished in 20 seconds when there was no PK index on the temp table. When the index was created before insert, the query was running for 20 minutes and execution plan was showing some crazy rowcounts, accompanied by nested loops joins.

The same query worked fine on SQL 2005.

Piotr

May 4, 2012 5:16 PM
 

Paul White said:

@AMB: Thank you for that.  After the post about seeks that use linear interpolation, I am always a little nervous that someone else will find different behaviour on a different version or processor architecture!

@Piotr: That is interesting!  It would have been interesting to see an execution plan for that so we could be sure if it was the same issue or not.  No doubt too much time has elapsed for that now (though if not, please feel free to drop me a mail).  Always fascinating to find instances where an 'upgrade' turns out to be be so much 'up' after all :)

May 4, 2012 9:00 PM
 

jchang said:

there are 2 elements to the performance problem here. One is that the parallel loop join does not scale. I have complained about this before suggesting that this was because each access to the inner source was latching the root level index page instead of just once per thread or query, but Conor insists that this was absolutely necessary.

Second is that a hash join requires an equality join condition. We could easily replace the CROSS JOIN with an INNER JOIN  -- ON L.n = R.n, but with the virtual table, the QO will not consider this to be a join predicate. On the other hand, if we dump N1 into a real table, temp or permanent, the execution plan will show hash joins, which does scale with parallelism.

May 5, 2012 6:14 AM
 

Paul White said:

Hello Joe,

I think it's pretty clear the performance problem is due to processing hundreds of millions of rows on one thread when the final output only requires one million, wouldn't you say? ;c)

On the subject of parallel loops join and scaling, Adam Machanic and I have found the *complete reverse* to be true: parallel hash (and merge) join does not scale at all well, compared with *properly written* parallel nested loops code.

We should probably discuss this point off-line between the three of us, to avoid getting too far off-topic in these comments.  (Batch-mode parallel hash joins with column store in SQL Server 2012 is a different story).

I have updated my post to show a *parallel hash join* example that exhibits the same problem, and runs for 101 seconds on my test machine.  I should stress that this bug does not occur in *every* parallel plan that includes TOP after a parallel join (of whatever type).

Paul

May 5, 2012 8:57 AM
 

a.m. said:

Joe,

Parallel hash join (outside of Columnstore, as Paul mentioned) does not tend to scale well, due to the overhead of repartitioning exchanges. Parallel nested loop, on the other hand, can be made to scale extremely well if you're careful with it.

Latching of index pages is not a problem in my experience. I don't see any latch contention if I run large workloads using a parallel nested loop pattern.

--Adam

May 6, 2012 10:47 AM
 

Woodler said:

I have checked your bug report for this issue and I was disappointed.  They will not plan to fix it :

Posted by Microsoft on 4/26/2013 at 10:25 AM

Hello,

Thank you for submitting this feedback. After carefully evaluating all of the bugs in our pipeline, we are closing bugs that we will not fix in the current or future versions of SQL Server. Thank you for submitting this feedback. After carefully evaluating all of the bugs in our pipeline, we are closing bugs that we will not fix in the current or future versions of SQL Server. The reasons for closing this bug is because the scenarios reported in the bug are not common enough and so unfortunately it does not meet the bar for the current release.

If you have more supporting information and details or feel strongly that we should reconsider, please do resubmit this item for review.

Thanks again for reporting the product issue and continued support in improving our product.

Gus Apostol, SQL Server Program Manager

April 27, 2013 9:04 AM
 

Paul White said:

Hi Woodler,

Connect is an imperfect system; being closed as Won't Fix doesn't necessarily mean it will never be fixed, just that there are no immediate plans to do so. There are lots of things that need doing, and this item has only attracted 29 votes so far. I am personally hopeful it will be fixed some day.

April 27, 2013 9:59 PM
 

Paul White: Page Free Space said:

The query optimizer does not always choose an optimal strategy when joining partitioned tables. This

June 16, 2013 2:09 PM
 

Chris said:

I know this is an old thread, but I'm looking into a TOP Sort issue so I'm leaving this comment for others.

SQL 2016 SP1 adds a new option hint, either of which may help in this situation.

OPTION (USE HINT('DISABLE_OPTIMIZER_ROWGOAL'));

--or

OPTION (MAXDOP 1);

--or

OPTION (USE HINT('DISABLE_OPTIMIZER_ROWGOAL'), MAXDOP 1);

April 5, 2017 10:41 PM

Leave a Comment

(required) 
(required) 
Submit
Privacy Statement