THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Joe Chang

Query Optimizer Gone Wild - Full-Text

Overall, SQL Server has become a very capable and mature product, with a very powerful engine and sophisticated query optimizer. Still, every now and then, a certain query structure throws the optimizer for a loop resulting in an execution plan that will take forever. The key to identifying this type of problem begins with the exeuction plan. First, the plan cost does not tell the whole story. It is necessary to know which execution plan operations can run well on modern server systems and which do not. Solving the problem can be a simple matter of rewriting the SQL to a different execution plan, one that uses good execution components.

Of course, when working with 3rd party applications that do not use stored procedures, it is necessary to convince the ISV, often first talking to someone who does not write code, not mention someone with any understanding of the SQL Server query optimizer.

Anyways, the topic here is Full-Text Search, in particular CONTAINS and CONSTAINSTABLE. CONTAINS is "a predicte used in a WHERE clause" per Microsoft documentation, while CONTAINSTABLE acts as a table.

Consider the two queries below, the first is an example of CONTAINS and the second an example of CONTAINSTABLE.

LoopJoinScan

We might intuitively think that there should be no difference between the two, which is why in SQL Server, we should never even bother with intuition and instead always, repeat always, focus on the execution plan.

LoopJoinScan

LoopJoinScan

Both queries perform a Full-Text search, both the CONTAINS function must also scan a index on the source table to get the count. The CONTAINSTABLE function on the other hand, being a row source, can be summed directly. In this example, the Document table is on the order of 60GB excluding lob structures stored out of the table, the index in question is 150MB, and there are 16M rows in the table. Both queries run in about 2+ sec elapsed. The first consumes 6 CPU-sec running in 2.2 sec, while the second query consumes 2.6 CPU-sec in 2.6 sec as it the not a parallel plan. OK, so the first query runs slightly faster with parallel execution on the Stream Aggregate, while the second is single-threaded. But the Full-Text function itself is not multi-threaded, and probably the bulk of the 2.2 sec of the first query. So why is the CONTAINS operation beneficial?

Before jumping to the title topic - Query Optimizer Gone Wild, lets look at another query, shown below.

LoopJoinScan

Below is the query plan. Note that neither column in the search argument is indexed, because this is an administrative query that the executive director runs once every month as a Key Performance Indicator, which is also probably related to why I am not an executive. So the execution plan is a table scan.

LoopJoinScan

The IO portion of the full table (Clustered Index) scan is 5677 (1350 pages or 10.5MB has an IO cost of 1 in a scan operation). For this particular example, the Fulltext Match Table Valued Function is assessed a plan cost of 1.6. When combined with the other components, Stream Aggregate and Filter, the total plan cost of this Full-Text search is 4.14.

On this particular system, a Xeon E7-48xx, with max degree of parallelism set to 8, the table scan query consumes 25 CPU-sec running 3.8 sec when data is in memory. At MAXDOP 20, the query consumes 37 CPU-sec running in 2.1sec. This is why I emphasized earlier that plan cost is not hugely relavent.

(In case you were curious, the 60GB, 16M row table scan consumes 23 CPU-sec at DOP 1, 24.5 CPU-sec, 12.3 sec elapsed at DOP 2, the same 24.5 CPU-sec, 6.6 sec elapsed at DOP 4, i.e., excellent scaling to DOP 8, and good continued scaling to DOP 20. This is an amazing 2.6GB/s per sec per core, and 700,000 rows per sec per core. Of course, this is a wide table with 175 columns averaging 3750 bytes per row.)

The Wild Plan

The actual query we are interested in is not the ones discussed above. Due to the nature of the application, PowerPoint documents can be indicated by the expression shown in any of three columns, one of which is part of a Full-Text catalog, as expressed by the query below.

LoopJoinScan

(It actually turns out that this query is not entirely correct from the technical perspective, but it is correct by executive direction; also, part of the reason why I will never be an executive.)

Given that this is a relatively simple SQL expression, and that the two elements of this query are known to run quickly, we might intuitively expect this composite query to also run quickly. But as I said earlier, do not even bother with intuition, and always always focus on the execution plan as shown below.

LoopJoinScan

Can you say: "We are sooo screwed!"

What is wrong with this plan? Let us compare this plan with the table scan plan from above. Both plans have approximately equal cost, as the 60GB table scan dominates. The extra Table Valued function contributes very little, as shown below.

LoopJoinScan

The problem with this execution plan is that there is a fat arrow (indicating very many rows, 16M in fact) coming from the outer source (top) with the Full-Text search in the inner source (bottom). For each row from the outer source, the inner source is evaluated.

This is why I said to not pay much attention to the plan cost, including components with high cost relative to other components.

Instead, it is important to focus on the SQL operations, the number of rows and pages involved, along with our knowledge of how each operation behaves non-parallel and parallel execution plans and the difference between data in memory versus on hard drive, and now also SSD storage.

This execution plan will attempt to performance 16M Full-Text searches. We have already established that this particular Full-Text search takes about 2 sec. The full query might take 32M sec. There are 86,400 seconds per day. We should expect this query to complete in 370 days, assuming there is not a need to reboot the OS after a critical security patch. And oh by the way, we need to run this query next month too, every month as a matter of fact.

Note, in the first Query Optimizer Gone Wild, the topic was a loop join with a table scan on the inner source. So this is another loop join example.

The Tamed Plan

Now that we have identified a problem, and we know exactly what to look for in the execution plan, it is time to solve the problem. Because we have been working with SQL Server and other DBMS engines built around a cost base optimizer for many years, we know exactly what to do. The solution is to rewrite the SQL to get a good execution plan where the two base operations forle which we know that have run time is reasonable are only executed only once each.

The query below meets this objective.

LoopJoinScan

The execution plan is below.

LoopJoinScan

This query consumes 37 CPU-sec, and 6.9 sec elapsed. Given that the two component elements of this query combined for 27 CPU-sec and 6.4 sec elapsed, the hash join and 2 parallelism repartition streams component increased true cost by 10 CPU-sec, but almost a minuscue 0.5 sec elapsed time.

I suppose that I should file this on Connect, but Microsoft locked my account out, and does not want to send me the unlock code. So I am posting this here.

Published Sunday, February 19, 2012 9:05 PM by jchang

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Scott McFadden said:

Good article. One question though...

Why Rank > 1?  Why not Rank > 0?

thanks

February 20, 2012 8:55 PM
 

Alexander Kuznetsov said:

I liked your solution, but an even better one would be to download source code, fork it, fix it and such. Fixing the root cause is often more interesting than finding a workaround, is it not?

February 21, 2012 4:29 PM
 

jchang said:

I image the SQL Server source code would be an enormously complex collection, given that the sqlservr.exe is 60MB, plus all the associated DLLs. It would probably take a lot of time just to find the code.

So as much as I would like to fix various broken aspects of SQL Server myself, it would be quicker to direct the problem to the code section owner, and have him/her fix it

ps, there is an even more interesting aspect of FullText search with complex AND OR combinations that is completely broken, I will describe later

February 22, 2012 10:06 PM
 

jchang said:

Scott: I am thinking Rank greater than 0, but less than 1 is for phrases where the match is not exact. For single word exact match, rank should be 1? perhaps a FT expert could weigh in.

AK: if there were not broken aspects of SQL Server, I might not have a job. All products have hidden land mines. I just like having a map of the danger spots.

February 22, 2012 10:09 PM
 

mmiirroorr mmaann said:

"but Microsoft locked my account out"

I cant imagine why.....SQLBits 256 slide deck maybe?

;)

February 24, 2012 5:19 PM
 

Jesse G said:

I'm fighting this exact issue right now.

It seems that the optimizer grossly underestimates the cost of doing fulltext searches inside of a loop join. Short of using a plan guide to force a hash or merge join, is there any other way to prevent a fulltext search inside of a loop join? (on the outside is ok)

The only other solution I've found is to use a temp table to hold the results of the contains query and change my where query

from:

... AND CONTAINS(SearchCol, @containsCondition)

to:

... AND id IN ( SELECT id FROM #containsMatches)

March 1, 2012 10:57 AM
 

jchang said:

sorry, the point of this and the http://www.qdpma.com/CBO/CBO_GoneWild.html discussion is for the MS SQL Server team to change the query optimizer not to generate execution plans with completely catastrophic consequences.

The upside is a minuscule saving in the loop join if and only if there is just 1 row from the outer source,

but death when there are many rows from the outer source.

At least this helps you diagnose an execution plan that seems reasonably, but will never complete

March 1, 2012 4:16 PM
 

Jesse G said:

Sql 2012 appears to do a better job.  I'm mostly seeing eager table spools when the fulltext is inside the loop join.  

I only get the problematic plan with very low cardinalities on the outer source (< 3500 rows seems to be the tipping point to produce the bad plan for my dataset/queries).  

May 13, 2012 1:00 AM

Leave a Comment

(required) 
(required) 
Submit

About jchang

Reverse engineering the SQL Server Cost Based Optimizer (Query Optimizer), NUMA System Architecture, performance tools developer - SQL ExecStats, mucking with the data distribution statistics histogram - decoding STATS_STREAM, Parallel Execution plans, microprocessors, SSD, HDD, SAN, storage performance, performance modeling and prediction, database architecture, SQL Server engine

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement