THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Joe Chang

Top Clause and Other Factors in Problematic Execution Plans

Three years ago, I conducted an extensive investigation on a SQL Server system running kCura's Relativity document e-discovery application. It was fascinating to see the broad range of problematic queries all from one application. This provided good material for my presentation Modern Performance which focuses on the more spectacular problems that can occur with a cost based query optimizer.

(I am not sure why the images are not showing up. See the alternate link Top Clause)

It should be pointed that unlike a typical application in which the key queries can be rigorously tuned, Relativity must generate the SQL from options in the UI. The objective of Relativity is to support complicated searches, with heavy emphasis on queries of the form: IN set A but NOT IN set B. This causes problems in row estimation because there is no generally valid logic to assess whether there is overlap between Set A and Set B even if row estimates on the individual sets are possible.

If this were not difficult enough, there can also be nested AND/OR combinations. SQL Server has difficulty generating good execution plans even for a single AND/OR combination. It is unfortunate that the most direct conversion of natural (user-oriented) logic to a SQL expression (that Relativity employs) just happens to be the form for which the SQL Server query optimizer has difficulties in producing a good execution plans.

Relativity 8.1

This article is based on a brief observation of Relativity version 8.1. The first Relativity article was originally based on version 7.3-7.5, and later updated with observations of version 8.0.

Relativity Architecture

The architecture of Relativity first employs a SELECT COUNT query to determine the number of rows that meet a specific search query. A second call is then made in the form of a SELECT TOP 1000 query to retrieve just the document identity column (ArtifactID) values. There should be a third query that retrieves all the desired columns for specific ArtifactID values, but this query is very low cost and does not warrant discussion here. If more than 1000 rows are needed, then the next query will be TOP 6000.

One observation is that even when the COUNT query indicates that there are fewer than 1000 rows, the ArtifactID query is still issued with the TOP 1000 clause. In earlier versions of Relativity (7.x), the TOP 1000 query was issued even when the COUNT query indicated zero rows. Perhaps an oversight by the developers in not realizing zero rows are returned when COUNT is zero.

One aspect of Relativity architecture that is correct is the not enabling plan reuse. The expectation is that individual searches probably have high CPU for execution than for compile and that each parameter set could have skewed distribution. However, in issuing explicit SQL, there is also no opportunity to fix problematic SQL. Hence it is important the Relativity anticipates as many problems as possible and employ good strategies for know issues. We could wish for such, and we would still be wishing.

Problematic SQL in Relativity 8.1

There are several difficulties in Relativity queries, three of which are covered here. One is due to the SELECT COUNT followed by TOP 1000 architecture. The second is due to the use of the direct translation of natural (user) logic to SQL rather than a form of SQL for which the SQL Server query optimizer happens to produce very good execution plans. The third is in parallelism strategy. The data in Relativity is expected to have heavily skewed distribution so is necessary to watch for situations where this produces ineffective parallel execution plans.

Relativity Query Example

Below is the COUNT form of an example search query to be studied in greater detail.

SELECT COUNT( [Document].[ArtifactID])
FROM eddsdbo.[Document] WITH(NOLOCK)
LEFT JOIN eddsdbo.[Custodian] AS [o1000022_f2154932] (NOLOCK)
ON [o1000022_f2154932].[ArtifactID] = [Document].[Custodian]
WHERE [Document].[AccessControlListID_D] IN (1,1000062,1000063,...)
AND
(
  [o1000022_f2154932].[ArtifactID] IN (22552945, 22552935, 22552925, 22552915, 22552905, 14768277,...)
 OR (EXISTS(
  SELECT [f2154930f2154931].[f2154931ArtifactID]
  FROM eddsdbo.[f2154930f2154931] (NOLOCK)
  LEFT JOIN eddsdbo.[Custodian] (NOLOCK)
  ON [Custodian].[ArtifactID] = [f2154930f2154931].[f2154931ArtifactID] 
  WHERE [f2154930f2154931].[f2154930ArtifactID] = [Document].[ArtifactID]
  AND [Custodian].[ArtifactID] IS NOT NULL
  AND ([Custodian].[ArtifactID] IN (22552945, 22552935, 22552925, 22552915, 22552905, 14768277,...))
 ))
)

The actual number of values in the AccessControlListID_D IN clause is 113. The actual number of values in each of the Custodian.ArtifactID IN clauses is somewhat over 100, with both sets being identical.

Below is subsequent TOP 1000 query that is issued after the COUNT query. In the 7.x versions, the TOP 1000 clause is present regardless of the number of rows indicated by the COUNT query, and is in fact issued even if COUNT reports zero rows. It is also very possible that either one of the COUNT and TOP queries or both are very expensive regardless of the actual number of rows.

SELECT TOP 1000 [Document].[ArtifactID]
FROM eddsdbo.[Document] WITH(NOLOCK)
LEFT JOIN eddsdbo.[Custodian] AS [o1000022_f2154932] (NOLOCK)
ON [o1000022_f2154932].[ArtifactID] = [Document].[Custodian]
WHERE [Document].[AccessControlListID_D] IN (1,1000062,1000063,...)
AND
(
  [o1000022_f2154932].[ArtifactID] IN (22552945,22552935, 22552925, 22552915, 22552905, 14768277,..)
 OR (EXISTS(
  SELECT [f2154930f2154931].[f2154931ArtifactID]
  FROM eddsdbo.[f2154930f2154931] (NOLOCK)
  LEFT JOIN eddsdbo.[Custodian] (NOLOCK)
  ON [Custodian].[ArtifactID] = [f2154930f2154931].[f2154931ArtifactID] 
  WHERE [f2154930f2154931].[f2154930ArtifactID] = [Document].[ArtifactID]
  AND  [Custodian].[ArtifactID] IS NOT NULL
  AND ([Custodian].[ArtifactID] IN (22552945, 22552935, 22552925, 22552915, 22552905, 14768277,...))
 ))
)
ORDER BY [Document].[ArtifactID]

Alternative Queries

For the above search with the actual data set, there are in fact 232 rows. Three alternatives queries are examined here. One is the above query without the TOP clause. The second alternative is the above query (including the TOP clause) but with an index hint applied. The third alternative tested is the form below, with the OR clause replaced by a UNION.

SELECT [Document].[ArtifactID]
FROM eddsdbo.[Document] WITH(NOLOCK)
LEFT JOIN eddsdbo.[Custodian] AS [o1000022_f2154932] (NOLOCK)
ON [o1000022_f2154932].[ArtifactID] = [Document].[Custodian]
WHERE [Document].[AccessControlListID_D] IN (1,1000062,1000063)
AND
(
  [o1000022_f2154932].[ArtifactID] IN (22552945, 22552935, 22552925, 22552915, 22552905, 14768277,...)
)
UNION
SELECT  [Document].[ArtifactID]
FROM eddsdbo.[Document] WITH(NOLOCK)
LEFT JOIN eddsdbo.[Custodian] AS [o1000022_f2154932] (NOLOCK)
ON [o1000022_f2154932].[ArtifactID] = [Document].[Custodian]
WHERE [Document].[AccessControlListID_D] IN (1,1000062,1000063,...)
AND
(
 (EXISTS(
  SELECT [f2154930f2154931].[f2154931ArtifactID]
  FROM eddsdbo.[f2154930f2154931] (NOLOCK)
  LEFT JOIN eddsdbo.[Custodian] (NOLOCK)
  ON [Custodian].[ArtifactID] = [f2154930f2154931].[f2154931ArtifactID] 
  WHERE [f2154930f2154931].[f2154930ArtifactID] = [Document].[ArtifactID]
  AND  [Custodian].[ArtifactID] IS NOT NULL
  AND ([Custodian].[ArtifactID] IN (22552945, 22552935, 22552925, 22552915, 22552905, 14768277,...))
 ))
)
ORDER BY [Document].[ArtifactID]

For this particular case, the UNION will produce exactly the same set of rows as the original query because only the Document table ArtifactID column is in the select list and this column is the primary key of the Document table. In the more general case with columns from more than one table in the SELECT list, the UNION form could have a different row set than the original OR form. The architecture of Relativity is such that the OR forms can be converted to UNION while maintaining correct results.

Earlier, it was mentioned that the SQL Server query optimizer can have problems in producing a good execution plan for queries with a combination of AND and OR clauses. The form shown above is set A OR set B, with A having an AND clause. The other form of this is set A AND set B, with either A or B having an OR clause. The second form could probably be handled by converting the AND to an INNER JOIN, which would require both conditions to be true, but a full study has not been done on this.

Relativity Query Execution Plans

Now consider the estimated execution plans for the 5 queries: 1) COUNT, 2) TOP, 3) no TOP, 4) index hint, and 5) UNION.

The SELECT COUNT estimated execution plan is shown below (full plan).

CountEst

The SELECT TOP 1000 estimated execution plan (full plan).

Note top right operation in the above execution plan for the TOP 1000 query. The operation is an Index Scan on index EV_ArtifactID. The index lead key is ArtifactID, and has all the columns required for this query.

Below left is the Index Scan detail. Notice that the Estimated I/O cost is 85, corresponding to approximately 892MB (I/O cost 1 = 1350 pages or 10.5MB). Below right is the COUNT query Nested Loops operation at the top left of the plan.

Below are the left most operator showing total plan cost for the COUNT and TOP 1000 queries respectively.

,

The TOP 1000 plan cost is 0.31269 even though the Index Scan on Document table index EV_ArtifactID has an I/O of 85.38. This is because the execution plan is expecting the first 1111 rows from Documents to be sufficient to meet the total required 1000 rows. The plan cost for the COUNT query is much higher at 519 as all rows must be evaluated to test the match to the search arguments. The Document Index Scan in the COUNT query uses the IX_AccessControlListID_D index because this index is narrow with cost of 35.46 (272MB)

Below is the estimated plan for the SELECT query without the TOP clause (full plan)

,

The estimated row count is 11,569,800, exactly the same as in the COUNT query at the Nested Loops operation just before the rows are aggregated into a COUNT value.

There are 12.8M rows in the Document table, so the query optimizer believes that about 90% of row meet the search argument (with OR clause). The execution plan is the same as the COUNT query except that the return list requires a sort operation contributing to the higher overall plan cost of 720.

Below is the estimated plan for the TOP query but with an index hint applied (full plan)

The use of the index hint results in the query optimizer not changing the join order. Hence when using hints, it is also necessary to write the query in the form with the desired join order. The hinted index lead key does not match in the ORDER BY clause, so the entire index is scanned and then sorted. The plan cost 197 is more than the original TOP query, but less than the no TOP query. The reason is that the query optimizer believes it can exit the execution plan once the TOP 1000 rows have been found.

Below is the estimated execution plan for the UNION query. (full plan)



The Estimated Number of Rows is 3.2M and the plan cost is 92.8.

Curiously, the number of rows from each of the two sub-queries is much less at 23,764 and 923,036.

One would think that the UNION of the two result sets should at least equal to the larger and could be as high as the sum.

Relativity Query Execution Times

Below are the execution time in milli-seconds for both CPU and elapsed times of each of the 5 queries. The first is the COUNT query, followed by the Relativity standard TOP 1000 query. Next are the 3 variations considered: without the TOP clause, with an Index Hint and finally a UNION structure instead of OR in the WHERE clause.

QueryCPU timeelapsed time
COUNT13827969109
TOP 1000152148152298
no TOP14397371857
index hint16767122016
UNION1665324

The standard Relativity TOP query does not have a parallel execution plan as the plan cost (0.31269) is below the cost threshold for parallelism. Even though some of the operations in the plan show a cost greater than 0.3, the Top clause believes that the query can terminate after few rows from Documents (estimate 1111) have been examined.

In this example, the WHERE clause arguments are highly selective, and fewer than 1000 rows meet the conditions. So what happens is that the full set of 12.8M rows from the right most loop join outer source must be processed, as the Top clause "exit" criteria is never reached. Furthermore, the execution in not parallel, because it is believed to be a low cost plan.

Note the fat arrows in the SELECT TOP 1000 actual execution plan (full plan) below.

There is not a significant difference in CPU times between the COUNT, regular TOP, no TOP, and TOP + index hint query plans. The differences are mostly in the elapsed time. The regular TOP query has the longest elapsed time being a non-parallel plan as explained above. The COUNT and no TOP queries have elapsed times approximately one-half of the CPU time, corresponding to a 2X speed-up with parallelism even though the actual degree of parallelism was 8 (on separate physical cores).

Parallel Execution Plans

There are two critical factors/challenges in parallel execution plans. One is to divide the work evenly among multiple threads (each running on different cores). The second is to have sufficiently large granularity of work on each thread before some form of synchronization is required. My recollection was that SQL Server 2000 employed a method that ensured even distribution by having each thread process one page at a time(?) This was fine in the Pentium II/III 500MHz generation, but was far too small by the NetBurst and Core2 architecture processors.

SQL Server 2005 employed a different methods with the strategy of allowing larger granularity and reduced need for synchronization between threads(?) but could also result in having highly uneven distribution of work between threads. Some of this was reported to have been fixed in SQL Server 2008 sp1, see Using Star Join and Few-Outer-Row Optimizations to Improve Data Warehousing Queries but apparently this is still an issue in SQL Server execution plans?

In the COUNT and no TOP queries, the Constant Scan operation acts as the outer source in a loop join to Document. This is the artificial rowset from the IN clause on AccessControlListID_D. It is known to SQL Server from statistics that this column has highly skewed distribution resulting in the row-thread split shown below.

The distribution on the other joins are also skewed but not as strongly?

The query with the TOP clause and an index hint just happens to prevent the SQL Server query optimizer from using a Constant Scan as a loop join outer source. The index hint does nothing to improve the CPU efficiency of this query. In fact it is more than 10% less efficient in terms of CPU. The positive effect is that work is evenly distributed among threads as shown below resulting in nearly linear scaling with parallelism (7.6 to 1).

Technically the problem here is not all few outer rows, but just uneven few outer rows? Microsoft seems to be aware of the problem and has fixed some aspects? but apparently now all. Perhaps the SQL Server engine could implement a flag or hint to avoid execution plans that are sensitive uneven distributions? For now, the only work-around is to look for this situation and take appropriate action.

UNION in place of AND/OR Combination

The execution plan that has outstanding results in terms of CPU efficiency is the UNION replacing the OR clause. It was previously documented that the SQL Server query optimizer has difficulty in generating a good execution plan for queries with a combination of AND/OR conditions, but has no problems when an alternative structure is employed.

There are other forms for which the SQL Server query optimizer does not produce good execution plans. Once these can be catalogued with appropriate alternative SQL expression strategies, there are probably not any searches that cannot be handled.

COUNT + TOP Architecture

This actually encompasses several sub-topics. Presumably the purpose of this architecture is to know the exact number of documents that match a search, but also so as to not over-whelm the application server with data. But we should consider that 1) the query with a rowset only has an integer column, 2) even a large case should have no more than tens of millions of documents (a small number in the modern world) and 3) the application server today has many gigabytes of memory. So this is not absolutely necessary?

There is also a client/application-side work around for this. Simply issue the query to return the identity column for all rows in the search, load the first X into an array, then continue to read but not store the remaining rows.

Consider the COUNT + TOP architecture. There are 4 possibilities. One is that both queries are inexpensive in which case this is not important. Second is that the COUNT query is expensive but not the TOP. This could happen when many rows match the search, and the TOP clause allows the query to exit quickly.

Third is that the COUNT is not expensive but the TOP is. This happens when the query optimizer estimates many rows but in fact there are few rows. In this case, an appropriate high-volume parallel plan is employed for the COUNT query, but a non-parallel plan is used for the TOP query relying on the expectation of exiting quickly. The exit condition is never met, and the non-parallel plan must process the full set of rows with only a single thread. Consider also that this non-parallel plan was formulated based on low start-up overhead (loop joins) rather than volume efficiency (hash joins).

The fourth possibilities is that both the COUNT and TOP queries are inherently expensive. In this case, we now have to execute two expensive queries all so that the developer can avoid a few lines of code on the application side? (Many SQL/database disasters have been traced to lazy/incompetent coders.)

Summary

As I said in the first Relativity article, all of the database problems appear to be solvable, but most require action in the application code where the SQL is generated, including the architectural strategy of COUNT + TOP. Some problems from Relativity 7.x appear to have been resolved, such as the data type mismatch between a varchar column and nvarchar search parameter. This by itself was a nuisance, but when combined with the TOP clause had significant negative consequence.

The full set of details along with a test database and queries to both reproduce and fix the problems in Relativity 7.x were sent to kCura. Most of the advice seems to have been disregarded.

It would seem that kCura was aware that there were problems. In version 7.x, there was a CodeArtifact table (3 columns CodeTypeID, CodeArtifactID, AssociatedArtifactID) that was frequently involve in search queries that would take forever, as in 30min (or whatever the web page time-out is) to 30 months (estimated) to complete. It is possible that the long running read queries and write activity also resulted in blocking and deadlocks despite prolific use of NOLOCK (without the WITH keyword?).

For version 8.x kCura went to great effort to split this table into multiple tables with names of the form ZCodeArtifact_xxx, one for each of the group values (xxx)? The new tables have columns CodeArtifactID and AssociatedArtifactID, so perhaps there is one table for each CodeTypeID in the version 7.x table? (This topic is covered in ZCodeArtifact & Statistics, as there were a series of performance problems in queries to these tables related to statistics.)

The problem was that blocking and deadlocks were the symptoms of Relativity problems, not the cause. The causes were the topics discussed here: 1) injudicious use of the TOP clause in situations in which the query optimizer makes a serious error in the estimated number of rows when the actual row count is already known from the COUNT query, 2) generating a complicated SQL expression with combination AND/OR clauses instead of JOIN and UNION, and 3) ineffective parallel execution plans due of skewed distribution. One more point, kCura also went to the effort of changing the SQL a from single expression form to one using CTEs. This may contribute to the clarity of the expression, but does not impact the execution plan problems.

Another problem seen in 8.1 is a search involving both conventional search arguments and Full-Text elements. The execution plan had the Full-Text Search (FTS) operation as a loop join inner source (see FTS), probably because the estimate number of rows from the outer source showed 1 (which could mean 0). This might have been because the ZCodeArtifact_xxx table was newly generated and statistics were not updated? The query produced a good execution plan after statistics were updated and had quick actual execution time as well.

Appendix

 
(232 row(s) affected) SQL Server Execution Times: CPU time = 152148 ms, elapsed time = 152298 ms. TOP 1000
(232 row(s) affected) SQL Server Execution Times: CPU time = 143973 ms, elapsed time = 71857 ms. no TOP
(232 row(s) affected) SQL Server Execution Times: CPU time = 167671 ms, elapsed time = 22016 ms. Index hint
SQL Server Execution Times: CPU time = 1665 ms, elapsed time = 324 ms. UNION
SQL Server Execution Times: CPU time = 138279 ms, elapsed time = 69109 ms. COUNT

The full list of Document.AccessControlListID_D and o1000022_f2154932.ArtifactID values are below
/* [AccessControlListID_D]
1,1000062,1000063,1000064,1000065,1000066,1000067,1000099,1000100,1000101,1000102,1000103,1000104,1000105, 1000106,1000107,1000108,1000109,1000110,1000111,1000112,1000113,1000114,1000115,1000116,1000117,1000118, 1000119,1000120,1000121,1000122,1000123,1000124,1000125,1000126,1000127,1000128,1000129,1000130,1000131, 1000132,1000133,1000134,1000135,1000136,1000137,1000138,1000139,1000140,1000141,1000142,1000143,1000144, 1000145,1000146,1000147,1000148,1000149,1000150,1000151,1000152,1000153,1000154,1000155,1000156,1000157, 1000158,1000159,1000160,1000161,1000162,1000163,1000164,1000165,1000166,1000167,1000168,1000169,1000170, 1000171,1000172,1000173,1000174,1000175,1000176,1000177,1000178,1000179,1000180,1000181,1000182,1000183, 1000184,1000185,1000186,1000187,1000193,1000206,1000240,1000244,1000245,1000246,1000259,1000288,1000315, 1000483,1000535,1000536,1000550,1000599,1000617,1000763,1000770

[o1000022_f2154932].[ArtifactID]
22552945,22552935,22552925,22552915,22552905,14768277,14768265,5375743,5375742,5375741,5375740,22552947, 22552937,22552927,22552917,22552907,22552894,22552884,22552933,22552923,22552913,22552903,22552890, 22552880,14768269,5375739,5375738,22552951,22552941,22552931,22552921,22552911,22552901,22552896, 22552886,14768275,22552892,22552882,22552939,22552929,22552919,22552909,14768273,14768267,22552949, 28315630,5549264,22552953,22552898,22552888,22552878,22552954,22552944,22552934,22552924,22552914, 22552904,22552950,22552940,22552930,22552920,22552910,22552900,14768271,22552948,22552938,22552928, 22552918,22552908,22552952,22552942,22552946,22552936,22552926,22552916,22552906,22552895,22552885, 22552932,22552922,22552912,22552902,22552891,22552881,14768268,22552943,22552899,22552889,22552879, 14768270,5549202,22552897,22552887,14768274,14768272,22552893,22552883,14768266,14768276,14768264

Note: a Test database was created with just these tables and limited columns for in-depth investigation.

Published Wednesday, April 23, 2014 5:51 PM by jchang

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

jchang said:

I am not sure why the images are not showing up. See the alternate link http://www.qdpma.com/CBO/TopClause.html

April 23, 2014 5:19 PM
 

jchang said:

Apparently the reason images did not show up at first was that I had a space on both sides of the equal char in <img src = "png" >, but without the spaces, it works! who knew? ok, may be everybody but me knew.

April 26, 2014 1:22 PM

Leave a Comment

(required) 
(required) 
Submit

About jchang

Reverse engineering the SQL Server Cost Based Optimizer (Query Optimizer), NUMA System Architecture, performance tools developer - SQL ExecStats, mucking with the data distribution statistics histogram - decoding STATS_STREAM, Parallel Execution plans, microprocessors, SSD, HDD, SAN, storage performance, performance modeling and prediction, database architecture, SQL Server engine

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement