THE SQL Server Blog Spot on the Web
Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Adam Machanic

Adam Machanic, SQL Server Practice Lead for The Pythian Group, shares his experiences with programming, performance tuning, and optimizing SQL Server 2000, 2005, and 2008, in conjunction with related technologies such as .NET.

SQL Server Query Processing Puzzle: LIKE vs ?

How creative are you with manipulating your queries to produce more efficient plans? Try the following puzzle and e-mail your solution to me at [<my last name> @ pythian.com]. Make sure to include an explanation of why it works, as well as your mailing address. The best two solutions/explanations win a free copy of Expert SQL Server 2005 Development, a wonderful feeling of accomplishment, plus eternal fame and glory when I reveal your solutions here on the blog.

Run the following T-SQL to create two tables in TempDB:

USE TempDB
GO

CREATE TABLE b1 (blat1 CHAR(5) NOT NULL)
CREATE TABLE b2 (blat2 VARCHAR(200) NOT NULL)
GO

INSERT b1
SELECT LEFT(AddressLine1, 5) AS blat1
FROM AdventureWorks.Person.Address

INSERT b2
SELECT AddressLine1 AS blat2
FROM AdventureWorks.Person.Address
GO
Now consider the following query:
SELECT *
FROM b1
JOIN b2 ON
b2.blat2 LIKE b1.blat1 + '%'

This query takes around three minutes to run on my notebook, and does over 1.8 million logical reads. Can you figure out a way to re-write it so that it performs better? No modification of the base tables or addition of any other objects is allowed (sorry, no indexed views!) -- the challenge is to tune this by doing nothing more than re-writing the query.

Good luck! I'll leave the contest open for submissions until May 1.


Published Tuesday, April 22, 2008 11:18 AM by Adam Machanic

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Adam Machanic said:

To people responding: Don't just send me a solution.  Explain to me WHY it works.  Why does your change allow the query processor to work more efficiently?  The goal of this challenge is not just to modify a query and happen upon a faster version, but rather to think about the reasons behind it and gain a deeper understanding of the QP.  Enjoy!

April 22, 2008 1:05 PM
 

Alexander Kuznetsov said:

Hi Paul,

Why did you post your solution here instead of e-mailing it to Adam? Don't you think it might have skewed the contest just a little bit?

April 23, 2008 10:55 AM
 

Adam Machanic said:

Paul D., next time think before you post.

April 23, 2008 11:37 AM
 

david wei said:

Adam, do you think change blat1 to VARCHAR(5) will make this more challenging?  People may take the advantage of CHAR(5). :-)

April 23, 2008 2:17 PM
 

Adam Machanic said:

David:

If you want more of a challenge, try changing all of the 5s to 7s ... I'm not sure if this will be more or less challenging than the VARCHAR(5) change; I'll let you tell me :-)

April 23, 2008 3:05 PM
 

david wei said:

Adam, not the length issue, is the trailing space issue.

Trailing space makes the difference.

for example:

if 'abcde' like 'a%'

print 'Y' -- returns Y

if 'abcde' like 'a  %'

print 'Yes' -- does not return Yes

So if we know the blat1 is fixed 5 characters, we can simply use the left(balt2,5) = balt1 for the join, and a simple HASH join hint will only sacn two tables once, avoid nested loop joins on two 19K rows tables(which caused 1.8M reads).

But if we change blat1 to VARCHAR(5), and truncate the trailing space to make the length vary (from 1 to 5), this will add the chalenge, hence you can not just compare the first 5 characters :-)

INSERT b1

SELECT RTRIM(LEFT(AddressLine1, 5)) AS blat1

FROM AdventureWorks.Person.Address

-- Note. you can comment out some sensitive part in my post if you think this will skew the contest.

April 23, 2008 3:51 PM
 

Adam Machanic said:

David,

Length is an issue here as well, due to some of the data.  Try creating everything with length 7, and play with some of the solutions (assuming you've come up with one -- I don't see a submission from you in my inbox).  If your solution happens to be the most common one people have been sending, you'll find that it will no longer return the same results as the example query.

April 23, 2008 4:07 PM
 

Denis Gobo said:

David, that didn't make a difference, I am getting the same number of rows back

SELECT blat1,len(blat1),datalength(blat1),b1.blat1 ,left(b2.blat2,5),

len(blat2),datalength(blat2)

FROM b1

JOIN b2 ON  <very secret indeed>

len and datalength reveal that there are some rows with 5 and 4 chars

April 23, 2008 4:21 PM
 

david wei said:

Adam, still the trailing space issue, because the minimal length in b2 is 5, not 7.

select min(len((blat2))) from b2

So when you change to 7, blat1 automatic added 2 trailing space to make it CHAR(7); that cause result difference.

I have sent my solution.

April 23, 2008 4:24 PM
 

Michelle said:

So Adam,

Do we need to resend you a new solution using CHAR(7), or do we still have a chance to win...?   :-)

Thanks,

Michelle.

April 23, 2008 4:36 PM
 

david wei said:

Denis, it does make the difference. The len() does not count the trailing space, but they affect the like operator.

Try create the b3 table use trimmed varchar(5) as below:

(CREATE TABLE b3 (blat3 VARCHAR(5) NOT NULL)

INSERT b3

SELECT RTRIM(LEFT(AddressLine1, 5)) AS blat1

FROM AdventureWorks.Person.Address

SELECT count(*) FROM b3 inner JOIN b2 ON    b2.blat2 LIKE b3.blat3 + '%'

--Trimmed VARCHAR(5) returns 111181 rows

SELECT count(*) FROM b1 inner JOIN b2 ON    b2.blat2 LIKE b1.blat1 + '%'

-- original CAHR(5) returns 109984 rows

April 23, 2008 4:44 PM
 

Denis Gobo said:

David,

I see I was talking about LEFT not LIKE, with LEFT it doesn't matter if it is char or varchar

April 23, 2008 4:51 PM
 

Fabiano Neves Amorim said:

Umm, Denis, I think with the answer you "spilled your beans" about your secret :-(

April 25, 2008 5:51 AM
 

SQL Server på svenska said:

April 30, 2008 5:48 PM
 

Gordon said:

It's all about iteration.

May 2, 2008 1:23 AM
 

Igor Mikhalyev said:

So where are three best queries?  Who are the winners?

May 2, 2008 12:35 PM
 

Adam Machanic said:

Igor: Patience, please.  I got a -lot- of responses... Trying to figure out who won now.

May 5, 2008 12:14 PM
 

too embarrassed to say said:

Criminy. I didn't even NOTICE the char vs. varchar part. Now my answer could be wrong. Blah.

May 9, 2008 5:23 PM

Leave a Comment

(required) 
(optional)
(required) 
Submit

About Adam Machanic

Adam Machanic is the SQL Server Practice Lead for The Pythian Group, a leading provider of wholly and partially outsourced remote administration of the enterprise database tier, for over 100 customers around the world. He has been involved in dozens of SQL Server implementations for both high-availability OLTP and large-scale data warehouse applications, and has optimized data access layer performance for several data-intensive applications. Adam has written for numerous web sites and magazines, including SQLblog, Simple Talk, Search SQL Server, SQL Server Professional, CoDe, and VSJ. He has also contributed to several books on SQL Server, including "Expert SQL Server 2005 Development" (Apress, 2007) and "Inside SQL Server 2005: Query Tuning and Optimization" (Microsoft Press, 2007). Adam regularly speaks at user groups, community events, and conferences on a variety of SQL Server and .NET-related topics. He is a Microsoft Most Valuable Professional (MVP) for SQL Server and a Microsoft Certified IT Professional (MCITP).
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement