THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Denis Gobo

Interview With Craig Freedman About Indexing, Query Plans And Performance

I asked for some names of people who you would like to see interviewed and Craig Freedman's name popped up a couple of times. I contacted Craig and he was kind enough to take time out from his busy schedule to answer these questions.  So, here are the questions.

What are the most important things a person can do to master SQL Server?

Use the product!  I recall once hearing an expert explain that it takes about ten years of practice to master a skill – and this refers to any skill whether it is mastering a software product, a sport, a musical instrument, etc.  Mentors, books, and classes certainly help whether you are just getting started or expanding your existing skill set, but the reality is there is no substitute for hands on experience.

What are the most important things a person can do to master Transact-SQL?

See my answer to the last question!

One thing that is worth pointing out is how different SQL (not the “Transact” part – just the query language itself) is from most other programming languages.  Most languages like C, C++, C#, VB, and JavaScript are procedural.  This is how programmers are taught to think.  First do A, then do B, then do C, etc.  Moreover, because most procedural languages share the same common elements, programming skills are relatively transferable from one language to another.  A good programmer can learn multiple languages fairly quickly.

SQL is much different.  SQL is a declarative language.  You tell the database what you want without specifying how to get it.  It’s a different paradigm and years of experience with procedural languages will do little to help you master SQL.  To become a SQL expert, you need to train yourself to think differently.

What SQL Server books are on your bookshelf?

Surprisingly (or perhaps not surprisingly depending on your point of view), I have only Kalen’s “Inside SQL Server 2005” series.  Much of what I need to know about SQL Server I learn either by checking Books Online, asking other developers (okay, I’m lucky that I have this option!), reviewing the source code (again, lucky me!), and searching the web.

What new technologies introduced in SQL Server 2005 do you think are the most beneficial for performance?

This is an interesting question to ask a developer.  When asked this question, most developers will tend to migrate toward the features or technologies that they work on and understand.  However, like most developers, I rarely use the product (other than for testing purposes) and I’m not a consultant.  So I actually have few strong opinions about what really works or doesn’t work from a customer perspective.  Instead, I prefer to hear from customers what they really like and dislike and what they would like to see in the product.

That said, I do think that the 64-bit and 64-way support that went mainstream with SQL Server 2005 is critical for large enterprise customers who need the most scalable and largest servers.  The DMVs that were added to SQL Server 2005 also have made a huge difference in identifying and addressing a host of performance issues.  Of course there were also a whole lot of smaller features that, depending on your application or workload, could make a huge difference for you but I couldn’t possibly name them all here.

What new technologies in SQL Server 2008 do you think are the most beneficial for performance?
Name three things that are new in SQL Server 2008 that you find are the most valuable?


I’m going to take these two questions together.  As with SQL Server 2008, there are many new features that depending on your application could have a major impact: compression, plan freezing (which should make service packs and upgrades far less painful going forward), the new MERGE statement, star join performance, parallel scans of partitioned tables, and the list goes on.

What is the biggest mistake people make when looking at a query plan?

I’m not sure about the biggest mistake.  What I would do is to encourage everyone to learn as much as possible about how to read query plans, to understand the different types of operators and how they work, and to be careful not to jump to any conclusions before fully understanding a query, schema, and plan.  These skills are essential for evaluating whether a query plan is good or bad or whether a better plan exists.

There are many things that can go wrong with query plans.  For example, a bad plan could be due to cardinality and costing errors, missing indexes, bad join orders, a poor choice of join or aggregation types, too little memory, etc.  At the same time, there are few, if any, absolute rights and wrongs.  Sometimes scans are better while other times seeks are better.  There is no best join type and no join type is inherently bad.

Why did you decide to co-author Inside Microsoft SQL Server 2005: Query Tuning and Optimization?

After I started blogging, Kalen approached me to see whether I was interested in contributing any of my material to her book.  I’d never authored a book before and was really excited about the opportunity to try something new.  So, I jumped at the opportunity.  It was a lot of work but a great experience.  Thanks, Kalen!

Can you name some best practices in terms of creating indexes?

Most importantly, before you start trying to tune your indexes, I think you need to know your workload, establish performance requirements and service level agreements, and understand your query plans.  For example, is your workload mostly read only (in which case the extra indexes may be mostly free aside from the disk space requirements) or a mix of reads and writes (so that the extra indexes need to be maintained)?  Is it more important for your reporting queries to run quickly or for your data loads to complete in the minimum time?  Is this an OLTP system or a data warehouse?

Once you know what you want to accomplish, you can focus on where you can get the biggest gains for the minimum cost.  Generally, it’s best to have a clustered index (rather than a heap).  If you create a heap and one non-clustered index, you have two objects to maintain.  If you create a single clustered index, you have only one object to maintain.  Better yet, clustered indexes tend to perform better than heaps for scans.

Look for large scans with highly selective filters.  Consider whether an index might enable a much more efficient seek.  Look for sorts that might be replaced by an ordered scan.  Look for bookmark lookups over many rows that might be eliminated by adding included columns.  Consider whether creating a different clustered index might eliminate more bookmark lookups.  If you have foreign key constraints, consider whether you need indexes on the foreign keys to support the constraint validation efficiently.  While experimenting, beware that creating new indexes may change plans in unexpected ways.  In SQL Server 2005, the missing indexes DMV and the Database Tuning Advisor (DTA) can help identify possible candidate indexes, but beware as they tend to be rather aggressive with their suggestions.

Finally, try not to overdo it.  Factor the cost of maintaining indexes into your decisions.  Make sure that all of your indexes and all of the keys and included columns are really used.  Get rid of any extraneous indexes.

Since we now have partitioning functions is there still a need for partitioned views?

In almost all cases, partitioned tables are more efficient, easier to manage and use, and overall the best choice.  However, there are still two cases where partitioned views can be useful.  The first case is for distributed partitioned views.  Partitioned tables cannot span multiple SQL Server instances while partitioned views can.  The second case is for tables where the partitions are heavily skewed.  With partitioned tables, the query plan shares the same operator(s) for each partition.  This design results in efficient compilation times and compact plans.  With partitioned views, the optimizer is free to optimize each partition independently and, if the partitions differ in terms of cardinality or cost, to choose different operators for each partition.  This design leads to longer compilation times and larger plans but can yield more efficient plans if the partitions are not uniform.

Are you working on any upcoming books?

No.  After completing my chapters for “Inside SQL Server 2005: Query Tuning and Optimization,” I realized just how much work it is to write a book.  As I said, it was a great experience and I’m grateful for the opportunity, but I’ve decided to prioritize and put my family first!

When should you use a clustered index and when should you use a non clustered index?

As I mentioned earlier, it’s almost always best to have a clustered index rather than a heap.  For the clustered index, try to choose a key that does not get updated.  Updating the clustered index key tends to lead to more fragmentation and to expensive update plans that touch every index.  Since the clustered index covers all columns in the table, you’ll also probably want to look at how you can choose the key to minimize bookmark lookups or sorts.  For example, if you have a set of common queries that all seek on a particular key and then touch a large number of columns and/or rows in the table, it may be best to choose that key for the clustered index.

If you have a query that seeks on a different key, runs less frequently, or only touches a small number of columns and/or rows, that might be a better candidate for a non-clustered index.  Remember that you can include extra columns in a non-clustered index and avoid a bookmark lookup.  But, don’t forget that a bookmark lookup is not necessarily a problem over a small set of rows.

How much does fragmentation affect query performance?

It depends.  For rotating media, random I/Os are much slower than sequential I/Os.  The problem with fragmentation is that it turns what would be sequential I/Os into random I/Os.  Fragmentation won’t affect performance if your workload already includes mostly random I/Os (for example, lots of small index seeks), if most of your data fits in memory so there is little or no I/O at all, or if your data is stored on SSDs where random and sequential I/Os perform comparably.  However, fragmentation could hurt performance if your workload includes many sequential table or index scans.

Unfortunately, the problem doesn’t stop there.  SQL Server reports logical fragmentation.  In most real world scenarios, the details of the physical storage subsystem are abstracted and hidden from SQL Server by all of the layers of hardware and software including the Windows file system and device drivers, SANs, and even drive firmware.  A file that is logically sequential may be stored in a physically fragmented manner while a table that is fragmented may be cached in SAN memory.  In the end, defragmenting an index may or may not be necessary and may or may not help.

What tools are available for developers to use when doing query performance tuning

The only tools I’m familiar with are those that ship with SQL Server: Management Studio, Profiler, DMVs, DTA, and the SQL Server 2005 Performance Dashboard Reports.  You can do quite a lot with these tools.  You can view query plans and try out different alternatives.  You can identify expensive queries and possible index alternatives.

Published Friday, May 23, 2008 8:39 PM by Denis Gobo

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

wharrislv said:

Great questions, a really nice read!

May 23, 2008 8:47 PM
 

Jason said:

May 24, 2008 1:01 AM
 

Kalen Delaney said:

You're welcome, Craig and Thank YOU!!

~Kalen

May 25, 2008 5:06 PM

Leave a Comment

(required) 
(required) 
Submit

About Denis Gobo

I was born in Croatia in 1970, when I was one I moved to Amsterdam (and yes Ajax is THE team in Holland) and finally in 1993 I came to the US. I have lived in New York City for a bunch of years and currently live in Princeton, New Jersey with my wife and 3 kids. I work for Dow Jones as a Database architect in the indexes department, one drawback: since our data goes back all the way to May 1896 I cannot use smalldates ;-( I have been working with SQL server since version 6.5 and compared to all the other bloggers here I am a n00b. Some of you might know me from http://sqlservercode.blogspot.com/ or even from some of the newsgroups where I go by the name Denis the SQL Menace If you are a Tek-Tips user then you might know me by the name SQLDenis, I am one of the guys answering SQL Questions in the SQL Programming forum.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement