THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Denis Gobo

So do you think deciding between a clustered or non clustered index is difficult?

You are looking at the data in your table, most of the queries are ranges queries but every now and then a query gets executed for exactly one row. How would you index this? This is a very tough decision, right? You have 2 types of indexes to choose from, and if the index is a composite (multi column) index then you also have to worry about which column to place first so that you get a lean mean index seek.

 

With Sybase IQ you have these indexes to choose from

 

The Default column index

The Low_Fast (LF) index type

The High_Group (HG) index type

The High_Non_Group (HNG) index type

The Compare (CMP) index type

The Containment (WD) index type

The Date (DATE) index type

The Time (TIME) index type

The Datetime (DTTM) index type

The JOIN Index (Linear joins and Star joins)

 

Yes, a lot more than two, Sybase IQ does not have clustered or non clustered indexes. You can specify clustered index but it will be ignored.

 

What do you think? Do we need more indexes in SQL Server? What about BitMap indexes?

Published Thursday, April 24, 2008 1:40 PM by Denis Gobo
Filed under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Greg Linwood said:

What we really need in SQL Server is table hash clusters, which replace the need for indexing keys. These are what makes Oracle so super fast on TPC-C.

April 24, 2008 1:20 PM
 

jchang said:

if the data is in memory, then the actual key lookup cost (not the plan cost) is about 20% higher to a Clustered Index than to a heap organized table.

So for a clustered index to make sense, it has to make about 20% of key lookups go away

This is guaranteed on any table with 5 or fewer indexes, assuming you pick the clustered index correctly.

On table with more than 6 or more indexes, the question is 1 index responsible for more than 20% of key lookups

if data is on disk, the clustered index is beneficial for reasons I will explain in a separate post.

Each type of index in Sybase has a specific mission. The large majority of people will not understand or correctly implement special index types, not mention ordinary indexes

Additional index types would could be helpful if accompanied by the right tool, something better than DTA

April 24, 2008 1:21 PM
 

Alexander Kuznetsov said:

jchang,

How did you come up with the following estimate:

"the actual key lookup cost (not the plan cost) is about 20% higher to a Clustered Index than to a heap organized table"

April 24, 2008 1:41 PM
 

jchang said:

Greg must mean TPC-H, TPC-C should not use hash joins. I do my own measurements on what the SQL Server CBO formulas should be, for in-memory, disk bound, by processor architecture, by platform architecture and all that. Its not that I do not trust MS, its just that I trust no one. Besides, my calibration techniques and analysis is better than anything MS has. There are many very serious issues in SQL 2000, and some serious issuess in 2005.

I published the 2000 stuff below, with updates at CMG and HP World. I never published the SQL 2005 model because people did not seem interested in knowledge, only what magic knobs would make their problems go away, without touching their crappy code

http://www.sql-server-performance.com/articles/per/quantative_analysis1_p1.aspx

http://www.sql-server-performance.com/articles/jchang.aspx

April 24, 2008 4:58 PM
 

Jason Massie said:

SQL creates on the fly bitmap temp indexes when it thinks it should. Do any other RDMS have filtered indexes?

April 24, 2008 5:06 PM
 

Alexander Kuznetsov said:

jchang,

I guess Greg Linwood meant Oracle's Multi-table Index Cluster Tables. Google up "Multi-table Index Cluster Tables". Your 20% estimate sounds counterintuitive to me, but cannot investigate right now.

April 24, 2008 5:50 PM
 

Kalen Delaney said:

I had been thinking of this post all day, and then noticed that Denis wrote a post with almost the same

April 24, 2008 5:51 PM
 

Greg Linwood said:

Joe, I think Oracle actually use hash clusters in TPC-C. Checking an audit report would confirm but I'm pretty sure they do.

April 24, 2008 6:03 PM
 

Kalen Delaney said:

I had been thinking of this post all day, and then noticed that Denis wrote a post with almost the same

April 24, 2008 6:13 PM
 

jchang said:

the first item below is part of an Oracle script for the hist table in TPC-C, the second is part of the script for the lineitem table in TPC-H. I do not see the value of using a hash in the index key for TPC-C. The queries should affect a small number of rows, 10-15 except for the stock level, for which the plan should be a loop join, which does not benefit. TPC-H has many hash joins, fow which having the hash already computed, ie, built, has a big benefit.

The Oracle table for Stock does include hashkey values, which seems to be for cardinality

create table hist (

h_c_id number

, h_c_d_id number

, h_c_w_id number

, h_d_id number

, h_w_id number

, h_date date

, h_amount number

, h_data varchar2(24)

)

partition by range(h_w_id) (

partition hist_0 values less than (5460) pctfree 5 initrans 4 storage (buffer_pool recycle)

tablespace hist_0,

partition hist_1 values less than (10920) pctfree 5 initrans 4 storage (buffer_pool recycle)

tablespace hist_0,

partition hist_2 values less

drop table lineitem;

create table lineitem(

l_shipdate ,

l_orderkey NOT NULL,

l_discount NOT NULL,

l_extendedprice NOT NULL,

l_suppkey NOT NULL,

l_quantity NOT NULL,

l_returnflag ,

l_partkey NOT NULL,

l_linestatus ,

l_tax NOT NULL,

l_commitdate ,

l_receiptdate ,

l_shipmode ,

l_linenumber NOT NULL,

l_shipinstruct ,

l_comment

)

pctfree 1

pctused 99

initrans 10

storage (freelist groups 4 freelists 84)

parallel

nologging

partition by range (l_shipdate)

subpartition by hash(l_partkey)

subpartitions 128

April 24, 2008 8:05 PM
 

jchang said:

why does it seem counter intuitive for the cost of a key look to a clustered index to be more expensive than a heap? the non clustered index on a heap table includes the file, page and row pointers, so its just go to this location. The non clustered index key on a table with a clustered index includes the cluster key, so it must navigate the clustered index. Even for a small table, ie, few rows, 1 leaf page, it must first go to the root level, then the leaf. For a large table, there are 1 or more intermediate levels

April 24, 2008 8:09 PM
 

Greg Linwood said:

Joe, I totally agree with you re CIX key lookup cost. I think MS & the SQL Server community really has this issue the wrong way around. The performance impacts also aren't just limited to IO - locking is also a major factor.

Re the Oracle hash clusters - this script demos table partitions but not hash clusters. Look for CREATE CLUSTER... then CREATE TABLE() CLUSTER.. for hash clusters. I have definitely seen these in TPC-C scripts in the past but haven't checked for a year or so..

April 25, 2008 7:53 AM
 

Alexander Kuznetsov said:

The moment I read the following "the actual key lookup cost (not the plan cost) is about 20% higher to a Clustered Index than to a heap organized table", I suspected it was not completely correct to put it mildly. Clearly the relative cost of a bookmark lookup depends on the size of bookmarks as well of on the depth of the clustered index. I ran some benchmarks and did see noticeable differences when I changed bookmark width or table size or both.

It is neither 20% nor about 20% - it does depend, it may be less than 10% in some, it may be more in other ones. It usually depends in query optimization - that's what makes it so interesting.

April 25, 2008 2:59 PM

Leave a Comment

(required) 
(required) 
Submit

About Denis Gobo

I was born in Croatia in 1970, when I was one I moved to Amsterdam (and yes Ajax is THE team in Holland) and finally in 1993 I came to the US. I have lived in New York City for a bunch of years and currently live in Princeton, New Jersey with my wife and 3 kids. I work for Dow Jones as a Database architect in the indexes department, one drawback: since our data goes back all the way to May 1896 I cannot use smalldates ;-( I have been working with SQL server since version 6.5 and compared to all the other bloggers here I am a n00b. Some of you might know me from http://sqlservercode.blogspot.com/ or even from some of the newsgroups where I go by the name Denis the SQL Menace If you are a Tek-Tips user then you might know me by the name SQLDenis, I am one of the guys answering SQL Questions in the SQL Programming forum.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement