THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Tibor Karaszi

Is statistics over non-indexed columns updated by index rebuild?

Short answer: no

This question came up today in the MCT discussion group. My gut instinct said no, but I wanted to test it to be certain. But first a brief background:

You can rebuild an index using DBCC DBREINDEX or in 2005 (and now preferred) ALTER INDEX ... REBUILD. Rebuilding an index internally creates a new index and when that has been done, drops the old index.

So it is pretty obvious that we also get new statistics for that index (based on all data, not sampled, just as when we do CREATE INDEX). As an aside, reorganizing does *not* update the statistics...

But what about statistics over non-indexed columns? SQL Server can create this by itself, assuming you didn't turn off this database option. These are named something like _WA_sys. And you can also create these explicitly usinf CREATE STATISTICS.

A few words about below script: I wanted to use the STATS_DATE function to retrieve datetime for when the statistics was built/updated. But STATS_DATE doesn't seem to work on statistics only; it expect an id for an index... So, this is why I use DBCC SHOW_STATISTICS instead. And, unfortunately, DBCC SHOW_STATISTICS only display the statistics build time with minute precision. This is why I have a WAITFOR with > 1 minute in between the operations.

I got the same resuld whether or not I rebuild a clustered or non-clustered index on the table or even when specifying ALL indexes. Script:

USE tempdb
SET NOCOUNT ON
IF OBJECT_ID('t') IS NOT NULL DROP TABLE t
CREATE TABLE t(c1 int identity, c2 char(5))
INSERT INTO t (c2)
 SELECT TOP 10000 'Hello' FROM syscolumns a, syscolumns b

CREATE CLUSTERED INDEX x1 ON t(c2)
CREATE STATISTICS s1 ON t(c1)

SELECT ' ' AS "Before mass modification"
DBCC SHOW_STATISTICS('t', 'x1') WITH STAT_HEADER, NO_INFOMSGS
DBCC SHOW_STATISTICS('t', 's1') WITH STAT_HEADER, NO_INFOMSGS

WAITFOR DELAY '00:01:02'

INSERT INTO t (c2)
 SELECT TOP 10000 'Hi' FROM syscolumns a, syscolumns b
SELECT ' ' AS "Before index rebuild"
DBCC SHOW_STATISTICS('t', 'x1') WITH STAT_HEADER, NO_INFOMSGS
DBCC SHOW_STATISTICS('t', 's1') WITH STAT_HEADER, NO_INFOMSGS

WAITFOR DELAY '00:01:02'

--ALTER INDEX x1 ON t REBUILD
ALTER INDEX ALL ON t REBUILD
SELECT ' ' AS "After index rebuild"
DBCC SHOW_STATISTICS('t', 'x1') WITH STAT_HEADER, NO_INFOMSGS
DBCC SHOW_STATISTICS('t', 's1') WITH STAT_HEADER, NO_INFOMSGS

 

 

Published Thursday, August 09, 2007 6:18 PM by TiborKaraszi
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

James Luetkehoelter said:

Nice to point out - I find it amazing how often the two (indexes and statistics) are misunderstood. Also, when indexes or statistics are not updated - doing mass data loads, etc.

Well done, a simple question with a clear answer, as well as a more technical discussion of why. Great post.

August 10, 2007 12:06 PM
 

Alejandro Mesa said:

Hi Tibor,

Good question, short answer, great explanation and test script.

Microsoft should be more precise describing STATS_DATE function. The [index_id] could be also the [stats_id]. As you know, previous to 2005, they use to be in [sysindexes], so I guess that is why they continue mentioning just [index_id].

select [name], stats_date([object_id], index_id) as [stats_date]

from

(

select [name], [object_id], index_id from sys.indexes where [name] = 'x1' and [object_id] = object_id('t')

union all

select [name], [object_id], stats_id from sys.stats where [name] = 's1' and [object_id] = object_id('t')

) as t

Cheers,

AMB

August 22, 2007 3:15 PM
 

TiborKaraszi said:

Thanks for the kind words, James and Alejandro. :-)

Alejandro:

I did consider what you say above before I posted the blog. But, MS seems to have goofed on this. The problem is that the STATS_DATE function only takes two parameters: table_id and index_id. And, you can find a row in sys.stats with the same "index id" as a row in sys.indexes. So, how to tell these apart? Basically, STATS_DATE() needs a 3:rd parameter like "index" or "statistics". Try below (you might need to try a few tables before you find "duplicates"):

USE Adventureworks

DECLARE @o_id int

SET @o_id = OBJECT_ID('Production.Product')

SELECT 'Index' AS "type", [name] AS "Index or stats name", index_id AS "Index or stats id" FROM sys.indexes WHERE [object_id] = @o_id

UNION ALL

SELECT 'Stats' AS "type", [name] AS "Index or stats name", stats_id AS "Index or stats id" FROM sys.stats WHERE [object_id] = @o_id

ORDER BY "Index or stats id"

On my machine, for id 1 - 4, we have both statistics and indexes. And for id 5 - 10 only statistics. An interesting thing is whether STATS_DATE accepts an id for which we *dont* have an index?

SELECT STATS_DATE(OBJECT_ID('Production.Product'), 7)

Hmm, yes it does and probably reports the datetime for that statistics. I don't like this, at all. Makes me wonder what STATS_DATE reports if we have both an index and statistics with the same id? I would guess the index, but even if a test would show that, we can't say for sure. Time for a connect entry...

August 23, 2007 3:13 AM
 

TiborKaraszi said:

August 23, 2007 3:25 AM
 

Alejandro Mesa said:

Hi Tibor,

I noticed that the rows matching between sys.indexes ans sys.stats, for the same [object_id], are really related, in other words, the statistics of the indexes. Let us tweak your last query a little bit:

USE Adventureworks

DECLARE @o_id int

SET @o_id = OBJECT_ID('Production.Product')

select

*

from

(

SELECT [object_id], 'Index' AS "type", [name] AS "Index or stats name", index_id AS "Index or stats id"

FROM sys.indexes

WHERE [object_id] = @o_id

) as si

inner join

(

SELECT [object_id], 'Stats' AS "type", [name] AS "Index or stats name", stats_id AS "Index or stats id"

FROM sys.stats

WHERE [object_id] = @o_id

) as ss

on si.[object_id] = ss.[object_id]

and si.[Index or stats id] = ss.[Index or stats id]

go

If we can find matching rows, where the statistic is not related to the index, then we have a case.

What do you think?

Regards,

AMB

August 23, 2007 7:54 AM
 

TiborKaraszi said:

Mon dieu! But of course... you are correct!

(Sorry, I've been watching too much Poirot lately ;-) ).

A certain value for an index/stats id refers to the *same* index/statistics, not different. I didn't even pay attention to the index/stats name from my earlier query...

I guess I was stuck in 2000-mode somehow, and figured a UNION ALL would basically give us old sysindexes. Not the case.

So, no problem with the STATS_DATE function, then. It is just a matter of cleaning up the BOL description. I'll update my connect entry and post a new BOL feedback.

Thanks Alejandro for sticking with me and sorting it out... :-)

So, no problem with the STATS_DATE function, then. It is just a matter of cleaning up the BOL description. I'll update my connect entry.

August 23, 2007 10:48 AM
 

Alejandro Mesa said:

As you mentioned, It just a matter of cleaning up BOL description. BOL should refer to [stats_id], instead [index_id], for STATS_DATE function. Personally, I think that the name of the function implies it.

Cheers,

AMB

August 23, 2007 12:14 PM
 

Denis Gobo said:

A year in review, The 21 + 1 best blog posts on SQLBlog Best posts according to me, it might have been

December 27, 2007 4:11 PM
 

Kal said:

Hi Tibor,

This behaviour seems to be different when you use DBCC DBREINDEX instead of ALTER INDEX ... REBUILD. When I used DBCC DBREINDEX ('Sales.CustomerAddress') on AdventureWorks, STATS_DATE showed me most recent date for all statistics, including the auto-generated stats (_WA_Sys..) on this table.

Have you tested your theory using DBCC DBREINDEX?

February 4, 2008 6:15 PM
 

TiborKaraszi said:

Hi Kal,

Hmm, yes indeed. DBCC DBREINDEX does update statistics over the non-indexed column, something which ALTER INDEX ALL ... REBUILD do not do. Thanks for catching that, I would have assumed that these two behaves the same...

February 12, 2008 6:14 AM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement