THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

T-SQL Exercise: Simplest data-loading script to produce worst query performance

Here is a T-SQL scripting exercise in case you have a few minutes to spare or are bored with whatever else you are doing. 

 

Objective

The task is to write a simple T-SQL script to generate and load 4,000,000 rows into a test table. The objective is to make the following simple test query to have the worst performance in terms of elapsed time:

 

DBCC DROPCLEANBUFFERS

go

SELECT COUNT(*) FROM dbo.test;

 

The longer the elapsed time, the worst it is.

 

Constraints

Of course, there must be a number of constraints, and they are as follows:

 

·        The test table can have any number of columns of any fixed-length data types. However, the first column should be an integer column with values from 1 through 4,000,000, inclusive.

·        The sum of the all the column widths are between 1000 bytes plus or minus 20 (i.e. between 980 bytes and 1020 bytes).

·        No NULL is allowed for any column.

·        No undocumented features are allowed in the script.

·        When the test query is run, the test table must meet the following condition:

§         The test table has 4,000,000 rows.

§         The total size of data and index must be less than 7.5GB, as measured by sp_spaceused.

§         Avg. page density is greater than 50% as reported by DBCC SHOWCONTIG().

·      The test query shown above must be run as is without any change.

 

Why?

Well, it’ll reveal a lot of about how data is stored and how the simple test query is processed.

 

If you would like to share your solution, and I hope you do, please post it here in the comment. Or if you want it better formatted, I can append it to the main text of this post.

Published Friday, May 22, 2009 4:21 PM by Linchi Shea

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Denis Gobo said:

here is my attempt, this was run on SQL 2008  :-)

create table Sometest (id int identity,

Somecol2 uniqueidentifier primary key clustered,

Somecol3 char(100) default '1',

Somecol4 char(100) default '1',

Somecol5 char(100) default '1',

Somecol6 char(100) default '1',

Somecol7 char(100) default '1',

Somecol8 char(100) default '1',

Somecol9 char(100) default '1',

Somecol10 char(100) default '1',

Somecol11 char(100) default '1',

Somecol11 char(60) default '1')

--loop in batches to fragment the hell out of the table

declare @i int

set @i =1

while @i <= 40

begin

insert Sometest(Somecol2)

select top 100000 newid() from master..spt_values s1

cross join  master..spt_values s2

set @i =@i + 1

end

--Run these to validate

DBCC SHOWCONTIG('Sometest')

DBCC SHOWCONTIG('Sometest') with tableresults --check  average rowsize (1002)

sp_spaceused 'Sometest'

select count(*) from Sometest

May 22, 2009 4:21 PM
 

Linchi Shea said:

Denis;

The performance of the data loading script itself is pretty atrocious. But the test query finished in about 60% of the time it takes to finih with a benchmark data-load I use.

May 22, 2009 5:26 PM
 

Adam Machanic said:

I think I could do a bit better, but here's a quick shot:

---

create table test (x int not null, y char(896) not null default (''), z char(120) not null default(''))

insert test (x)

select

r

from

(

select row_number() over (order by (select 1)) r

from master..spt_values a, master..spt_values b

) p

where r <= 4000000

create clustered index ix_x on test (x, y) with fillfactor=51

---

May 22, 2009 5:32 PM
 

TiborKaraszi said:

I'm putting my hopes to forwarding pointers and fragmentation. I hope your system don't have SSD :-). Using Adam's script as base. The DDL and population took some 35 minutes on my desktop machine:

CREATE TABLE test (x int NOT NULL, y char(10) NOT NULL DEFAULT (''), z char(10) NOT NULL DEFAULT(''))

INSERT test (x)

SELECT r

FROM

(

SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) r

FROM master..spt_values a, master..spt_values b

) AS p

WHERE r <= 4000000

GO

ALTER TABLE test ALTER COLUMN y char(20)

GO

ALTER TABLE test ALTER COLUMN y char(896)

GO

ALTER TABLE test ALTER COLUMN z char(120)

GO

ALTER TABLE test ALTER COLUMN z char(500)

GO

ALTER TABLE test ALTER COLUMN z char(120)

May 23, 2009 10:03 AM
 

Adam Machanic said:

Try again, Tibor.  Yours is over 100x faster than mine on my test system :-)

May 23, 2009 11:14 AM
 

Adam Machanic said:

And Denis's is 50% faster than Tibor's on my end.

May 23, 2009 11:27 AM
 

TiborKaraszi said:

That is weird, Adam. Your's ran in some 2:xx minutes on my machine and mine took some 15:yy minutes. You didn't create any index, I hope (i.e., the table is a  heap)? Also, you included those ALTER TABLE? The point with mine was to have so many forwarding pointers so every new row would result in a random I/O. How many forwarded records do you have?

SELECT * FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('test'), NULL, NULL, 'DETAILED')

I have 3,951,020 forwarded records.

May 23, 2009 12:38 PM
 

Adam Machanic said:

Interesting.  On my end yours runs in 2 seconds; mine takes almost 3 minutes.  I tested twice...

May 23, 2009 12:49 PM
 

TiborKaraszi said:

That is weird. Do you still have my table? If so, can you check number of pages and number of forwarded records? And did you really empty the cache? The table should be some 6.2GB and that should not be handled in 2 seconds (physical reads), even without jumping back-and-forth.

May 23, 2009 12:58 PM
 

Adam Machanic said:

The box I'm testing on is hooked up to a SAN device with a good amount of cache, set to 50% read, so that's probably influencing things somewhat... I'm re-testing now.

May 23, 2009 1:04 PM
 

Adam Machanic said:

Same results -- 2 seconds.  And yes, I did DBCC DROPCLEANBUFFERS first.  Forwarded record count: 3951020

May 23, 2009 1:05 PM
 

TiborKaraszi said:

OK, that would probably be the SAN cache making the difference here. I wonder why yours would take as much 3 minutes? Could it be that my slightly smaller dataset fit into the SAN cache, while yours just happen to not fit in that cache? You had (output from sp_spaceused, reserved): 7,314,536 KB and I had 6,479,704 KB. My quesry shuld be performed doing an IAM scan and also following forwarded records, while yours should done using the clustered index linked list (most probably).

May 23, 2009 1:36 PM
 

Adam Machanic said:

Aha!  STATISTICS IO output gave me the answer:

--Mine

Table 'test'. Scan count 9, logical reads 937564, physical reads 18272, read-ahead reads 911821

--Yours

Table 'test'. Scan count 9, logical reads 4760879, physical reads 0, read-ahead reads 2

... OK, now the timings make more sense.  After doing a CHECKPOINT and yet another DBCC DROPCLEANBUFFERS, we get:

--Yours, try #2

Table 'test'. Scan count 9, logical reads 4760879, physical reads 99043, read-ahead reads 18616

Note that this one only took 23 seconds (which makes sense if you notice that mine had to do a lot more physical reads) -- but this still doesn't make sense, because your table has 809859 pages.

So now the question is, what is really happening here?  Why did mine not require a checkpoint to clear the buffers, while yours did?  And why do the buffers seem to not all be getting cleared with yours?  And how should Linchi test to determine the actual "winner" of this challenge?  Seems there are a large number of factors at play.

May 23, 2009 1:49 PM
 

Adam Machanic said:

Actually disregard that part about the "more physical reads"--not sure what I was thinking.  I believe the SAN has 8 GB of read cache, so I'll try another test where I'll build 16 GB of other data to totally flush things before running each test.  Will post back later or tomorrow.

May 23, 2009 1:55 PM
 

TiborKaraszi said:

OK, I've done some new tests. First, I've adapted the script so I now have roughly the same number of pages as your script generated (my script 7244056 KB, yours 7314536 KB). In order to reduce page density (I ended up with 54 %), I had to fill the pages more and then remove rows. I had a tad fewer rows than 4000000 rows, but that is not the point. So basically we now have a test between following the linked list of an index with potential a lot of read-ahead or jumping back and forth following forwarding pointers (I now have 3647030 forwarded records) - with roughly the same number of pages.

Running the query (first emptying cache) on your data set took about 2:45, and on my data set (again first emptying cache) took 6:25 (mhh, earlier it took 15 minutes, I wonder why the inconsistencies?).

I think the result here is highly dependent on the system in general, for instance the disk cache. Anyhow, here's my current DDL:

CREATE TABLE test (

x int NOT NULL

,x2 int NOT NULL

,y char(10) NOT NULL DEFAULT ('')

,z char(10) NOT NULL DEFAULT('')

)

DECLARE @rows int = 6666666, @toKeep int = 4000000, @diff int

INSERT test (x, x2)

SELECT TOP(@rows)

ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS r

,ROW_NUMBER() OVER (ORDER BY (SELECT 1)) % 10 AS s

FROM master..spt_values a CROSS JOIN master..spt_values b

ALTER TABLE test ALTER COLUMN y char(892)

ALTER TABLE test ALTER COLUMN z char(100)

CHECKPOINT

DELETE TOP(@rows - @toKeep) FROM test WHERE x2 IN(2, 4, 6, 8)

May 24, 2009 12:38 PM
 

Linchi Shea said:

This is a quick update on the T-SQL exercise I posted a few days ago. The goal was to write a simple

May 26, 2009 8:54 AM
 

Istvan Safar said:

use master

if DB_ID('fregmenteddatabase') is not null

begin

drop database fregmenteddatabase;

end

go

create database fregmenteddatabase;

go

use fregmenteddatabase;

go

if exists (select * from sysobjects where name = 'test')

  drop table test;

go

create table test(

  c1 int identity(1,1),

  c2 int,

  c3 char(200) default ' ',

  c4 char(700) default ' ',

  c5 char(96) default 'x');

go

create clustered index ix_test on test(c3,c4);

go

set nocount on;

go

declare  @i int,@max int;

set @max = 4000000;

begin tran

set @i=1

while @i <= @max

begin

     insert into test WITH (TABLOCK) (c2,c3,c4) select 1,NEWID(),NEWID()      

     checkpoint      

     set @i = @i + 1

end

commit tran

go

May 28, 2009 8:40 AM
 

Linchi Shea said:

This is another follow-up on the T-SQL exercise . So the test query below is rather simple: DBCC DROPCLEANBUFFERS

May 29, 2009 12:24 PM
 

ghf said:

f

December 2, 2009 3:24 AM

Leave a Comment

(required) 
(required) 
Submit

About Linchi Shea

Checking out SQL Server via empirical data points

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement