THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Alexander Kuznetsov

Without ORDER BY, there is no default sort order.

Sounds trivial? Right, but different flavors of this myth still persist. Yesterday I noticed a thread on stackoverflow discussing SELECT TOP without a where clause, where it was suggested that "The order will be defined based on the clustered key in that table."

http://stackoverflow.com/questions/882195/what-does-top-1-mean-in-an-sql-query/882690#882690

Because apparently many visitors agreed with this myth, I decided to post a repro script which demonstrates that this is simply not true. Here you go:

CREATE SCHEMA Data AUTHORIZATION dbo
GO
CREATE TABLE Data.Numbers(Number INT NOT NULL PRIMARY KEY)
GO
DECLARE @ID INT;
SET NOCOUNT ON;
SET @ID 1;
WHILE @ID 100000 BEGIN
 INSERT INTO 
Data.Numbers(Number)
 
SELECT @ID;
 
SET @ID @ID+1;
END

CREATE TABLE Data.WideTable(ID INT NOT NULL 
CONSTRAINT PK_WideTable PRIMARY KEY,
RandomInt INT NOT NULL,
CHARFiller CHAR(1000))
GO
CREATE VIEW dbo.WrappedRand
AS
SELECT 
RAND() AS random_value
GO
CREATE ALTER FUNCTION dbo.RandomInt()
RETURNS INT
AS
BEGIN
DECLARE 
@ret INT;
SET @ret (SELECT random_value*1000000 FROM dbo.WrappedRand);
RETURN @ret;
END
GO

INSERT INTO Data.WideTable(ID,RandomInt,CHARFiller)
SELECT Numberdbo.RandomInt(), 'asdf'
FROM Data.Numbers
GO
CREATE INDEX WideTable_RandomInt ON Data.WideTable(RandomInt)
GO
SELECT TOP 100 ID FROM Data.WideTable 

1407
253
9175
6568
4506
1623
581

As you have seen, the optimizer has chosen to use a non-clustered index to satisfy this SELECT TOP query.

Clearly you cannot assume that your results are ordered unless you explicitly use ORDER BY clause.

 

 This post continues my series on defensive database programming. My next post:
When you add an index and your query blows up

Published Wednesday, May 20, 2009 9:31 AM by Alexander Kuznetsov

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

SkullKiller said:

/*

Hi

Your final conclusion isn't correct.

-- 1st the table doens't have a clustered index, so it can't be ordered by it.

-- 2nd you didn't select the column randomint, so you didn't prove it's ordered by that column

-- 3rd the real order when you don't have a clustered index the statements (Select, Update, Delete) orders the rows by the order of insertion in the table. (If a clustered index existed previously on the table it will order by the clustered index for the records that existed before the deletion of the clustered index)

An odd behaviour is that if you delete a record after you delete another the "gap" in rows left by the deletion is refiled by the new row. (See in the script)

A conclusion: If you SUD a table without a order by condition it always order the rows by the order that they exist on the pages of the table and it uses "gap" fill in the pages (the space left by a deletion is filled by the next inserted row).

A Little Script that demonstrates it (It ha only one page, so the "gap" refill isn't tested here when a table has more than one page

*/

CREATE TABLE #testeIndexOrder (

   InsertionOrder      INTEGER,

   IndexOrder          INTEGER,

   ClusteredIndexOrder INTEGER

);

INSERT INTO #testeIndexOrder

(InsertionOrder, IndexOrder, ClusteredIndexOrder)

VALUES(1,2,3);

INSERT INTO #testeIndexOrder

(InsertionOrder, IndexOrder, ClusteredIndexOrder)

VALUES(2,3,2);

INSERT INTO #testeIndexOrder

(InsertionOrder, IndexOrder, ClusteredIndexOrder)

VALUES(3,1,1);

SELECT * FROM #testeIndexOrder

CREATE INDEX IX_testeIndexOrder ON #testeIndexOrder(IndexOrder)

SELECT * FROM #testeIndexOrder

CREATE CLUSTERED INDEX IX_testeClusteredIndexOrder ON #testeIndexOrder(ClusteredIndexOrder)

SELECT * FROM #testeIndexOrder

DROP INDEX IX_testeClusteredIndexOrder ON #testeIndexOrder

INSERT INTO #testeIndexOrder

(InsertionOrder, IndexOrder, ClusteredIndexOrder)

VALUES(4,0,0);

SELECT * FROM #testeIndexOrder

DELETE FROM #testeIndexOrder

WHERE InsertionOrder = 3

INSERT INTO #testeIndexOrder

(InsertionOrder, IndexOrder, ClusteredIndexOrder)

VALUES(3,1,1);

SELECT * FROM #testeIndexOrder

DELETE FROM #testeIndexOrder

WHERE InsertionOrder = 3

INSERT INTO #testeIndexOrder

(InsertionOrder, IndexOrder, ClusteredIndexOrder)

VALUES(5,1,1);

SELECT * FROM #testeIndexOrder

INSERT INTO #testeIndexOrder

(InsertionOrder, IndexOrder, ClusteredIndexOrder)

VALUES(3,1,1);

SELECT * FROM #testeIndexOrder

DROP TABLE #testeIndexOrder

May 20, 2009 10:42 AM
 

Alexander Kuznetsov said:

-- 1st the table doens't have a clustered index, so it can't be ordered by it.

Of course the table does have a clustered index. Note that I created a primary key. By default, primary keys are implemented as clustered indexes. You can look it up in INFORMATION_SCHEMA views and see for yourself

-- 2nd you didn't select the column randomint, so you didn't prove it's ordered by that column

I never intended to "prove it's ordered by that column". I intended to demonstrate that it is not ordered by clustered index, which I did.

-- 3rd the real order when you don't have a clustered index the statements (Select, Update, Delete) orders the rows by the order of insertion in the table.

This is also completely false. I will provide a repro demonstrating otherwise soon, explicitly inserting rows one by one.

May 20, 2009 11:12 AM
 

Alexander Kuznetsov said:

Here you go. No clustered index, fixed order of inserting, still random order of selected ID:

CREATE TABLE Data.WideTableNoClusteredIndex(ID INT NOT NULL,

RandomInt INT NOT NULL,

CHARFiller CHAR(1000))

GO

CREATE VIEW dbo.WrappedRand

AS

SELECT RAND() AS random_value

GO

CREATE FUNCTION dbo.RandomInt()

RETURNS INT

AS

BEGIN

DECLARE @ret INT;

SET @ret = (SELECT random_value*1000000 FROM dbo.WrappedRand);

RETURN @ret;

END

GO

DECLARE @ID INT;

SET NOCOUNT ON;

SET @ID = 1;

WHILE @ID < 100000 BEGIN

 INSERT INTO Data.WideTableNoClusteredIndex(ID,RandomInt,CHARFiller)

 SELECT @ID, dbo.RandomInt(), 'asdf';

 SET @ID = @ID+1;

END

GO

CREATE INDEX WideTableNoClusteredIndex_RandomInt_ID ON Data.WideTableNoClusteredIndex(RandomInt,ID)

GO

SELECT TOP 100 ID FROM Data.WideTableNoClusteredIndex  

83864

67961

94434

53062

87404

65429

47053

91600

66222

72454

...

May 20, 2009 11:18 AM
 

Stuart Ainsworth said:

Actually, SkullKiller, your first statement is incorrect; Alexander specified a non-specific PRIMARY KEY on a table without a clustered index, which will build an underlying clustered index on that key.  I wish Alexander had written the script in such a way that it was easy to install (it depends on a Data schema and a table of Numbers), but if he had, you could examine the execution plan to show that order is indeed determined by the column RandomInt.

Here's an old script of mine from a presentation I did a few years ago; it's a slight variation on Alexander's, but the basic gist is the same and it's portable; in short, you cannot rely on the clustered index to return a consistent sort.  There are times when it's LIKELY to return data in the order of the clustered index, but not always.  Furthermore, what do you do if you decide to change the clustered index at a later date?

IF EXISTS (SELECT * FROM Information_schema.Tables

WHERE Table_name = 'ClusterTest')

DROP TABLE ClusterTest;

CREATE TABLE Clustertest

         (Cluster int NOT NULL,

          Other int NOT NULL,

          Filler char(2000) NOT NULL,

          PRIMARY KEY CLUSTERED (Cluster));

CREATE INDEX ix_Other ON ClusterTest(Other);

DECLARE @i int;

SET @i = 1;

WHILE @i < 10000

BEGIN

 INSERT INTO ClusterTest (Cluster, Other, Filler)

 SELECT @i, CASE WHEN @i % 789 = 0 THEN @i % 987 ELSE 0 END, '';

 SET @i = @i + 1;

END;

*/

--This query will use a clustered index seek,

--and will PROBABLY return rows in the order of the index.

SELECT *

FROM ClusterTest

WHERE Cluster BETWEEN 100 AND 4500

--This query will perform an index scan

--and order will be in the method the optimizer thinks most effecient.

SELECT Cluster

FROM ClusterTest

WHERE Cluster BETWEEN 100 AND 4500

May 20, 2009 11:27 AM
 

Denis Gobo said:

May 20, 2009 11:35 AM
 

Alexander Kuznetsov said:

Stuart,

I added CREATE SCHEMA, CREATE TABLE, and populate it to my script. Thank you for the suggestion.

May 20, 2009 11:40 AM
 

Alexander Kuznetsov said:

Denis,

A great link, thanks! Somehow I failed to google it up.

May 20, 2009 11:42 AM
 

GrumpyOldDBA said:

Examples can be tricky to script and I agree with Stuart on that point. I thought most people knew about random orders without an order by; didn't it change in sql server 7.0 ?; but obviously not < grin >  It can be interesting when using TOP with updates though.

May 20, 2009 11:44 AM
 

Rajiv said:

Alexander,

Your last example does not prove it. I dropped index WideTableNoClusteredIndex_RandomInt_ID. I ran query:

SELECT TOP 100 ID FROM Data.WideTableNoClusteredIndex  

and selected table was returning in it's natural order 1,2,3,4,5. I mean in order the loop inserted records into WideTableNoClusteredIndex table. It is natural order of table.

May 22, 2009 6:27 PM
 

AaronBertrand said:

Rajiv, that doesn't prove that it will always come back in its "natural order" (which there is no such thing)... only that the optimizer chose to return it in that order that time.  There is no guarantee that it will return in that order every time... way too many factors can change the optimizer's choice.

May 22, 2009 11:16 PM
 

Toby said:

Obviously if ORDER BY isn't specified, the ordering is entirely up to the RDBMS. To discuss "accidental results" is to misunderstand the semantics of SQL. Can somebody explain the value in discussing this further? Thanks Alexander for taking a swing at the zombie...

May 23, 2009 3:59 PM
 

Clueless said:

I would like to know if there is a way to order within a query? I mean, to create a view so that by selecting the view the results will be already sorted...

May 24, 2009 1:19 PM
 

AaronBertrand said:

Clueless, no, once again, you cannot.  If you do not specify ORDER BY on the query, then the optimizer is free to return rows in any order it chooses.  You can get pretty reliable behavior in certain scenarios, but this is NOT GUARANTEED.  If you want a specific order, then write your queries correctly.

May 24, 2009 2:50 PM
 

GeorgeP said:

A simpler example is here:

create table #test (

ID int identity primary key,

Number int unique

)

insert into #test (Number) values (7)

insert into #test (Number) values (2)

insert into #test (Number) values (3)

select top 3 * from #test

drop table #test

ID Number

2 2

3 3

1 7

The results are sorted by the unique index (at least in my tests), which doesn't match neither the clustered key nor the insertion order.

May 25, 2009 10:37 AM
 

Alexander Kuznetsov said:

George,

Can you explain why the optimizer has chosen a non clustered index in your query? I cannot - this looks like a random choice to me. I made sure that in my script the choice is clear, because the table is much wider than the index, and there are enough rows to select, so the difference in real execution costs between scanning the table and the index is significant.

May 26, 2009 9:32 AM
 

Charles Kincaid said:

You indeed can specify ORDER BY in a view.  That said you still can't rely on that all the time.  If you JOIN to your view the final result set the order of rows may not match any underlying structure.

If you need it in a certain order then ask for it.  If the optimizer can get it without a sort then fine.  If you don't specify the ORDER BY then just be glad you got your data at all.

You grow to expect coffee, then soup, then salad, then entre, then desert.  But that is custom.  I often get the salad and then the soup seconds later.  Since I did not specify then I take it as it comes and am grown up about it.

May 26, 2009 1:33 PM
 

Alexander Kuznetsov said:

Some T-SQL code is written under the assumption that either a TRY block successfully completes or a CATCH

October 9, 2009 5:15 PM
 

Aaron Bertrand said:

In my last post in this series , I talked about the common habit of creating an IDENTITY column on every

February 8, 2010 10:02 PM
 

Gabe said:

Just want to say that this issue stumbled us. We have a lookup table with just 3 records. In testing, it was being sorted in alphabetical order. In our dev environment, it was being sorted by the clustered key. Unfortunately, we had that query in about 20 reports, which involved adding that one line tediously to all of them.

I had thought this issue only occurred with large tables but even this 3-record table was affected.

August 1, 2012 3:11 PM
 

Angier said:

It's possible by materialized views,

just by create a view and ordering (and/or create clustered index on it)

March 6, 2014 7:02 AM
 

AlexK said:

Angier,

Regarding materialized views, order is not guaranteed as well - they are not different from tables in that regard. It is very easy to come up with an example.

March 6, 2014 11:41 AM

Leave a Comment

(required) 
(required) 
Submit

About Alexander Kuznetsov

Alex Kuznetsov has been working with object oriented languages, mostly C# and C++, as well as with databases for more than a decade. He has worked with Sybase, SQL Server, Oracle and DB2. He regularly blogs on sqlblog.com, mostly about database unit testing, defensive programming, and query optimization. Alex has written a book entitled "Defensive Database Programming with Transact-SQL" and several articles on simple-talk.com and devx.com. Currently he works as an agile developer.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement