THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Denis Gobo

  • Teaser: Left Join..the SQL ego crusher

    One of the more difficult things to master for people coming to SQL seems to be the Outer Join. The common mistake people make is of course putting the condition in the WHERE clause from the outer joined table which should have been in the JOIN itself. There was a question about this on Stackoverflow and a person claimedhe did have an Left Join because...well....he had Left Join in his SQL statement...


    Let's take a look at some code, create these 2 tables


    CREATE TABLE Home( HomeID INT not null,

    City VARCHAR(200) not null,

    Taxes DECIMAL(16,2)not null)



    INSERT Home VALUES(1,'New York City',12000.00)

    INSERT Home VALUES(2,'Sacramento',11000.00)

    INSERT Home VALUES(3,'Wichita',6000.00)

    INSERT Home VALUES(4,'Tampa',9000.00)



    CREATE TABLE HomePhotos(HomePhotosID INT not null,

    HomeID INT not null,

    HasColorPhoto BIT not null)



    INSERT HomePhotos VALUES(1,1,1)

    INSERT HomePhotos VALUES(2,1,1)

    INSERT HomePhotos VALUES(3,3,0)

    INSERT HomePhotos VALUES(4,3,0)

    INSERT HomePhotos VALUES(5,4,1)


    If we do a regular join


    SELECT h.*,hp.HomePhotosID FROM Home h

    join HomePhotos hp ON h.HomeID = hp.HomeID

    Our output is this 

    HomeID	City		Taxes	HomePhotosID
    1 New York City 12000.00 1
    1 New York City 12000.00 2
    3 Wichita 6000.00 3
    3 Wichita 6000.00 4
    4	Tampa		9000.00		5 


    A left join

    SELECT h.*,hp.HomePhotosID FROM Home h

    LEFT join HomePhotos hp ON h.HomeID = hp.HomeID

    Gives us this 

     HomeID City Taxes HomePhotosID

    1	New York City	12000.00	1
    1 New York City 12000.00 2
    2 Sacramento 11000.00 NULL
    3 Wichita 6000.00 3
    3 Wichita 6000.00 4
    4 Tampa 9000.00 5

    Now what will these 2 queries bring back?


    SELECT h.*,hp.HomePhotosID,hp.HasColorPhoto

    FROM Home h

    LEFT join HomePhotos hp

    ON h.HomeID = hp.HomeID

    AND h.Taxes <= 11000





    SELECT h.*,hp.HomePhotosID,hp.HasColorPhoto

    FROM Home h

    LEFT join HomePhotos hp

    ON h.HomeID = hp.HomeID

    AND h.Taxes = -555


    Did you guess correctly?


    It is helpful to know the order of logical query processing, here is what happens in general

    FROM (incl joins)


  • A couple of reasons I won't be moving my databases to the cloud anytime soon

    Let's beat this horse to death once again shall we, Paul Nielsen wrote about SQL in the cloud already here: where he made the following prediction:

    In five years time, hosting your own SQL data on your own servers will seem as obsolete as running your own dial-up BBS.

    I am not so sure about that! 

    I attended the Microsoft Enterprise Developer Conference in New York City on Tuesday and Wednesday. Last year it was all about High Performance Computing, this time it was all about the cloud(I know, big surprise right?)

     I am going to focus this post on SQL Server Data Services. The SSDS session was presented by David Robinson who is a senior program manager on the SQL team. As you can imagine David is the perfect person to ask all kind of questions about what you can and can't do in SSDS. If you would like to watch this session yourself then visit this link:

    Oh, and that annoying person sitting in the front row asking all those questions......yeah that's me


    Here are the reasons I think some customers won’t be able to move to the cloud just yet


    1) 10GB limit

    This is a biggie for me since I have almost no databases that are smaller than the size limit. You can create many databases and implement some kind of sharding but you need to do this yourself. There is no support for distributed/federated views across databases. You also cannot use USE syntax. Remember all that dynamic SQL that you wrote that started with USE DatabaseName..probably not a good idea.


    2) No execution plan

    Okay so how do you check if your query will perform well?You can’t do a SET SHOWPLAN command either


    3) Costly queries will be terminated

    I might have a query at the end of the month that calculates correlations for 90000 indexes and the query runs for 5 minutes, a query like this could be terminated


    4)No CLR

    So you have that superfast split function coded in CLR,that won't fly in the cloud mister, you are out of luck. Spatial data is another thing that is not supported


    5) No analysis or reporting services

    Probably not as big of a deal but it would still mean that you now need a separate license to run SSAS or SSRS on campus


    6) No DMVs,

    If you use DMVs or catalog views to build logic to do certain things in SQL you will need to rethink that


    I didn’t hear (and forgot to ask) about scheduling jobs or if you have access to msdb/sql agent in any form. I also don’t know if you can run SSIS packages in the cloud.

     Below is a picture that shows you what is and what is not available for V1

    SQL Server Data Services Compatibility


    So what do you think? Will you move to the cloud anytime soon?

  • Who has been a great leader in your career and what made them a great leader?

    I have been tagged by Aaron Alton as part of the latest meme and the question this time around is: Who has been a great leader in your career and what made them a great leader?

    This is a tough one and this will probably sound like a cop out but it is not. I will actually pick two people one of them is real and the other ‘one’ is more than one single person
    Let’s start with my previous manager Mike. Mike is a manager who still writes code daily, he understands the life of a developer. When you tell him that something will take 2 weeks he will ask you if you are sure because it sounds like that isn’t enough time. Mike was never a person who would pressure you into committing to a tighter schedule. When you arrived at work Mike was there, when you went home Mike was still there….I really felt bad leaving half the time knowing that Mike was still in his cube and here I am leaving to go home. If you need a tool there was never a problem getting it. Mike was the person who got our first real bug tracking software (Jira by Atlassian) and he also got us Subversion servers.

    Scrum was introduced to our team by Mike. I remember a time when it was common for me to go over documentation with the rest of the team for 3 weeks straight because we had a 9 month project. After 9 months the client would of course tell you that it wasn’t really what they wanted. Scrum changed things drastically, we pushed out two projects, and each was a three or four three week sprint. The feedback was immediate and there was no time wasted implementing feature that nobody really wanted, stuff we weren’t sure about were put in the ‘nice to have’ bucket. You needed books, no problem; an email would get you what you need.

    The second ‘person’ is ‘the wisdom of the crowds’ or ‘collective intelligence’ 
    What do I mean by that? These are the people who are active in the newsgroups, forums, blogs, email lists, user groups, write books etc etc. Not one of these people made a reaaly huge impact but all of them combined did.

    One of the better ways to learn SQL is still to participate in newsgroups/forums. In the beginning you will probably ask more than you answer but after a while you will see that 50% of the questions are the same 20 questions asked over and over again with a slight variation in the question itself. You can just lurk in these newsgroups and take in all the great information that has been passed around. There are many people that taught me some really good stuff; here is just a small number: Aaron Bertrand, Adam Machanic, Denny Cherry, vongrunt, George Mastros, Jonathan Kehayias, Itzik Ben-Gan, David Portas, Ted Krueger, Emtucifor, Uri Dimant, Dan Guzman................

    Blogs have really made a difference for developers in the last 5 years. There are blogs that specialize just in one thing in terms of a product. If you want to learn stuff about checkdb and indexing, then you visit Kimberly Tripp and Paul Randal’s blog. If you are interested in admin related stuff then Denny Cherry is the man. If you want to get someone’s perspective on large data sets then Michelle Ufford is your woman. Dynamic SQL interest you and you want to learn more? Erland Sommarskog has the definitive page. Datetime data? Tibor Karaszi has all the answers. Then there are blogs where a bunch of really good bloggers all blog on the same site; sqlblog is one of these

    Authors and content creators
    All this online stuff is nice but I still prefer to hold a ‘real’ tree disappearing book in my hand. Every SQL developer should own a bunch of these books, following is a short list. Ken Henderson’s The Guru’s guide to SQL, The Inside SQL Server series by Kalen Delaney, Itzik Ben-Gan and others, Louis Davidson design book, Adam Machanic’s Expert SQL book, Joe Celko’s for smarties and puzzle books. In terms of Video/Audio content you should check out Greg Low’s SQLDownunder podcasts, the podcast on SQLServerPedia, DotNetRocks and Channel 9

    The SQL community
    All the #sql folks on twitter who make everybody suffer from ADD these days, yes Brent Ozar and Jason Massie I am talking about you.
    And I  will tag no one  :-)

  • From the I did not know this would work department

    I have used IN many times in my life, usually I would use hardcoded values. Something like these two queries

    SELECT *

    FROM SomeTable

    WHERE state in ('AZ','NY','NJ')


    SELECT *

    FROM SomeTable

    WHERE value in (1,2,3,4,5)


    Today I saw a query on a forum where a person used column names, this got me interested and I fired up this query in the adventureworks database



    WHERE 'thomas' in(Firstname,LastName,LEFT(emailaddress,7))


    That query will return all the rows where Lastname or Firstname or the first 7 characters of the emaill address have the value Thomas

    Here is what the execution plan looks like

      |--Clustered Index Scan(OBJECT:([AdventureWorks].[Person].[Contact].[PK_Contact_ContactID]),
    OR N'thomas'=[AdventureWorks].[Person].[Contact].[LastName]
    OR N'thomas'=[AdventureWorks].[Person].[Contact].[FirstName]))

    The way I usually write a query like that is like this


    WHERE Firstname ='thomas'

    OR lastname ='thomas'

    OR left(emailaddress,7) ='thomas'

    Here is the plan for that

    |--Clustered Index Scan(OBJECT:([AdventureWorks].[Person].[Contact].[PK_Contact_ContactID]),
    OR [AdventureWorks].[Person].[Contact].[LastName]=N'thomas'
    OR substring([AdventureWorks].[Person].[Contact].[EmailAddress],(1),(7))=N'thomas'))

    Or better yet like this


    WHERE Firstname ='thomas'

    OR lastname ='thomas'

    OR emailaddress LIKE 'thomas%'

     The plan for that query is below

     |--Clustered Index Scan(OBJECT:([AdventureWorks].[Person].[Contact].[PK_Contact_ContactID]),
    OR [AdventureWorks].[Person].[Contact].[LastName]=N'thomas'
    OR [AdventureWorks].[Person].[Contact].[EmailAddress] like N'thomas%'))

    So my question to you is do you ever write a query with columns inside IN?

  • A Quick Look At Compression In SQL 2008

    To use compression you have to understand what your data looks like, if your data is mostly unique then compression might not really help a lot. If your data is repeated on a page then it could be beneficial. In my own test compression would improve performance with some tables and not so much with other tables. I won't explain how compression works, this is all explained in books on line. the goal of this post is for you to have some code that you can run and then see how page and row level compression differs compared to the original data.

    Here is some code that you can run to see how compression works. the data I have used is similar to what we use at my job(but not the same, i used fake tickers so don't try to invest in those  ;-0). I have used just the 10 industry groups from the Industry Classification Benchmark, I left out supersectors,sectors and subsectors otherwise we would have a lot more rows per table. I also did not create fake ISIN Code,SEDOL and CUSIP columns since that would have bloated the code in this post. The code will generate 2780650 rows per table, this is a nice size to test with.
    Compression is really nice when you get charged $30 per GB for the SAN, so besides performance there is also a cost benefit in storage

    Here is the code
    This block of code will create the database, setup the lookup tables and create the tables for our tests


    use master


    create database CompressionTest2


    use CompressionTest2


    --our Industry Classification Benchmark table

    Create table IndustryGroup(GroupCode char(4) not null primary key,

    GroupDescription varchar(40))


    --just industry groups

    insert IndustryGroup values('0001','Oil & Gas' )

    insert IndustryGroup values('1000','Basic Materials' )

    insert IndustryGroup values('2000','Industrials')

    insert IndustryGroup values('3000','Consumer Goods')

    insert IndustryGroup values('4000','Healthcare')

    insert IndustryGroup values('5000','Consumer Services')

    insert IndustryGroup values('6000','Telecommunications')

    insert IndustryGroup values('7000','Utilities')

    insert IndustryGroup values('8000','Financials')

    insert IndustryGroup values('9000','Technology')


    --currency table

    Create table Currency (CurrencyCode char(3) not null primary key, CurrencyDescription varchar(30))


    --just handful of currencies

    insert currency values('USD','U.S. Dollar')

    insert currency values('AUD','Australian Dollar')

    insert currency values('CAD','Canadian Dollar')

    insert currency values('JPY','Japanese Yen')

    insert currency values('MXN','Mexican Peso')

    insert currency values('GBP','U.K. Sterling')

    insert currency values('EUR','European Euro')

    insert currency values('ISK','Iceland Krona')

    insert currency values('BGN','Bulgarian Lev')

    insert currency values('RON','Romanian Leu')

    insert currency values('INR','Indian Rupee')

    insert currency values('RUB','Russia Rubles')

    insert currency values('BHD','Bahrain Dinar')

    insert currency values('EGP','Egypt Pounds')

    insert currency values('JOD','Jodan Dinars')

    insert currency values('KWD','Kuwait Dinars')

    insert currency values('MAD','Morocco Dirham')

    insert currency values('OMR','Omam Rial')

    insert currency values('QAR','Qatari Rial')


    --market cap table

    create table MarketCap (MarketCap varchar(20) not null primary key,MarketCapDescription varchar(100) )


    --left out a bunch of market caps

    insert MarketCap values('ALL','broad')

    insert MarketCap values('MID','mid cap')

    insert MarketCap values('MCR','micro cap')

    insert MarketCap values('SML','small cap')

    insert MarketCap values('LRG','large cap')


    --calendar table

    create table Calendar (CalendarDate date not null primary key)


    insert Calendar

    select dateadd(d,number,'19920101') from master..spt_values

    where type = 'p'

    and datepart(dw,dateadd(d,number,'20080101')) not in (1,7)


    select dateadd(d,number,'19970810') from master..spt_values

    where type = 'p'

    and datepart(dw,dateadd(d,number,'19970810')) not in (1,7)





    --the table that we will test against

    create table IndexCloseValues (CalendarDate date not null,

    IndexSymbol varchar(30) not null,

    GroupCode char(4) not null,

    CurrencyCode char(3) not null,

    MarketCap varchar(20) not null,

    CloseValue decimal(30,10),

    TotalReturnClose decimal(30,10))


    --2780650 rows

    insert IndexCloseValues

    select CalendarDate,

    CurrencyCode + left(GroupCode,1) + MarketCap as Symbol, --fake tickers


    ROW_NUMBER() OVER( ORDER BY CalendarDate,GroupCode,CurrencyCode,MarketCap ) *0.00100,

    ROW_NUMBER() OVER( ORDER BY CalendarDate,GroupCode,CurrencyCode,MarketCap ) *0.247001

    from IndustryGroup i cross join currency c

    cross join MarketCap

    cross join calendar



    --create a copy for page compression

    select * into IndexCloseValuesPage

    from IndexCloseValues


    --create a copy for row compression

    select * into IndexCloseValuesRow

    from IndexCloseValues




    --add unique constraint


    ADD CONSTRAINT UQ_IndexCloseValues UNIQUE Clustered (CalendarDate,IndexSymbol)


    --add unique constraint


    ADD CONSTRAINT UQ_IndexCloseValuesPage UNIQUE Clustered (CalendarDate,IndexSymbol)


    --page compression

    ALTER TABLE IndexCloseValuesPage



    --add unique constraint


    ADD CONSTRAINT UQ_IndexCloseValuesRow UNIQUE Clustered (CalendarDate,IndexSymbol)


    --row compression

    ALTER TABLE IndexCloseValuesRow



    Now that everything is setup we can look how big the tables are. Run the code below (old school I know)

    dbcc showcontig('IndexCloseValues')

    dbcc showcontig('IndexCloseValuesPage')

    dbcc showcontig('IndexCloseValuesRow')

    Table: 'IndexCloseValues' (213575799); index ID: 1, database ID: 16
    TABLE level scan performed.
    - Pages Scanned................................: 23767
    - Extents Scanned..............................: 2972
    - Extent Switches..............................: 2971
    - Avg. Pages per Extent........................: 8.0
    - Scan Density [Best Count:Actual Count].......: 99.97% [2971:2972]
    - Logical Scan Fragmentation ..................: 0.01%
    - Extent Scan Fragmentation ...................: 0.17%
    - Avg. Bytes Free per Page.....................: 23.3
    - Avg. Page Density (full).....................: 99.71%

    Table: 'IndexCloseValuesPage' (245575913); index ID: 1, database ID: 16
    TABLE level scan performed.
    - Pages Scanned................................: 9307
    - Extents Scanned..............................: 1165
    - Extent Switches..............................: 1164
    - Avg. Pages per Extent........................: 8.0
    - Scan Density [Best Count:Actual Count].......: 99.91% [1164:1165]
    - Logical Scan Fragmentation ..................: 0.04%
    - Extent Scan Fragmentation ...................: 0.77%
    - Avg. Bytes Free per Page.....................: 10.7
    - Avg. Page Density (full).....................: 99.87%

    Table: 'IndexCloseValuesRow' (261575970); index ID: 1, database ID: 16
    TABLE level scan performed.
    - Pages Scanned................................: 13255
    - Extents Scanned..............................: 1657
    - Extent Switches..............................: 1656
    - Avg. Pages per Extent........................: 8.0
    - Scan Density [Best Count:Actual Count].......: 100.00% [1657:1657]
    - Logical Scan Fragmentation ..................: 0.01%
    - Extent Scan Fragmentation ...................: 0.36%
    - Avg. Bytes Free per Page.....................: 11.4
    - Avg. Page Density (full).....................: 99.86%

    As you can see compression really reduced the pages for the table, page level compression looks really good here.

    First let's look at the execution plans, running the following block of code....

    select * from IndexCloseValues

    where CalendarDate = '19920101'


    select * from IndexCloseValuesPage

    where CalendarDate = '19920101'


    select * from IndexCloseValuesRow

    where CalendarDate = '19920101'


    ...will result in this plan


    Increasing the query to return a month of data instead of a day


    select * from IndexCloseValues

    where CalendarDate between '19920101' and '19920121'


    select * from IndexCloseValuesPage

    where CalendarDate between '19920101' and '19920121'


    select * from IndexCloseValuesRow

    where CalendarDate between '19920101' and '19920121'

    will result in this plan.



    You can draw your own conclusions from those images


    Let's look at some reads, first turn statistics io on




    select * from IndexCloseValues

    where CalendarDate = '19920101'


    select * from IndexCloseValuesPage

    where CalendarDate = '19920101'


    select * from IndexCloseValuesRow

    where CalendarDate = '19920101'

    950 row(s) affected)

    Table 'IndexCloseValues'. Scan count 1, logical reads 12, physical reads 0......

    (950 row(s) affected)

    Table 'IndexCloseValuesPage'. Scan count 1, logical reads 7, physical reads 0......

    (950 row(s) affected)

    Table 'IndexCloseValuesRow'. Scan count 1, logical reads 8, physical reads 0......


    Those are some nice numbers for the reads, now we will increase the date range to one month

    select * from IndexCloseValues

    where CalendarDate between '19920101' and '19920121'


    select * from IndexCloseValuesPage

    where CalendarDate between '19920101' and '19920121'


    select * from IndexCloseValuesRow

    where CalendarDate between '19920101' and '19920121'

    (14250 row(s) affected)

    Table 'IndexCloseValues'. Scan count 1, logical reads 125, physical reads 0......

    (14250 row(s) affected)

    Table 'IndexCloseValuesPage'. Scan count 1, logical reads 52, physical reads 0......

    (14250 row(s) affected)

    Table 'IndexCloseValuesRow'. Scan count 1, logical reads 69, physical reads 0......


    When selecting more data the numbers look even better.

    Turn statistics io off again



    So as you can see compression reduces the reads by over half when using page compression.

  • SQL Teaser Count That Trancount

    Today's teaser is very simple

    First create this table

    CREATE TABLE #Tran (TranCountDefault int DEFAULT (@@TranCount),TranCountPassedIn int)


    As you can see that table has two columns, one column has a default of @@TRANCOUNT. Now run this piece of code

    DECLARE @trancount int

    SELECT @trancount = @@TranCount

    INSERT #Tran (TranCountPassedIn) VALUES (@trancount)

    We assigned @@TRANCOUNT to the @trancount variable and we passed that in, so TranCountDefault  has the default value and TranCountPassedIn has the value we passed in

    Now when you do a select from the table will the 2 columns have the same value?

    SELECT * FROM #Tran

  • SQL Server In The Cloud Vaporware Or Inevitable?

    After Paul Nielsen post SQL in the Cloud yesterday I decided to put on my Nostradamus hat and give you my take on this.

    What needs to happen before we can move to the cloud?
    There are a couple of things, here is a partial list

    Performance has to be good
    If performance is bad then nobody will move to the cloud. Since the data is external you probably want to have a secure connection and thus you will slow things down

    Security has to be implemented
    Can you give user A only read access and user B read\write access. What if you don't want certain users to see the salary column?

    Bulk uploading of data has to be implemented
    The ability to upload a file and specify the target object to load the data into has to be implemented. We get files via ftp all the time, nobody wants to parse the files locally and insert row by row.

    The ability to do merge replication with disconnected handheld devices
    This is not so much an issue anymore since almost all devise these days are online but if you have some legacy junk this would be nice.

    The ability to restore to a point in time
    You uploaded some bad or old data and hosed everything. On a SQL box you would just do a point in time restore (if you had backups that were valid of course and the correct recovery model) and you would be done.

    Alerts and Notifications
    You want to know every time some product level falls below a certain threshold, someone added or modified a table etc etc.

    Auditing, SOX, HIPAA, GAAP and all your other favorite acronyms
    Unless this stuff is implemented no public company will move to the cloud.

    Bandwidth cost has to be low
    There are tons of FoxPro, Access Applications Excel sheets and in house developed applications that connect directly to SQL Server, get a ton of data and then do something with that. When this is all in your office you don't have bandwidth cost. If your data lives in the cloud this could add up.
    I will give you another example, we all compress our content before it is pushed to the client from our websites right(rhetorical don't answer)? Guess what? Amazon doesn't do compression because they get paid for bandwidth, compressing would lower their revenue.

    You need to have the ability to profile
    There has to a way to profile and performance tune your cloud database, after all that is the most fun part of a database developer, you can’t take away instant gratification

    What will go in the cloud first?
    The first thing that will ascend into the clouds are your typical internet applications, this will also include RIA (or as I call them Chubby Clients) that are built on Adobe AIR, JavaFX and Silverlight. Ever heard of TweetDeck? It is a twitter client written in Adobe AIR, stuff like that is perfect because you don't care where the data is stored; you just want to call a RESTful API and get your data. Websites that don’t store (or don’t care) about sensitive information are going to store their stuff in the cloud, now we have real C2C (Cloud to Cloud) since websites live in the cloud to begin with.

    What will not go in the cloud any time soon?
    Any app that is directly connected to a database server and does some crazy manipulation/calculation of data, the bandwidth cost might just be too high
    Departmental (Small business edition) database servers or SQL Server Express edition, this stuff is cheap enough that I can’t imagine there being an incentive to move to the cloud. Shops like this usually have one person to manage all the hardware to begin with so they would not save anything.
    I am also having a hard time seeing those lovely Excel sheets that are connected to SSAS cubes connecting to the cloud instead.

    We cannot really predict anything unless we have some pricing. I aslo think virtualization of databases will be mainstream sooner than the cloud, one huge box 20 SQL Servers on it...only one box to manage
    If you want you can vote (you need to register), right now it is mostly no cloud 
    So what is your opinion on this, The Cloud is it vaporware or inevitable?

  • So You Like To Use ANSI SQL...

    We all know that CURRENT_TIMESTAMP is ANSI and carries the Celko stamp of approval while GETDATE() is proprietary. So now run the following line of code



    No problem right? Let's take a look at what is actually created, run the query below


     SELECT object_definition(id) FROM sysobjects WHERE name = 'df_duh'

    Interesting, the object definition for the constraint shows getdate() not current_timestamp.

    Let's verify that by using sp_help, run the query below and scroll all the way down, look at the constraint_keys column

    SP_HELP 'duh2'
    Same thing,it shows getdate()
    What happens when you script out the table?
    /****** Object:  Table [dbo].[duh2]    Script Date: 02/22/2009 20:58:41 ******/
    CREATE TABLE [dbo].[duh2](
    ) ON [PRIMARY]'
    As you can see SQL Server changes CURRENT_TIMESTAMP to GETDATE().
    Is this a bug/feature or something else? What if you use the same table on more than one RDBMS and want to quickly script the table from SQL Server so that you can create it on somewhere else?



  • Things you know now but wish you knew when you started

    I got tagged by Go Daddy girl Michelle Ufford in her post Things you know now…. I also got tagged by Denny Cherry in his post Things You Know Now...

    So here is my little post. Before I start I will say that I thought of several things that are the same like the other people that were tagged already wrote. One of them is of course participating in newsgroups/forums, another is have a blog and writing about the technologies you are using. I will not cover those two here for the reason stated above.


    If you can buy it don’t write it.

    What do I mean by that? Take for example having to write code that checks for any schema or data changes between staging and production servers. Sure you can spend a couple of days writing that and then another couple of days for testing/debugging or you can have your boss pay a half day of your salary(this depends on location)  for SQL Compare or something similar. This enables you to work on real stuff like those quarter end reports. Your work will be done faster and the tool will probably be much better and do so much more than you could ever write.


    First get to know a variety of technologies and then zoom in on one of them.

    If you are someone who know a lot of different things but you are not an expert in any of them then it is time to specialize. There was a time when I worked with SQL Server, Coldfusion, JSP, ASP, HTML, XML, JavaScript etc etc. I was decent in all of those (or so I thought) but I wasn’t an expert in any.  When companies are looking for people there is always one thing that they are really looking for with a bunch of required/nice to have things. If you know something about lots of things but not a lot about one thing then most likely you will not pass the technical interview because your knowledge is not deep enough.


    Use the internet to see if it exists before you reinvent it

    6 or 7 years ago I wrote some paging code in which you could pass in what page you want, how many rows per page etc. A year or so later I discovered that Aaron Bertrand had something very similar on his aspfaq site. If I knew this beforehand it could have saved me hours of debugging and perfecting this code


    Know your product well

    Believe it or not I wrote once a whole bunch of code to parse a proc and to get all the params and their data type and position in the proc to document this. This took me a whole day or so…..later on I realized I could have just used INFORMATION_SCHEMA.PARAMETERS and be done in two minutes. It is important to get to know the inner workings of your product, for SQL Server this means you need to study the system tables and the dynamic management views. There is so much information that is exposed through the dynamic management views that it will help you with a ton of stuff like metadata about your system, performance problems etc


    Technology is a curse and blessing at the same time

    It is a blessing because it changes always so you can work with the latest and greatest. It is a curse because you need another workday after work just to keep up with all the changes. When SQL Server 2000 came out there didn’t really change a whole lot compared to SQL Server 7. We got indexed views, user defined functions, uniqueidentifiers and a couple of other things. From SQL Server 2000 to SQL Server 2005 was a whole other story. Just to figure out SSIS took me a lot longer than the new stuff in SQL Server 2000. This is partly my fault because I was messing around with CTEs, Ranking functions, DMVs, partitioning functions and a whole lot more stuff. I did not make this mistake with 2008, once the first CTP came out I loaded it up and started tomes around with it. Unfortunately SQL Server 2008 is so massive now that you will have to specialize within that product. Are you a DBA, Hybrid DBA, Developer, BI Developer, Data Warehouse Architect, Performance Guru….you get the point

     There is one more thing....I wish I knew the lotto numbers when the jackpot hit over 100 million $$$$ :-)

    I will not tag anyone since I know you have to prepare for Valentines Day

  • SQL Server Proximity Search

    George, a good friend of mine created a blog post showing how you can do a SQL Server Zipcode Latitude/Longitude proximity distance search without using the geography data type. I searched the internet to see if anyone had something similar with the geography data type available in 2008 and could not find anything that showed some useful stuff. Yes there is a lot available with geometry and polygons but nothing I was looking for. George challenged me and I did the 2008 version.

     There is nothing really complicated. if you run this

    DECLARE @g geography;
    DECLARE @h geography;
    SET @h = geography::STGeomFromText('POINT(-77.36750 38.98390)', 4326);
    SET @g = geography::STGeomFromText('POINT(-77.36160 38.85570)', 4326);
    SELECT @g.STDistance(@h)/1609.344;
    You will see that the distance in miles between those two points is 8.8490611480890067
    In the end the code runs between 15 and 60 millisecond to get all the zipcodes within 20 miles of zipcode 10028, pretty impressive if you ask me 
    All the code including sample data for all the zip codes in the US can be found here
    The reason I did not post it here is because George created the 2000 version so it would be weird if the 2008 version was somewhere else, but no worries I will have a post here tomorrow about "what do you wish you knew when you were starting?"  Michelle Ufford (aka SQLFool) tagged me so look forward to that



  • Some Simple Code To Show The Difference Between Newid And Newsequentialid

    Some Simple Code To Show The Difference Between Newid And Newsequentialid

    In SQL Server 2000 we got the uniqueidentifier data type and the newid() function

    Lots of people thought that newid() would be very handy to create some unique values across all databases.
    Newid() is nice but it has a little side effect; it causes terrible page splits because it is a random value. SQL Server 2005 introduced newsequentialid() where each value generated by the function is always greater than the previous value.

    Let's take a look

    First we will create these two tables

    SomeDate DATETIME, batchNumber BIGINT)
    CREATE TABLE TestGuid2 (Id UNIQUEIDENTIFIER not null DEFAULT newsequentialid(),
    SomeDate DATETIME, batchNumber BIGINT)
    now run this block of code to insert 1000 rows in each table

    INSERT TestGuid1 (SomeDate,batchNumber) VALUES (GETDATE(),1)
    go 1000
    INSERT TestGuid2 (SomeDate,batchNumber) VALUES (GETDATE(),1)
    go 1000

    Create these two clustered indexes
    CREATE CLUSTERED INDEX ix_id1 ON TestGuid1(id)
    CREATE CLUSTERED INDEX ix_id2 ON TestGuid2(id)
    Run the code below
    DBCC showcontig ('TestGuid1') WITH tableresults
    DBCC showcontig ('TestGuid2')  WITH tableresults

    You will see that AvgerageFreeBytes is the same for both tables

    What about the inserts themselves? Run this code below

    SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate))
    FROM TestGuid1
    GROUP BY batchNumber
    SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate))
    FROM TestGuid2
    GROUP BY batchNumber

    The table with the Newsequentialid did the inserts about 40% faster.

    Now we will insert 1000 rows in each table again

    INSERT TestGuid1 (SomeDate,batchNumber) VALUES (GETDATE(),2)
    go 1000
    INSERT TestGuid2 (SomeDate,batchNumber) VALUES (GETDATE(),2)
    go 1000

    No we will look again what happened

    DBCC showcontig ('TestGuid1') WITH tableresults
    DBCC showcontig ('TestGuid2')  WITH tableresults

    Wow the first table uses 21 pages while the second one uses 12.
    AvgerageFreeBytes is 3524 bytes per page in the first table and only 96 bytes per page in the second table. Obviously newsequentialid is the better choice.

    Running the code below you will see that newid is still slower than newsequentialid by about 16% or so for batch 2
    SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate))
    FROM TestGuid1
    GROUP BY batchNumber
    SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate))
    FROM TestGuid2
    GROUP BY batchNumber

    Now we will insert 10000 rows and then look at freespace and duration again

    INSERT TestGuid1 (SomeDate,batchNumber) VALUES (GETDATE(),3)
    go 10000
    INSERT TestGuid2 (SomeDate,batchNumber) VALUES (GETDATE(),3)
    go 10000

    DBCC showcontig ('TestGuid1') WITH tableresults
    DBCC showcontig ('TestGuid2')  WITH tableresults

    The first table uses 117 pages while the second one uses 80.
    AvgerageFreeBytes is 2574 bytes per page in the first table and only 21 bytes per page in the second table.

    Running the code below you will see that newid is still slower than newsequentialid by about 40% or so for batch 3
    SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate))
    FROM TestGuid1
    GROUP BY batchNumber
    SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate))
    FROM TestGuid2
    GROUP BY batchNumber

    Clean up
    DROP TABLE TestGuid2,TestGuid1

    So that is all, do you use newid or newsequentialid and if you do use newid did you experience fragmentation or performance problems because of it?
  • SQL Teaser: Missing FROM

    Take a look at this code, will the select return anything or not? Look at the subquery and notice that there is no FROM there.


    USE tempdb


    CREATE TABLE TestQuery (TestID int)



    SELECT * FROM information_schema.columns

    WHERE column_name =( SELECT column_name

    WHERE column_name = 'TestID')

  • Will we need many DBAs or no DBAs at all in the future?

    This is a very exciting/confusing/glooming period for SQLServer Developers/DBAs

    On one hand there is the ‘cloud’ which will take our precious boxes away from us and we won’t be needed to manage it anymore.

    On the other hand is an avalanche of technologies bundled with every release of SQL Server.


    Back in the 2000 days all you really needed to know was








    Profiler/QA and other tools


    I can just see the skills required for a SQL Server DBA these days












    Entity Framework


    SSMS/ Profiler/QA/BI Studio and other tools



    With Kilimanjaro this list will only get longer.


    Soon your IT shop will look like a hospital.

    You need some SSIS with C#? Talk to DBA A

    You need some Powershell automation with SMO? Talk to DBA B since DBA A doesn’t know Powershell

    You need some DTS? Talk to DBA C because DBA A and DBA B started to use SQL Server with version 2005 (9) and don’t know anything about DTS


    Is there really one super DBA who knows all these technologies? I don’t think so; it is time to start specializing. Besides the stuff you absolutely need to know like T-SQL, the internals and maintenance, you will have to pick your direction.


    So I ask you the reader: what brings the future for the SQL Server DBA?

  • A year in review, The 31 best blog posts on SQLBlog for 2008

    Wow, it has been already a year since I wrote A year in review, The 21 + 1 best blog posts on SQLBlog last year

    Here we are with another fascinating collection of links, some of them are chosen for their content and some of them for the great comments.

    1. A Gift of Script for 2008: Who's Active, What Are They Doing, and Who is Blocked? New version published today here A Gift of Script for 2009: Who is Active, Redux
    2. Partitioning enhancements in SQL Server 2008
    3. Five Different Ways To Return Data From One Table Which Doesn't Exists In another Table
    4. When GETDATE() is not a constant
    5. SQL Cache Dependencies
    6. Storage Performance for SQL Server
    7. Is the 8KB page obsolete or aging?
    8. Let's deprecate UPDATE FROM!
    9. Geek City: Nonclustered Index Keys
    10. Which to use: "<>" or "!="?
    11. Execution Plan Costs
    12. The Most Important Performance Monitor Counters for SQL Server
    13. SQL Teaser: Some Things Are Not What They Seem
    14. SQL Server Query Processing Puzzle: LIKE vs ?
    15. Performance / Storage Comparisons : MONEY vs. DECIMAL
    16. SQL Server 2008 Page Compression: Using multiple processors
    17. Did You Know? I have a question for you on DBA Blunders!
    18. I have a question for you on DB development blunders!
    19. Triggers...Evil?
    20. Indexing Foreign Keys - should SQL Server do that automatically?
    21. Data modeling: art or science?
    22. Are inserts quicker to heap or clustered tables?
    23. Large Query Performance and Query Parallelism
    24. Large Query Performance from SQL Server 2000 to 2008, 32 & 64-bit
    25. Geek City: Reading the Transaction Log
    26. An open letter to all 3rd-party vendors: DO NOT USE SA ANYWHERE IN YOUR APPLICATION
    27. Defensive database programming: qualifying column names.
    28. Defensive database programming: rewriting queries with NOT IN().
    29. My stored procedure "best practices" checklist
    30. Geek City: When is FULL Recovery not Really FULL Recovery
    31. LINQ to SQL: Does it have much of a future?

    Which was your favorite post of the past year?

  • SQL Quiz: Toughest Challenges

    Chris Shaw posted a new SQL Quiz where he asks: "What are the largest challenges that you have faced in your career and how did you overcome those?"

    Denny Cherry (@mrdenny on twitter)tagged me so here is my story:

    The biggest problem I had was when a developer I worked with went on vacation, it was agreed upon by us and the client that once that developer came back from vacation we would make her changes 'live'. Needless to say a week after she left the cliend needed this code now because the tax code changed (or something like that) I did not really know what she worked on specifically, nor did I know if whatever the latest thing was on development should be pushed out to production. We had some code on the QA box but that was not the greatest and latest either. The biggest problem was that I didn't really know what procs had changed between dev/qa and production. I tried to do this manually but after 4 hours of so got frustated because I was scared I would mess up something on the production box.

    This is where third party tools come in :-) I fired up SQL Compare and SQL Data Compare (for the lookup tables mostly) saw what needed to be changed on production, scripted out everything and I was in business. Yes I dropped tables on a production box (one table really) and messed up data but that I could fix fairly easy with a point in time restore and I knew what needed to be done. The problem is when you are faced with the unknown, what you do then will decide if you sink deeper into the hole you are already in or if you come out of that hole saving the company/CEO/data.....


    So now it is my turn to tag some victims. first person I am taging is George Mastros, an amazing SQL developer and a good friend. Next up is Alexander Kuznetsov because he was giving me a hard time with the interview questions post. Another person I am tagging is Aaron Bertrand (@aaronbertrand on twitter) because I learned a lot of him from the newsgroups when I was lurking. Finally I am tagging Andy Leonard because he is on twitter (okay so I had to come up with a reason, @andyleonard)

More Posts Next page »

This Blog


Privacy Statement