|
|
|
|
-
One of the more difficult things to master for people coming to SQL seems to be the Outer Join. The common mistake people make is of course putting the condition in the WHERE clause from the outer joined table which should have been in the JOIN itself. There was a question about this on Stackoverflow and a person claimedhe did have an Left Join because...well....he had Left Join in his SQL statement...
Let's take a look at some code, create these 2 tables
CREATE TABLE Home( HomeID INT not null,
City VARCHAR(200) not null,
Taxes DECIMAL(16,2)not null)
go
INSERT Home VALUES(1,'New York City',12000.00)
INSERT Home VALUES(2,'Sacramento',11000.00)
INSERT Home VALUES(3,'Wichita',6000.00)
INSERT Home VALUES(4,'Tampa',9000.00)
go
CREATE TABLE HomePhotos(HomePhotosID INT not null,
HomeID INT not null,
HasColorPhoto BIT not null)
go
INSERT HomePhotos VALUES(1,1,1)
INSERT HomePhotos VALUES(2,1,1)
INSERT HomePhotos VALUES(3,3,0)
INSERT HomePhotos VALUES(4,3,0)
INSERT HomePhotos VALUES(5,4,1)
go
If we do a regular join
SELECT h.*,hp.HomePhotosID FROM Home h
join HomePhotos hp ON h.HomeID = hp.HomeID
Our output is this HomeID City Taxes HomePhotosID 1 New York City 12000.00 1 1 New York City 12000.00 2 3 Wichita 6000.00 3 3 Wichita 6000.00 4 4 Tampa 9000.00 5
A left join
SELECT h.*,hp.HomePhotosID FROM Home h
LEFT join HomePhotos hp ON h.HomeID = hp.HomeID
Gives us this
HomeID City Taxes HomePhotosID 1 New York City 12000.00 1 1 New York City 12000.00 2 2 Sacramento 11000.00 NULL 3 Wichita 6000.00 3 3 Wichita 6000.00 4 4 Tampa 9000.00 5
Now what will these 2 queries bring back?
SELECT h.*,hp.HomePhotosID,hp.HasColorPhoto
FROM Home h
LEFT join HomePhotos hp
ON h.HomeID = hp.HomeID
AND h.Taxes <= 11000
SELECT h.*,hp.HomePhotosID,hp.HasColorPhoto
FROM Home h
LEFT join HomePhotos hp
ON h.HomeID = hp.HomeID
AND h.Taxes = -555
Did you guess correctly?
It is helpful to know the order of logical query processing, here is what happens in general FROM (incl joins) ON OUTER WHERE GROUP BY HAVING SELECT ORDER BY TOP
|
-
Let's beat this horse to death once again shall we, Paul Nielsen wrote about SQL in the cloud already here: http://sqlblog.com/blogs/paul_nielsen/archive/2009/02/24/sql-in-the-cloud.aspx where he made the following prediction:
In five years time, hosting your own SQL data on your own servers will seem as obsolete as running your own dial-up BBS.
I am not so sure about that!
I attended the Microsoft Enterprise Developer Conference in New York City on Tuesday and Wednesday. Last year it was all about High Performance Computing, this time it was all about the cloud(I know, big surprise right?)
I am going to focus this post on SQL Server Data Services. The SSDS session was presented by David Robinson who is a senior program manager on the SQL team. As you can imagine David is the perfect person to ask all kind of questions about what you can and can't do in SSDS. If you would like to watch this session yourself then visit this link: http://entdevcon.istreamplanet.com/video.asp?v=36
Oh, and that annoying person sitting in the front row asking all those questions......yeah that's me
Here are the reasons I think some customers won’t be able to move to the cloud just yet
1) 10GB limit
This is a biggie for me since I have almost no databases that are smaller than the size limit. You can create many databases and implement some kind of sharding but you need to do this yourself. There is no support for distributed/federated views across databases. You also cannot use USE syntax. Remember all that dynamic SQL that you wrote that started with USE DatabaseName..probably not a good idea.
2) No execution plan
Okay so how do you check if your query will perform well?You can’t do a SET SHOWPLAN command either
3) Costly queries will be terminated
I might have a query at the end of the month that calculates correlations for 90000 indexes and the query runs for 5 minutes, a query like this could be terminated
4)No CLR
So you have that superfast split function coded in CLR,that won't fly in the cloud mister, you are out of luck. Spatial data is another thing that is not supported
5) No analysis or reporting services
Probably not as big of a deal but it would still mean that you now need a separate license to run SSAS or SSRS on campus
6) No DMVs,
If you use DMVs or catalog views to build logic to do certain things in SQL you will need to rethink that
I didn’t hear (and forgot to ask) about scheduling jobs or if you have access to msdb/sql agent in any form. I also don’t know if you can run SSIS packages in the cloud.
Below is a picture that shows you what is and what is not available for V1

So what do you think? Will you move to the cloud anytime soon?
|
-
I have been tagged by Aaron Alton as part of the latest meme and the question this time around is: Who has been a great leader in your career and what made them a great leader?
This is a tough one and this will probably sound like a cop out but it is not. I will actually pick two people one of them is real and the other ‘one’ is more than one single person
Let’s start with my previous manager Mike. Mike is a manager who still writes code daily, he understands the life of a developer. When you tell him that something will take 2 weeks he will ask you if you are sure because it sounds like that isn’t enough time. Mike was never a person who would pressure you into committing to a tighter schedule. When you arrived at work Mike was there, when you went home Mike was still there….I really felt bad leaving half the time knowing that Mike was still in his cube and here I am leaving to go home. If you need a tool there was never a problem getting it. Mike was the person who got our first real bug tracking software (Jira by Atlassian) and he also got us Subversion servers.
Scrum was introduced to our team by Mike. I remember a time when it was common for me to go over documentation with the rest of the team for 3 weeks straight because we had a 9 month project. After 9 months the client would of course tell you that it wasn’t really what they wanted. Scrum changed things drastically, we pushed out two projects, and each was a three or four three week sprint. The feedback was immediate and there was no time wasted implementing feature that nobody really wanted, stuff we weren’t sure about were put in the ‘nice to have’ bucket. You needed books, no problem; an email would get you what you need.
The second ‘person’ is ‘the wisdom of the crowds’ or ‘collective intelligence’
What do I mean by that? These are the people who are active in the newsgroups, forums, blogs, email lists, user groups, write books etc etc. Not one of these people made a reaaly huge impact but all of them combined did.
Newsgroups/forum
One of the better ways to learn SQL is still to participate in newsgroups/forums. In the beginning you will probably ask more than you answer but after a while you will see that 50% of the questions are the same 20 questions asked over and over again with a slight variation in the question itself. You can just lurk in these newsgroups and take in all the great information that has been passed around. There are many people that taught me some really good stuff; here is just a small number: Aaron Bertrand, Adam Machanic, Denny Cherry, vongrunt, George Mastros, Jonathan Kehayias, Itzik Ben-Gan, David Portas, Ted Krueger, Emtucifor, Uri Dimant, Dan Guzman................
Bloggers/website
Blogs have really made a difference for developers in the last 5 years. There are blogs that specialize just in one thing in terms of a product. If you want to learn stuff about checkdb and indexing, then you visit Kimberly Tripp and Paul Randal’s blog. If you are interested in admin related stuff then Denny Cherry is the man. If you want to get someone’s perspective on large data sets then Michelle Ufford is your woman. Dynamic SQL interest you and you want to learn more? Erland Sommarskog has the definitive page. Datetime data? Tibor Karaszi has all the answers. Then there are blogs where a bunch of really good bloggers all blog on the same site; sqlblog is one of these
Authors and content creators
All this online stuff is nice but I still prefer to hold a ‘real’ tree disappearing book in my hand. Every SQL developer should own a bunch of these books, following is a short list. Ken Henderson’s The Guru’s guide to SQL, The Inside SQL Server series by Kalen Delaney, Itzik Ben-Gan and others, Louis Davidson design book, Adam Machanic’s Expert SQL book, Joe Celko’s for smarties and puzzle books. In terms of Video/Audio content you should check out Greg Low’s SQLDownunder podcasts, the podcast on SQLServerPedia, DotNetRocks and Channel 9
The SQL community
All the #sql folks on twitter who make everybody suffer from ADD these days, yes Brent Ozar and Jason Massie I am talking about you.
And I will tag no one :-)
|
-
I have used IN many times in my life, usually I would use hardcoded values. Something like these two queries
SELECT *
FROM SomeTable
WHERE state in ('AZ','NY','NJ')
SELECT *
FROM SomeTable
WHERE value in (1,2,3,4,5)
Today I saw a query on a forum where a person used column names, this got me interested and I fired up this query in the adventureworks database
SELECT * FROM person.contact
WHERE 'thomas' in(Firstname,LastName,LEFT(emailaddress,7))
That query will return all the rows where Lastname or Firstname or the first 7 characters of the emaill address have the value Thomas
Here is what the execution plan looks like
|--Clustered Index Scan(OBJECT:([AdventureWorks].[Person].[Contact].[PK_Contact_ContactID]), WHERE:(N'thomas'=substring([AdventureWorks].[Person].[Contact].[EmailAddress],(1),(7)) OR N'thomas'=[AdventureWorks].[Person].[Contact].[LastName] OR N'thomas'=[AdventureWorks].[Person].[Contact].[FirstName]))
The way I usually write a query like that is like this
SELECT * FROM person.contact
WHERE Firstname ='thomas'
OR lastname ='thomas'
OR left(emailaddress,7) ='thomas'
Here is the plan for that
|--Clustered Index Scan(OBJECT:([AdventureWorks].[Person].[Contact].[PK_Contact_ContactID]), WHERE:([AdventureWorks].[Person].[Contact].[FirstName]=N'thomas' OR [AdventureWorks].[Person].[Contact].[LastName]=N'thomas' OR substring([AdventureWorks].[Person].[Contact].[EmailAddress],(1),(7))=N'thomas'))
Or better yet like this
SELECT * FROM person.contact
WHERE Firstname ='thomas'
OR lastname ='thomas'
OR emailaddress LIKE 'thomas%'
The plan for that query is below
|--Clustered Index Scan(OBJECT:([AdventureWorks].[Person].[Contact].[PK_Contact_ContactID]), WHERE:([AdventureWorks].[Person].[Contact].[FirstName]=N'thomas' OR [AdventureWorks].[Person].[Contact].[LastName]=N'thomas' OR [AdventureWorks].[Person].[Contact].[EmailAddress] like N'thomas%'))
So my question to you is do you ever write a query with columns inside IN?
|
-
To use compression you have to understand what your data looks like, if your data is mostly unique then compression might not really help a lot. If your data is repeated on a page then it could be beneficial. In my own test compression would improve performance with some tables and not so much with other tables. I won't explain how compression works, this is all explained in books on line. the goal of this post is for you to have some code that you can run and then see how page and row level compression differs compared to the original data.
Here is some code that you can run to see how compression works. the data I have used is similar to what we use at my job(but not the same, i used fake tickers so don't try to invest in those ;-0). I have used just the 10 industry groups from the Industry Classification Benchmark, I left out supersectors,sectors and subsectors otherwise we would have a lot more rows per table. I also did not create fake ISIN Code,SEDOL and CUSIP columns since that would have bloated the code in this post. The code will generate 2780650 rows per table, this is a nice size to test with. Compression is really nice when you get charged $30 per GB for the SAN, so besides performance there is also a cost benefit in storage
Here is the code This block of code will create the database, setup the lookup tables and create the tables for our tests
use master
go
create database CompressionTest2
go
use CompressionTest2
go
--our Industry Classification Benchmark table
Create table IndustryGroup(GroupCode char(4) not null primary key,
GroupDescription varchar(40))
GO
--just industry groups
insert IndustryGroup values('0001','Oil & Gas' )
insert IndustryGroup values('1000','Basic Materials' )
insert IndustryGroup values('2000','Industrials')
insert IndustryGroup values('3000','Consumer Goods')
insert IndustryGroup values('4000','Healthcare')
insert IndustryGroup values('5000','Consumer Services')
insert IndustryGroup values('6000','Telecommunications')
insert IndustryGroup values('7000','Utilities')
insert IndustryGroup values('8000','Financials')
insert IndustryGroup values('9000','Technology')
GO
--currency table
Create table Currency (CurrencyCode char(3) not null primary key, CurrencyDescription varchar(30))
GO
--just handful of currencies
insert currency values('USD','U.S. Dollar')
insert currency values('AUD','Australian Dollar')
insert currency values('CAD','Canadian Dollar')
insert currency values('JPY','Japanese Yen')
insert currency values('MXN','Mexican Peso')
insert currency values('GBP','U.K. Sterling')
insert currency values('EUR','European Euro')
insert currency values('ISK','Iceland Krona')
insert currency values('BGN','Bulgarian Lev')
insert currency values('RON','Romanian Leu')
insert currency values('INR','Indian Rupee')
insert currency values('RUB','Russia Rubles')
insert currency values('BHD','Bahrain Dinar')
insert currency values('EGP','Egypt Pounds')
insert currency values('JOD','Jodan Dinars')
insert currency values('KWD','Kuwait Dinars')
insert currency values('MAD','Morocco Dirham')
insert currency values('OMR','Omam Rial')
insert currency values('QAR','Qatari Rial')
GO
--market cap table
create table MarketCap (MarketCap varchar(20) not null primary key,MarketCapDescription varchar(100) )
GO
--left out a bunch of market caps
insert MarketCap values('ALL','broad')
insert MarketCap values('MID','mid cap')
insert MarketCap values('MCR','micro cap')
insert MarketCap values('SML','small cap')
insert MarketCap values('LRG','large cap')
--calendar table
create table Calendar (CalendarDate date not null primary key)
GO
insert Calendar
select dateadd(d,number,'19920101') from master..spt_values
where type = 'p'
and datepart(dw,dateadd(d,number,'20080101')) not in (1,7)
union
select dateadd(d,number,'19970810') from master..spt_values
where type = 'p'
and datepart(dw,dateadd(d,number,'19970810')) not in (1,7)
--the table that we will test against
create table IndexCloseValues (CalendarDate date not null,
IndexSymbol varchar(30) not null,
GroupCode char(4) not null,
CurrencyCode char(3) not null,
MarketCap varchar(20) not null,
CloseValue decimal(30,10),
TotalReturnClose decimal(30,10))
--2780650 rows
insert IndexCloseValues
select CalendarDate,
CurrencyCode + left(GroupCode,1) + MarketCap as Symbol, --fake tickers
GroupCode ,CurrencyCode,MarketCap,
ROW_NUMBER() OVER( ORDER BY CalendarDate,GroupCode,CurrencyCode,MarketCap ) *0.00100,
ROW_NUMBER() OVER( ORDER BY CalendarDate,GroupCode,CurrencyCode,MarketCap ) *0.247001
from IndustryGroup i cross join currency c
cross join MarketCap
cross join calendar
GO
--create a copy for page compression
select * into IndexCloseValuesPage
from IndexCloseValues
Go
--create a copy for row compression
select * into IndexCloseValuesRow
from IndexCloseValues
GO
--add unique constraint
ALTER TABLE IndexCloseValues WITH NOCHECK
ADD CONSTRAINT UQ_IndexCloseValues UNIQUE Clustered (CalendarDate,IndexSymbol)
GO
--add unique constraint
ALTER TABLE IndexCloseValuesPage WITH NOCHECK
ADD CONSTRAINT UQ_IndexCloseValuesPage UNIQUE Clustered (CalendarDate,IndexSymbol)
GO
--page compression
ALTER TABLE IndexCloseValuesPage
REBUILD WITH (DATA_COMPRESSION = PAGE);
GO
--add unique constraint
ALTER TABLE IndexCloseValuesRow WITH NOCHECK
ADD CONSTRAINT UQ_IndexCloseValuesRow UNIQUE Clustered (CalendarDate,IndexSymbol)
GO
--row compression
ALTER TABLE IndexCloseValuesRow
REBUILD WITH (DATA_COMPRESSION = ROW);
GO
Now that everything is setup we can look how big the tables are. Run the code below (old school I know)
dbcc showcontig('IndexCloseValues')
dbcc showcontig('IndexCloseValuesPage')
dbcc showcontig('IndexCloseValuesRow')
Table: 'IndexCloseValues' (213575799); index ID: 1, database ID: 16 TABLE level scan performed. - Pages Scanned................................: 23767 - Extents Scanned..............................: 2972 - Extent Switches..............................: 2971 - Avg. Pages per Extent........................: 8.0 - Scan Density [Best Count:Actual Count].......: 99.97% [2971:2972] - Logical Scan Fragmentation ..................: 0.01% - Extent Scan Fragmentation ...................: 0.17% - Avg. Bytes Free per Page.....................: 23.3 - Avg. Page Density (full).....................: 99.71%
Table: 'IndexCloseValuesPage' (245575913); index ID: 1, database ID: 16 TABLE level scan performed. - Pages Scanned................................: 9307 - Extents Scanned..............................: 1165 - Extent Switches..............................: 1164 - Avg. Pages per Extent........................: 8.0 - Scan Density [Best Count:Actual Count].......: 99.91% [1164:1165] - Logical Scan Fragmentation ..................: 0.04% - Extent Scan Fragmentation ...................: 0.77% - Avg. Bytes Free per Page.....................: 10.7 - Avg. Page Density (full).....................: 99.87%
Table: 'IndexCloseValuesRow' (261575970); index ID: 1, database ID: 16 TABLE level scan performed. - Pages Scanned................................: 13255 - Extents Scanned..............................: 1657 - Extent Switches..............................: 1656 - Avg. Pages per Extent........................: 8.0 - Scan Density [Best Count:Actual Count].......: 100.00% [1657:1657] - Logical Scan Fragmentation ..................: 0.01% - Extent Scan Fragmentation ...................: 0.36% - Avg. Bytes Free per Page.....................: 11.4 - Avg. Page Density (full).....................: 99.86%
As you can see compression really reduced the pages for the table, page level compression looks really good here.
First let's look at the execution plans, running the following block of code....
select * from IndexCloseValues
where CalendarDate = '19920101'
select * from IndexCloseValuesPage
where CalendarDate = '19920101'
select * from IndexCloseValuesRow
where CalendarDate = '19920101'
...will result in this plan
Increasing the query to return a month of data instead of a day
select * from IndexCloseValues
where CalendarDate between '19920101' and '19920121'
select * from IndexCloseValuesPage
where CalendarDate between '19920101' and '19920121'
select * from IndexCloseValuesRow
where CalendarDate between '19920101' and '19920121'
will result in this plan.
You can draw your own conclusions from those images
Let's look at some reads, first turn statistics io on
SET STATISTICS IO ON
go
select * from IndexCloseValues
where CalendarDate = '19920101'
select * from IndexCloseValuesPage
where CalendarDate = '19920101'
select * from IndexCloseValuesRow
where CalendarDate = '19920101'
950 row(s) affected)
Table 'IndexCloseValues'. Scan count 1, logical reads 12, physical reads 0......
(950 row(s) affected)
Table 'IndexCloseValuesPage'. Scan count 1, logical reads 7, physical reads 0......
(950 row(s) affected)
Table 'IndexCloseValuesRow'. Scan count 1, logical reads 8, physical reads 0......
Those are some nice numbers for the reads, now we will increase the date range to one month
select * from IndexCloseValues
where CalendarDate between '19920101' and '19920121'
select * from IndexCloseValuesPage
where CalendarDate between '19920101' and '19920121'
select * from IndexCloseValuesRow
where CalendarDate between '19920101' and '19920121'
(14250 row(s) affected)
Table 'IndexCloseValues'. Scan count 1, logical reads 125, physical reads 0......
(14250 row(s) affected)
Table 'IndexCloseValuesPage'. Scan count 1, logical reads 52, physical reads 0......
(14250 row(s) affected)
Table 'IndexCloseValuesRow'. Scan count 1, logical reads 69, physical reads 0......
When selecting more data the numbers look even better.
Turn statistics io off again
SET STATISTICS IO OFF
go
So as you can see compression reduces the reads by over half when using page compression.
|
-
Today's teaser is very simple First create this table CREATE TABLE #Tran (TranCountDefault int DEFAULT (@@TranCount),TranCountPassedIn int) As you can see that table has two columns, one column has a default of @@TRANCOUNT. Now run this piece of code DECLARE @trancount int SELECT @trancount = @@TranCount INSERT #Tran (TranCountPassedIn) VALUES (@trancount) We assigned @@TRANCOUNT to the @trancount variable and we passed that in, so TranCountDefault has the default value and TranCountPassedIn has the value we passed in Now when you do a select from the table will the 2 columns have the same value? SELECT * FROM #Tran
|
-
After Paul Nielsen post SQL in the Cloud yesterday I decided to put on my Nostradamus hat and give you my take on this.
What needs to happen before we can move to the cloud? There are a couple of things, here is a partial list
Performance has to be good If performance is bad then nobody will move to the cloud. Since the data is external you probably want to have a secure connection and thus you will slow things down
Security has to be implemented Can you give user A only read access and user B read\write access. What if you don't want certain users to see the salary column?
Bulk uploading of data has to be implemented The ability to upload a file and specify the target object to load the data into has to be implemented. We get files via ftp all the time, nobody wants to parse the files locally and insert row by row.
The ability to do merge replication with disconnected handheld devices This is not so much an issue anymore since almost all devise these days are online but if you have some legacy junk this would be nice.
The ability to restore to a point in time You uploaded some bad or old data and hosed everything. On a SQL box you would just do a point in time restore (if you had backups that were valid of course and the correct recovery model) and you would be done.
Alerts and Notifications You want to know every time some product level falls below a certain threshold, someone added or modified a table etc etc.
Auditing, SOX, HIPAA, GAAP and all your other favorite acronyms Unless this stuff is implemented no public company will move to the cloud.
Bandwidth cost has to be low There are tons of FoxPro, Access Applications Excel sheets and in house developed applications that connect directly to SQL Server, get a ton of data and then do something with that. When this is all in your office you don't have bandwidth cost. If your data lives in the cloud this could add up. I will give you another example, we all compress our content before it is pushed to the client from our websites right(rhetorical don't answer)? Guess what? Amazon doesn't do compression because they get paid for bandwidth, compressing would lower their revenue.
You need to have the ability to profile There has to a way to profile and performance tune your cloud database, after all that is the most fun part of a database developer, you can’t take away instant gratification
What will go in the cloud first? The first thing that will ascend into the clouds are your typical internet applications, this will also include RIA (or as I call them Chubby Clients) that are built on Adobe AIR, JavaFX and Silverlight. Ever heard of TweetDeck? It is a twitter client written in Adobe AIR, stuff like that is perfect because you don't care where the data is stored; you just want to call a RESTful API and get your data. Websites that don’t store (or don’t care) about sensitive information are going to store their stuff in the cloud, now we have real C2C (Cloud to Cloud) since websites live in the cloud to begin with.
What will not go in the cloud any time soon? Any app that is directly connected to a database server and does some crazy manipulation/calculation of data, the bandwidth cost might just be too high Departmental (Small business edition) database servers or SQL Server Express edition, this stuff is cheap enough that I can’t imagine there being an incentive to move to the cloud. Shops like this usually have one person to manage all the hardware to begin with so they would not save anything. I am also having a hard time seeing those lovely Excel sheets that are connected to SSAS cubes connecting to the cloud instead.
We cannot really predict anything unless we have some pricing. I aslo think virtualization of databases will be mainstream sooner than the cloud, one huge box 20 SQL Servers on it...only one box to manage If you want you can vote (you need to register), right now it is mostly no cloud So what is your opinion on this, The Cloud is it vaporware or inevitable?
|
-
We all know that CURRENT_TIMESTAMP is ANSI and carries the Celko stamp of approval while GETDATE() is proprietary. So now run the following line of code
CREATE TABLE duh2(id DATETIME CONSTRAINT df_duh DEFAULT CURRENT_TIMESTAMP) No problem right? Let's take a look at what is actually created, run the query below
SELECT object_definition(id) FROM sysobjects WHERE name = 'df_duh' Interesting, the object definition for the constraint shows getdate() not current_timestamp. Let's verify that by using sp_help, run the query below and scroll all the way down, look at the constraint_keys column
SP_HELP 'duh2' Same thing,it shows getdate() What happens when you script out the table?
/****** Object: Table [dbo].[duh2] Script Date: 02/22/2009 20:58:41 ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[duh2]( [id] [DATETIME] NULL CONSTRAINT [df_duh] DEFAULT (GETDATE()) ) ON [PRIMARY]' As you can see SQL Server changes CURRENT_TIMESTAMP to GETDATE(). Is this a bug/feature or something else? What if you use the same table on more than one RDBMS and want to quickly script the table from SQL Server so that you can create it on somewhere else?
|
-
I got tagged by Go Daddy girl Michelle Ufford in her post
Things you know now…. I also got tagged by Denny Cherry in his post Things You Know Now...
So here is my little post. Before I start I will say that I thought of several things that are the same like the other people that were tagged already wrote. One of them is of course participating in newsgroups/forums, another is have a blog and writing about the technologies you are using. I will not cover those two here for the reason stated above.
If you can buy it don’t write it.
What do I mean by that? Take for example having to write code that checks for any schema or data changes between staging and production servers. Sure you can spend a couple of days writing that and then another couple of days for testing/debugging or you can have your boss pay a half day of your salary(this depends on location) for SQL Compare or something similar. This enables you to work on real stuff like those quarter end reports. Your work will be done faster and the tool will probably be much better and do so much more than you could ever write.
First get to know a variety of technologies and then zoom in on one of them.
If you are someone who know a lot of different things but you are not an expert in any of them then it is time to specialize. There was a time when I worked with SQL Server, Coldfusion, JSP, ASP, HTML, XML, JavaScript etc etc. I was decent in all of those (or so I thought) but I wasn’t an expert in any. When companies are looking for people there is always one thing that they are really looking for with a bunch of required/nice to have things. If you know something about lots of things but not a lot about one thing then most likely you will not pass the technical interview because your knowledge is not deep enough.
Use the internet to see if it exists before you reinvent it
6 or 7 years ago I wrote some paging code in which you could pass in what page you want, how many rows per page etc. A year or so later I discovered that Aaron Bertrand had something very similar on his aspfaq site. If I knew this beforehand it could have saved me hours of debugging and perfecting this code
Know your product well
Believe it or not I wrote once a whole bunch of code to parse a proc and to get all the params and their data type and position in the proc to document this. This took me a whole day or so…..later on I realized I could have just used INFORMATION_SCHEMA.PARAMETERS and be done in two minutes. It is important to get to know the inner workings of your product, for SQL Server this means you need to study the system tables and the dynamic management views. There is so much information that is exposed through the dynamic management views that it will help you with a ton of stuff like metadata about your system, performance problems etc
Technology is a curse and blessing at the same time
It is a blessing because it changes always so you can work with the latest and greatest. It is a curse because you need another workday after work just to keep up with all the changes. When SQL Server 2000 came out there didn’t really change a whole lot compared to SQL Server 7. We got indexed views, user defined functions, uniqueidentifiers and a couple of other things. From SQL Server 2000 to SQL Server 2005 was a whole other story. Just to figure out SSIS took me a lot longer than the new stuff in SQL Server 2000. This is partly my fault because I was messing around with CTEs, Ranking functions, DMVs, partitioning functions and a whole lot more stuff. I did not make this mistake with 2008, once the first CTP came out I loaded it up and started tomes around with it. Unfortunately SQL Server 2008 is so massive now that you will have to specialize within that product. Are you a DBA, Hybrid DBA, Developer, BI Developer, Data Warehouse Architect, Performance Guru….you get the point
There is one more thing....I wish I knew the lotto numbers when the jackpot hit over 100 million $$$$ :-)
I will not tag anyone since I know you have to prepare for Valentines Day
|
-
George, a good friend of mine created a blog post showing how you can do a SQL Server Zipcode Latitude/Longitude proximity distance search without using the geography data type. I searched the internet to see if anyone had something similar with the geography data type available in 2008 and could not find anything that showed some useful stuff. Yes there is a lot available with geometry and polygons but nothing I was looking for. George challenged me and I did the 2008 version. There is nothing really complicated. if you run this DECLARE @g geography; DECLARE @h geography; SET @h = geography::STGeomFromText('POINT(-77.36750 38.98390)', 4326); SET @g = geography::STGeomFromText('POINT(-77.36160 38.85570)', 4326); SELECT @g.STDistance(@h)/1609.344; You will see that the distance in miles between those two points is 8.8490611480890067 In the end the code runs between 15 and 60 millisecond to get all the zipcodes within 20 miles of zipcode 10028, pretty impressive if you ask me
All the code including sample data for all the zip codes in the US can be found here The reason I did not post it here is because George created the 2000 version so it would be weird if the 2008 version was somewhere else, but no worries I will have a post here tomorrow about "what do you wish you knew when you were starting?" Michelle Ufford (aka SQLFool) tagged me so look forward to that
|
-
Some Simple Code To Show The Difference Between Newid And Newsequentialid
In SQL Server 2000 we got the uniqueidentifier data type and the newid() function
Lots of people thought that newid() would be very handy to create some unique values across all databases. Newid() is nice but it has a little side effect; it causes terrible page splits because it is a random value. SQL Server 2005 introduced newsequentialid() where each value generated by the function is always greater than the previous value.
Let's take a look
First we will create these two tables CREATE TABLE TestGuid1 (Id UNIQUEIDENTIFIER not null DEFAULT newid(), SomeDate DATETIME, batchNumber BIGINT) CREATE TABLE TestGuid2 (Id UNIQUEIDENTIFIER not null DEFAULT newsequentialid(), SomeDate DATETIME, batchNumber BIGINT) now run this block of code to insert 1000 rows in each table SET NOCOUNT ON INSERT TestGuid1 (SomeDate,batchNumber) VALUES (GETDATE(),1) go 1000 SET NOCOUNT ON INSERT TestGuid2 (SomeDate,batchNumber) VALUES (GETDATE(),1) go 1000 Create these two clustered indexes CREATE CLUSTERED INDEX ix_id1 ON TestGuid1(id) CREATE CLUSTERED INDEX ix_id2 ON TestGuid2(id) Run the code below DBCC showcontig ('TestGuid1') WITH tableresults DBCC showcontig ('TestGuid2') WITH tableresults You will see that AvgerageFreeBytes is the same for both tables What about the inserts themselves? Run this code below SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate)) FROM TestGuid1 GROUP BY batchNumber SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate)) FROM TestGuid2 GROUP BY batchNumber The table with the Newsequentialid did the inserts about 40% faster. Now we will insert 1000 rows in each table again SET NOCOUNT ON INSERT TestGuid1 (SomeDate,batchNumber) VALUES (GETDATE(),2) go 1000 SET NOCOUNT ON INSERT TestGuid2 (SomeDate,batchNumber) VALUES (GETDATE(),2) go 1000 No we will look again what happened DBCC showcontig ('TestGuid1') WITH tableresults DBCC showcontig ('TestGuid2') WITH tableresults Wow the first table uses 21 pages while the second one uses 12. AvgerageFreeBytes is 3524 bytes per page in the first table and only 96 bytes per page in the second table. Obviously newsequentialid is the better choice. Running the code below you will see that newid is still slower than newsequentialid by about 16% or so for batch 2 SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate)) FROM TestGuid1 GROUP BY batchNumber SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate)) FROM TestGuid2 GROUP BY batchNumber Now we will insert 10000 rows and then look at freespace and duration again SET NOCOUNT ON INSERT TestGuid1 (SomeDate,batchNumber) VALUES (GETDATE(),3) go 10000 SET NOCOUNT ON INSERT TestGuid2 (SomeDate,batchNumber) VALUES (GETDATE(),3) go 10000 DBCC showcontig ('TestGuid1') WITH tableresults DBCC showcontig ('TestGuid2') WITH tableresults The first table uses 117 pages while the second one uses 80. AvgerageFreeBytes is 2574 bytes per page in the first table and only 21 bytes per page in the second table. Running the code below you will see that newid is still slower than newsequentialid by about 40% or so for batch 3 SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate)) FROM TestGuid1 GROUP BY batchNumber SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate)) FROM TestGuid2 GROUP BY batchNumber Clean up DROP TABLE TestGuid2,TestGuid1 So that is all, do you use newid or newsequentialid and if you do use newid did you experience fragmentation or performance problems because of it?
|
-
Take a look at this code, will the select return anything or not? Look at the subquery and notice that there is no FROM there.
USE tempdb
GO
CREATE TABLE TestQuery (TestID int)
GO
SELECT * FROM information_schema.columns
WHERE column_name =( SELECT column_name
WHERE column_name = 'TestID')
|
-
This is a very exciting/confusing/glooming period for SQLServer Developers/DBAs On one hand there is the ‘cloud’ which will take our precious boxes away from us and we won’t be needed to manage it anymore. On the other hand is an avalanche of technologies bundled with every release of SQL Server. Back in the 2000 days all you really needed to know was VB/VBA/VBS DTS T-SQL MDX SSAS DOS WSH Profiler/QA and other tools I can just see the skills required for a SQL Server DBA these days VB/VBA/VBS C# SSIS DTS Powershell T-SQL MDX SSAS SSRS SSNS LINQ Entity Framework SSDS SSMS/ Profiler/QA/BI Studio and other tools With Kilimanjaro this list will only get longer. Soon your IT shop will look like a hospital. You need some SSIS with C#? Talk to DBA A You need some Powershell automation with SMO? Talk to DBA B since DBA A doesn’t know Powershell You need some DTS? Talk to DBA C because DBA A and DBA B started to use SQL Server with version 2005 (9) and don’t know anything about DTS Is there really one super DBA who knows all these technologies? I don’t think so; it is time to start specializing. Besides the stuff you absolutely need to know like T-SQL, the internals and maintenance, you will have to pick your direction. So I ask you the reader: what brings the future for the SQL Server DBA?
|
-
-
Chris Shaw posted a new SQL Quiz where he asks: "What are the largest challenges that you have faced in your career and how did you overcome those?"
Denny Cherry (@mrdenny on twitter)tagged me so here is my story:
The biggest problem I had was when a developer I worked with went on vacation, it was agreed upon by us and the client that once that developer came back from vacation we would make her changes 'live'. Needless to say a week after she left the cliend needed this code now because the tax code changed (or something like that) I did not really know what she worked on specifically, nor did I know if whatever the latest thing was on development should be pushed out to production. We had some code on the QA box but that was not the greatest and latest either. The biggest problem was that I didn't really know what procs had changed between dev/qa and production. I tried to do this manually but after 4 hours of so got frustated because I was scared I would mess up something on the production box.
This is where third party tools come in :-) I fired up SQL Compare and SQL Data Compare (for the lookup tables mostly) saw what needed to be changed on production, scripted out everything and I was in business. Yes I dropped tables on a production box (one table really) and messed up data but that I could fix fairly easy with a point in time restore and I knew what needed to be done. The problem is when you are faced with the unknown, what you do then will decide if you sink deeper into the hole you are already in or if you come out of that hole saving the company/CEO/data.....
So now it is my turn to tag some victims. first person I am taging is George Mastros, an amazing SQL developer and a good friend. Next up is Alexander Kuznetsov because he was giving me a hard time with the interview questions post. Another person I am tagging is Aaron Bertrand (@aaronbertrand on twitter) because I learned a lot of him from the newsgroups when I was lurking. Finally I am tagging Andy Leonard because he is on twitter (okay so I had to come up with a reason, @andyleonard)
|
|
|
|
|
|