I asked for some names of people who you would like to see interviewed here at Sqlblog and Erland Sommarskog's name popped up a couple of times. I contacted Erland and he was kind enough to take time out from his busy schedule to answer these questions. So, here are the questions.
What are the most important things a person can do to master SQL Server?
That is a very difficult question, because SQL Server is such a vast product.
What do you aim at? Being an infrastructure DBA that builds clusters and sets up replication, but don't know much about the business? Or do you want to do database design and work closely to the business analysts? Do you want to be a performance specialist? Or be a good SQL programmer? What sort of applications do you want to work with? OLTP? OLAP? Maybe you want to be in the ETL trade and work a lot with Integration Services? Or do you want to be a Reporting Services developer?
It goes without saying that depending on what you aim at, the exact answer will be different. If you want to be an infrastructure DBA, you need to get a very good understanding of RAIDs, SANs and all that, but for other roles this is of less interest.
Nevertheless, there are still some commons: whatever you aim at, you need experience. You can read books, you can attend classes and they can help you enter the next level. But it is through hands-on experience you learn. And not at least you learn from your own mistakes.
In terms of reading, there is one source that I like to highlight as better than many other: follow a public forum in the area you are interested in. You will see questions, and hopefully good answers to the questions. And sometimes you may be able to contribute with an answer yourself. What is great with the newsgroups is that questions reappears. As they say: repetition is the mother of all learning. If you read a book or a blogpost, your memory fades after a while. But if you see the same answer reoccurring, one day not only will you be able to answer the question yourself, but also use it for your own work.
What are the most important things a person can do to master Transact-SQL?
Obviously, you need learn think in sets, and resist the temptations to solve problems in loops. This can indeed be a challenge. I have recent years rewritten procedures that originally were designed for one at a time, and in some cases it was quite an undertaking. The reward at the end of the rainbow was vastly increased performance for big volumes.
You also need to get a very good understanding of indexing, and when an index is useful or not. This is because this has such a big impact on performance.
But more exactly how do you learn this well? Let me return to what I said above: Experience. When you have query that seems to work, don't stop there. What happens if you change that part of the query? What happens if you add that row that holds some funky data? How does the query plan look like? Does it use an index? If it doesn't, why not?
When it comes to understanding indexing and query plans, I have a tip: imagine that you had the data on a bunch of record cards, and you needed to traverse it yourself to find the desired data. How would you do this effectively?
What SQL Server books are on your bookshelf?
My time to read books is fairly limited. Mainly I read books when I'm travelling, and when I go basking lakeside in summer. So I don't go buying books by the dozen. And least of all SQL Books, I do like reading about other topics too.
But I have the Inside SQL Server 2005 series, which I definitely recommend for anyone who has some experience of SQL Server and want to raise to a higher level. Itzik's two books learn you some very valuable query techniques, and Kalen's two books give you a deeper look into the internals and query-tuning techniques. (Actually, I'm only on chapter 3 of the last book, but what I've seen this far looks promising.)
I have a few more titles; of which several hitherto are unread I will have to admit. The SQL Server-related book I return to the most often is the OLE DB Reference Manual, would you believe it.
Erland, the first time I came across your name was in the Microsoft public SQL Server programming group. Can you explain to us why you participate in newsgroups and forums?
I started to participate on Usenet groups when I got access to Usenet with my first job in the mid-eighties. In the beginning it was mainly for leisure, including the technical groups I followed. By time I became very active, and a search on Google news will reveal posts in newsgroups about music, linguistics, politics and Usenet itself.
All the time, I subscribed to technical newsgroups that related to what I was working with. So I subscribed to comp.databases.sybase when I worked with Sybase, and when I moved on to MS SQL Server I went over to comp.databases.ms-sqlserver. The idea was simple: ask a question when I needed help, and if there was a question to which I knew the answer, I answered it. And in between that I learnt things from other people's posts and answers.
At some point I found that I answered far more questions than I asked, and when I asked a question, I did not always get an answer any more. I also realized that the traffic was a bit low, and one I summer I started to visit microsoft.public.sqlserver.programming to find where the real action was going on.
These days I only follow SQL Server newsgroups with one exception, rec.games.trivia. And this is definitely the best era in my Usenet career. I help people, and the tone in newsgroups I follow is generally polite. Every once in a while I can feel that I answer the same questions, but it's still rewarding to know that I have been able to help someone. Not only with his urgent problem, but, I hope, he or she has also learnt something for the future. And every once there is a challenging problem from which I can learn myself.
Why do you have a site, why did you not write a book instead?
There are a couple of reasons. One is that I originally wrote these articles so that I could refer to them in my posts, rather than having to type the same thing over and over again. I could never dream of answering people by suggesting them to pay money to get the answer.
Another is that once a book is printed, it's printed. If there is an error, an omission, or I simply make new revelations on the topic, I cannot magically update all copies out there. That is easier to do on a web site.
Finally, there is a general feeling that a web site is more low key. Since no one pays for anything, I can produce articles in the pace I like, and I can permit myself to keep down the level of ambition a bit.
Which article was the hardest to write and can you explain why?
I think the article I have spent the most time on is the one on Arrays and Lists. With the performance appendix, this is by far the longest article, and one problem was that people kept suggesting me new methods all the time. Of which some were very good. (These days, I review all new ideas fairly critically, and it has to be really innovative to make the article. It's already long enough.) Add to this it takes some time to run all those performance tests and evaluate them.
But I think the most difficult to write was the one on error handling. It's definitely the most difficult to revise for SQL 2005, which is testified by the fact that I haven't started yet. One reason that error handling in SQL Server is so difficult to write about is that it is so inconsistent. TRY-CATCH has made this easier, but to understand the fine details, you still need to learn about statement-aborting errors, batch-aborting errors and all that jazz.
What is the most popular article on your site?
That seems to be "The Curse and Blessings of Dynamic SQL".
What new technologies in SQL Server 2005 do you think are the most beneficial for performance?
The row_number() function! It may seem like a simple thing, but there are so many problems that can be solved more efficiently with row_number(). Now, if Microsoft would only implement the missing parts of the OVER() clause!
Statement recompile is also very important. In SQL 2000, I often faced problems with long stored procedures that were recompiled several times during a single execution. That damage is recuded considerably with statement recompile.
Snapshot isolation is also a valuable addition, although it's not always the right thing. I recently learnt that as long as snapshot isolation (including read committed snapshot) is enabled, SQL Server adds 14 bytes of overhead to each row. But as snapshot isolation can reduce blocking considerably, it can still be worthwhile for some applications.
I guess it's inevitable to mention plan guides. This is a feature that I have stayed away from myself, as I feel that it is a truly advanced feature. But as a last resort it is very valuable that it's available.
What new technologies in SQL Server 2008 do you think are the most beneficial for performance?
The MERGE statement makes it possible to replace combinations of INSERT WHERE NOT EXISTS and UPDATE with a single statement. A very valuable addition.
There are various new features to make tables smaller: table/index compression, sparse columns and filtered indexes. They can improve your performance if you use them correctly.
This is not really my area, but I believe that the fact that full-text is now a first-class SQL Server citizen, will be very beneficial for full-text applications.
I like to add for both these questions I have made my choices for things that are in my realm. Improved star-schema join is probably a great thing, but as I am not into data warehousing, this is not a feature that I can evaluate.
Name three things that are new in SQL Server 2005 that you find are the most valuable for developers?
Two features stand out ahead of everything else: TRY-CATCH and the row_number() function. Error handling that was very complicated and tiresome in SQL 2000 is now significantly easier. And there are so many problems that are a lot easier to solve with help of row_number().
Beside these two, there were many valuable additions to SQL 2005, and picking just one ahead of the other is difficult. But, OK, today my vote goes to the XML enhancements. (FOR XML, the xml data type, XQuery.)
Name three things that are new in SQL Server 2008 that you find are the most valuable developers?
The new date/time data types. A date-only data type in SQL Server is long overdue. And I mean long as in loooooooooooooooooooooong.
The MERGE statement, that I've already mentioned. Beside the replacement for INSERT+UPDATE+sometimes DELETE, some of my MVP colleagues have found that MERGE has advantages over just a single INSERT or UPDATE statement.
Table-valued parameters. This new feature will make my article on arrays on lists and in SQL Server far less interesting. It's too bad that TVPs are read-only, but for passing structured data from a client to SQL Server it is a tremendous addition. Just pass a dataset, and that's all.
What would you like to see in the next version of SQL Server?
As you may guess, my wishlist to Microsoft exceeds what most kids sends to Santa Claus. But let me select a couple of important ones that all have a common theme: they all enhance the programmability of SQL Server in one way or another.
To start with, I want Microsoft to implement the OVER clause in full. There are a class of problems like running sums, sliding aggregates etc that are very difficult to implement with set-based logic and good performance today. If you are not acquainted with these functions, I recommend that you read Itzik Ben-Gan's and Sujata Metha's paper
You also find links there to the Connect requests for these features in this document. Go there and vote!
One of the biggest misfeatures Microsoft added in SQL 7 was deferred name resolution, and we are still paying the price. I have a Connect request about SET STRICT_CHECKS ON, that covers the problem with deferred name resolution, and a lot of things that MS could inform the programmer when he creates the procedure, rather than bombing in production: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=260762
I plan a longer development on that Connect item, that will appear as http://www.sommarskog.se/strict_checks.html in the not too distant future.
SQL 2008 added table-valued parameters, but it was a big disappointment that they are input-only. The situation where I would have most use for table-valued parameters is when passing data between stored procedures -both in and out.
A long-standing wish that I have is to be able to access the call-stack one way or another. This can help to make logging that I do for debug purposes more accurate. It can also help you to things like "if this trigger is called from this procedure, skip this action".
Can you list any third party tools that you find useful to have as a SQL Server developer/admin?
The most important third-party tool to me is AbaPerls, but then I wrote it myself. :-) We use it to build our databases, load our stored procedures and build our update scripts. Unfortunately, I am currently not able to put it in the public domain.
If you work with a data model of any size, using a good data-modeling tool is a great asset to keep the model documented. I have mainly used PowerDesigner from Sybase, but I guess ERwin and Embrocadero are good too. Unfortunately, these tools tend to be pricy.
Apart from that... I know there are a lot of interesting third-party tools out there, I some I would really like to try out in practice. But unfortunately time has never permitted me.
What is the biggest Database in GB/PB that you have ever worked with, how many rows was your biggest table ever?
Nothing impressive. The biggest customer database for the system I work with is just over 500 GB. The biggest table in that database has 62 million rows.
Which feature of SQL Server(in any version) do you like most?
SHUTDOWN WITH NOWAIT!
Seriously, if I am to mention a feature that I have not mentioned yet, I pick derived tables.
Why don't you have a blog?
CREATE TABLE Week (Hour int NOT NULL CHECK (Hour <= 168))
Of course, it's all a matter of priorities. But I don't follow any blogs myself, and call me an old fart, but I have not really understood all this blogging hype. In the end, it seems a bit unstructured to me. I have seen blog authors ask about things they had posted themselves in their blog some months back...
So I prefer to answer questions on the newsgroups, and when time permits compose longer articles for my web site.
I understand that you speak a lot of languages, what is your fascination with languages?
I live in a small country, so I know that my own language is not enough.
What particularly interests me with languages is how they relate to each other, how the differ from each other in a systematic way. I learnt French in school, and then I learnt Italian on my own with help of my knowledge in French. I spent a summer in Italy when I was a student, and studied transformation rules, so that if I saw a word in Italian I could see how the word would be in French, and thereby get the meaning of the word.
And I would not really say that I speak a lot of languages. Beside English and Swedish, I can speak German, French, Italian and to some extent Spanish. There are a few more languages that I can read with some difficulty. But they are all Germanic or Romance languages. From other language families... I've studied some Polish and Russian but far from enough to claim that I speak them. Still, it is fascinating how even wee bits of knowledge can be useful. I was travelling in Bulgaria - a very nice country - and that small knowledge of Slavic was really useful for me.
How many variants of you name have been offered? James Luetkehoelter has a long list of mispronunciations. I bet that you have some beauties too - especially from the states.
I haven't kept a count, but I don't think it's that many. "Sommerskog" is probably the most common. Strangely, I seem to see this as often from fellow Swedes than from abroad. Since "Sommar-" simply is Swedish for "summer", one would think that they can't fail.
What is a little more puzzling to me is that so many English-speaking people get my first name flat wrong - they call me Eric!