- Owner/Principal with LobsterPot Solutions (a MS Gold Partner consulting firm), Microsoft Certified Master, Microsoft MVP (SQL Server), APS/PDW trainer and leader of the SQL User Group in Adelaide, Australia. Rob is a former director of PASS, and runs training courses around the world in SQL Server and BI topics.
It’s very easy to get in the habit of imagining the way that a query should work based on the Logical Order of query processing – the idea that the FROM clause gets evaluated first, followed by the WHERE clause, GROUP BY, and so on – finally ending with whatever is in the SELECT clause. We even get in the habit of creating indexes that focus on the WHERE clause, and this is mostly right.
But it’s only mostly right, and it will often depend on statistics.
There are other situations where statistics have to play a major part in choosing the right plan, of course. In fact, almost every query you ever run will use statistics to work out the best plan. What I’m going to show you in this post is an example of how the statistics end up being incredibly vital in choosing the right plan. It also helps demonstrate an important feature of the way that Scans work, and how to read execution plans.
I’m going to use AdventureWorks2012 for this example. I’m going to ask for the cheapest product according to the first letter of the product name. This kind of query:
WHERE Name LIKE 'H%';
Don’t run it yet. I want to ask you how you’d solve it on paper.
Would you prefer I give you a list of the products sorted by name, or would you prefer I give you that list sorted by price?
If you want the ‘sorted by name’ option, then you’ll have to look through all the products that start with H, and work out which is the cheapest (notice that my predicate is not an equality predicate – if I knew what the name had to be exactly, then I could have an index which ordered by name and then price, and very quickly find the cheapest with that name). This approach could be good if you don’t have many products starting with that particular letter. But if you have lots, then finding them all and then looking for the cheapest of them could feel like too much work. Funnily enough, this is the way that most people would imagine this query being run – applying the WHERE clause first, and applying the aggregate function after that.
On the other hand, if you have lots of products with that particular letter, you might be better off with your list sorted by price, looking through for the first product that starts with the right letter.
Let me explain this algorithm a little more.
If you’re at a restaurant and are strapped for cash, you might want to see what the cheapest thing is. You’d pick the “sorted by price” menu, and go to the first item. But then if you saw it had peanut in, and you have an allergy, then you’d skip it and go to the next one. You wouldn’t expect to have to look far to find one that doesn’t have peanut, and because you’ve got the “sorted by price” menu, you have the cheapest one that satisfies your condition after looking through just a few records.
It’s clearly not the same algorithm as finding all the things that satisfy the condition first, but it’s just as valid. If you’re only going to have to look through a handful of products before you find one that starts with the right letter, then great! But what if there are none? You’d end up having to look through the whole list before you realised.
The Query Optimizer faces the same dilemma, but luckily it might have statistics, so it should be able to know which will suit better.
Let’s create the two indexes – one sorted by Name, one sorted by Price. Both will include the other column, so that the query will only need one of them.
CREATE INDEX ixNamePrice ON Production.Product (Name) INCLUDE (ListPrice);
CREATE INDEX ixPriceName ON Production.Product (ListPrice) INCLUDE (Name);
Now let’s consider two queries. Both queries give the same result – $0.00. But that’s not important, I’m only interested in how they run.
WHERE Name LIKE 'I%';
WHERE Name LIKE 'H%';
The two queries are almost identical, but they run quite differently.
Ok, they’re fairly similar – they both use a Stream Aggregate operator, for example. And they have similar cost. But significantly, one is performing a Seek, while the other is doing a Scan. Different indexes, but nevertheless a Scan and a Seek.
People will tell you that Scans are bad and Seeks are good, but it’s not necessarily the case. Here, we see that the Scan plan is no more expensive than the Seek plan – it’s just different. We should consider why.
Those two indexes are the two different stories that I described earlier. There are very few products that start with the letter ‘I’, and quite a number than start with ‘H’, and so the Query Optimizer has chosen differently.
There are exactly 10 products that start with I. From a total of 504. That’s less than 2% of the products.
There are 91 products that start with H. That’s 18%. You might not have expected it to be that high, but that’s okay – if SQL has been maintaining statistics for you on this, it hopefully won’t be as surprised as you.
18% – nearly 1 in 5. So by the time you’ve looked at, oh, a dozen records, you will have almost certainly found one that starts with an H. (Actually, the chance of NOT finding one in the first 12 would be power(.82, 12), which is 0.09. That’s just 9%.) If I do a bit of digging into the internals, I can discover that the pages in my index typically have over a hundred records on them each. The chance of not finding a product that starts with an H on that first page – you’d need lottery-scale luck (1 in 444 million).
On the other hand, the cost of finding the cheapest value from 91 records is a lot more expensive than finding the cheapest from just 10. And getting all 10 records should be a small number of reads too.
But a Scan! Really? It has to look through the whole table, right?
No. That’s not how it works.
You see, execution plans go from left to right. If you start reading these plans from the right, you’ll start thinking that the whole index has been scanned, when it’s simply not the case. That Top operator asks for a single row from the index, and that’s all it provides. Once that row has been found, the Scan operation stops.
For this information, I don’t even need to pull up the Properties window for the Scan (but I would recommend you get in the habit of doing that). No – this is all available in the Tool Tip. Look at the number of “Actual number of rows” – it’s just one.
A predicate is applied – it looks through the index for rows that start with H – but it’s doing this in Order (see Ordered = True), and it’s stopping after the first row is found. Remember I mentioned that there are actually 91 rows that satisfy the predicate? The Scan doesn’t care – it only needs one and it stops right then.
You might figure this is because we are using MIN. What if we needed the MAX though? Well, that’s just the same, except that the Direction of the Scan is BACKWARD (you’ll need F4 for that one).
MIN goes forward, because it’s most interested in the ‘smallest’ ones, MAX will go backward because it wants the ‘largest’. (And as you’d probably expect, if you’d created your index to be descending, then it would be reversed.)
But again – being able to tell which is the better algorithm depends entirely on your statistics being known.
I see so many systems have bad statistics for one reason or another, and typically because the data most frequently queried is the newest data, and that makes up such a small percentage of the table. The statistics will think that there is almost no data for ‘today’, as they probably haven’t been updated since at least some number of hours ago.
When you look at how a query is running, always have a think about you’d solve it on paper, and remember that you might actually have a better (or worse) picture of the statistics than what the Query Optimizer has.
And remember that a Scan is not necessarily bad. I might do another post on that soon as well.
The last PASS board meeting for the year has happened, and the portfolio handovers are well under way.
Sadly, having new board members elected means having existing board members step down, and this last board meeting was the last one for both Allen Kinsel (@sqlinsaneo) and Kendal van Dyke (@sqldba). In 2012, these guys had the portfolios of local chapters and SQL Saturdays, respectively.
Newly elected board member Wendy Pastrick (@wendy_dance) is taking over from Allen on local chapters, while I’m taking over SQL Saturdays from Kendal. In 2012, my portfolio was 24 Hours of PASS, which is being rolled into the Virtual Chapters portfolio, headed still by Denise McInerney (@denisemc06).
I have to admit that I’m really excited that the 24HOP portfolio is being merged with Virtual Chapters, as the two are so linked. I had been on the 24HOP committee before I joined the PASS board, and had recommended that the two portfolios be merged around the time I was elected to the board. During my term I even recruited Virtual Chapter leaders to be on the committee for 24HOP, as I believe their experience in the online experience makes them best suited to influence PASS’ premier online event – the semi-annual 24HOP.
2012 was a good year for 24HOP, although it was the riskiest for some time as well.
Two of the more obvious changes that we made were to look at a new platform, and to return to the 24-hours straight format (rather than two 12-hour blocks). This more continuous format meant that numbers dropped (the largest audience is in the US, so any sessions that are overnight for the US are obviously going to have smaller attendance). However, this format meant we reached over 100 different countries, which I think was really significant. Comparing the first 2012 event with the first 2011 event (which used the 2x12 format), we jumped from reaching 54 countries in 2011 to 104 in 2012.
While I was still on the committee, we had discussed the need for a new platform, as the LiveMeeting platform wasn’t coping well with the numbers we were seeing. A number of options had been considered, some too expensive, some not capable of scaling sufficiently, and a decision had been made to use a platform called IBTalk. It was obviously more expensive than LiveMeeting (which had been available for free), but looked like it was going to scale much more nicely. We used it for both 2012 events and it will also be used for the next event (on Jan 30). The decision to use IBTalk was very risky, but as an experiment it seemed to work okay. There were both good and bad elements of the platform, which I’m not going to go into in a forum like this, although the second event that we used IBTalk for ended up being much smoother than the first, and I anticipate that the Jan30 event will be event smoother still.
I felt like the first event of 2012 was dominated by the new platform. It was held two weeks after the SQL Server 2012 launch, which had also been a large virtual event using a new platform. I guess experimenting with new platforms was a large topic of discussion that month. One thing that didn’t really work for us was the closed captioning. It turns out that when you have someone providing closed captioning live, any typos that come through, or anything that is misheard by the person providing the service, etc… well, it doesn’t always work for being able to feed a translation service. We tried, and it was good to try – but it didn’t work so well. Despite all that, PASS members can view the session recordings at http://www.sqlpass.org/24hours/spring2012/Home.aspx
The main 24HOP event in the second half of the year was the annual Summit Preview event. We didn’t try to pursue the closed captioning again, but we did continue with IBTalk. Going back to LiveMeeting was never going to be an option for us, and we wanted to take a second look at the platform, in light of the various things we’d learned from the experience in Q1. It was a better experience from a number of perspectives, and we certainly got to test the scalability.
Over the course of the day, we had good numbers – only a handful shy of 10,000 attendees across the course of the day (okay, a handful if you count fingers, toes, and were inbred – we had 9979). The lowest attendances were around the 100 mark, but the largest reached 1421 attendees. The highest from any previous events was around the 800 mark, so this was a significant improvement – and the platform handled it just fine. If we’d had that many people trying to access the LiveMeeting platform it simply wouldn’t’ve coped, and the IBTalk platform deserves praise for that.
The platform decision isn’t over yet. A new search has taken place in light of what we’ve learned in the past year, and including a lot of what people have expressed to us on platforms such as Twitter. There are platforms that are way out of our price range (it can be very expensive to present 10,000 man-hours of content using some platforms), and there are ones that won’t cope with some of the things we would like to do. With some of the Virtual Chapters growing fast, a new platform needs to be able to cope with them too, with a wide variety of attendances needing to be handled. I wish Denise all the best for that, and have been able to happily assure her that the PASS HQ team that does most of the work behind the scenes for 24HOP (particularly Jane and Vicki) is excellent and will make her look brilliant this year.
Another change in 2012 was the sponsorship story. For a long time, Dell had been a major sponsor of 24HOP, and I want to thank them for that. However, 24HOP wasn’t a priority for them in 2012, and new sponsors needed to be found. The first event saw sponsorship come in from Microsoft, SQL Sentry and Idera, with Idera being replaced by RSSBus for the second event. But what really excited me was to see a second tier of sponsors join the fray, with Melissa Data and Confio joining Idera as ‘Alliance Sponsors’. It was really good to have six fantastic companies sponsoring the event, and providing extra options for them.
I haven’t even mentioned the non-English events that have taken place! PASS has seen 24HOP events in Russian, Portuguese and Spanish this year, although my personal involvement with those events have been somewhat less. Again, the PASS HQ staff have been great in enabling these events, and helping them run smoothly.
So I leave 24HOP in capable hands.
Instead, I pick up the SQL Saturday portfolio – another fast-growing facet of PASS. Already the 200th SQL Saturday event has been scheduled, and I’m finding myself getting onto a moving train. Luckily, I won’t be battling anyone on the roof Bond-style, but there are a lot of things that will need attention to make sure that the SQL Saturday model can continue to be successful.
The PASS HQ staff most responsible for the SQL Saturdays that happen all around the world are Karla and Niko. If you’ve ever met either of these two, you’ll know that they run quickly and are nothing if not achievers. I suspect that I could just tell them to keep doing their thing and the portfolio would be considered successful. This is incredibly useful to me, because I should be able to focus on identifying and solving some of the things that might need to change as these events become larger in both size and number. I’m keen to look into some of the edge cases, such as international events (including non-English), and both the larger and smaller events that are around – but all the time trying to serve Niko, Karla and all the community leaders in what they do.
I spent some time explaining SQL Server Replication to someone recently. They said they hadn’t ever really understood the concepts, and that I’d managed to help. It’s inspired me to write a post that I wouldn’t normally do – a “101” post. I’m not trying to do a fully comprehensive piece on replication, just enough to be able to help you get the concepts.
The way I like to think about replication is by comparing it to magazines. The analogy only goes so far, but let’s see how we go.
The things being replicated are articles. A publication (the responsibility of a publisher) is a collection of these articles. At the other end of the process are people with subscriptions. It’s just like when my son got a magazine subscription last Christmas. Every month, the latest set of articles got delivered to our house. (The image here isn’t my own – but feel free to click on it and subscribe to FourFourTwo – excellent magazine, particularly when they’re doing an article about the Arsenal.) Most of the work is done by agents, such as the newsagent that gets it to my house.
In SQL Server, these same concepts hold. The objects which are being replicated are articles (typically tables, but also stored procedures, functions, view definitions, and even indexed views). You might not replicate your whole database – just the tables and other objects of interest. These articles make up a publication. Replication is just about getting that stuff to the Subscribers.
Of course, the magazine analogy breaks down quite quickly. Each time my son got a new edition, the articles were brand new – material he’d never seen before. In SQL Replication, the Subscribers probably have data from earlier. But this brings us to look at a key concept in SQL Replication – how the stupid thing starts.
Regardless of what kind of replication you’re talking about, the concept is all about keeping Subscribers in sync with the Publisher. You could have the whole table move across every time, but more than likely, you’re going to just have the changes go through. At some point, though, the thing has to get to a starting point.
This starting point is (typically) done using a snapshot. It’s not a “Database Snapshot” like what you see in the Object Explorer of SQL Server Management Studio – this is just a starting point for replication. It’s a dump of all the data and metadata that make up the articles, and it’s stored on the file system. Not in a database, on the file system. A Subscriber will need this data to be initialised, ready for a stream of changes to be applied.
It’s worth noting that there is a flavour of replication which just uses snapshots, known as Snapshot Replication. Every time the subscriber gets a refresh of data, it’s the whole publication that has to flow down. This might be fine for small pieces of data, it might not for others.
(There are other ways to get started too, such as by restoring a backup, but you should still be familiar with the concept of snapshots for replication.)
To get in sync, a subscriber would need the data in the snapshot for initialisation, and then every change that has happened since. To reduce the effort that would be required if something went drastically wrong and a new subscription became needed, snapshots can be recreated at regular intervals. This is done by the Snapshot Agent, and like all agents, can be found as a SQL Server Agent job.
The middle-man between the Publisher and the Subscribers is the Distributor. The Distributor is essentially a bunch of configuration items (and as we’ll see later, changes to articles), stored in the distribution database – a system database that is often overlooked. If you query sys.databases on a SQL instance that has been configured as a Distributor you’ll see a row for the distribution database. It won’t have a database_id less than 5, but it will have a value of 1 in the is_distributor column. The instance used as the Distributor is the one whose SQL Server Agent runs most of the replication agents, including the Snapshot Agent.
If you’re not doing Snapshot Replication, you’re going to want to get those changes through. Transactional Replication, as the name suggests, involves getting transactions that affect the published articles out to the subscribers. If the replication has been set up to push the data through, this should be quite low latency.
So that SQL Server isn’t having to check every transaction right in the middle of it, there’s a separate agent that looks though the log for transactions that are needed for the replication, copying them across to the distribution database, where they hang around as long as they’re needed. This agent is the Log Reader Agent, and also runs on the Distributor. You can imagine that there is a potential performance hit if this is running on a different machine to the Publisher, and this is one of the influencing factors that means that you’ll typically have the Distributor running on the Publisher (although there are various reasons why you might not).
Now we have a process which is making sure that initialisation is possible by getting snapshots ready, and another process which is looking for changes to the articles. The agent that gets this data out to Subscribers is the Distribution Agent. Despite its name, it can run at the Subscriber, if the Subscriber is set to pull data across (good for occasionally connected systems). This is like with my magazine – I might prefer to go to the newsagent and pick it up, if I’m not likely to be home when the postman comes around. In effect, my role as Subscriber includes doing some distribution if I want to pull the data through myself.
These three agents, Snapshot Agent, Log Reader Agent and Distribution Agent, make up the main agents used for Transactional Replication, which is probably the most common type of replication around. Snapshot Replication doesn’t use the Log Reader Agent, but still needs the other two.
Now let’s consider the other types of replication.
Merge Replication involves having subscribers that can also change the data. It’s similar to Transactional Replication with Updateable Subscribers, which has been deprecated. These changes are sent back to the Merge Agent, which works out what changes have to be applied. This is actually more complicated than you might expect, because it’s very possible to have changes made in multiple places and for a conflict to arise. You can set defaults about who wins, and can override manually through the Replication Monitor (which is generally a useful tool for seeing if Subscribers are sufficiently in sync, testing the latency, and so on). Updateable Subscribers end up using the Queue Reader Agent instead of the Merge Agent. They’re slightly different in the way they run, but I consider them to be quite similar in function, as they both involve getting the data back into the publisher when changes have been made elsewhere.
Peer-to-Peer Replication is the final kind. This is really a special type of Transactional Replication, in which you have multiple publishers, all pushing data out at each other. It’s the option that is considered closest to a High Availability system, and is good across geographically wide environments, particularly if connections are typically routed to the closest server. Consider the example of servers in the UK, the US and Australia. Australian users can be connected to the local server, knowing the changes are going to be pushed out to the UK and US boxes. They’re set up in a topology, with each server considered a node. Each server keeps track of which updates it’s had, which means they should be able to keep in sync, regardless of when they have downtime. If Australian changes are sent to the UK but not the US, then US can be updated by the UK server if that’s easier.
Replication can feel complex. There are a lot of concepts that are quite alien to most database administrators. However, the benefits of replication can be significant, and are worth taking advantage of in many situations. They’re an excellent way of keeping data in sync across a number of servers, without many of the server availability hassles associated with log-shipping or mirroring. It can definitely help you achieve scale-out environments, particularly if you consider Peer-to-Peer, which can help you offload your connections to other servers, knowing the key data can be kept up-to-date easily.
I haven’t tried to be completely comprehensive in this quick overview of replication, but if you’re new to the concepts, or you’re studying for one of the MCITP exams and need to be able to get enough of an understanding to get you by, then I hope this has helped demystify it somewhat.
There’s more in SQL Books Online, of course – a whole section on Replication. If what I’ve written makes sense, go exploring, and try to get it running on your own servers too. Maybe my next post will cover some of that.
Readers of my blog, or followers on Twitter will know I took the MCM Lab exam a couple of days ago. I let people know I was doing the exam, rather than doing the ‘normal’ thing of doing it in secret and hoping no-one found out until a successful result had been published.
Oh, and this post has been approved by the MCM program’s boB Taylor (@sqlboBT) as not-breaking NDA. Nothing herein should be seen to imply that a particular topic is or isn’t in the exam.
So how did I go? Well... I made a bunch of mistakes, I wasted a lot of time, I even left some questions incomplete, and so I assume I’ve failed. It’s a horrible feeling, I can tell you. I went in not knowing what to expect. I knew that I’d worked with SQL Server for a lot of years, and felt like I had a good grounding in all the various aspects of SQL Server that I might have to know – but on the day, in the stress of having a laggy remote desktop session, a Lync connection which had dropped out just beforehand, being told that ‘everyone’ fails first time but that I was probably going to be one of the exceptions..., well, I didn’t feel like it was my day.
If I’d had that day at a client site, it would’ve been fine. With extra time, I have no doubt that I would’ve been able to get through things. I could’ve raised questions when I didn’t feel exactly sure about what was expected. I could’ve pointed out how much I’d done to help address a situation and found out if they wanted me to continue hunting for other potential issues. I could’ve asked for more information. I could’ve written a document describing what I had in mind to solve the particular problem to confirm that the client was happy with it. You don’t get any of that in exam situations.
I found myself doing things the ways that work in the real world, but not having somewhere to turn when it turned out that the information was clearly not quite there, turning a simple situation into a trickier one. In real life, I’d’ve turned to the face peering over my shoulder and said “I thought you said it was that thing.” Instead, I wasted time looking for what it actually was.
I found myself making changes to databases that I wouldn’t make in the real world without first running those changes past someone, or at least making sufficient disclaimers along the lines of “I would recommend making this change, but there’s always a chance that something else will see the impact of this…”
Now, I think I could go into the exam, and even faced with a different set of questions, have a better picture about the style of thing that would be asked, and be better able to identify the danger-items – those things which could lure me away from staying on track. It’s not the same with consulting, because you can always ask as soon as you consider there’s a potential problem. Just like the mechanic who says “So, there’s a rattle… do you want me to spend some time looking into this? If so, is half an hour okay?”, I can ask my clients if there’s something which doesn’t smell right. In the exam, you don’t get that luxury.
If you’re going to take the exam, I would recommend the following approach:
- Start by going through the questions, making notes about what they’re asking you to do and how much time it’ll take. Move the list of questions to the side, because switching from one window to another will simply take too long.
- Once you’ve identified ones that you know will be ‘quick’, do them. But if one stops being quick, leave it and come back. You’re not in danger of leaving it unfinished completely – there’s 5.5 hours that you’ll be remoted in, and you’ll be coming back in about 30 minutes. But you don’t want to find that you spend an hour on something which you anticipated would be a ten minute job.
- Whenever you leave a question, put it back in the list at an appropriate spot. If you think it’s still quicker than the question on XYZ, then fine, put it above that. But be ruthless about how long you spend on each task. If something doesn’t work the way you expect, do some troubleshooting, but don’t treat it as if your client depends on it. Your exam result will suffer more because you wasted time, than if you left it incomplete. You could end up finding that the XYZ question actually turned out to be simpler than you thought.
I hope I passed, but I really don’t think I have done.
I’m confident I blitzed quite a lot of them. There were plenty on the list that I moved through quickly, and others that took me longer than I expected, but I still finished. It’s just that there were a certain number that I assume I didn’t finish satisfactorily – simply because if it were my database that I had got a consultant in to fix, I wouldn’t’ve considered it complete. I mean, hopefully I hit the key things on some of those, but I can’t tell.
I know I definitely got some just plain wrong. The things that weren’t working, and I didn’t have time to get back to. I was getting hungry by the end of it, and was feeling stressed about the amount of time I’d wasted on other questions.
Assuming I need to retake it, I have to wait until 90 days after this attempt. That’s March 20th.
So what will I do between now and then? Well, I might check through the various guides about the things that stop things from working the way I expect them to. I’m not saying what didn’t work, but imagine there’s some technology that you’re familiar enough with. To use Tom’s example, peer-to-peer replication. I know how to set that up – I’ve done it before. It’s actually very straight forward. But if you do what you always do and you get some error… well, that might be harder to troubleshoot. In the real world there are plenty of ways you can troubleshoot online, but in an exam, it’s just Books Online and what’s in your head.
In that regard, Tom’s guide on what he didn’t do is useful. His “Don’t try to get every answer” is very important. But I would also point out that you should start to try them, in case they turn out to be easier than you expected. Just because it’s on P2P (continuing Tom’s example) doesn’t mean you won’t get it. Tom described shouting at the screen saying “Seriously? Is that all you’ve got?” – this is why you should try the ones that you figured would be harder. Don’t start with them, but a few hours in, don’t just focus on the time-wasters.
Tom described ‘not studying alone’. I discussed various things with people leading up to the exam, more so than studying per se. There’s another person who I know is taking the exam soon, and we’ve talked a bit about the various technologies. Like “How are you at TDE? What’s your clustering like?” – that kind of thing. He has environments set up where he's practising a bunch of tasks. I didn’t. I did skim through some of the videos, but not many, and it was really only skimming. I found myself being slightly interested in the video where PaulR demonstrates how to set up replication with a mirrored subscriber. I watched it a couple of times because something wasn’t sitting right – I figured it out though… in the exam you don’t want to be using scripts for everything (the GUI would be much better – not that I practised at all), but also, the error that he gets at the end… it’s because his script tries to remove the subscriber from the wrong server. In the GUI, he wouldn’t’ve had that problem (reminds me – I should mention this particular mistake to him, although I’m sure he knows about it and just doesn’t have an easy way to change the video). I didn’t end up watching most of the videos – even the ones on Resource Governor. I looked at the titles, and figured it would be fine to skip them. I listened to one on Clustering Troubleshooting in the car as I drove home from town, but just kept thinking “Right – I get all that… but is that enough for the exam?” I can’t say whether it was or not, but I can tell you that I didn’t do any further study on it. I also watched the Waits demo, and the slides on Log File Internals (but mainly out of curiosity about a different problem I’d had a while back).
I didn’t read the books. I’ve read a couple of chapters of Kalen’s SQL Internals before, but didn’t finish it. I haven’t even read the book I wrote recently (well, obviously the chapters I wrote – I read them lots of times).
Of Tom’s advice on what he did to pass (doing the reading, watching the videos, writing things down and teaching others) I really didn’t do any of them. But I do spend time teaching others in general. I’m an MCT – I teach from time to time (not as often as I used to), and that sharing aspect is important to me. I explain to people about how to tune their queries. I explain why indexing strategies will work for general performance gain. I explain why security is important, and the keys to avoiding SQL Injection. Even today I spent time explaining the roles of Replication Agents to someone (who later told me that they’d heard that stuff before, but it just made more sense when I explained it). Teaching comes naturally to me, and I’ve always done it. But I didn’t do any of that with this MCM exam in mind.
My advice on how to pass this exam – use SQL Server for fifteen years. It’s what I did. Studying might work for you too, and if that’s what you’ve done, then you may well get to leave the lab exam feeling a whole lot more confident about your result than me.
I actually did leave feeling very confident about my result. I’m confident I failed. But I also know that I could pass it tomorrow with no extra study – by avoiding the time sinks that are in there – and I wish I didn’t have to wait until March to retake it.
I’ll post again when my result becomes public, to let you know how I actually did.
Of course, I hope I’m wrong.
In two days I’ll’ve finished the MCM Lab exam, 88-971. If you do an internet search for 88-971, it’ll tell you the answer is –883. Obviously.
It’ll also give you a link to the actual exam page, which is useful too, once you’ve finished being distracted by the calculator instead of going to the thing you’re actually looking for. (Do people actually search the internet for the results of mathematical questions? Really?)
The list of Skills Measured for this exam is quite short, but can essentially be broken down into one word “Anything”.
The Preparation Materials section is even better. Classroom Training – none available. Microsoft E-Learning – none available. Microsoft Press Books – none available. Practice Tests – none available. But there are links to Readiness Videos and a page which has no resources listed, but tells you a list of people who have already qualified. Three in Australia who have MCM SQL Server 2008 so far. The list doesn’t include some of the latest batch, such as Jason Strate or Tom LaRock.
I’ve used SQL Server for almost 15 years. During that time I’ve been awarded SQL Server MVP seven times, but the MVP award doesn’t actually mean all that much when considering this particular certification. I know lots of MVPs who have tried this particular exam and failed – including Jason and Tom. Right now, I have no idea whether I’ll pass or not. People tell me I’ll pass no problem, but I honestly have no idea. There’s something about that “Anything” aspect that worries me.
I keep looking at the list of things in the Readiness Videos, and think to myself “I’m comfortable with Resource Governor (or whatever) – that should be fine.” Except that then I feel like I maybe don’t know all the different things that can go wrong with Resource Governor (or whatever), and I wonder what kind of situations I’ll be faced with. And then I find myself looking through the stuff that’s explained in the videos, and wondering what kinds of things I should know that I don’t, and then I get amazingly bored and frustrated (after all, I tell people that these exams aren’t supposed to be studied for – you’ve been studying for the last 15 years, right?), and I figure “What’s the worst that can happen? A fail?”
I’m told that the exam provides a list of scenarios (maybe 14 of them?) and you have 5.5 hours to complete them. When I say “complete”, I mean complete – you don’t get to leave them unfinished, that’ll get you ‘nil points’ for that scenario. Apparently no-one gets to complete all of them.
Now, I’m a consultant. I get called on to fix the problems that people have on their SQL boxes. Sometimes this involves fixing corruption. Sometimes it’s figuring out some performance problem. Sometimes it’s as straight forward as getting past a full transaction log; sometimes it’s as tricky as recovering a database that has lost its metadata, without backups. Most situations aren’t a problem, but I also have the confidence of being able to do internet searches to verify my maths (in case I forget it’s –883). In the exam, I’ll have maybe twenty minutes per scenario (but if I need longer, I’ll have to take longer – no point in stopping half way if it takes more than twenty minutes, unless I don’t see an end coming up), so I’ll have time constraints too. And of course, I won’t have any of my usual tools. I can’t take scripts in, I can’t take staff members. Hopefully I can use the coffee machine that will be in the room.
I figure it’s going to feel like one of those days when I’ve gone into a client site, and found that the problems are way worse than I expected, and that the site is down, with people standing over me needing me to get things right first time...
...so it should be fine, I’ve done that before. :)
If I do fail, it won’t make me any less of a consultant. It won’t make me any less able to help all of my clients (including you if you get in touch – hehe), it’ll just mean that the particular problem might’ve taken me more than the twenty minutes that the exam gave me.
PS: Apparently the done thing is to NOT advertise that you’re sitting the exam at a particular time, only that you’re expecting to take it at some point in the future. I think it’s akin to the idea of not telling people you’re pregnant for the first few months – it’s just in case the worst happens. Personally, I’m happy to tell you all that I’m going to take this exam the day after tomorrow (which is the 19th in the US, the 20th here). If I end up failing, you can all commiserate and tell me that I’m not actually as unqualified as I feel.
Tables are only metadata. They don’t store data.
I’ve written something about this before, but I want to take a viewpoint of this idea around the topic of joins, especially since it’s the topic for T-SQL Tuesday this month. Hosted this time by Sebastian Meine (@sqlity), who has a whole series on joins this month. Good for him – it’s a great topic.
In that last post I discussed the fact that we write queries against tables, but that the engine turns it into a plan against indexes. My point wasn’t simply that a table is actually just a Clustered Index (or heap, which I consider just a special type of index), but that data access always happens against indexes – never tables – and we should be thinking about the indexes (specifically the non-clustered ones) when we write our queries.
I described the scenario of looking up phone numbers, and how it never really occurs to us that there is a master list of phone numbers, because we think in terms of the useful non-clustered indexes that the phone companies provide us, but anyway – that’s not the point of this post.
So a table is metadata. It stores information about the names of columns and their data types. Nullability, default values, constraints, triggers – these are all things that define the table, but the data isn’t stored in the table. The data that a table describes is stored in a heap or clustered index, but it goes further than this.
All the useful data is going to live in non-clustered indexes. Remember this. It’s important. Stop thinking about tables, and start thinking about indexes.
So let’s think about tables as indexes. This applies even in a world created by someone else, who doesn’t have the best indexes in mind for you.
I’m sure you don’t need me to explain Covering Index bit – the fact that if you don’t have sufficient columns “included” in your index, your query plan will either have to do a Lookup, or else it’ll give up using your index and use one that does have everything it needs (even if that means scanning it). If you haven’t seen that before, drop me a line and I’ll run through it with you. Or go and read a post I did a long while ago about the maths involved in that decision.
So – what I’m going to tell you is that a Lookup is a join.
When I run SELECT CustomerID FROM Sales.SalesOrderHeader WHERE SalesPersonID = 285; against the AdventureWorks2012 get the following plan:
I’m sure you can see the join. Don’t look in the query, it’s not there. But you should be able to see the join in the plan. It’s an Inner Join, implemented by a Nested Loop. It’s pulling data in from the Index Seek, and joining that to the results of a Key Lookup.
It clearly is – the QO wouldn’t call it that if it wasn’t really one. It behaves exactly like any other Nested Loop (Inner Join) operator, pulling rows from one side and putting a request in from the other. You wouldn’t have a problem accepting it as a join if the query were slightly different, such as
FROM Sales.SalesOrderHeader AS soh
JOIN Sales.SalesOrderDetail as sod
on sod.SalesOrderID = soh.SalesOrderID
WHERE soh.SalesPersonID = 285;
Amazingly similar, of course. This one is an explicit join, the first example was just as much a join, even thought you didn’t actually ask for one.
You need to consider this when you’re thinking about your queries.
But it gets more interesting.
Consider this query:
WHERE SalesPersonID = 276
AND CustomerID = 29522;
It doesn’t look like there’s a join here either, but look at the plan.
That’s not some Lookup in action – that’s a proper Merge Join. The Query Optimizer has worked out that it can get the data it needs by looking in two separate indexes and then doing a Merge Join on the data that it gets. Both indexes used are ordered by the column that’s indexed (one on SalesPersonID, one on CustomerID), and then by the CIX key SalesOrderID. Just like when you seek in the phone book to Farley, the Farleys you have are ordered by FirstName, these seek operations return the data ordered by the next field. This order is SalesOrderID, even though you didn’t explicitly put that column in the index definition. The result is two datasets that are ordered by SalesOrderID, making them very mergeable.
Another example is the simple query
WHERE SalesPersonID = 276;
This one prefers a Hash Match to a standard lookup even! This isn’t just ordinary index intersection, this is something else again! Just like before, we could imagine it better with two whole tables, but we shouldn’t try to distinguish between joining two tables and joining two indexes.
The Query Optimizer can see (using basic maths) that it’s worth doing these particular operations using these two less-than-ideal indexes (because of course, the best indexese would be on both columns – a composite such as (SalesPersonID, CustomerID – and it would have the SalesOrderID column as part of it as the CIX key still).
You need to think like this too.
Not in terms of excusing single-column indexes like the ones in AdventureWorks2012, but in terms of having a picture about how you’d like your queries to run. If you start to think about what data you need, where it’s coming from, and how it’s going to be used, then you will almost certainly write better queries.
…and yes, this would include when you’re dealing with regular joins across multiples, not just against joins within single table queries.
I took the SQL 2008 MCM Knowledge exam while in Seattle for the PASS Summit ten days ago.
I wasn’t planning to do it, but I got persuaded to try. I was meaning to write this post to explain myself before the result came out, but it seems I didn’t get typing quickly enough.
Those of you who know me will know I’m a big fan of certification, to a point. I’ve been involved with Microsoft Learning to help create exams. I’ve kept my certifications current since I first took an exam back in 1998, sitting many in beta, across quite a variety of topics. I’ve probably become quite good at them – I know I’ve definitely passed some that I really should’ve failed.
I’ve also written that I don’t think exams are worth studying for.
(That’s probably not entirely true, but it depends on your motivation. If you’re doing learning, I would encourage you to focus on what you need to know to do your job better. That will help you pass an exam – but the two skills are very different. I can coach someone on how to pass an exam, but that’s a different kind of teaching when compared to coaching someone about how to do a job. For example, the real world includes a lot of “it depends”, where you develop a feel for what the influencing factors might be. In an exam, its better to be able to know some of the “Don’t use this technology if XYZ is true” concepts better.)
As for the Microsoft Certified Master certification… I’m not opposed to the idea of having the MCM (or in the future, MCSM) cert. But the barrier to entry feels quite high for me. When it was first introduced, the nearest testing centres to me were in Kuala Lumpur and Manila. Now there’s one in Perth, but that’s still a big effort. I know there are options in the US – such as one about an hour’s drive away from downtown Seattle, but it all just seems too hard. Plus, these exams are more expensive, and all up – I wasn’t sure I wanted to try them, particularly with the fact that I don’t like to study.
I used to study for exams. It would drive my wife crazy. I’d have some exam scheduled for some time in the future (like the time I had two booked for two consecutive days at TechEd Australia 2005), and I’d make sure I was ready. Every waking moment would be spent pouring over exam material, and it wasn’t healthy. I got shaken out of that, though, when I ended up taking four exams in those two days in 2005 and passed them all. I also worked out that if I had a Second Shot available, then failing wasn’t a bad thing at all. Even without Second Shot, I’m much more okay about failing. But even just trying an MCM exam is a big effort. I wouldn’t want to fail one of them.
Plus there’s the illusion to maintain. People have told me for a long time that I should just take the MCM exams – that I’d pass no problem. I’ve never been so sure. It was almost becoming a pride-point. Perhaps I should fail just to demonstrate that I can fail these things.
Anyway – boB Taylor (@sqlboBT) persuaded me to try the SQL 2008 MCM Knowledge exam at the PASS Summit. They set up a testing centre in one of the room there, so it wasn’t out of my way at all. I had to squeeze it in between other commitments, and I certainly didn’t have time to even see what was on the syllabus, let alone study. In fact, I was so exhausted from the week that I fell asleep at least once (just for a moment though) during the actual exam. Perhaps the questions need more jokes, I’m not sure.
I knew if I failed, then I might disappoint some people, but that I wouldn’t’ve spent a great deal of effort in trying to pass. On the other hand, if I did pass I’d then be under pressure to investigate the MCM Lab exam, which can be taken remotely (therefore, a much smaller amount of effort to make happen). In some ways, passing could end up just putting a bunch more pressure on me.
Oh, and I did.
I posted a few hours ago about a reflection of the Summit, but I wanted to write another one for this month’s T-SQL Tuesday, hosted by Chris Yates.
In January of this year, Adam Jorgensen and I joked around in a video that was used for the SQL Server 2012 launch. We were asked about SQLFamily, and we said how we were like brothers – how we could drive each other crazy (the look he gave me as I patted his stomach was priceless), but that we’d still look out for each other, just like in a real family.
And this is really true.
Last week at the PASS Summit, there was a lot going on. I was busy as always, as were many others. People told me their good news, their awful news, and some whinged to me about other people who were driving them crazy. But throughout this, people in the SQL Server community genuinely want the best for each other. I’m sure there are exceptions, but I don’t see much of this.
Australians aren’t big on cheering for each other. Neither are the English. I think we see it as an American thing. It could be easy for me to consider that the SQL Community that I see at the PASS Summit is mainly there because it’s a primarily American organisation. But when you speak to people like sponsors, or people involved in several types of communities, you quickly hear that it’s not just about that – that PASS has something special. It goes beyond cheering, it’s a strong desire to see each other succeed.
I see MVPs feel disappointed for those people who don’t get awarded. I see Summit speakers concerned for those who missed out on the chance to speak. I see chapter leaders excited about the opportunity to help other chapters. And throughout, I see a gentleness and love for people that you rarely see outside the church (and sadly, many churches don’t have it either).
Chris points out that the M-W dictionary defined community as “a unified body of individuals”, and I feel like this is true of the SQL Server community. It goes deeper though. It’s not just unity – and we’re most definitely different to each other – it’s more than that. We all want to see each other grow. We all want to pull ourselves up, to serve each other, and to grow PASS into something more than it is today.
In that other post of mine I wrote a bit about Paul White’s experience at his first Summit. His missus wrote to me on Facebook saying that she welled up over it. But that emotion was nothing about what I wrote – it was about the reaction that the SQL Community had had to Paul. Be proud of it, my SQL brothers and sisters, and never lose it.
So far, my three PASS Summit experiences have been notably different to each other.
My first, I wasn’t on the board and I gave two regular sessions and a Lightning Talk in which I told jokes.
My second, I was a board advisor, and I delivered a precon, a spotlight and a Lightning Talk in which I sang.
My third (last week), I was a full board director, and I didn’t present at all.
Let’s not talk about next year. I’m not sure there are many options left.
This year, I noticed that a lot more people recognised me and said hello. I guess that’s potentially because of the singing last year, but could also be because board elections can bring a fair bit of attention, and because of the effort I’ve put in through things like 24HOP... Yeah, ok. It’d be the singing.
My approach was very different though. I was watching things through different eyes. I looked for the things that seemed to be working and the things that didn’t. I had staff there again, and was curious to know how their things were working out. I knew a lot more about what was going on behind the scenes to make various things happen, and although very little about the Summit was actually my responsibility (based on not having that portfolio), my perspective had moved considerably.
Before the Summit started, Board Members had been given notebooks – an idea Tom (who heads up PASS’ marketing) had come up with after being inspired by seeing Bill walk around with a notebook. The plan was to take notes about feedback we got from people. It was a good thing, and the notebook forms a nice pair with the SQLBits one I got a couple of years ago when I last spoke there. I think one of the biggest impacts of this was that during the first keynote, Bill told everyone present about the notebooks. This set a tone of “we’re listening”, and a number of people were definitely keen to tell us things that would cause us to pull out our notebooks.
PASSTV was a new thing this year. Justin, the host, featured on the couch and talked a lot of people about a lot of things, including me (he talked to me about a lot of things, I don’t think he talked to a lot people about me). Reaching people through online methods is something which interests me a lot – it has huge potential, and I love the idea of being able to broadcast to people who are unable to attend in person. I’m keen to see how this medium can be developed over time.
People who know me will know that I’m a keen advocate of certification – I've been SQL certified since version 6.5, and have even been involved in creating exams. However, I don’t believe in studying for exams. I think training is worthwhile for learning new skills, but the goal should be on learning those skills, not on passing an exam. Exams should be for proving that the skills are there, not a goal in themselves. The PASS Summit is an excellent place to take exams though, and with an attitude of professional development throughout the event, why not?
So I did. I wasn’t expecting to take one, but I was persuaded and took the MCM Knowledge Exam. I hadn’t even looked at the syllabus, but tried it anyway. I was very tired, and even fell asleep at one point during it. I’ll find out my result at some point in the future – the Prometric site just says “Tested” at the moment. As I said, it wasn’t something I was expecting to do, but it was good to have something unexpected during the week.
Of course it was good to catch up with old friends and make new ones. I feel like every time I’m in the US I see things develop a bit more, with more and more people knowing who I am, who my staff are, and recognising the LobsterPot brand. I missed being a presenter, but I definitely enjoyed seeing many friends on the list of presenters. I won’t try to list them, because there are so many these days that people might feel sad if I don’t mention them. For those that I managed to see, I was pleased to see that the majority of them have lifted their presentation skills since I last saw them, and I happily told them as much. One person who I will mention was Paul White, who travelled from New Zealand to his first PASS Summit. He gave two sessions (a regular session and a half-day), packed large rooms of people, and had everyone buzzing with enthusiasm. I spoke to him after the event, and he told me that his expectations were blown away. Paul isn’t normally a fan of crowds, and the thought of 4000 people would have been scary. But he told me he had no idea that people would welcome him so well, be so friendly and so down to earth. He’s seen the significance of the SQL Server community, and says he’ll be back.
It’ll be good to see him there. Will you be there too?
PASS Summit is coming up, and I thought I’d post a few things.
At the PASS Summit, you will get the chance to hear presentations by the SQL Server establishment. Just about every big name in the SQL Server world is a regular at the PASS Summit, so you will get to hear and meet people like Kalen Delaney (@sqlqueen) (who just recently got awarded MVP status for the 20th year running), and from all around the world such as the UK’s Chris Webb (@technitrain) or Pinal Dave (@pinaldave) from India. Almost all the household names in SQL Server will be there, including a large contingent from Microsoft. The PASS Summit is by far the best place to meet the legends of SQL Server. And they’re not all old. Some are, but most of them are younger than you might think.
The hottest topics are often about the newest technologies (such as SQL Server 2012). But you will almost certainly learn new stuff about older versions too. But that’s not what I wanted to pick on for this point.
There are many new speakers at every PASS Summit, and content that has not been covered in other places. This year, for example, LobsterPot’s Roger Noble (@roger_noble) is giving a presentation for the first time. He’s a regular around the Australian circuit, but this is his first time presenting to a US audience. New Zealand’s Paul White (@sql_kiwi) is attending his first PASS Summit, and will be giving over four hours of incredibly deep stuff that has never been presented anywhere in the US before (I can’t say the world, because he did present similar material in Adelaide earlier in the year).
No, I’m not talking about plagiarism – the talks you’ll hear are all their own work.
But you will get a lot of stuff you’ll be able to take back and apply at work. The PASS Summit sessions are not full of sales-pitches, telling you about how great things could be if only you’d buy some third-party vendor product. It’s simply not that kind of conference, and PASS doesn’t allow that kind of talk to take place. Instead, you’ll be taught techniques, and be able to download scripts and slides to let you perform that magic back at work when you get home. You will definitely find plenty of ideas to borrow at the PASS Summit.
Yeah – and there’s karaoke. Blue - Jason - SQL Karaoke - YouTube
I got asked about calling home from the US, by someone going to the PASS Summit. I found myself thinking “there should be a blog post about this”...
The easiest way to phone home is Skype - no question. Use WiFi, and if you’re calling someone who has Skype on their phone at the other end, it’s free. Even if they don’t, it’s still pretty good price-wise. The PASS Summit conference centre has good WiFI, as do the hotels, and plenty of other places (like Starbucks).
But if you’re used to having data all the time, particularly when you’re walking from one place to another, then you’ll want a sim card. This also lets you receive calls more easily, not just solving your data problem. You’ll need to make sure your phone isn’t locked to your local network – get that sorted before you leave.
It’s no trouble to drop by a T-mobile or AT&T store and getting a prepaid sim. You can’t get one from the airport, but if the PASS Summit is your first stop, there’s a T-mobile store on 6th in Seattle between Pine & Pike, so you can see it from the Sheraton hotel if that’s where you’re staying. AT&T isn’t far away either.
But – there’s an extra step that you should be aware of.
If you talk to one of these US telcos, you’ll probably (hopefully I’m wrong, but this is how it was for me recently) be told that their prepaid sims don’t work in smartphones. And they’re right – the APN gets detected and stops the data from working. But luckily, Apple (and others) have provided information about how to change the APN, which has been used by a company based in New Zealand to let you get your phone working.
Basically, you send your phone browser to http://unlockit.co.nz and follow the prompts. But do this from a WiFi place somewhere, because you won’t have data access until after you’ve sorted this out...
Oh, and if you get a prepaid sim with “unlimited data”, you will still need to get a Data Feature for it.
And just for the record – this is WAY easier if you’re going to the UK. I dropped into a T-mobile shop there, and bought a prepaid sim card for five quid, which gave me 250MB data and some (but not much) call credit. In Australia it’s even easier, because you can buy data-enabled sim cards that work in smartphones from the airport when you arrive.
I think having access to data really helps you feel at home in a different place. It means you can pull up maps, see what your friends are doing, and more. Hopefully this post helps, but feel free to post comments with extra information if you have it.
SQL Server Reporting Services plays nicely. You can have things in the catalogue that get shared. You can have Reports that have Links, Datasets that can be used across different reports, and Data Sources that can be used in a variety of ways too.
So if you find that someone has deleted a shared data source, you potentially have a bit of a horror story going on. And this works for this month’s T-SQL Tuesday theme, hosted by Nick Haslam, who wants to hear about horror stories. I don’t write about LobsterPot client horror stories, so I’m writing about a situation that a fellow MVP friend asked me about recently instead.
The best thing to do is to grab a recent backup of the ReportServer database, restore it somewhere, and figure out what’s changed. But of course, this isn’t always possible.
And it’s much nicer to help someone with this kind of thing, rather than to be trying to fix it yourself when you’ve just deleted the wrong data source. Unfortunately, it lets you delete data sources, without trying to scream that the data source is shared across over 400 reports in over 100 folders, as was the case for my friend’s colleague.
So, suddenly there’s a big problem – lots of reports are failing, and the time to turn it around is small. You probably know which data source has been deleted, but getting the shared data source back isn’t the hard part (that’s just a connection string really). The nasty bit is all the re-mapping, to get those 400 reports working again.
I know from exploring this kind of stuff in the past that the ReportServer database (using its default name) has a table called dbo.Catalog to represent the catalogue, and that Reports are stored here. However, the information about what data sources these deployed reports are configured to use is stored in a different table, dbo.DataSource. You could be forgiven for thinking that shared data sources would live in this table, but they don’t – they’re catalogue items just like the reports. Let’s have a look at the structure of these two tables (although if you’re reading this because you have a disaster, feel free to skim past).
Frustratingly, there doesn’t seem to be a Books Online page for this information, sorry about that. I’m also not going to look at all the columns, just ones that I find interesting enough to mention, and that are related to the problem at hand. These fields are consistent all the way through to SQL Server 2012 – there doesn’t seem to have been any changes here for quite a while.
The Primary Key is ItemID. It’s a uniqueidentifier. I’m not going to comment any more on that. A minor nice point about using GUIDs in unfamiliar databases is that you can more easily figure out what’s what. But foreign keys are for that too…
Path, Name and ParentID tell you where in the folder structure the item lives. Path isn’t actually required – you could’ve done recursive queries to get there. But as that would be quite painful, I’m more than happy for the Path column to be there. Path contains the Name as well, incidentally.
Type tells you what kind of item it is. Some examples are 1 for a folder and 2 a report. 4 is linked reports, 5 is a data source, 6 is a report model. I forget the others for now (but feel free to put a comment giving the full list if you know it).
Content is an image field, remembering that image doesn’t necessarily store images – these days we’d rather use varbinary(max), but even in SQL Server 2012, this field is still image. It stores the actual item definition in binary form, whether it’s actually an image, a report, whatever.
LinkSourceID is used for Linked Reports, and has a self-referencing foreign key (allowing NULL, of course) back to ItemID.
Parameter is an ntext field containing XML for the parameters of the report. Not sure why this couldn’t be a separate table, but I guess that’s just the way it goes. This field gets changed when the default parameters get changed in Report Manager.
There is nothing in dbo.Catalog that describes the actual data sources that the report uses. The default data sources would be part of the Content field, as they are defined in the RDL, but when you deploy reports, you typically choose to NOT replace the data sources. Anyway, they’re not in this table. Maybe it was already considered a bit wide to throw in another ntext field, I’m not sure. They’re in dbo.DataSource instead.
The Primary key is DSID. Yes it’s a uniqueidentifier...
ItemID is a foreign key reference back to dbo.Catalog
Fields such as ConnectionString, Prompt, UserName and Password do what they say on the tin, storing information about how to connect to the particular source in question.
Link is a uniqueidentifier, which refers back to dbo.Catalog. This is used when a data source within a report refers back to a shared data source, rather than embedding the connection information itself. You’d think this should be enforced by foreign key, but it’s not. It does allow NULLs though.
Flags this is an int, and I’ll come back to this.
When a Data Source gets deleted out of dbo.Catalog, you might assume that it would be disallowed if there are references to it from dbo.DataSource. Well, you’d be wrong. And not because of the lack of a foreign key either.
Deleting anything from the catalogue is done by calling a stored procedure called dbo.DeleteObject. You can look at the definition in there – it feels very much like the kind of Delete stored procedures that many people write, the kind of thing that means they don’t need to worry about allowing cascading deletes with foreign keys – because the stored procedure does the lot.
Except that it doesn’t quite do that.
If it deleted everything on a cascading delete, we’d’ve lost all the data sources as configured in dbo.DataSource, and that would be bad. This is fine if the ItemID from dbo.DataSource hooks in – if the report is being deleted. But if a shared data source is being deleted, you don’t want to lose the existence of the data source from the report.
So it sets it to NULL, and it marks it as invalid.
We see this code in that stored procedure.
[Flags] = [Flags] & 0x7FFFFFFD, -- broken link
[Link] = NULL
[Catalog] AS C
INNER JOIN [DataSource] AS DS ON C.[ItemID] = DS.[Link]
(C.Path = @Path OR C.Path LIKE @Prefix ESCAPE '*')
Unfortunately there’s no semi-colon on the end (but I’d rather they fix the ntext and image types first), and don’t get me started about using the table name in the UPDATE clause (it should use the alias DS). But there is a nice comment about what’s going on with the Flags field.
What I’d LIKE it to do would be to set the connection information to a report-embedded copy of the connection information that’s in the shared data source, the one that’s about to be deleted. I understand that this would cause someone to lose the benefit of having the data sources configured in a central point, but I’d say that’s probably still slightly better than LOSING THE INFORMATION COMPLETELY. Sorry, rant over. I should log a Connect item – I’ll put that on my todo list.
So it sets the Link field to NULL, and marks the Flags to tell you they’re broken. So this is your clue to fixing it.
A bitwise AND with 0x7FFFFFFD is basically stripping out the ‘2’ bit from a number. So numbers like 2, 3, 6, 7, 10, 11, etc, whose binary representation ends in either 11 or 10 get turned into 0, 1, 4, 5, 8, 9, etc. We can test for it using a WHERE clause that matches the SET clause we’ve just used. I’d also recommend checking for Link being NULL and also having no ConnectionString. And join back to dbo.Catalog to get the path (including the name) of broken reports are – in case you get a surprise from a different data source being broken in the past.
SELECT c.Path, ds.Name
FROM dbo.[DataSource] AS ds
JOIN dbo.[Catalog] AS c ON c.ItemID = ds.ItemID
WHERE ds.[Flags] = ds.[Flags] & 0x7FFFFFFD
AND ds.[Link] IS NULL
AND ds.[ConnectionString] IS NULL;
When I just ran this on my own machine, having deleted a data source to check my code, I noticed a Report Model in the list as well – so if you had thought it was just going to be reports that were broken, you’d be forgetting something.
So to fix those reports, get your new data source created in the catalogue, and then find its ItemID by querying Catalog, using Path and Name to find it.
And then use this value to fix them up. To fix the Flags field, just add 2. I prefer to use bitwise OR which should do the same. Use the OUTPUT clause to get a copy of the DSIDs of the ones you’re changing, just in case you need to revert something later after testing (doing it all in a transaction won’t help, because you’ll just lock out the table, stopping you from testing anything).
UPDATE ds SET [Flags] = [Flags] | 2, [Link] = '3AE31CBA-BDB4-4FD1-94F4-580B7FAB939D' /*Insert your own GUID*/
OUTPUT deleted.Name, deleted.DSID, deleted.ItemID, deleted.Flags
FROM dbo.[DataSource] AS ds
JOIN dbo.[Catalog] AS c ON c.ItemID = ds.ItemID
WHERE ds.[Flags] = ds.[Flags] & 0x7FFFFFFD
AND ds.[Link] IS NULL
AND ds.[ConnectionString] IS NULL;
But please be careful. Your mileage may vary. And there’s no reason why 400-odd broken reports needs to be quite the nightmare that it could be. Really, it should be less than five minutes.
A non-SQL MVP friend of mine, who also happens to be a client, asked me for some help again last week. I was planning on writing this up even before Rob Volk (@sql_r) listed his T-SQL Tuesday topic for this month.
Earlier in the year, I (well, LobsterPot Solutions, although I’d been the person mostly involved) had helped out with a merge replication problem. The Merge Agent on the subscriber was just stopping every time, shortly after it started. With no errors anywhere – not in the Windows Event Log, the SQL Agent logs, not anywhere. We’d managed to get the system working again, but didn’t have a good reason about what had happened, and last week, the problem occurred again. I asked him about writing up the experience in a blog post, largely because of the red herrings that we encountered. It was an interesting experience for me, also because I didn’t end up touching my computer the whole time – just tapping on my phone via Twitter and Live Msgr.
You see, the thing with replication is that a useful troubleshooting option is to reinitialise the thing. We’d done that last time, and it had started to work again – eventually. I say eventually, because the link being used between the sites is relatively slow, and it took a long while for the initialisation to finish. Meanwhile, we’d been doing some investigation into what the problem could be, and were suitably pleased when the problem disappeared.
So I got a message saying that a replication problem had occurred again. Reinitialising wasn’t going to be an option this time either.
In this scenario, the subscriber having the problem happened to be in a different domain to the publisher. The other subscribers (within the domain) were fine, just this one in a different domain had the problem.
Part of the problem seemed to be a log file that wasn’t being backed up properly. They’d been trying to back up to a backup device that had a corruption, and the log file was growing. Turned out, this wasn’t related to the problem, but of course, any time you’re troubleshooting and you see something untoward, you wonder.
Having got past that problem, my next thought was that perhaps there was a problem with the account being used. But the other subscribers were using the same account, without any problems.
The client pointed out that that it was almost exactly six months since the last failure (later shown to be a complete red herring). It sounded like something might’ve expired. Checking through certificates and trusts showed no sign of anything, and besides, there wasn’t a problem running a command-prompt window using the account in question, from the subscriber box.
...except that when he ran the sqlcmd –E –S servername command I recommended, it failed with a Named Pipes error. I’ve seen problems with firewalls rejecting connections via Named Pipes but letting TCP/IP through, so I got him to look into SQL Configuration Manager to see what kind of connection was being preferred... Everything seemed fine. And strangely, he could connect via Management Studio. Turned out, he had a typo in the servername of the sqlcmd command. That particular red herring must’ve been reflected in his cheeks as he told me.
During the time, I also pinged a friend of mine to find out who I should ask, and Ted Kruger (@onpnt) ‘s name came up. Ted (and thanks again, Ted – really) reconfirmed some of my thoughts around the idea of an account expiring, and also suggesting bumping up the logging to level 4 (2 is Verbose, 4 is undocumented ridiculousness). I’d just told the client to push the logging up to level 2, but the log file wasn’t appearing. Checking permissions showed that the user did have permission on the folder, but still no file was appearing. Then it was noticed that the user had been switched earlier as part of the troubleshooting, and switching it back to the real user caused the log file to appear.
Still no errors. A lot more information being pushed out, but still no errors.
Ted suggested making sure the FQDNs were okay from both ends, in case the servers were unable to talk to each other. DNS problems can lead to hassles which can stop replication from working. No luck there either – it was all working fine.
Another server started to report a problem as well. These two boxes were both SQL 2008 R2 (SP1), while the others, still working, were SQL 2005.
Around this time, the client tried an idea that I’d shown him a few years ago – using a Profiler trace to see what was being called on the servers. It turned out that the last call being made on the publisher was sp_MSenumschemachange. A quick interwebs search on that showed a problem that exists in SQL Server 2008 R2, when stored procedures have more than 4000 characters. Running that stored procedure (with the same parameters) manually on SQL 2005 listed three stored procedures, the first of which did indeed have more than 4000 characters. Still no error though, and the problem as listed at http://support.microsoft.com/kb/2539378 describes an error that should occur in the Event log.
However, this problem is the type of thing that is fixed by a reinitialisation (because it doesn’t need to send the procedure change across as a transaction). And a look in the change history of the long stored procs (you all keep them, right?), showed that the problem from six months earlier could well have been down to this too.
Applying SP2 (with sufficient paranoia about backups and how to get back out again if necessary) fixed the problem. The stored proc changes went through immediately after the service pack was applied, and it’s been running happily since.
The funny thing is that I didn’t solve the problem. He had put the Profiler trace on the server, and had done the search that found a forum post pointing at this particular problem. I’d asked Ted too, and although he’d given some useful information, nothing that he’d come up with had actually been the solution either.
Sometimes, asking for help is the most useful thing you can do. Often though, you don’t end up getting the help from the person you asked – the sounding board is actually what you need.
Massive thanks to all the people that have been shouting about this event already. I’ve seen quite a number of blog posts about it, and rather than listing some and missing others, please assume I’ve noticed your blog and accept my thanks.
But in case this is all news to you – the next 24 Hours of PASS event is less than a fortnight away (Sep 20/21)! And there’s lots of info about it at http://www.sqlpass.org/24hours/fall2012/
(Don’t ask why it’s “Fall 2012”. Apparently that’s what this time of year is called in at least two countries. I would call it “Spring”, personally, but do appreciate that it’s “Autumn” in the Northern Hemisphere...)
Yes, I blogged about it on the PASS blog a few weeks ago, but haven’t got around to writing about it here yet.
As always, 24HOP is going to have some amazing content. But it’s going to be pointing at the larger event, which now less than two months away. That’s right, this 24HOP is the Summit 2012 Preview event. Most of the precon speakers are going to be represented, as are half-day session presenters, quite a few of the Spotlight presenters and some of the Microsoft speakers too. When you look down the list of sessions at http://www.sqlpass.org/24hours/fall2012/SessionsbySchedule.aspx, you’ll find yourself wondering how you can fit them all in. Luckily, that’s not my problem. For me, it’s just about making sure that you can get to hear these people present, and get a taste for the amazing time that you’ll have if you can come to the Summit.
I see this 24HOP as the kind of thing that will just drive you crazy if you can’t get to the Summit. There will be so much great content, and every one of these presenters will be delivering even more than this at the Summit itself. If you tune into Jason Strate’s 24HOP session on the Plan Cache and are impressed – well, you can get to a longer session by him on that same topic at the Summit. And the same goes for all of them.
If you’re anything like me, you’ll find yourself looking at the Summit schedule, wishing you could get to several presentations for every time slot. So get yourself registered for 24HOP and help yourself make that decision.
And if you can’t go to the Summit, tune in anyway. You’ll still learn a lot, and you might just be able to help persuade someone to send you to the Summit after all (before the price goes up after Sep 30).
Four years ago, I was preparing to speak at TechEd Australia. I’d been asked to give a session on “T-SQL Tips and Tricks”, but I’d pushed back and we’d gone with “T-SQL Tips and Techniques” instead. I hadn’t wanted to show Tricks, because despite being a fan of ‘magicians’ (like Tommy Cooper) I feel like the trickery should disappear with the understanding of the technique used. This month, Mike Fal asks about Trick Shots, and I’m reminded of some of the things I do with T-SQL, and that session I gave nearly four years ago.
So I gave a talk, in which I covered 15 T-SQL Tips (probably more – I definitely threw a lot of stuff in there). They included things like the Incredible Shrinking Execution Plan, using the OUTPUT clause to return identity values on multiple rows, short-circuiting GROUP BY statements with unique indexes, and plenty more. There are a lot more things that I cover these days – you can’t exactly stay still and remain current – but still I like to maintain that there shouldn’t be trickery with T-SQL.
The common thread going through many of the tips, along with every class I teach about T-SQL, is the importance of the execution plan. That’s where you can see what’s actually going on, and hopefully it can explain some of the magic that you see. Of course, there’s more to it than that, but getting your head around the relationship between queries and plans can definitely help demystify situations.
Take recursive CTEs, for example. In the piece of code below (against old AdventureWorks, to which I added a covering index to avoid lookups), we see a sub-query used within a CTE (which is all about giving a name to the sub-query so it can be referenced later), and then in the second half of the UNION ALL statement, still within the sub-query, we see the CTE name used (where I’ve made it bold). Despite the fact that we haven’t even finished using it yet. This functionality has been around for a long time, but yet many people are not used to it, and see it as a trick.
WITH OrgChart AS
SELECT 1 AS EmployeeLevel, *
WHERE ManagerID IS NULL
SELECT o.EmployeeLevel + 1, e.*
FROM OrgChart AS o
JOIN HumanResources.Employee AS e
ON e.ManagerID = o.EmployeeID
There isn’t a trick here, and it comes down the principle of pulling sub-query definitions (including CTEs and non-indexed views) into the outer query. You see, the OrgChart here isn’t a database object, it’s simply a nested sub-query.
You might imagine that it looks a bit like this, where I’ve replaced the OrgChart reference with a copy of the query itself.
...but I’m not a big fan of this kind of representation, because it’s a bit strange to see that “WHERE ManagerID IS NULL” bit in there repeatedly. Are we really going to be getting that row out over and over again?
I’ve seen people try to demonstrate this something like:
SELECT o2.EmployeeLevel + 1 AS EmployeeLevel, e.*
SELECT o1.EmployeeLevel + 1 AS EmployeeLevel, e.*
SELECT 1 AS EmployeeLevel, *
WHERE ManagerID IS NULL
) AS o1
JOIN HumanResources.Employee AS e
ON e.ManagerID = o1.EmployeeID
) AS o2
JOIN HumanResources.Employee AS e
ON e.ManagerID = o2.EmployeeID
, but this isn’t right either, because this query is putting the data onto the end of rows, whereas we really do need a UNION ALL.
The easiest way of showing what’s going on is to look at the execution plan.
Look at the first operator called – it’s an Index Spool (over on the left). It gets its data from a Concatenation between data that comes from an Index Seek, and a join between a Table Spool and another Index Seek. This sounds all well and good, but that Table Spool is empty. There’s nothing on the spool at all.
At least, until the Concatenation operator returns that first row to the Index Spool. When this happens, the Table Spool can serve another row. The Nested Loop happily takes the row, and requests any matching rows from the Index Seek at the bottom-right of the plan, and the Concatenation operator happily passes these rows back to the Index Spool, at which point the Table Spool has more rows it can serve up again.
The Index Spool controls all this. At some point, the system has to realise that there’s no more data that’s going to be served up. The Table Spool doesn’t just sit waiting for rows to appear, nor does the spooling behaviour cause the Table Spool to suddenly get kicked off again. This is all handled because the Concatenation operator keeps getting prodded (by the Index Spool) that there’s more data that’s been pushed onto it. The Table Spool doesn’t know (or even care) if the rows it’s handed over are going to end up back on the spool – after all, it doesn’t know if those employees are also managers, it just serves up the data that appears on it, when requested.
The recursive CTE is not magic. It doesn’t do any kind of trickery. It’s just a loop that feeds its data back into itself. And of course, to understand this properly, you should make sure you know to read plans from the left, revealing the Index Spool which really runs this query.
Learn to read execution plans. There are a bunch of resources out there (such as other posts of mine, stuff by Grant Fritchey, and more), but above all, just start opening them up and seeing how your queries run. You’ll find that a lot of the ‘tricks’ you think are in T-SQL aren’t really tricks at all, it’s just about understanding how your queries are being executed.
And before you ask, I won’t be at TechEd Australia this year.