THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Tibor Karaszi

  • Upgrading from Express Edition to Standard Edition

    Say you encounter an SQL Server which is Express Edition, and it really should have been some higher edition. Sounds familiar? It is common for me as a consultant to find plenty of SQL Servers at a customer's site. Many of the databases in these will be moved (typically using backup and restore) to a "real" SQL Server. But in some cases, this might not be desirable. You want to convert the whole instance, from Express to a "real" SQL Server edition.

    I'm attending a great SharePoint course for Daniel Bugday this week, and the topic of how SharePoint is installed was discussed. Selecting "Standalone", SharePoint install will not only install all SharePoint parts on a single server, it will also install SQL Server Express Edition as the database engine. In short, SharePoint is one example where you might have an existing Express Edition which you want to make a "real" (higher) edition. There are other such scenarios, of course.

    As of SQL Server 2008, you can let SQL Server's setup program change from one edition to some other edition by just selecting "Maintenance", "Edition Upgrade" in setup and specify the product key that came with the install media for the desired edition (which you have acquired first, of course). The cool thing here is that setup basically just registers the different product key and WHAM - you have the higher edition! This is what I wanted to test, and specifically going from Express to a higher edition. Below is my experience:

    I have an Express edition which came with the SharePoint installation. It is SQL Server 2008 (not R2), with no tools (SSMS etc) installed. I want to make this a "real" Standard Edition. I have a Standard Edition 2008 installation media, with a product key.

    The database engine 
    I first ran setup from the installed SQL Server (Start, All Programs, Microsoft SQL Server 2008, Configuration Tools, SQL Server Installation Center (64-bit)). I selected "Maintenance", "Edition Upgrade" in setup and specify the product key that came with my standard Edition. This did not allow me to continue. I got an error (which is lost now) and I could not continue.
    Next try was to run setup from the installation media (my standard Edition install media) instead. This did allow me to continue. The change was very quick (a couple of minutes). I then verified using SQLCMD that I now have Standard Edition. Great! The database engine done.

    Tool-time
    You might want to have the tools (a.k.a. Shared Components) as well on the machine. I refer to things such as SQL Server Management Studio, Integration Services etc. So, I just ran setup from the installation media and installed the desired shared components. Nothing strange here. I now have Management Studio, Books Online and whatever more I might want to have.

    SQL Server Agent
    SQL Server Express doesn't come with SQL Server Agent, right? Not really. It comes with agent, but the service is disabled and you cannot start it even if you try. But what about after an edition upgrade? I could still not start Agent. However, checking SQLAGENT.OUT revealed the problem. The first thing agent does at startup is to login to SQL Server and verify that it is sysadmin. The SharePoint setup installed Agent service account as Network Service, which was added as a login in SQL Server, but the login isn't sysadmin. Now, this is in general a good thing (don't run your services with higher privileges than necessary), but in this case it causes Agent to not start. Some of the SharePoint services are configured to use Network Service, and apparently the server roles securityadmin and dbcreator is enough for SharePoint. So, this needs to be taken care of. I could use some other service account for Agent, and make sure that service account is sysadmin in SQL Server. For my test, though, it is OK to make the login in SQL Server for Network Service as sysadmin. After that was done, Agent started just as expected. And just for fun, I created a Maintennance Plan for all databases, which also worked just fine.

    What about the rest?
    Obviously, you might want to check other things, such as database options, file size, auto grow settings, maintenance for the databases etc. But that isn't strictly related to the edition change, it is just general best practices. For my case, for instance, I might want to upgrade the SQL Server from 2008 to 2008 R2, while I'm at it. Specifically for SharePoint, it seems its setup program created the database files in the Hive (a SharePoint term for a folder in Program Files, Common Files), and you probably want to move the SharePoint database files to a more suitable location. But that is outside the scope for this post, and hopefully no sweat for your dba.

    In short, my test show that is is easy to change from one edition to a different edition, even if you want to change from Express Edition.

  • Point-in-time restore of database backup?

    SQL Server 2005 added the STOPAT option for the RESTORE DATABASE command. This sounds great - we can stop at some point in time during the database backup process was running! Or? No, we can't. Here follows some tech stuff why not, and then what the option is really meant for:

    A database backup includes all used extents and also all log records that were produced while the backup process was running (possibly older as well, to handle open transactions). When you restore such a backup, SQL Server will from backup media copy the data pages to mdf and ndf file(s), log records to ldf file(s) and then do REDO (roll forward the log records) and possibly also UNDO (rollback open transactions - this is skipped if you specify NORECOVERY option). Why does it need the log records? Because you can do modifications in the database while the database backup is running. After backup process was started (and before it finishes) you can both modify pages not yet copied to backup media, but also pages already copied to backup media. So, the data pages in themselves do not present a consistent state of the database. At restore time, SQL Server uses the log records included in the database backup to "sync" the modifications that were performed while the backup process were running. So, what you end up with is what the database looked like at the end time of the backup process. I'm sure this is well documented somewhere in Books Online so I won't dive further into the tech stuff here.

    So, what is the purpose of the STOPAT option for RESTORE DATABASE?

    It is to give you an early warning, in case you try to restore a database backup which is too late for a subsequent STOPAT operation for a log restore. Here's an example:

    1. 13:00 db backup
    2. 13:10 log backup
    3. 13:20 log backup
    4. 13:30 db backup
    5. 13:40 log backup
    6. 13:50 log backup

    Say you now want to restore to 13:15. Intuitively, you say that you will restore 1, 2 and for 3 you do STOPAT 13:15. Yes, that is right.

    But say you are under stress, and perhaps even used some tool which confuses things for you? So, you end up restore 4, 5 and for 6 you try STOPAT 13:15. Clearly, you can't stopat 13:15 for backup number 6 (it only covers 13:40 to 13:50). I.e., restore of 6 gives you an error message. So, you have to re-start the restore process (1, 2 and 3), which could take a long time (if the backups are large). If you had specified STOPAT 13:15 when you restore backup 4, SQL Server would have given you an error immediately, so you wouldn't have wasted valuable time restoring unnecessary backups!

    That is the purpose of STOPAT for RESTORE DATABASE.

  • Another bugfix for sp_indexinfo

    JackMcC reported correctly at http://sqlblog.com/blogs/tibor_karaszi/archive/2009/02/18/sp-indexinfo-updated.aspx that sp_indexinfo will count rows double (or even tripple) if there are lob or row overflow allocations. Now fixed. Thanks Jack.
  • Why we never want to trick SQL Server into rebuilding a log file

    "Never" is a strong word, so let me say that we really really want to avoid it, if at all humanly possible. In short, we will have a (potentially) broken database, both at the physical level and at the logical level. Read on.

    Just to be clear, when I refer to a "log file" here, I'm talking about a transaction log file, an .ldf file. Ever so often we see in forums how log files are "lost", "disappears" or so. Now, SQL Server relies on the ldf file at startup to bring the database to a consistent state. This is known as "recovery", or "startup recovery". This is pretty well known, I for instance teach this at every intro level admin course. So, what if

    • The ldf file isn't there?
    • Or isn't accessible to the db engine?
    • Or is broken somehow?
    • Or is from a different point in time from the data file? (You'd be surprised to all the things one see over the years.)

    Well, SQL Server will do the only reasonable, refuse us into the database and produce an error message (in eventlog etc).

    What we see from time to time, is trying to "trick" SQL Server into re-creating an ldf file. So, why is this so bad? I will try to explain why. Let me first say that SQL Server doesn't do these things to be mean to us, or to prove a point. If SQL Server know that the ldf file is not necessary for recovery (the database was "cleanly shutdown"), then it can and will re-create a log file for us at startup. The problem is that it isn't these cases we see in forum. The cases we see in the forums is when this didn't happen. SQL Server relied on the ldf file in order to bring the database to a consistent state.

    Enough beating around the bush, here is an example of why we don't want to trick SQL Server to forcefully re-create a log file:

    Say you have a transaction in which you add a row to the order header table (oh) and five rows to the order details table (od). Physically, each command is reflected in the tlog, the page is modified (but still cached) and at commit, the log records are written to the ldf file. There are lots written about these algorithms, but the concepts are pretty simple. For more details, read this.

    Say that we forcefully stop SQL Server, delete the ldf file, start SQL Server, see that the database isn't accessible and somehow "trick" SQL Server into creating an ldf file for us. What is the problem? Why is this so bad? The problem is that you can have no confidence in the state of your data, both at the physical level and at the logical level. Let me start explaining what I mean by the logical level and problems at this level:

    The logical level
    By the logical level, I consider the user data. The rows in the tables, quite simply. We inserted one row in the oh table and five rows in the od table. These can be inserted into a total of two pages or 6 pages (in reality more, since each table is likely to have indexes etc, I'll touch on this when we get to the logical level). Say that three of order details rows have been written to disk, but not the order header row, and not the other two order details rows. This is just an example; you can pick and choose any combo you want. It can get just as bad as you can imagine! You might think that the pages are in the end written in the same sequence as we modified them. No, that is now how the cookie crumbles (read the article I posted link to). You can probably imagine other scenarios, closer to your data. For instance, we will not do some certain modification to a table unless some other prior modification was also performed. In essence, rebuilding a log file leave us with no confidence in the data. Are you prepared to go through your database and manually verify all the data in there? Majority of you will say "no", and it might not even be doable (data volumes, lack of information to verify etc). So, logical inconsistencies are bad. Really bad. We don't want this. Ok?

    The physical level
    This log is not only used for "user-data". It i also used for system data. Tables has indexes, where each row in a table is reflected by a row in each index. We have allocation pages stating what storage is used. We have IAM pages and PFS pages. We have linked lists. And whatnot. You probably realize that these structures also require that some modification is performed in full or not at all. (an extent allocation will be reflected in both the GAM or SGAM page and also in th extent data itself, etc). What do you think is used to protect this? Yes, you got it, the ldf file. Trick SQL Server into re-creating an ldf file and you have all sorts of physical inconsistencies. Actually, physical inconsistencies are a bit better than logical since we do have a way to check for these. I'm of course talking about the mighty DBCC CHECKDB command, a command with lots of smarts (right Paul?) to check that the db is consistent at the physical level. And what if it isn't? CHECKDB spits out errors. Sure it has repair options, but those generally mean (unless you are the type of person who wins are lotteries) that the repair will rip out whatever makes the db inconsistent resulting in data loss (which also has no knowledge of what you might consider logical consistent data).

    So, what to do?
    I hope the answer is obvious. Don't get into this situation in the first place. Don't go deleting ldf files for instance. Do take backups frequently enough so you don't end up in a situation like "I need to rebuild the log or I'm toast.". If something strange happens, don't start mucking about with the system unless you know what you are doing. Hire a specialist, call MS support or so. If I had a brand new Ferrari, I wouldn't disassemble the engine in case I hear weird noise from the it. Heck, I barely pop the hood of my Beamer!

    And no, I won't get into what commands can be used to re-build the ldf file. I expect all readers of this to not get into a situation where it is needed. :-)

    (I understand one can encounter a machine with no backups and something happens to the ldf file, and such scnearios. Of course I do. I feel really really bad every time I read about such a situation, because there is no graceful way to handle it. That is why I have such a harsh tone above. I don't want this to happen to anybody. One would hope that this is obvious, but nothing ever is on the Net. So I've learned. So, please leave such unconstructive comments out of the discussions!)

  • New util proc: sp_tableinfo

    Some of you might use my sp_indexinfo to easily get lots of details about the indexes in a database. I added an sp_tableinfo with info rolled up to table level. This allow to quickly see the largest tables in a database and things like that. Check it out at http://www.karaszi.com/SQLServer/util_sp_tableinfo.asp.
  • Why can't we have column alias in ORDER BY?

    You probably think something like "What! Of course we can!". And so did I when I read the title for a post about this in a forum. Let us first establish that you can have a column alias in ORDER BY:

    SELECT
    p.ProductID AS pid
    ,p.ListPrice AS pr
    FROM Production.Product AS p
    ORDER BY pr

    The problem was that a table name was also in the ORDER BY clause. And this is when it becomes weird. The problem isn't the column alias, it is the table alias. Basically the desire is to write something like:

    SELECT
    p.ProductID AS pid
    ,p.ListPrice AS pr
    FROM Production.Product AS p
    ORDER BY p.pr

    Above was apparently allowed in SQL Server 2005 2000 (I never did it myself, however). (Update: Erland Sommarskog caught my error, this was allowed in 2000, not 2005.) In order to understand why above is backwards, we need to consider the logical query flow. The first part of a query which is executed is the FROM clause (logically, the optimizer can do as it pleases as long as we get the same result). In above example, I renamed the table (for the rest of the query) from "Production.Product" to "p". That means that we cannot refer to "Product" or "Production.Product" anymore in the query. However, SQL has imperfections and the topic of this blog is an attempt from MS to rectify such earlier imperfections (some of them), i.e., improving the product.

    Continuing the logical query flow (WHERE, GROUP BY, HAVING), we towards the end find the SELECT clause. Here is where you can rename the columns (a.k.a. column alias) for the rest of the query. After the SELECT list, it is meaningless to talk about different tables (where the data originally comes from), all we have is the result of the query (so far). This is why, we can in the ORDER BY refer to a column alias (generally); ORDER BY comes logically after the SELECT list.

    However, the original problem was that one also wanted to include a table alias, which shouldn't be allowed. So, why is it allowed, in the first place - consider it makes no sense from a logical standpoint? For that we need to look as somethining weird, which is allowed (both in T-SQL and ANSI SQL) and that can be somewhat useful:

    SELECT
    ProductID AS pid
    ,ListPrice AS pr
    FROM Production.Product
    ORDER BY ModifiedDate

    We don't need to return something to the client app in order to sort over it. Sure, that can be useful, but from a logical perspective, this is spooky. We sort over something which doesn't exist! So, the imperfections of the SQL language is the reason things starts to break down. Basically, what we do is to sort over something as if we would have included it in the SELECT list. We are saying, in the ORDER BY clause something like

    "Hey, dbms! Can you please pretend we included Product.ModifiedDate in the SELECT list so we can sort over it. Bur don't return it to the client app, I don't want to see that column."

    Where it really breaks down is when we combine table and column alias. Let me repeat the problematic query:

    SELECT
    p.ProductID AS pid
    ,p.ListPrice AS pr
    FROM Production.Product AS p
    ORDER BY p.pr

    The problem is that it is meaningless to specify a table alias here, the column does exist in the SELECT list. The only reason we can refer to something table-something at all, is to let the DBMS pretend we included something additional from the base tables, so we could sort over it. Such as (allowed): 

    SELECT
    ProductID AS pid
    ,ListPrice AS pr
    FROM Production.Product AS p
    ORDER BY p.ModifiedDate

    One could of course argue that below would be more logical (not allowed):

    SELECT
    ProductID AS pid
    ,ListPrice AS pr
    FROM Production.Product AS p
    ORDER BY Product.ModifiedDate

    Arguing that one form of two above is more logical than the other is pretty irrelevant, houever. Both are weird constructs, and the language is what it is. Anyone feel like checking the ANSI SQL standard which of above two (first, second or both) are allowed/preferred? No me, I'm off to grab some breakfast!

     

  • Book: SQL Server DMVs in Action

    A while ago, I was offered to have an early look at the book "SQL Server DMVs in Action" (by Ian W. Stirk). This morning, I received a copy (in electronic form) from Manning Publications of the book. So I jumped straight into the book, and here are some thoughts from a quick first look:

    DMVs were introduced in SQL Server 2005, and I felt this was one of the most important new manageability features of that release. I guess I'm not alone in the feeling that I should spend more time learning these DMVs, but other things always come in between. This is why I was interested by a book focusing on this pretty narrow topic. Basically, what I want is a cookbok of DMVs I can use for various situations. Sure, we can surf the net for these things, but it is so ... fragmented. A book can, if done right, give us some context and elaborations. And, hopefully, one place to look - for various situations.

    I'm happy to say that this book seems to have this structure. The book has a couple of introduction chapters (valuable for those less familar with the DMV concept) and then we have one chapter for each important part (index, execution plans, OS, transaction etc). Basically, we have sub-topics (and one or more query) named something like "Find the most important missing indexes", "Finding the queries taking the longest time to run", etc.. You get the picture.

    So far, I only had a brief look, trying a couple of queries. I like the topic and how the book is structured. You can find out more about it at http://www.manning.com/stirk/.

  • Watch out for old stuff

    No, I'm not referring to me, I'm referring to options, config values and such which should really have been removed from the product a long time ago. I was reading a recent blog from Jonathan Kehayias and realize how much old stuff are still visible one way or the other in the product. There are of course commands which has more modern replacements (manage logins and users, attaching databases, changing SID for user etc), but keeing the old command for a few version is common sense. I'm thinking more of stuff where the functionality it gone ages, or even decades, ago, so why keep the knob for it?

    Referring to Jonathan's article, the LoginMode property has in SMO and policies four possible values.

    • "Unknown" maps to registry value 3 or higher which is clearly rubbish (somebody manually changed this to something weird).
    • "Mixed" maps to value 2 in the registry, Known as "SQL Server and Windows", or "Mixed mode".
    • "Integrated" maps to value 1 in the registry. Known as "Windows only".
    • "Normal" maps to values 0 in the registry. But what is this, you think? There are only two values, right?

    So, we have something strange exposed in SMO and policies, and it is even referred to as "normal". This is one example if heritage which should have been cleaned up a long long time ago. We have to go all the way back to SQL Server 6.5 in order to explain this.

    • "Normal" (more commonly known as "Standard" back in the days) is more like what "Mixed" is today.
    • "Mixed" meant something slightly different back then compared to what it means today. We need to recall that back then, we didn't have Windows authentication like how it is done today. We could map Windows account to SQL Server account using a tool, and this mapping could be used to do what we call today a "Windows authentication" or "trusted connection". This setting meant that SQL Server could attempt a trusted connection based on the login name you specify, even if you in the connection string didn't ask for a trusted connection. Confusing, huh? No wonder they got rid of the distinction bewteen "Normal" and "Mixed".

    As of 7.0, the only registry values used are 1 (Integrated) and 2 (Mixed). So why do SMO still expose 0 (Normal)?

    We have plenty of other such examples. Look in the registry and you can see mapping for _, # and $. These were used in the mapping tool, so you could map a Windows login, which apparently has a backslash in the name and SQL Server would translate the backslash to some other character (_ by default), allowed as a SQL Server login name. Why do these still exist in the registry?
    Or "Default login"? There no such thing as a default login in SQL Server anymore (there is something like a guest *user* but that is a different things from a login.
    Or how about setting the master database to full recovery model? That is absurd, you still can't do log backup and it still behaves like in simple recovery.

    OK, enough of the rant. Back to meaningful work.

  • Do you remember Database Gateways?

    I replied to a question in a forum today about intercepting and changing a SQL query, before it reaches SQL Server. I started typing a reply, and realized that... wow... "Does this take me on a trip down memory lane, or what!". So, just for fun, here is what I wrote (slightly modified):

    There used to be products on the market, called "Database Gateways", something that looks like one type of DBMS but actually accepts the SQL queries and submits them to some other type of DBMS. Basically an "in between" piece of software. For instance I used to work with such a product, which looked like a SQL Server but submitted the queries to IBM's mainframe DB2.

    That market pretty much disappeared around when ODBC was born. Yes, we are talking ancient here, around 1992.

    There was a leftover from this time period in SQL Server, called "Open Data Services", which was basically an API allowing you to write something that looks like a SQL Server. There was even a sample app which allowed you to intercept the SQL queries and then re-submit them to a SQL Server.

    This API was in turn transformed over time to what became known as "extended stored procedures", which in turn was deprecated with SQL Server 2005 and the SQL CLR support.

  • Ola Hallengren's maint procedures now supports exclusions

    Ola has a set of stored procedures to do maint operations, see http://ola.hallengren.com/. Ola has now updated them to support exclusions or inclusions of tables, indexes or even whole schemas from index rebuild/reorg. Check out http://ola.hallengren.com/Documentation.html#SelectingIndexes for info on how to define exclusions and http://ola.hallengren.com/Versions.html for version overview.

  • Rebuilding system databases in 2008 R2

    UPDATE: Apparently, a workaround is to use SETUP.EXE from install media (and not the locally installed SETUP.EXE). See the connect item mentioned below.

     All my attempts so far to rebuild the system databases in 2008 R2 has failed. I first tried to run setup from below path:
    C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Release

    But above turns out to be the 2008 setup program, not 2008R2 setup; even though I have no 2008 instanced installed (I have only R2 instances installed). Apparently, the 2008 setup program does a version check of the instance to be rebuilt and since it is > 10.50.0, the rebuild fails. Books Online for R2 the section about rebuilding system databases, has the above path, which obviously is incorret.

    So, this lead me to think that the path to use is infact:
    C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\SQLServer2008R2

    Above does indeed contain the R2 version of the setup program. The problem now is that it refuses to rebuild for some other reason. I opened a question in the MSDN forum, and Xiao Min Tan opened a connect entry for this: https://connect.microsoft.com/SQLServer/feedback/details/564905/rebuilding-system-databases-for-sql-server-2008-r2.

    Meanwhile, waiting for a resolution for above, I strongly recommend you to do both file-level backup as well as SQL Server backup of your system databases, from a good known clean state of them. Remember to do file-level backup of the database files, in case your instance won't start.

    (Here's a prior blog post about rebuilding system database, btw: http://sqlblog.com/blogs/tibor_karaszi/archive/2008/08/29/rebuild-master-in-sql-server-2008.aspx

  • Are non-modified FILESTREAM files excluded from DIFFERENTIAL backups?

    Short answer seems to be "yes".

    I got this from a forum post today, so I thought I'd test it out. Basically, the discussion is whether we can somehow cut down backup sizes for filestream data (assumption is that filestream data isn't modified very frequently). I've seen this a few times now, and often suggestions arises to somehow exclude the filestream data and fo file level backup of those files. But that would more or less leaves us with the problem of the "old" solution: potential inconsistency.

    So I thought that perhaps diff backups can be helpful here? I.e., perhaps SQL Server might be smart enough to in a diff backup exclude the filestream files that were already in the most prior database backup? Sure seems so, according to below script. (I'm on the road now, so scrips could be neater...):

    USE master
    GO
    IF DB_ID('fstest'IS NOT NULL DROP DATABASE fstest
    GO

    CREATE DATABASE fstest ON PRIMARY
      
    NAME fstest_data,
        
    FILENAME N'C:\DemoDatabases\DbFiles\a\fstest_data.mdf'),
    FILEGROUP FileStreamFileGroup CONTAINS FILESTREAM
      
    NAME FileStreamTestDBDocuments,
        
    FILENAME N'C:\DemoDatabases\DbFiles\a\Documents')
    LOG ON 
      
    NAME 'fstest_log'
        
    FILENAME N'C:\DemoDatabases\DbFiles\a\fstest_log.ldf');
    GO

    --Baseline:
    BACKUP DATABASE fstest TO DISK = 'C:\x\fst.bak' WITH INIT
    --4.08 MB

    BACKUP DATABASE fstest TO DISK = 'C:\x\fst_d.bak' WITH INITDIFFERENTIAL
    --3.08 MB

    CREATE TABLE fstest..t (
     
    TestId UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE
    ,Name VARCHAR (25)
    ,
    Document VARBINARY(MAXFILESTREAM);
    GO

    INSERT INTO fstest..t 
    VALUES(NEWID (), 'Hello there'CAST(REPLICATE(CAST('M' AS VARBINARY(MAX)), 1000000AS VARBINARY(MAX)));
    GO 10

    BACKUP DATABASE fstest TO DISK = 'C:\x\fst.bak' WITH INIT
    --13 MB

    BACKUP DATABASE fstest TO DISK = 'C:\x\fst_d.bak' WITH INITDIFFERENTIAL
    --3.08

    --Cleanup
    IF DB_ID('fstest'IS NOT NULL DROP DATABASE fstest

  • Mirroring: what happens if principal loses contact with both mirror and wittness?

    Imagine a database mirroring setup where you have two LANs with a WAN link in between. Let's call them site A and site B. Say that principal is currently in site A, and both mirror and witness are in site B. I.e., we are running database mirroring with a witness, and assuming we are running safety FULL (synchronous), we have auto-fail over.

    Now, what is really fail over when it comes to mirroring? the simple answer is that the mirror will finish the recovery process (UNDO) and make the database available. So, what if the mirror and witness becomes isolated from the principal? In out setup, this would happen is site A is disconnected from site B. Well, the fail-over protocol is as follows:

    If the mirror cannot contact the principal, it asks the witness if it has contact with the principal. If the witness responds and says it doesn't have contact with the principal, then they consider this a quorum. They are in majority and fail over occurs. I.e., the mirror will make the database available (and essentially becomes the principal).

    But, think about the poor principal now. It is isolated and knows nothing about what happens at the other end. The only logical thing to do is to make the database inaccessible. If that didn't happen, then we would have a case where both sites had the database available, and we wouldn't want that!

    This is nothing new, and it is clearly stated in Ron Talmage's great whitepaper on mirroring: http://technet.microsoft.com/en-us/library/cc917680.aspx.

    This is easy to test. Setup mirroring (you can even do it between three instances in the same OS). Make sure you are running safety full and have a wittness. Now you check what Windows process ID the principal has:

    SELECT SERVERPROPERTY('ProcessID')

    Based on above, you now use Task Manage to kill the *other* two SQL Servers. Make sure that SQL Server Agent isn't running for those two, else agent will start those SQL Servers again. Management Studio will now show the database as "In Recovery", and if you try to access it, for instance from a query window, you get below error:

    Msg 955, Level 14, State 1, Line 1
    Database Adventureworks2008 is enabled for Database Mirroring,
    but the database lacks quorum: the database cannot be opened.
    Check the partner and witness connections if configured.

  • Updated article "Agent Alerts Management Pack"

    I've just updated the "Agent Alerts Management Pack" found here. I realize that some don't feel confident in reading and executing T-SQL code and they instead prefer to point & click in SSMS instead. So I added two tables with my suggestion on the severity levels and error numbers to define alerts for.
  • How selective do we need to be for an index to be used?

    You know the answer already: It depends. But I often see some percentage value quoted and the point of this post is to show that there is no such percentage value.

    To get the most out of this blog post, you should understand the basic structure for an index, i.e. how the b+ tree look like. You should also understand the difference between a clustered and a non-clustered index. In essence, you should be able to visualize these structures and searches through them as you read the text. If you find that difficult, draw a few versions on a piece of paper and "navigate" through them by tracing through them with a pen or your finger. After a while, you will do this in your mind. For instance, check out the sections under this.

    I'm not saying that we shouldn't consider selectivity when designing indexes - of course we should! I'm not saying that one shouldn't have some vague "feeling" about how much data to be return when making such decisions. What I will prove is that there is in reality no set percentage that the optimizer uses. The comment we usually see is something like:

    "If we return more than 5% of the rows, then an index will not be used."

    Where did that 5% number came from? I can assure you that this is not some hard-wired number in the optimizer (except for an edge-case, see below). The optimizer aims at running the query with as low cost as possible. Considering the data access part (think WHERE clause and the condition), this is pretty much about reading as few pages as possible (few page-accesses).

    Just to cut down a bit on the thread that might follow these types of blogs ("Hey, when I do this, your observations doesn't match, your blog post is incorrect!"), let us first consider some special cases:

    Clustered index
    The clustered index *is* the data. If the search condition (SARG) is SEEKable, then SQL Server will obviously seek through a clustered index instead of scan it. Anything else would be stupid. There can be *other* search conditions that are more efficient, but we are considering one search condition at-a-time.

    Non-clustered index that covers the query
    This is pretty much the same argument as for above. Since all data is in the index ("covers the query"), not seeking it would be stupid. Again, there can be cheaper alternatives for any of the other search conditions, but we are considering one condition at-a-time.

    The value is not known to the optimizer
    This is what happens when you have a TSQL variable you compare against. Something like "colname = @v". The optimizer has no knowledge of the contents of this variable. Either it uses density (where applicable, like "="), as stored in the statistics information of the index. Where not applicable (like ">", "<", "BETWEEN" etc), then the optimizer actually do use some hard-wired percentage value. This value can change between versions so give it a spin of you want to know what value you have for your version/build number. Note that a variable is not the same thing as a parameter. SQL Server sniffs parameters (parameter sniffing). Read this for elaboration: http://msdn.microsoft.com/en-us/library/ee343986.aspx.

    The search expression is not seek-able
    I hope you know this already, but just to point it out. In most cases, having some calculation at the column side will void the ability to seek through the index. This should ideally be known to all T-SQL developers: Never do calculations at the column side! So, things to avoid are like "colname * 23 > 45" or "SOMEFUNCTION(colname) = 44".

    Hopefully by now we all understand that there are always special cases and exceptions. The more of Kalen's books you have read, the more you understand this. What we are discussing here is the typical situation. OK? Fine. So, "Why is there no percentage value that the optimizer uses?", you ask. Because the value will differ. In short, SQL Server wants to read as few pages as possible. In the most simple example, the alternative to an index seek is a table scan. So we will use this as basis for your discussion. There can be other alternatives to the table scan (using some other index for some other condition), but that doesn't change the principal "it depends" concept.

    In essence, it is all about the alternative. As I said, our example wil use a table scan as alternative. A table scan (or clustered index scan if it is a clustered table) means that SQL Server will look at every page and see what rows satisfies the search condition on each page.

    My example has two different tables, both with 100,000 rows. These tables both have an integer column with consecutive increasing unique values, which also has a non-clustered index. I will see how selective I need to be when searching on this column in order for an index search to be done, compared to a table scan. I.e, find this percentage cut-off value.

    The fewrows table only fit one row per data page. This means 100,000 data pages. My tests show that the cutoff for fewrows is about 31,000 rows. I.e., we need to be more selective than 31% for the index to be used.

    The manyrows table fit 384 rows per page. This means 260 pages. My tests show that the cutoff for fewrows is about 155 rows. I.e., we need to be more selective than 0.16% for the index to be used.

    You might end up with different exact numbers, depending on what you have in the statistics, the build number of your SQL Server etc. But what you will see that a similar pattern. A huge difference between the two.

    It is all about the alternative
    If I look at my indexes using sys.dm_db_index_physical_stats, I will see that the non-clustered index on the int column for the two tables are exactly the same (same number of pages in the index, etc). So, two indexes with the same characteristics have very different cut-off values. How can that be? It is because the alternative differs. The alternative for this example is a table scan. For the bigrows table, the alternative means reading 100,000 pages. But for the smallrows table, the alternative means reading only 260 pages. There can of course be other alternatives, like using some other index for some other search condition. This is, in the end, why we don't have a set percentage value: it is all about the alternative!

    Conclusion
    The typical cases will of course fall somewhere between my more extreme examples. But my examples show that there is no set percentage value used by the optimizer. I showed that for my test, the percentage value can be as low as 0.15% or as high as 31%. What matter is the alternative!

    T-SQL

    USE tempdb
    GO

    IF OBJECT_ID('manyrows'IS NOT NULL DROP TABLE manyrows
    IF OBJECT_ID('fewrows'IS NOT NULL DROP TABLE fewrows
    GO

    CREATE TABLE manyrows(c1 INT IDENTITY PRIMARY KEYc2 INTc3 INT)
    CREATE TABLE fewrows(c1 INT IDENTITY PRIMARY KEYc2 INTc3 CHAR(4500))
    GO

    INSERT INTO manyrows
    SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY a.OBJECT_IDAS c2AS c3
    FROM sys.columns AS asys.columns AS b

    INSERT INTO fewrows
    SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY a.OBJECT_IDAS c2'hi' AS c3
    FROM sys.columns AS asys.columns AS b

    CREATE INDEX ON manyrows (c2)
    CREATE INDEX ON fewrows (c2)

    --Number of pages etc:
    EXEC sp_indexinfo 'manyrows'
    -- Data: 265 pages (2.07 MB)
    -- Index x: 187 pages (1.46 MB)

    EXEC sp_indexinfo 'fewrows'
    -- Data: 100385 pages (784 MB)
    -- Index x: 187 pages (1.46 MB)

    SELECT OBJECT_NAME(OBJECT_ID), *, OBJECT_NAME(OBJECT_ID)
    FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'DETAILED')

    --Run below with showplan:
    SELECT FROM manyrows 
    WHERE c2 BETWEEN AND 155
    --155 = ix search, 156 = ts
    --Cut-off is 0.16%

    SELECT FROM fewrows 
    WHERE c2 BETWEEN AND 31000
    --31000 = ix search, 32000 = ts
    --Cut-off is 31%




This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement