THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Maria Zakourdaev

  • Real tortoises keep it slow and steady. How about the backups?

     

    … Four tortoises were playing in the backyard when they decided they needed hibiscus flower snacks. They pooled their money and sent the smallest tortoise out to fetch the snacks.
    Two days passed and there was no sign of the tortoise.
    "You know, she is taking a lot of time", said one of the tortoises.
    A little voice from just out side the fence said, "If you are going to talk that way about me I won't go."

    image

    Is it too much to request from the quite expensive 3rd party backup tool to be a way faster than the SQL server native backup?

    Or at least save a respectable amount of storage by producing a really smaller backup files?  By saying “really smaller”, I mean at least getting a file in half size.

    After Googling the internet in an attempt to understand what other “sql people” are using for database backups, I see that most people are using one of three tools which are the main players in SQL backup area: 

    • LiteSpeed by Quest
    • SQL Backup by Red Gate
    • SQL Safe by Idera

    The feedbacks about those tools are truly emotional and happy. However, while reading the forums and blogs I have wondered, is it possible that many are accustomed to using the above tools since SQL 2000 and 2005. 
    This can easily be understood due to the fact that a 300GB database backup for instance, using regular a SQL 2005 backup statement would have run for about 3 hours and have produced ~150GB file (depending on the content, of course). 
    Then you take a 3rd party tool which performs the same backup in 30 minutes resulting in a 30GB file leaving you speechless, you run to management persuading them to buy it due to the fact that it is definitely worth the price.
    In addition to the increased speed and disk space savings you would also get backup file encryption and virtual restore -  features that are still missing from the SQL server.

    But in case you, as well as me, don’t need these additional features and only want a tool that performs a full backup MUCH faster AND produces a far smaller backup file (like the gain you observed back in SQL 2005 days) you will be quite disappointed. SQL Server backup compression feature has totally changed the market picture.

    Medium size database.

    Take a look at the table below, check out how my SQL server 2008 R2 compares to other tools when backing up a 300GB database.

    It appears that when talking about the backup speed, SQL 2008 R2 compresses and performs backup in similar overall times as all three other tools. 3rd party tools maximum compression level takes twice longer.
    Backup file gain is not that impressive, except the highest compression levels but the price that you pay is very high cpu load and much longer time. Only SQL Safe by Idera was quite fast with it’s maximum compression level but most of the run time have used 95% cpu on the server.
    Note that I have used two types of destination storage, SATA 11 disks and FC 53 disks and, obviously, on faster storage have got my backup ready in half time.

    image image

    Looking at the above results, should we spend money, bother with another layer of complexity and software middle-man for the medium sized databases? I’m definitely not going to do so. 

    Very large database

    As a next phase of this benchmark, I have moved to a 6 terabyte database which was actually my main backup target.

    image

    Note, how multiple files usage enables the SQL Server backup operation to use parallel I/O and remarkably increases it’s speed, especially when the backup device is heavily striped. SQL Server supports a maximum of 64 backup devices for a single backup operation but the most speed is gained when using one file per CPU, in the case above 8 files for a 2 Quad CPU server. The impact of additional files is minimal. 
    However, SQLsafe doesn’t show any speed improvement between 4 files and 8 files.

    Of course, with such huge databases every half percent of the compression transforms into the noticeable numbers. Saving almost 470GB of space may turn the backup tool into quite valuable purchase. Still, the backup speed and high CPU are the variables that should be taken into the consideration.

    As for us, the backup speed is more critical than the storage and we cannot allow a production server to sustain 95% cpu for such a long time.
    Bottomline, 3rd party backup tool developers, we are waiting for some breakthrough release.

    There are a few unanswered questions, like the restore speed comparison between different tools and the impact of multiple backup files on restore operation. Stay tuned for the next benchmarks. 
     

    Benchmark server:

    • SQL Server 2008 R2 sp1
    • 2 Quad CPU
    • Database location: NetApp FC 15K Aggregate 53 discs
    • Backup destination volumes: two physical NetApps FC 15K Aggregate 53 discs, 4 files on each volume. 

    Backup statements:

    No matter how good that UI is, we need to run the backup tasks from inside of SQL Server Agent to make sure they are covered by our monitoring systems. I have used extended stored procedures (command line execution also is an option, I haven’t noticed any impact on the backup performance).

    SQL backup

    LiteSpeed

    SQL Backup

    SQL safe

    backup database <DBNAME> to

    disk= '\\<networkpath>\par1.bak' ,

    disk= '\\<networkpath>\par2.bak',

    disk= '\\<networkpath>\par3.bak'


    with format, compression

    EXECUTE master.dbo.xp_backup_database

    @database = N'<DBName>',

    @backupname= N'<DBName> full backup',

    @desc = N'Test',

    @compressionlevel=8,

    @filename= N'\\<networkpath>\par1.bak',

    @filename= N'\\<networkpath>\par2.bak',

    @filename= N'\\<networkpath>\par3.bak',
    @init = 1

    EXECUTE master.dbo.sqlbackup
    '-SQL "BACKUP DATABASE <DBNAME>

    TO DISK= ''\\<networkpath>\par1.sqb'',

    DISK= ''\\<networkpath>\par2.sqb'',

    DISK= ''\\<networkpath>\par3.sqb''

    WITH

    DISKRETRYINTERVAL = 30,
    DISKRETRYCOUNT = 10,
    COMPRESSION = 4,
    INIT"'

    EXECUTE master.dbo.xp_ss_backup

    @database = 'UCMSDB',

    @filename = '\\<networkpath>\par1.bak',

    @backuptype = 'Full',

    @compressionlevel = 4,

    @backupfile = '\\<networkpath>\par2.bak',

    @backupfile = '\\<networkpath>\par3.bak'

    If you still insist on using 3rd party tools for the backups in your production environment with maximum compression level, you will definitely need to consider limiting cpu usage which will increase the backup operation time even more:

    • RedGate : use THREADPRIORITY option ( values 0 – 6 )
    • LiteSpeed : use  @throttle ( percentage, like 70%)
    • SQL safe :  the only thing I have found was @Threads option.

     

    Yours,

    Maria

  • Is your TRY worth catching?

     

    A very useful error handling TRY/CATCH construct is widely used to catch all execution errors  that do not close the database connection. The biggest downside is that in the case of multiple errors the TRY/CATCH mechanism will only catch the last error.

    An example of this can be seen during a standard restore operation. In this example I attempt to perform a restore from a file that no longer exists. Two errors are being fired: 3201 and 3013:

    image

    Assuming that we are using the TRY and CATCH construct, the ERROR_MESSAGE() function will catch the last message only:

    image

    To workaround this problem you can prepare a temporary table that will receive the statement output. Execute the statement inside the xp_cmdshell stored procedure, connect back to the SQL Server using the command line utility sqlcmd and redirect it's output into the previously created temp table. 

    image

    After receiving the output, you will need to parse it to understand whether the statement has finished successfully or failed. It’s quite easy to accomplish as long as you know which statement was executed. In the case of generic executions you can query the output table and search for words like“Msg%Level%State%” that are usually a part of the error message.

    Furthermore, you don’t need TRY/CATCH in the above workaround, since the xp_cmdshell procedure always finishes successfully and you can decide whether to fire the RAISERROR statement or not.

     

    Yours,

    Maria

  • SQLCMD Mode: give it one more chance

     

    - Click on me. Choose me. - asked one forgotten feature when some bored DBA was purposelessly wondering through the Management Studio menu at the end of her long and busy working day.

    - Why would I use you? I have heard of no one who does. What are you for? - perplexedly wondered aged and wise DBA. At least that DBA thought she was aged and wise though each day tried to prove to her that she wasn't.

    - I know you. You are quite lazy. Why would you do additional clicks to move from window to window? From Tool to tool ? This is irritating, isn't it? I can run windows system commands, sql statements and much more from the same script, from the same query window!

    - I have all my tools that I‘m used to, I have Management Studio, Cmd, Powershell. They can do anything for me. I don’t need additional tools.

    - I promise you, you will like me. – the thing continued to whine .

    - All right, show me. – she gave up. It’s always this way, she thought sadly, - easier to agree than to explain why you don’t want.

    - Enable me and then think about anything that you always couldn’t do through the management studio and had to use other tools.

    - Ok. Google for me the list of greatest features of SQL SERVER 2012.

    - Well... I’m not sure... Think about something else.

    - Ok, here is something easy for you. I want to check if file folder exists or if file is there. Though, I can easily do this using xp_cmdshell …

    - This is easy for me. – rejoiced the feature.

    By the way, having the items of the menu talking to you usually means you should stop working and go home. Or drink coffee. Or both. Well, aged and wise dba wasn’t thinking about the weirdness of the situation at that moment.

    - After enabling me, – said unfairly forgotten feature (it was thinking of itself in such manner) – after enabling me you can use all command line commands in the same management studio query window by adding two exclamation marks !! at the beginning of the script line to denote that you want to use cmd command:

    image

    -Just keep in mind that when using this feature, you are actually running the commands ON YOUR computer and not on SQL server that query window is connected to. This is main difference from using xp_cmdshell which is executing commands on sql server itself. Bottomline, use UNC path instead of local path.

    - Look, there are much more than that. - The SQLCMD feature was getting exited.- You can get IP of your servers, create, rename and drop folders. You can see the contents of any file anywhere and even start different tools from the same query window:

    image

    Not so aged and wise DBA was getting interested: - I also want to run different scripts on different servers without changing connection of the query window.

    - Sure, sure! Another great feature that CMDmode is providing us with and giving more power to querying. Use “:” to use additional features, like :connect that allows you to change connection:

    image

    - Now imagine, you have one script where you have all your changes, like creating staging table on the DWH staging server, adding fact table to DWH itself and updating stored procedures in the server where reporting database is located.

    - Now, give me more challenges!

    - Script out a list of stored procedures into the text files.

    - You can do it easily by using command :out which will write the query results into the specified text file. The output can be the code of the stored procedure or any data. Actually this is the same as changing the query output into the file instead of the grid.

    image

    - Now, take all of the scripts and run all of them, one by one, on the different server. 

    - Easily

    - Come on... I’m sure that you can not...

    -Why not? Naturally, I can do it using :r commant which is opening a script and executing it. Look, I can also use :setvar command to define an environment variable in SQLCMD mode. Just note that you have to leave the empty string between :r commands, otherwise it’s not working although I have no idea why.

    image

    - Wow.- She was really impressed. - Ok, I’ll go to try all those…

    -Wait, wait! I know how to google the SQL SERVER features for you! This example will open chrome explorer with search results for the “SQL server 2012 top features” ( change the path to suit your PC):

    image

    “Well, this can be probably useful stuff, maybe this feature is really unfairly forgotten”, thought the DBA while going through the dark empty parking lot to her lonely car. “As someone really wise once said: “It is what we think we know that keeps us from learning. Learn, unlearn and relearn”.

  • Maintenance plans love story

    There are about 200 QA and DEV SQL Servers out there. 

    There is a maintenance plan on many of them that performs a backup of all databases and removes the backup history files.

    First of all, I must admit that I’m no big fan of maintenance plans in particular or the SSIS packages in general. 

    In this specific case, if I ever need to change anything in the way backup is performed, such as the compression feature or perform some other change, I have to open each plan one by one. This is quite a pain.

    Therefore, I have decided to replace the maintenance plans with a stored procedure that will perform exactly the same thing.  Having such a procedure will allow me to open multiple server connections and just execute an ALTER PROCEDURE whenever I need to change anything in it. There is nothing like good ole T-SQL.

    The first challenge was to remove the unneeded maintenance plans. Of course, I didn’t want to do it server by server.  I found the procedure msdb.dbo.sp_maintplan_delete_plan, but it only has a parameter for the maintenance plan id and it has no other parameters, like plan name, which would have been much more useful. Now I needed to find the table that holds all maintenance plans on the server. You would think that it would be msdb.dbo.sysdbmaintplans but, unfortunately, regardless of the number of maintenance plans on the instance, it contains just one row.

    clip_image002[8]  

    After a while I found another table: msdb.dbo.sysmaintplan_subplans. It contains the plan id that I was looking for, in the plan_id column and well as the agent’s job id which is executing the plan’s package:

    clip_image004[8]

    That was all I needed and the rest turned out to be quite easy.  Here is a script that can be executed against hundreds of servers from a multi-server query window to drop the specific maintenance plans.

    DECLARE @PlanID uniqueidentifier

     

    SELECT @PlanID = plan_id

    FROM msdb.dbo.sysmaintplan_subplans

    Where name like ‘BackupPlan%’

     

    EXECUTE msdb.dbo.sp_maintplan_delete_plan @plan_id=@PlanID

     

    The second step was to create a procedure that will perform  all of the old maintenance plan tasks: create a folder for each database, backup all databases on the server and clean up the old files. The script is below. Enjoy.

     

    ALTER PROCEDURE BackupAllDatabases

                                      @PrintMode BIT = 1

    AS

    BEGIN

     

           DECLARE @BackupLocation VARCHAR(500)

           DECLARE @PurgeAferDays INT

           DECLARE @PurgingDate VARCHAR(30)

           DECLARE @SQLCmd  VARCHAR(MAX)

           DECLARE @FileName  VARCHAR(100)

          

           SET @PurgeAferDays = -14

           SET @BackupLocation = '\\central_storage_servername\BACKUPS\'+@@servername

          

           SET @PurgingDate = CONVERT(VARCHAR(19), DATEADD (dd,@PurgeAferDays,GETDATE()),126)

          

           SET @FileName = '?_full_'+

                         + REPLACE(CONVERT(VARCHAR(19), GETDATE(),126),':','-')

                         +'.bak';

     

           SET @SQLCmd = '

                  IF ''?'' <> ''tempdb'' BEGIN

                         EXECUTE master.dbo.xp_create_subdir N'''+@BackupLocation+'\?\'' ;

     

                         BACKUP DATABASE ? TO  DISK = N'''+@BackupLocation+'\?\'+@FileName+'''

                         WITH NOFORMAT, NOINIT,  SKIP, REWIND, NOUNLOAD, COMPRESSION,  STATS = 10 ;

     

                         EXECUTE master.dbo.xp_delete_file 0,N'''+@BackupLocation+'\?\'',N''bak'',N'''+@PurgingDate+''',1;

                  END'

     

           IF @PrintMode = 1 BEGIN

                  PRINT @SQLCmd

           END

          

           EXEC sp_MSforeachdb @SQLCmd

          

    END

  • Transactional replication: are you sure your data is totally synchronized?

     

    There are those rare times, when your replication solution is working perfectly. No consistency errors, no errors at all.
    Or so it seems.

    Then, all of a sudden, you catch one application which is writing to the read-only subscriber. Next, a developer complains that some data is missing from the subscriber.

    Similarly, you always wonder, after the consistency errors have appeared and solved, whether the data is synchronized. In some specific cases, the publication can be dropped by mistake and re-created with replication continuing without reporting any errors.

    Is there a native way to make sure all data is synchronized between the publisher and the subscriber?

    Validate subscriptions using Replication Monitor

    If you open the Replication Monitor application and right click on your publication, you will notice the “Validate subscriptions…” feature.

    image

    Click on it, choose the subscription that requires validation and select the way validation will be performed.

     image

    These options are quite critical since they directly impact the speed of the validation process. Here you can choose whether the tables themselves will be queried or row number information will be extracted from the sysindexes system view.

    The default option is to compare the numbers from sysindexes and compare actual row counts only in case the differences were found.

    You can also choose to compare the checksums of the data. In this case, the validation process will take a long time.

    The result of the comparison will appear in the distribution service details:

    clip_image005

    You will see all tables that were scanned by the validation process and the row counts.

    clip_image007

    If a table is out of sync you will see the error as below

    clip_image009

     

    What’s going on behind the scenes?

    Here is the query that is being executed against each table in the database if you choose to compare actual count and the checksum:

    SELECT count_big(*), sum (convert(numeric, binary_checksum(*) ) )

    FROM <tablename>  WITH (TABLOCK HOLDLOCK)

     

    All the above is nice when you have only one publication to validate. What if you have hundreds of publications?

    As you might have expected, this UI is executing the stored procedure on the publisher. The stored procedure is called sp_publication_validation and it’s main parameters are

    • @rowcount_only

    §  1 - Perform a rowcount check only  ( default )

    §  2 - Perform a rowcount and binary checksum

    • @full_or_fast

    §  0 – Full count using count(*)

    §  1 – Fast count using sysindexes view

    §  2 – Compare the subscriber and publisher using the fast count and, if the results are not equal, uses count on table. If the rows field in sysindexes is NULL full count will be used. ( default )

     

    USE <published database>

    GO

    EXEC sp_publication_validation  @publication =  'PublicationName'

                 ,  @rowcount_only =  2

                 ,  @full_or_fast =  2

    You can execute this stored procedure over all publications on the server.

    I have put together a simple script to do this, in order to use it you need to define a linked server to the Distributor server in order to get a list of publications from the MSpublications table which is located in the Distribution database.

     

    CREATE PROCEDURE ValidateSubscriptions

                            @PrintMode  int = 0

    AS

    BEGIN

         

          DECLARE @SQLCmd  varchar(max);

         

          SET @SQLCmd= ''

     

          SELECT @SQLCmd = @SQLCmd + 'EXEC ['+publisher_db+'].dbo.sp_publication_validation 

          @publication = '''+publication+''',

          @rowcount_only =  2,

          @full_or_fast =  2;' + CHAR(10) + CHAR(13) +

          ' WAITFOR DELAY ''00:01:00'';' + CHAR(10) + CHAR(13)

          FROM DISTRIBUTOR.DistributionDB.dbo.MSpublications p

                JOIN DISTRIBUTOR.master.sys.servers s

                      ON p.publisher_id = s.server_id 

          WHERE s.name =  @@servername

          IF @PrintMode = 1 BEGIN

                PRINT  @SQLCmd

          END

         

          EXEC (@SQLCmd)

     

    END

    All the validation results (for all publications) will be inserted into the MSdistribution_history table which is located in the Distribution database on the Distributor server.

    Viewing it can be done using the following query:

     

    USE DistributionDB

    GO

     

    SELECT      time              Time,

                s.Name            PublisherServer,

                a.publication     Publication,

                s2.name           SubscriberServer,

    SUBSTRING(comments,8,CHARINDEX('''',comments,8)-8) TableName,

                comments          Comments

          FROM dbo.MSdistribution_history  h

           JOIN dbo.MSdistribution_agents   a

             ON h.agent_id = a.id

                      JOIN master.sys.servers s

                        ON a.publisher_id = s.server_id

                            JOIN master.sys.servers s2

                              ON a.subscriber_id = s2.server_id        

          WHERE comments like 'Table %'

    The best part is that if any article fails validation you will immediately see it in the error log

     

    EXEC master..xp_readerrorlog 0,1,'failed data validation'

    clip_image011

    The above means that if you have some usual routine that notifies you in case there are any unusual errors in the SQL Server Error Log, it will show validation failures as well.

     

    May all your articles pass their validation successfully.

    Yours, Maria

  • Unfairly forgotten partitioning views can help us make our partitioning tables design better

    Partitioning is a very important feature for scalable applications, it allows breaking up huge monolithic tables in order to ensure their scalability & manageability within a single instance. For example, data deletion and loading can be very fast if done by partition. Reorganizing, optimizing and rebuilding indices also can be done by partition which is much less intrusive and faster.

    Also, when talking about performance, joins become faster when using tables that are partitioned. Smaller b-trees for each partition make partition access fast, especially when limiting rows by a partition key, accessing only a subset of data. Another performance benefit can be achieved when using partition level lock escalation.

    There are two partitioning designs in the SQL Server. In a partitioned view, the member tables are horizontally joined by a view, so from the user’s perspective, the data comes from one table. A natively partitioned table, a feature introduced in SQL Server 2005, is treated as a single object by the relational engine, yet is handled as multiple objects by the storage engine.

    clip_image002

    When comparing native partitioning to a partitioned view, one of the greatest improvements is that using a partitioned table results in relatively small query plans. PVs in general had enormous query plans because each table in the view was a separate object with separate statistics. Due to the fact that natively partitioned tables are treated by the relational engine as a single object, fewer permutations are considered during the optimization phase, meaning a faster optimization time. Also, auto-parameterization doesn’t work with PVs because the rules of auto-parameterization are quite strict. 

    Adhoc SQL that uses a single partitioned table has many more opportunities to get auto-parameterized and produce a re-usable adhoc query plan.

     In PVs, tables are accessed sequentially.  When using native partitioning many operations can be done in parallel. 

    However, there are many challenges that we face when migrating from an environment that is using partitioned views to native partitioning. The usual way of upgrading such an environment is to create a new partitioned table and then to switch the data in, partition by partition. At some point in time, the old view and the new table flip flop names. During such an upgrade, from the user’s perspective, data is not available or even worse, partially available and the reports are not correct.

    After upgrade, maintenance tasks are also more cumbersome when performed against a partitioned table. Adding new indices to a huge table with thousands of millions of rows takes about 4 hours and the table is locked during this time. Same story with data type changes on columns. Also consider primary key changes when you need to drop the PK, which also takes about 3 hours and then add a new one – another 4 hours (a PK change is usually a bad thing anyway since it indicates the database design is not correct but we all know that it happens and sometimes we need to add an additional column to the PK). Also, there is no scale out for partitioned tables, meaning all partitions must reside on the same server on the same database.

    Here, surprisingly, our good old, unfairly forgotten, partitioning views can help us make our partitioning design even better.

    Consider the following scenario. You have a partitioned view with many big tables. Instead of switching the data partition by partition into a new table, you simply add the new partitioned table under the same partitioned view. No long Sch-M locks, no data partial availability and no downtime. All the new data is inserted into the new partitioned table. When the time comes, the old partitions get purged, leaving the partitioned view with only the native partitioned table underneath. If you need to support indices and other lengthy changes on metadata, you can simply add a new partition table to the view with the required metadata changes which means zero downtime to your environment. In case you need a distributed architecture across servers you also can do it using partitioned views. 

    clip_image004

    Native Partitioning is a great feature of SQL Server, but partitioned views can make it even better helping scaling out and performing maintenance tasks with zero downtime.

  • Write something awesome, no one sees it. Write something embarrassing, everyone sees it.

    Hi everyone,

    I am truly exited to start blogging here among this incredible #sqlfamily.

    My name is Maria Zakourdaev. I have more than 10 years experience with SQL Server. The last five years have been spent mostly on benchmarking different SQL Server features and flows, like replication, data import, indexes impact on DML flows, star transformations in RDBMS, Hierarchic queries and custom OLAP-like aggregations. I was a speaker in the Microsoft Teched (Israel) on the SQL Server track. I am also an active member of the Israel SQL Server Group.

    Stay tuned

    Yours, Maria

More Posts « Previous page
Privacy Statement