THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Tibor Karaszi

Does BACKUP utilize pages in cache?

I'm having a conversation with Tony Rogerson (see his great blog at: http://sqlblogcasts.com/blogs/tonyrogerson/) about whether or not BACKUP will grab paged from cache (Buffer Pool, BP) or always read them from disk.

It seems natural to grab the pages in BP to eliminate some physical I/O. But thinking a bit more, I realized that the data in BP might not be organized in a suitable way. As far as I know, there's no overall extent structure (map) for the BP. And BACKUP is done at extent basis (extents doesn't have to be sequental on backup media, though). So, how would SQL Server orchestrate this? It need to lay each extent in backup media, but reading BP, it will get something like page 5 from extent 78, then page 2 from page 456, then page 3 from page 8964, then page 2 from extent 78.

Anyhow, seeing is believeing, so I created a repro for this (see code at end of this blog).

My conclusion is that backup does not read pages from BP, it will read all pages (extents) from disk. My test showed no different in lapse time whether or not data is in cache and same for I/O.

The conversation is still ongoing, and anyone is free to point out flaws in the script or different findings.

USE master
GO
IF DB_ID('testBpBackup') IS NOT NULL DROP DATABASE testBpBackup
GO
CREATE DATABASE testBpBackup
ON  PRIMARY
(NAME = N'testBpBackup_Data'
,FILENAME = N'C:\testBpBackup_Data.mdf'
,SIZE = 1GB
,MAXSIZE = UNLIMITED
,FILEGROWTH = 50MB )
 LOG ON
(NAME = N'testBpBackup_Log'
,FILENAME = N'C:\testBpBackup_Log.ldf'
,SIZE = 100MB
,MAXSIZE = UNLIMITED
,FILEGROWTH = 10MB)
GO
USE testBpBackup
GO
IF OBJECT_ID('t') IS NOT NULL DROP TABLE t
CREATE TABLE t (c1 int identity, c2 char(500))
INSERT INTO t
 SELECT TOP 1000000 'Hello' FROM syscolumns a, syscolumns b, syscolumns c
EXEC sp_spaceused 't' --Approx 500MB
GO


--Execute below in one batch

--support variables
DECLARE @t datetime, @io_before bigint, @crap int

--Backup with data in bp
SET @crap = (SELECT COUNT(*) FROM t)
SET @t = CURRENT_TIMESTAMP
SET @io_before = (SELECT num_of_reads FROM sys.dm_io_virtual_file_stats( DB_ID(), 1))

BACKUP DATABASE testBpBackup TO DISK = 'nul'
SELECT
 DATEDIFF(s, @t, CURRENT_TIMESTAMP) AS "seconds with data in BP",
 (SELECT num_of_reads FROM sys.dm_io_virtual_file_stats( DB_ID(), 1)) - @io_before AS "I/O with data in BP"


--Backup with data not in bp
DBCC DROPCLEANBUFFERS
SET @t = CURRENT_TIMESTAMP
SET @io_before = (SELECT num_of_reads FROM sys.dm_io_virtual_file_stats( DB_ID(), 1))

BACKUP DATABASE testBpBackup TO DISK = 'nul'
SELECT
 DATEDIFF(s, @t, CURRENT_TIMESTAMP) AS "seconds with data NOT in BP",
 (SELECT num_of_reads FROM sys.dm_io_virtual_file_stats( DB_ID(), 1)) - @io_before AS "I/O with data NOT in BP"

 

Published Friday, August 03, 2007 1:11 PM by TiborKaraszi
Filed under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Tony Rogerson said:

Hi Tibor,

Makes sense when you think about it, like you say the buffer pool is page orientated; also - if the backup used the buffer pool then it would wreck the cache and effect application performance; better it having it's own buffers and reading from disk.

Probably the reason the transaciton log needs to be backed up with the .BAK as well so it can make all the pages that weren't written because of lazy writing (dirty pages in cache) consistent.

Tony.

August 3, 2007 10:04 AM
 

TiborKaraszi said:

Hey Tony,

Regarding log records, you might recall the old architecture (6.5 and earlier). Algorithm was "start from page 0, and write each page. If somebody want to modify a page, backup need to jump to that extent and back it up before modification can be done". This would give a set of pages that are consistent (snapshot) as of beginning of backup. As of 7.0, backup ignores modification during backup, and the log records picked up are used by restore to re-apply those modification to get a snapshot as of end of backup period. (Old news to you, Tony, I know. This is more for the benefit if somebody else is reading this...)

As for backup reading pages from cache, a plausible algorithm could be: "Pick up the pages in cache from the cache. The rest of the pages you need to grab from disk, without caching them." But the page orientation of the cache will probably make this very impractical...

Unless...

Imagine a database data file. SQL Server would for each extent (extent # x, then y, then z, ...) check if *every* page (page 0, page 1, page 2, ...) is in cache and if so grab the pages from cache and organize it into an extent so it can be written to backup media. I have a feeling that this would be more work than would typically be gained.

August 3, 2007 11:39 AM
 

Santosh said:

Hi,

I think backup does read pages from BP. If you run DBCC MEMORYSTATUS command, the [Hashed] column which gives the count of data pages in memory increases while/after the BACKUP DATABASE command is executed.  

Even with the query you have posted in this article I see difference in I/O of both the BACKUP commands. However, when the DBCC DROPCLEANBUFFERS command is commented the I/O result is same.

Thanks,

Santy

August 6, 2007 10:04 AM
 

TiborKaraszi said:

Here's what I did:

DBCC MEMORYUSAGE, Buffer Counts, Hashed:

SELECT COUNT(*) to make sure "table is in cache": 66991

DBCC DROPCLEANBUFFERS: 247

BACKUP: 339

SELECT COUNT(*) to make sure "table is in cache": 67077

Hashed is raised a *little* bit by backup, probably some meta-data retrieval, but not nearly as much as the by the SELECT.

I also checked perfmon, PhysicalDisk, Disk Reads/sec. I see the same spike for the backup regardless of whether the data is in cache or not when I dod the backup. Just to be certain, I do get a spike when I do the SELECT and data isn't in cache, but I see no activity all all for the SELECT when data is in cache.

One funny thing, though, is that if I execute the script, everything at once, the first backup actually takes *longer* time (14 seconds vs 5 seconds. If I after this execute only the last batch, I get the same execution time for the two. This laso happens if I wait a while between the first and the second part. Perhaps there's som initialization needed to be done the first time you do a backup for a database...

August 6, 2007 12:25 PM
 

Tibor Karaszi said:

Having a few moments to spare, I decided to give this a spin. Specifying for the backup to be compressed

December 12, 2007 11:18 AM

Leave a Comment

(required) 
(required) 
Submit
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement