“My database is in the simple recovery mode. No, I don’t have any large, long-running, or replication transactions, and yes, I have enough number of VLFs to go around, about 250 of them to be precise. Why does my transaction log keep autogrowing?”
As documented in SQL Server Books Online, when a database is the simple recovery mode, its transaction log is truncated (i.e. all the virtual log files that do not contain any active transaction record are marked for reuse) whenever the checkpoint thread is run. SQL server wakes up the checkpoint process when, among other things, (1) there are enough log records that the recovery interval threshold is crossed, (2) you manually run the checkpoint command, or (3) the transaction log gets 70% full.
Even without any large, long-running, or replication transactions, it is quite possible for a database in the simple recovery mode to run out of the log space, or cause the log to autogrow if it’s enabled, when the checkpoint process can’t be woken up fast enough to truncate the log in time.
For instance, even though there are not enough changes to cross the recovery interval threshold, but if the changes after the log is 70% used are fast, SQL Server may not have enough time to run a checkpoint and complete the log truncation before the log space is fully used up. In a way, this is a race condition in that whether the log space will be used up (or autogrow) depends on which one gets there first:
- If SQL Server can truncate the log first, the log space will be freed up and reused, and there will be no autogrow or failed transaction,
- But it is also possible that the data changes are fast enough to use up the log space before the log is truncated.
Let’s illustrate this behavior with a script. First run the following setup script to create a test table and populate some data in a test database (1) that is already set to the simple recovery mode, (2) the log file size is 100MB, (3) and the log file autogrow is disabled (it's just easier to see the out-of-log-space error than check for autogrowth):
set nocount on
go
drop table test
go
create table test(i int, c char(1000))
go
declare @i int
set @i = 1
while @i < 10000
begin
insert test values(@i, 'abc')
set @i = @i + 1
end
To see the race condition, run the following script;
set nocount on
go
declare @change_size int
set @change_size = 100 -- adjust this value
declare @i int
set @i = 1
while @i < 100
begin
if @i % 2 = 0
update test set c = replicate('a', @change_size)
else
update test set c = replicate('b', @change_size)
select @i = @i + 1
end
You may need to adjust the value of variable @change_size several times. Start a small value (say 40) that enables the script to run to completion, then pick a large value (say 400) that causes the script to fail. And work towards to the middle. In some of my tests, when I set @change_size 100, I can run the above script to completion without running out of the log space. But if I change it to 120, I get an Msg 9002 error telling me that the transaction log for the database is full.
When I set @change_size to 116, sometimes the script will complete, and other times it'll fail. But this is a rough threshold, it’s not precise or stable across tests or environments. For instance, in my other tests, the threhold can as low as 50. The point is that sometimes SQL Server can truncate the log fast enough, and other times it can't.
Note that even though each of the two UPDATE statements modifies 9999 rows, the amount of the log space used by the modification is modest at about 2~3MB or 2~3% of the total log space. They are not tiny transactions, but they are not large, long-running transactions either.
What are the implications for databases that are in the simple recovery mode?
First of all, it’s good to leave ample space in the transaction log. This may give SQL Server a chance to wake up the checkpoint process to truncate the log in time. Secondly, you may have to issue BACKUP LOG WITH NO_LOG manually to help it out.
Note that in SQL Server 2008, BACKUP LOG WITH NO_LOG and BACKUP LOG WITH TRUNCATE_ONLY are removed. If you must help truncate the log, you probably would have to resort to issuing a checkpoint which does much more than just truncating the log.