THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

How Many Data Files Should I Create for a User Database?

Whenever this question comes up, my typical response is to pass along the following Microsoft recommendations:

  • The number of data files within a single filegroup should equal to the number of CPU cores. See the "Physical Database Storage Design" whitepaper.
  • Lining up the number of data files with CPU’s has scalability advantages for allocation intensive workloads. See the "Storage Top 10 Best Practices" whitepaper.

Note that we are talking about a user database, not tempdb. How to configure the tempdb data files is clearly documented in the KB article 328551 and in the whitepaper "Working with tempdb in SQL Server 2005".

So far so good! But what performance behavior should we expect as we change the number of the data files of a user database? As usual, it's good to have some empirical test results to help out. Since I couldn't find any relevant empirical data, I had to conduct some tests to get to the answer myself. I used two types of workload with the following characteristics in the tests:

  • The database was about 160GB for data in the primary filegroup and 20GB for log. The data resided on the same 250GB drive regardless of the number of the data files configured for a given test.
  • The server had four processors (or cores).
  • The tables were populated with data that were generated in compliance with the TPC-C specification 5.5 for 1000 warehouses.
  • The first type of test workload--which was treated as the baseline--consisted of the five TPC-C transactions. These are your typical OLTP transactions that are relatively short and simple; they either produce small resultsets or modify/insert a small number of rows. So they can hardly be labeled as allocation intensive.
  • The second type of test workload used the same TPC-C transactions, but with each stored procedure modified to include the following T-SQL code segment:
declare @tb varchar(36),
@create_sql varchar(100),
@drop_sql varchar(100),
@insert_sql varchar(100)

select @tb = cast(newid() as varchar(36))
set @create_sql = 'CREATE TABLE [' + @tb + '] (c char(1000) not null)'
set @insert_sql = 'INSERT [' + @tb + '] values(''aaaaaaaaaaaaa'')'
set @drop_sql = 'DROP TABLE [' + @tb + ']'

EXEC (@create_sql)
EXEC (@insert_sql)
EXEC (@drop_sql)

In other words, with this second type of workload, every transaction included creating and dropping a permanent table, thus incurring some allocation activity.

The following two charts show the performance impact of changing the number of the data files in the database (with everything else being equal).

Let's first look at the chart at the top that shows the test results of the baseline workload. Note that there was little or no allocation activity in the baseline workload. The number of the data files had little to no performance impact in terms of the transaction throughput even when the database was heavily loaded. This is especially true if you take out the 64-data-file configuration which was included for curiosity. Intuitively, there is no reason to create 64 data files for a database of 160GB in size on a four-core server.

The second chart tells a dramatically different story. It presents the test results of the second type of workload which did include allocation activities. The chart shows that as the load level increased beyond 100 users, the performance behavior of the 1-data-file configuration and that of the 4-data-file configuration started to diverge. With 200 or more users, the 4-data-file configuration was able to maintain its transaction throughput, while the transaction throughput of the 1-data-file configuration started to decline sharply. For instance, with 600 users, the 4-data-file configuration could pump through as much as 50% more transactions as could the 1-data-file configuration. Remember that the test server had four processor cores.

These test results appear to be consistent with the Microsoft recommendations mentioned at the beginning of this blog.

Published Monday, January 29, 2007 11:27 AM by Linchi Shea

Attachment(s): FileNumber.gif

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Chuck Boyce said:

This is a great contribution to the SQL world, Linchi.  Thank you very much.  This will be very helpful to propose the benefits of not only having one database file per core for TEMPDB, but also for user database filegroups.

Chuck

January 29, 2007 11:31 AM
 

Tony Rogerson said:

You need to becareful, you can over cook the pot; if you have 4 CPU's that doesn't necessarily mean you have a capable IO sub-system; by creating 4 files for the database means 4 active files on your physical disk - if that disk / array isn't capable you can actually degrade performance. You can see that if you try these tests on a single drive on a dual core machine - tempdb with 2 files has contention far earlier than when it has just the 1 file. There is a balance point at which IAM contention (create/dropping objects) becomes a problems compared to where saturating your disk becomes a problem. As ever, people should test first before comitting to production; a good test tool is IOMeter.

January 31, 2007 3:06 AM
 

Linchi Shea said:

Tony;

Good point, and that goes without saying! This is just a little blog entry not a whitepaper, and I intentionally chose not to cover all the test details. This is just some generic non-proprietary tidbits, I think, of some interest to the general public while I do my real work. In practice, these tests are always conducted at multiple layers of the stack from the individual system compoents (e.g. CPU, memory, I/Os, network, etc) all the way up to SQL Server applications (e.g. benchmarks such as TPC-C).

January 31, 2007 10:00 AM
 

Linchi Shea said:

By the way, Tony:

* What might be the reason that a 2-data-file tempdb "has contention far earlier than when it has just the 1 file" with exactly the same workload?

* Note that in these tests the number of data files was the >only< variable. And the first chart shows that between 1 file and 4 files there is little to no performance differnece.

Linchi

January 31, 2007 10:12 AM
 

Denis Gobo said:

A year in review, The 21 + 1 best blog posts on SQLBlog Best posts according to me, it might have been

December 27, 2007 4:11 PM

Leave a Comment

(required) 
(required) 
Submit

About Linchi Shea

Checking out SQL Server via empirical data points

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement