Whenever this question comes up, my typical response is to pass along the following Microsoft recommendations:
- The number of data files within a single filegroup should equal to the number of CPU cores. See the "Physical Database Storage Design" whitepaper.
- Lining up the number of data files with CPU’s has scalability advantages for allocation intensive workloads. See the "Storage Top 10 Best Practices" whitepaper.
Note that we are talking about a user database, not tempdb. How to configure the tempdb data files is clearly documented in the KB article 328551 and in the whitepaper "Working with tempdb in SQL Server 2005".
So far so good! But what performance behavior should we expect as we change the number of the data files of a user database? As usual, it's good to have some empirical test results to help out. Since I couldn't find any relevant empirical data, I had to conduct some tests to get to the answer myself. I used two types of workload with the following characteristics in the tests:
- The database was about 160GB for data in the primary filegroup and 20GB for log. The data resided on the same 250GB drive regardless of the number of the data files configured for a given test.
- The server had four processors (or cores).
- The tables were populated with data that were generated in compliance with the TPC-C specification 5.5 for 1000 warehouses.
- The first type of test workload--which was treated as the baseline--consisted of the five TPC-C transactions. These are your typical OLTP transactions that are relatively short and simple; they either produce small resultsets or modify/insert a small number of rows. So they can hardly be labeled as allocation intensive.
- The second type of test workload used the same TPC-C transactions, but with each stored procedure modified to include the following T-SQL code segment:
declare @tb varchar(36),
select @tb = cast(newid() as varchar(36))
set @create_sql = 'CREATE TABLE [' + @tb + '] (c char(1000) not null)'
set @insert_sql = 'INSERT [' + @tb + '] values(''aaaaaaaaaaaaa'')'
set @drop_sql = 'DROP TABLE [' + @tb + ']'
In other words, with this second type of workload, every transaction included creating and dropping a permanent table, thus incurring some allocation activity.
The following two charts show the performance impact of changing the number of the data files in the database (with everything else being equal).
Let's first look at the chart at the top that shows the test results of the baseline workload. Note that there was little or no allocation activity in the baseline workload. The number of the data files had little to no performance impact in terms of the transaction throughput even when the database was heavily loaded. This is especially true if you take out the 64-data-file configuration which was included for curiosity. Intuitively, there is no reason to create 64 data files for a database of 160GB in size on a four-core server.
The second chart tells a dramatically different story. It presents the test results of the second type of workload which did include allocation activities. The chart shows that as the load level increased beyond 100 users, the performance behavior of the 1-data-file configuration and that of the 4-data-file configuration started to diverge. With 200 or more users, the 4-data-file configuration was able to maintain its transaction throughput, while the transaction throughput of the 1-data-file configuration started to decline sharply. For instance, with 600 users, the 4-data-file configuration could pump through as much as 50% more transactions as could the 1-data-file configuration. Remember that the test server had four processor cores.
These test results appear to be consistent with the Microsoft recommendations mentioned at the beginning of this blog.