THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

James Luetkehoelter

Nearly any SQL topic presented at times in a slightly eclectic manner.

Filegroups Part I - Dividing Data for Performance

It never fails to surprise me how rarely filegroups are leveraged in client installations. I'm also surprised at how many SQL Server DBAs are new to the concept of filegroups whenever I teach or give a presentation that includes them. For many of you, this may seem simplistic, but I've decided there's enough need to spend some blog space explaining the usage and extolling the benefits of filegroup usage. I find that filegroups provide many benefits, both in performance and backup/recovery, or simply just organization. For this post I'm going to focus just on performance.

 Just to cover the basics and make sure we're all on the same page (hey, I might not be - happens all the time):

 1) Filegroups are essentially logical database constructs that organize physical data files together

2) Objects are created and stored on filegroups (but not necessarily one particular file within that filegroup)

3) SQL Server (all versions thusfar) have an initial filegroup called PRIMARY. That name can not be changed. PRIMARY also must hold the .MDF file, which contains metadata about the database. It can not be removed either. Since the .MDF file holds critical information about the database itself, mixing that data with production data can be dangerous. The larger a file gets (in this case the .MDF), the more likely it is to have some sort of corruption (simply by virtue of additional read/writes to a single large file). Personally I like to isolate the .MDF and the PRIMARY filegroup altogether.

4) One filegroup can have the "Default" property, which means if you don't specify where you're creating your table, index, etc. (with the clause ON [Filegroup_name] at the end of the CREATE statement), the object will be created on that "Default" filegroup.

I often find that while there may be multiple files in a SQL database, there is usually only one filegroup (PRIMARY). For smaller databases (less than 50GB) that could be alright, but for larger databases simply using PRIMARY as the only filegroup, the potential for improved database performance is significant. While this does of course depend on how the I/O subsystem is configured (splitting up objects amongst filegroups may not improve things if everything runs on a single RAID5 array).

My personal preference for configuring initial filegroups for any system is:

PRIMARY (This can't be changed) on minimal I/O path (not much I/O needed)

DATA on a separate I/O path; DATA gets the DEFAULT filegroup property

INDEX on a separate I/O path

On PRIMARY I place no additional files other than the .MDF file (which has to be in the PRIMARY filegroup), and no additional data goes into the .MDF file. It stays very small and isn't used for user data.

On DATA I place all of my clustered indexes or (if I have any) heaps. Since this is also the Default filegroup, if I accidentally forget the "ON [DATA]" clause at the end of my object creation statements, they'll go here instead of the PRIMARY filegroup (the reason for that is for a later post).

On the INDEX filegroup I place all of my non-clustered indexes.

This of course assumes that DATA and INDEX are both stored on separate I/O paths. As long as they are, I/O resources spent retrieving data from tables won't interfere with I/O searching through indexes. I/O resources spent performing updates, inserts and deletes will be split between the tables and the indexes. Will you see double the performance? Of course not. But you should see a measurable and visible increase in performance vs storing everything on the PRIMARY filegroup.

Yes, you could have just the PRIMARY filegroup and place data files on different I/O paths, but there's no way to guarantee that one object will be in one file. My approach here is a simplistic use of filegroups, but I have yet to see it slower than having a single PRIMARY filegroup. If you wanted to get even fancier, you could have a separate filegroup for seldom accessed data and put it on a separate, slower I/O path (RAID1). You could also place commonly join tables on different filegroups, each on separate I/O paths.

The bottom line is - filegroups - use them!!!!

Published Thursday, November 08, 2007 6:46 PM by James Luetkehoelter
Filed under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Denis Gobo said:

James,

Don't you love it when people create the table on FileGroup A and the Clustered Index on FileGroup B....then the question is asked "how come FileGroup A is empty?"    :-(

November 9, 2007 9:19 AM
 

Linchi Shea said:

I'm confused. There are two issues here. One is related to having additional filegroups (i.e. in addition to the PRIMARY filegroup), and the other issue is related to having a DATA filegroup and an INDEX filegroup.

The primary reason for leaving the PRIMARY filegroup alone is for recovery purposes, not performance reasons, and there is no question that the PRIMARY filegroup should be left alone.

> Will you see double the performance? Of course not. But you should see a

> measurable and visible increase in performance vs storing everything on the

> PRIMARY filegroup.

Your question was addressed to the issue of separating pages into DATA and INDEX filegorups. But then your answer was about the issue of putting everything in the PRIMARY filegoup.

Anyway, I seriously doubt separating data and index onto their own filegroups in itself has any performance benefits as a general approach. No doubt that the approach may lead to performance advantage in specific scenarios. I guess we are really talking about probabilities. Without any info on any specific apps, my personal experience is that it's better to stripe across everything for everything you have (i.e. all your data, indexes, and so on), logs excluded. So if you have two I/O paths, create one (or more) data file on each I/O path, put them all in the same filegroup, and then create your tables and indexes on that filegroup.

Again, we are talking about which approach may cover more bases in absence of any app specifics. As such, there is probably no hard empirical evidence in favor of either. Perhaps because of this, a simpler approach is a better approach.

November 9, 2007 11:46 AM
 

James Luetkehoelter said:

Hi Linchi,

I totally agree that if you lave a large number of disks, splitting DATA and INDEX will show you probably no visible and possibly no measureable gain.

For those that don't have a large number of disks, this could still be a solution to explore. By at least splitting them, even if you start with a single I/O path, you have the ability to split the I/O at a later time. I still defend the design principle. Remember, not all of the readers of these blogs deal with SAN back-end storage, and even some who have a SAN don't have a true storage administrator. What they may think is a huge number of disks could be a very small (disk-wise) RAID 5 array.

I guess the main point I was trying to make with this post is that the DBA has tools at his disposal for organizing objects to optimize I/O. Is it always necessary? No. Is it always the easiest way of optimizing I/O? No. However, it does lie in their control, whereas the configuration of storage may not be.

I do have clients with smaller databases  (<500GB) with locally attached storage where this has made an actual visible difference in performance. For some, not so much so. You're absolutely right, it depends on the specific scenario - it always does, doesn't it? What I want to try to enstill in my clients is a good habit - and yes, protecting PRIMARY is essentially for recovery purposes as you point out.

Consider scalability to be directional - there could be a scalability in need for performance and throughput that goes up, but on the other end the availability of hardware resources to produce this throughput goes down (I see it in the public sector all the time - "do more with less" or "work smarter, not harder" - all I can say to that is 'sigh.....').

Thanks for the feedback - I did make this brief and wanted it to be more blog-post-like than article, and you bring a good point. However I do disagree with Ockham's Razor in this instance - the simpler approach isn't the better approach. I would still alway advise clients to attempt to split up their objects with performance as one of the factors in mind when doing so. Maybe I'm just old fashioned...or wary of SAN environments that aren't actually administered by someone who understands them.

November 10, 2007 10:43 AM
 

James Luetkehoelter said:

Denis,

I just about wet myself laughing when I read that - if I had a quarter for everytime someone said "No I want to separate the table from the clustered index because it should improve performance...", I'd have...well, a much better TV :)

J.

November 10, 2007 10:44 AM
 

Piroc said:

Hi,

Very good article and comments.

Anyway, everything will ever be a question of balance between, fast, cheap and evolutive, and all things depending on the queies made (esecially join, number of rows, query plan, etc...)

I do have a little question... you began to speak about SAN, but you did not go so far about it.

I do have a SAN, and i will not have another one before a while. So my database server only have one letter for it in its explorer.

My question, is more aout how is working this SAN: could this single SAN drive be assimilated as many I/O paths, due to the SAN architecture and/or speed? So how does many filegroups improves performance? And finally is it still a good idea to have a filegroup per table on this single SAN drive? (i am not talking about recovery, etc... what makes the isolation of the primary filegroup obvious)

November 11, 2007 2:33 PM
 

James Luetkehoelter said:

Hi Piroc,

I would defer to Linchi Shea on the subject of the internals of SANs. As a quick answer, it is possible to configure a SAN poorly, so that the drive letter you've been given has only a handful of drives associated with it (to put things simply), or it could have a massive number of drives, and the filegroup approach is, as I said in my reply to Linchi's comments, irrelevant. It all depends on 1) what you have for a SAN (there is a wide range of devices now wearing that label) and 2) do you have a storage administrator who *really* understands the internals of that SAN.

Check out Linchi's posts - good material there...

November 12, 2007 9:07 AM
 

Piroc said:

Hi,

Thanks for the answer, i will check all the architecture and configuration of the SAN with the company which is responsible for my servers. They are very very good.

But at least, is it safe to say that however is the SAN configuration, implementing filegroups for each table will not be a lack of performance, but might only be a gain?

I'll come back with the answer of my storage provider.

Where can i find the Linchi's post?

Thanks agan guys, you look to have hanging up a lot in this crual world...! ;-)

November 13, 2007 4:49 AM
 

Tony said:

I am not a DBA.  I have been put in charge of developing a Spatial Database Engine (SDE) middle tier in a Microsoft SQL 2005 environment.

I have been trying to understand Filegroups and so far I haven't had much luck finding what I need.

Can you recommend anything that will explain filegroups?

Thanks!!

November 13, 2007 3:32 PM
 

Piroc said:

Hi Tony,

Why not looking inside the available books online of the MSDN... there some very good article about your problem in it.

Let's start with those links:

CREATE DATABASE: http://msdn2.microsoft.com/en-us/library/ms176061.aspx

i let you follow the "sub-links" as this one: http://msdn2.microsoft.com/en-us/library/ms179316.aspx

...

Cheers

November 15, 2007 4:29 AM
 

Tony said:

I appreciate it.  Thanks!

November 15, 2007 11:23 AM
 

Tony said:

Piroc;

That answered my question.

Have a Great Day!!

November 15, 2007 11:32 AM
 

WesleyB said:

Nice article James and in general I agree with your ideas.  I have to agree with Linchi about spreading over as many LUN's a possible as opposed to separating data and indexes.  At my current customer they also used this strategy with 4 LUN's with data and 4 LUN's with non-clustered indexes.  We decided to spread everything over the 8 LUN's and the results were astonishing.  But as always it all depends of course, what is good for some environments might be bad for others :-)

November 16, 2007 1:55 PM
 

Piroc said:

I can definitely agree with wesley and Linchi about the fact that in most of the scenarios it will be better to spread everything on everything.

But in one case i would like to point out, and in fact it occures often, you will need to separate the nonclustered indexes filegroup from table and clustered index filegroup. This very non-rare case is when you want to do a partitioning for a table on a column that is not member of the primary-key constraint (when you make most of you large range queries on a DateAdded column for example). in this case you need to specify a different "static" filegroup for the NONCLUSTERED pimary key constraint. Like this you are able to keep the speed-up of the partitioning solution and still have a nonclustered index for the joins query and a high-speed query for the partition-key-column range of data search. But you have 2 differents filegroups schemes...

November 16, 2007 4:35 PM
 

Denis Gobo said:

A year in review, The 21 + 1 best blog posts on SQLBlog Best posts according to me, it might have been

December 27, 2007 4:11 PM
 

James Luetkehoelter said:

In my last post on filegroups (it seems like ages ago), I argued that separating tables and their non-clustered

February 12, 2008 12:20 PM
 

James Luetkehoelter said:

In my last post on filegroups (it seems like ages ago), I argued that separating tables and their non-clustered

February 12, 2008 12:21 PM
 

James Luetkehoelter said:

In my last post on filegroups (it seems like ages ago), I argued that separating tables and their non-clustered

February 12, 2008 12:22 PM
 

Manoj said:

Hi,

I have done performance testing on Single Filegroup (on Single Disk) and Multiple fileGroups( 4 different disks)for the same DB.

I did following in Mutltple filegroups

1.Filegoup1:- I put the Log file only.

2.FileGroup2:- I moved all the Non-Clustered indexes and some small tables.

3.I spread all other tables across the Filegoup3 and Filegroup4 by balancing the volume of tables on each filegroup and based on the usages of the table in queries (I Kept the tables on different filegroups which commonly used in joins)

There are 3 tables which are being used very frequently, I moved those tables in Filegroup2,Filegroup3,Filegroup4.

During the testing I captured the SQL Profier and Performance Monitor Counters (Related to SQL Server counters).

After comparing both the sql trace file, did not find any significant improvement in multiple filegoups. And found some queries are performing well in Single fileGroup and performing very badly in multiple filegroups in terms of duration of queries.

What could be possible reason? And which performance monitor counter will help me to identify the cause of slow execution of query in multi filegroups.

Please advice

thanks

November 23, 2008 5:53 AM
 

Manoj said:

I did this performance testing through application and did exactly same activities on both the database (DB On Single fileGroup and DB on Multiple Filegroups).

November 23, 2008 6:05 AM
 

James Luetkehoelter said:

Are the filegroups actually on separate disk arrays? And, if you see from Linchi Shea's post (and I agree), if you have a major external disk device, it is best to just use all all of your disks as one source. In those cases, separate filegroups might lag a bit with separate lookups. What I was suggesting with this post is more for smaller installations.

November 24, 2008 8:40 PM
 

Manoj said:

Thanks for your response.

we have the following RAIDs. each RAID have Separate sets of disks drives

2TB RAID-1(J:)

2TB RAID-2(F:)

SCSI RAID-1(G:)

SCSI RAID-1(D:)

thanks

manoj

November 25, 2008 8:51 AM
 

Micro kid said:

This is really cool feature, which should absolutely used in bigger databases.

April 23, 2009 7:51 AM
 

stu said:

I use this strategy on my unix Sybase servers. On SQL server with SAN fibre channel, I put user db MDF and NDF  on primary as it is a vendor database, and I do not own the schema. I assume I may have to change that as I reach 1.5 terabytes, but for several years it has performed very well. Both MDF and NDF are on primary across two LUNS,with log on a seperated LUN. The system databases are on a third LUN, with tempdb striped on a fourth LUN. I do not see any advantage with filegrups as the SAN is presented as hundreds of spindles.

February 21, 2012 6:55 PM
 

DJ said:

Can you load the data on diffrent server & switch when loading in completed in order to avoid server extensive use of server resouces

September 21, 2012 2:51 PM
 

sam said:

primary filegroup-defaualt-its used to store mdf n ldf..we can seperate mdf file into secondary file like mdf.we can create 32767 ndf files. we can use file growth option to restrict where the created file shoul go. we can move mdf to ndf and ndf to mdf....if once ndf is empty means we cant move. we delete the ndf and create another ndf in same name.. we can move the mdf n ldf files by using database shrink option.

secondary file group-we can create tables...

if we insert any records in primary and secondary and cant from primary to secondary...

How it improve performance means if we select a table in the sql server databasea 1st it will check all primary file grops and then go to further filegroups...so we use a table regularly means it will create on primary and and we use table sometimes means it will create on secondary...so that we can improve performance....

May 10, 2013 4:28 AM

Leave a Comment

(required) 
(required) 
Submit

About James Luetkehoelter

I am passionate about what I do - which is DBA, development, IT and IT business consulting. If you don't know me, haven't met me or have never heard me speak, I'm a little on the eccentric side. One attendee recently described me as being "over the top". Yup, that about says it - because I only speak on topics that I'm passionate about.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement