In one of their year 2000 papers—Rules of Thumb in Data Engineering, Jim Gray and Prashant Shenoy stated that, “Over the last decade, disk pages have grown from 2KB to 8KB and poised to grow again. In ten years, the typical small transfer unit will probably be 64KB, and large transfer units will be a megabyte or more.”
Should we expect the page size of SQL Server to grow? Or is the 8KB page size still optimal for SQL Server? I remember discussing this very question with someone in the know about a year ago, and was told that the suggestion of using a larger page size in SQL Server was more BS than substance. The argument was essentially that SQL Server rarely incurs 8KB I/Os, and it always tries to ‘batch up’ I/Os into large transfers when it does such operations as read-ahead reads, checkpoints, lazy writes, and so on.
I’m not here to argue for or against using a single fixed 8KB page size in SQL Server. I’m simply not in a position to do justice to the topic. I’m sure Microsoft has been experimenting with different page sizes. At least, I sure hope so.
I just want to highlight some observations related to this topic.
First, note that SQL Server’s choice of a single fixed 8KB page is different from Oracle and DB2, which allow multiple page/block sizes. A single fixed page size can obviously lead to simplified code, but at the same time has the potential to introduce this fixed page size assumption throughout the code, making it difficult to accommodate a different page size should that becomes necessary.
Although the fact that it uses 8KB pages doesn’t necessarily mean that SQL Server would perform a lot of 8KB I/Os, it is not that rare to see SQL Server does 8KB I/Os in real world applications. This is especially true when there is fragmentation within a database.
Conceptually, it does seem that the larger the page size, the more row slots you can have on a page, and therefore fewer levels you may have for your indexes. Potentially, this can lead to fewer page splits and fewer physical I/Os. On the flip side, a larger page size may have a higher chance for running into unwanted concurrency issues, though this shouldn’t really be a problem with row-level locking. The other argument against a large page size is that if your workload typically requests a small chunk of data, getting a 32KB or even 64KB page into the buffer pool only to use a few short rows could be wasteful.
The introduction of database compression in SQL Server 2008—its dictionary compression in particular—seems to give some legitimacy to the question of whether the 8KB page starts to show signs of aging. For instance, with dictionary compression, it may work better with larger pages. Intuitively, the larger the page size, the more likely you may see repeating data patterns and the less overhead space the compression information structure may take. Oracle uses a similar block-level compression scheme and research and experience on the Oracle compression has shown that generally larger blocks tend to produce better compression ratios.
Now if you check the Oracle literature, you would find some rather heated debate on whether there is any value in using multiple block sizes in a real world app. Most agree that using a larger block size in data warehousing applications tend to result in better performance. But some have argued that the performance gain may not be worth the added hassle in managing multiple block sizes, and people may be better off just staying with the default block size, which happens to be 8KB.
So this is not a settled topic. It remains interesting to see whether Microsoft will change the page size or introduce multiple page sizes post SQL Server 2008.