THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

Flash Memory and Databases

Right now, flash memory-based solid state drives are still more expensive than traditional disk drives in terms of the cost per gigabyte. But flash-based drives or some type of hybrid that combines both flash and traditional disk drives seem to be coming.

There have been a lot of talks about the potential impact of flash memory in general. But I have always wondered what kind of impact these flash-based drives may have on the database systems in particular. Note that in all the major commercial database management systems (I deleted the word relational between commercial and database in this sentence :-), a significant chunk of code and a significant amount of investment is devoted to trading many small random I/Os for fewer large sequential I/Os. So intuitively, flash memory with its different I/O performance characteristics could have a huge potential impact on this trade-off at least.

Recently, Goetz Graefe wrote an update on Gray and Putzolo's original five-minute rule. Graefe's paper is titled, "The Five-minute rule twenty years later, and how flash memory changes the rules".

What most intrigued me in this paper is not the impact on the five-minute rule itself, but rather the potential impact on the database system architecture. Overall, Graefe suggests that database systems use flash memory as persistent storage instead of transient memory. In addition, he argues that using flash memory as part of persistent storage will have an impact on database checkpoint performance, selection of page sizes, buffer management, space allocation, and some of the B-tree logic.


Published Friday, January 04, 2008 9:59 PM by Linchi Shea
Filed under: ,



Adam Machanic said:

If the path to a solid state I/O system were fast enough, would checkpoints even be required anymore?  I wonder if we can look forward to systems where the idea of buffer cache no longer exists?

January 4, 2008 10:31 PM

Chuck Boyce said:

I'm not worthy.

Dude - love the stuff you turn us on to!  This is truly superb material.


January 4, 2008 11:06 PM

jchang said:

Computer system architecture has become seriously out of balance without a fundamental adjustment for too many years.

It used to be we had main memory and a page file for virtual memory because system memory was so expensive.

today, that we still have a page file is one of the totally stupid anachronisms, if you have to use the page file, then you system is running in such a crippled mode.

(we did add sram cache, originally off die, now on die for most cpu)

Today, systems have so much dram memory, most of it is used as disk cache, either by the OS, or by the database engine for certain applications

It is high time we ask the question, if we designing a brand new system architect + OS today looking forward, what would we do?

today, we have SRAM (12-24MB on die possible), dram, a nonvolatile flash.

on-die SRAM cache is still probably the best idea. there are serious process issues with putting DRAM on-die.

the biggest issue with DRAM main memory is the long latency and turn around times, ie, reading a word, to get the next address to be read, and also the read after write or write after read delays. AMD does mitigate this with integrated memory controller, but ...

My feeling is system memory, should revert to the original use, frequently used binaries, and key program data structures, the PTE and non paged pool for the OS, the lock structures for the SQL binary, stuff that really requires very fast turn around.

The implementation could be SRAM or a DRAM chip mounted on top of the CPU, with through chip connections to the CPU, for very low latency and incredible bandwidth.

rather than cache data in system memory, we would now implement a DRAM solid state storage device. if this sounds vaguely familar to the old foggies, i think Cray did this, so much for my original ideas

now a page file for virtual memory does make sense,

I would have a separate non-volatile flash storage, because NV cannot sustain unlimited write cycles, so dram should be used for write intensive, and less expensive NV should be used for more stable data and less frequently accessed stuff.

we would still use ordinary disk drives for archival, and even less frequently accessed data, the nv storage needs to be architected to reduce random IO load to disk storage

since NV does care about random or sequential IO, atleast to the degree HDDs are, there is not a super critical need for the lazy writer, but we already have it, so anyways, check points will not be noticed anymore

January 5, 2008 12:02 AM

Linchi Shea said:

> The implementation could be SRAM or a DRAM chip mounted on top of the

> CPU, with through chip connections to the CPU, for very low latency

> and incredible bandwidth.

This sounds like Intel's proposed 3D stacked die MCP, which is supposed to have a memory bandwidth of in excess of 1TB/sec, if sucessfully implemented. This architecture is said to have the shortest possible interconnect between the CPU and memory die, and the interconnect density can scale to enable thousands of die-to-die interconnects, therefore the mind-boggling bandwidth.

January 5, 2008 12:31 AM

David Markle said:


I doubt it, at least with any technology on the horizon.  In order to make what you're saying a reality, not only do you have to have a blazingly fast persistent storage mechanism, but you need to have an absolutely disgusting amount of bandwidth to it.  Linchi, correct me if I'm wrong, but it would be possible to find a page from the buffer pool as close to the CPU as the L2 cache, right?  To get the equivalent performance of a buffer-pool based storage mechanism by going directly to I/O would be a stunning advance in system architecture.  We can dream...

What I see down the road is the idea of seek times getting so low that fragmentation becomes close to meaningless.  If write performance were better on flash-based disks, I would expect everybody and their brother to use it for TempDB and transaction logs, but sadly, write performance still stinks for most flash based disks.  Right now what you need is an application that requires a lot of random reads to justify flash-based disks in a database.  I expect this whole state of affairs to get significantly better in the next two years.  Here's my first prognostication of 2008: It will take another two to three years for flash to really find its place in the database, mostly because major vendors are going to be slow to integrate it into their servers and storage infrastructure.  It'll be another two to three years on top of that for flash to become mainstream on the data layer.

January 5, 2008 8:11 AM

Brent O. said:

When you combine SSD's with a SAN that does automated storage tiering, especially at the block level, then suddenly write speeds don't matter anymore.  I'm thinking particularly of the Compellent Storage Center, which does block-level tiering.  All writes are done on the fastest tier of storage, and then they're gradually migrated down (behind the scenes, by the SAN itself) to slower tiers of storage like FC and SATA.  As a result, a single LUN could have data on any tier, but all writes are fast.

With a setup like that, SSD's make a lot of sense.  I could see our nightly ETL loads blazing because yesterday's sales data is written straight to fast SSD, the morning reports hit that same SSD, and then the data is gradually moved off to slower storage without expensive ETL processes.

January 5, 2008 10:37 AM

Kalen Delaney said:

Most of you who read my writings know that I don't often write about hardware. And in fact, I'm not writing

January 5, 2008 6:27 PM
New Comments to this post are disabled

About Linchi Shea

Checking out SQL Server via empirical data points

This Blog


Privacy Statement