Josh Jones, all around good guy and author of an upcoming book on database design, presented a rousing talk on Columnar Databases at last night’s Colorado Springs’ SQL Server User Group. There are several commercial and open source columnar databases available; SQL Server is not one of them.
SQL Server, like most relational databases, is a row-based data store. Data is written in the pages as rows,
1, Paul, Nielsen; 2, Josh, Jones; 3, Kalen, Delaney.
A columnar database writes data to the page as columns,
1, 2, 3; Paul, Josh, Kalen; Nielsen, Jones, Delaney.
A storage unit (page, or file) is assigned to a each column. This makes writing or reading every column of a single row from the middle of 50 million rows difficult and slow, so columnar database are contraindicated (cool big word) for OLTP databases.
OTOH, reading one of two columns from every row is screamlingly fast using a columnar database, so this storage style is perfect for OLAP databases. Some columnar databases claim they are 50x faster than a row-based database at these types of queries.
The difference between row-based and columnar databases is only in the storage engine layer. The query engine layer can still use straight SQL and then pass the optimized query to the storage engine, which writes to / reads from either row-based or columnar based tables.
Maybe you can see where I’m going with this thread. Currently, Microsoft SQL Server only supports row-based storage. But, it’s very conceivable that a future storage engine could write tables as either row-based or columnar storage structures. This would grant a huge performance advantage to SQL Server in OLAP applications. Consider this my public vote for a CREATE TABLE COLUMNAR (col1…) option.
For more info on columnar databases, there's a good page on Wikipedia. Thanks again Josh for your great presentation.