Long gone are the days when you can reasonably expect to know all aspects of SQL Server at a deep level. When a new version starts to emerge, I try to put some scope around the features that I will dive more deeply into. I'm publishing my list here so that like-minded SQL Server enthusiasts may become aware of a feature or two that was lost in that one-page Powerpoint slide you see in most of the presentations on SQL Server 2008.
Note: I put together this list during the summer as an roadmap for personal testing with the CTP releases. In December 2007, a whitepaper, "An Introduction to New Data Warehouse Scalability Features in SQL Server 2008", was released that includes a short writeup on each of these features, as well as some additional improvements to SSAS, SSIS and SSRS. If you are interested in the types of features I mention below, I highly recommend you check out this whitepaper.
My focus is on Business Intelligence and performance (particularly in terms of relational data warehouse performance). With that said and in no particular order, here is my list of top 10 features in SQL Server 2008, and why. I plan to post more detail about several of these in the future:
- MERGE command - this is a new TSQL command that will allow you to combine an Insert with an Update command, sometimes referred to as an UPSERT. This is particularly useful in loading a data warehouse. You will no longer have to test for a row's existence in order to take one of two paths (INSERT or UPDATE, or even DELETE).
- Star Join - this could have a tremendous impact on queries in a relational data warehouse. Data warehouse queries are characteristically performance hogs. Since a large percentage of data is usually selected, the query optimizer often cannot take advantage of indexes like it can with more selective OLTP queries.
- Change Data Capture (CDC) - this feature can be used in SQL Server 2008 data sources to automatically track changes in data that require a row to be re-sent to a data warehouse. This makes ETL from a SQL Server 2008 data source far more efficient and straight-forward.
- INSERT INTO - no structural changes have been made to the statement, but minimal logging can be implement, much like BULK INSERT or SELECT..INTO, under the right conditions.
- Lookups in SSIS - though SSIS lookups were incredibly enhanced from DTS (where they were virtually unusable), the performance of this task has been tuned to improve performance and minimize resource utilization. In lieu of blogging about this later, I'll refer to a blog post by Jamie Thomson who explains this feature in detail and far better than I could.
- Data compression - this is potentially huge! First of all, compression is an option, so if if the CPU hit is an issue you can choose not to implement compression. Data compression means more data on fewer pages, which has a domino effect on performance (more pages in memory, better memory utilization, and improved page life expectancy).
- Partitioning enhancements - there's an issue in SQL Server 2005 partitioning parallelism that affects queries on a few number of partitions but greater than 1. If only one partition is queried, intra-partition parallelism is implemented effectively. If greater than one, a single thread is used to process each partition, which results in under-parallelism with queries on just a few partitions. This has been addressed in SQL Server 2008 (future post).
- Resource Governor - if you haven't seen the demo on this, you should. You can implement resource governing rules that affect inflight queries. One scenario I see for this is in environments where ETL occurs simultaneously with data warehouse queries, such as in a real-time environment.
- Backup compression - this will dramatically affect backup times for large data warehouses. This capability is currently available through third party vendors but some DBAs are currently unable to leverage these solutions due to company standards or budget constraints.
- Partition-aligned indexed views (IVs) - this allows for the use of IVs on fact tables. Indexed views on partitioned tables is virtually unusable in SQL Server 2005 because of the requirement to drop and recreate dependent IVs whenever a fact table partition SWITCH is made. I'll explain why I think this is such a big deal in a future post.