<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Buck Woody : Developer, Microsoft</title><link>http://sqlblog.com/blogs/buck_woody/archive/tags/Developer/Microsoft/default.aspx</link><description>Tags: Developer, Microsoft</description><dc:language>en</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>Big Data - A Microsoft Tools Approach</title><link>http://sqlblog.com/blogs/buck_woody/archive/2012/02/20/big-data-a-microsoft-tools-approach.aspx</link><pubDate>Mon, 20 Feb 2012 21:16:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:41832</guid><dc:creator>BuckWoody</dc:creator><slash:comments>1</slash:comments><comments>http://sqlblog.com/blogs/buck_woody/comments/41832.aspx</comments><wfw:commentRss>http://sqlblog.com/blogs/buck_woody/commentrss.aspx?PostID=41832</wfw:commentRss><description>&lt;p&gt;&lt;em&gt;&lt;span style="color:#c0504d;"&gt;(As with all of these types of posts, check the date of the latest update I&amp;rsquo;ve made here. Anything older than 6 months is probably out of date, given the speed with which we release new features into Windows and SQL Azure)&lt;/span&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t normally like to discuss things in terms of tools. I find that whenever you start with a given tool (or even a tool stack) it&amp;rsquo;s too easy to fit the problem to the tool(s), rather than the other way around as it should be.&lt;/p&gt;
&lt;p&gt;That being said, it&amp;rsquo;s often useful to have an example to work through to better understand a concept. But like many ideas in Computer Science, &amp;ldquo;Big Data&amp;rdquo; is too broad a term in use to show a single example that brings out the multiple processes, use-cases and patterns you can use it for.&lt;/p&gt;
&lt;p&gt;So we turn to a description of the tools you can use to analyze large data sets. &amp;ldquo;Big Data&amp;rdquo; is a term used lately to describe data sets that have the &amp;ldquo;&lt;a href="http://radar.oreilly.com/2012/01/what-is-big-data.html" target="_blank"&gt;Four V&amp;rsquo;s&lt;/a&gt;&amp;rdquo;&amp;nbsp; as a characteristic, but I have a simpler definition I like to use:&lt;/p&gt;
&lt;p align="center"&gt;&lt;em&gt;&lt;span style="color:#0000ff;font-size:small;"&gt;Big Data involves a data set too large to process in a reasonable period of time&lt;/span&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I realize that&amp;rsquo;s a bit broad, but in my mind it answers the question and is fairly future-proof. The general idea is that you want to analyze some data, and using whatever current methods, storage, compute and so on that you have at hand it doesn&amp;rsquo;t allow you to finish processing it in a time period that you are comfortable with. I&amp;rsquo;ll explain some new tools you can use for this processing.&lt;/p&gt;
&lt;p&gt;Yes, this post is Microsoft-centric. There are probably posts from other vendors and open-source that cover this process in the way they best see fit. And of course you can always &amp;ldquo;mix and match&amp;rdquo;, meaning using Microsoft for one or more parts of the process and other vendors or open-source for another. I never advise that you use any one vendor blindly - educate yourself, examine the facts, perform some tests and choose whatever mix of technologies best solves your problem.&lt;/p&gt;
&lt;p&gt;At the risk of being vendor-specific, and probably incomplete, I use the following short list of tools Microsoft has for working with &amp;ldquo;Big Data&amp;rdquo;. There is no single package that performs all phases of analysis. These tools are what I use; they should not be taken as a Microsoft authoritative testament to the toolset we&amp;rsquo;ll finalize for a given problem-space. In fact, that&amp;rsquo;s the key: find the problem and then fit the tools to that.&lt;/p&gt;
&lt;h2&gt;Process Types&lt;/h2&gt;
&lt;p&gt;I break up the analysis of the data into two process types. The first is examining and processing the data &lt;em&gt;in-line&lt;/em&gt;, meaning as the data passes through some process. The second is a &lt;em&gt;store-analyze-present&lt;/em&gt; process.&lt;/p&gt;
&lt;h2&gt;Processing Data In-Line&lt;/h2&gt;
&lt;p&gt;Processing data in-line means that the data doesn&amp;rsquo;t have a destination - it remains in the source system. But as it moves from an input or is routed to storage within the source system, various methods are available to examine the data as it passes, and either trigger some action or create some analysis.&lt;/p&gt;
&lt;p&gt;You might not think of this as &amp;ldquo;Big Data&amp;rdquo;, but in fact it can be. Organizations have huge amounts of data stored in multiple systems. Many times the data from these systems do not end up in a database for evaluation. There are options, however, to evaluate that data real-time and either act on the data or perhaps copy or stream it to another process for evaluation.&lt;/p&gt;
&lt;p&gt;The advantage of an in-stream data analysis is that you don&amp;rsquo;t necessarily have to store the data again to work with it. That&amp;rsquo;s also a disadvantage - depending on how you architect the solution, you might not retain a historical record. One method of dealing with this requirement is to trigger a rollup collection or a more detailed collection based on the event.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;StreamInsight &lt;/strong&gt;- StreamInsight is Microsoft&amp;rsquo;s &amp;ldquo;Complex Event Processing&amp;rdquo; or CEP engine. This product, hooked into SQL Server 2008R2, has multiple ways of interacting with a data flow. You can create adapters to talk with systems, and then examine the data mid-stream and create triggers to do something with it. You can read more about StreamInsight here: &lt;a title="http://msdn.microsoft.com/en-us/library/ee391416(v=sql.110).aspx" href="http://msdn.microsoft.com/en-us/library/ee391416(v=sql.110).aspx"&gt;http://msdn.microsoft.com/en-us/library/ee391416(v=sql.110).aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;BizTalk &lt;/strong&gt;- When there is more latency available between the initiation of the data and its processing, you can use Microsoft BizTalk. This is a message-passing and Service Bus oriented tool, and it can also be used to join system&amp;rsquo;s data together than normally does not have a direct link, for instance a Mainframe system to SQL Server. You can learn more about BizTalk here: &lt;a href="http://www.microsoft.com/biztalk/en/us/overview.aspx"&gt;http://www.microsoft.com/biztalk/en/us/overview.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;.NET and the Windows Azure Service Bus &lt;/strong&gt;- Along the same lines as BizTalk but with a more programming-oriented design are the Windows and Windows Azure Service Bus tools. The Service Bus allows you to pass messages as well, and opens up web interactions and even inter-company routing. BizTalk can do this as well, but the Service Bus tools use an API approach for designing the flow and interfaces you want. The Service Bus offerings are also intended as near real-time, not as a streaming interface. You can learn more about the Windows Azure Service Bus here: &lt;a href="http://www.windowsazure.com/en-us/home/tour/service-bus/"&gt;http://www.windowsazure.com/en-us/home/tour/service-bus/&lt;/a&gt; and more about the Event Processing side here: &lt;a href="http://msdn.microsoft.com/en-us/magazine/dd569756.aspx"&gt;http://msdn.microsoft.com/en-us/magazine/dd569756.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2&gt;Store-Analyze-Present&lt;/h2&gt;
&lt;p&gt;A more traditional approach with an organization&amp;rsquo;s data is to store the data and analyze it out-of-band. This began with simply running code over a data store, but as locking and blocking became an issue on a file system, Relational Database Management Systems (RDBMs) were created. Over time a distinction was made between data used in an online processing system, meant to be highly available for writing data (OLTP) and systems designed for analytical and reporting purposes (OLAP).&lt;/p&gt;
&lt;p&gt;Later the data grew larger than these systems were designed for, primarily due to consistency requirements. In analysis, however, consistency isn&amp;rsquo;t always a requirement, and so file-based systems for that analysis were re-introduced from the Mainframe concepts, with new technology layered in for speed and size.&lt;/p&gt;
&lt;p&gt;I normally break up the process of analyzing large data sets into four phases:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;em&gt;Source and Transfer &lt;/em&gt;- Obtaining the data at its source and transferring or loading it into the storage; optionally transforming it along the way&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Store and Process&lt;/em&gt; - Data is stored on some sort of persistence, and in some cases an engine handles the acquisition and placement on persistent storage, as well as retrieval through an interface.&lt;/li&gt;
&lt;li&gt;&amp;nbsp;&lt;em&gt;Analysis &lt;/em&gt;- A new layer introduced with &amp;ldquo;Big Data&amp;rdquo; is a separate analysis step. This is dependent on the engine or storage methodology, is often programming language or script based, and sometimes re-introduces the analysis back into the data. Some engines and processes combine this function into the previous phase.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Presentation&lt;/em&gt; - In most cases, the data wants a graphical representation to comprehend, especially in a series or trend analysis. In other cases a simple symbolic representation, similar to the &amp;ldquo;dashboard&amp;rdquo; elements in a Business Intelligence suite. Presentation tools may also have an analysis or refinement capability to allow end-users to work with the data sets. As in the Analysis phase, some methodologies bundle in the Analysis and Presentation phases into one toolset.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;Source and Transfer&lt;/h3&gt;
&lt;p&gt;You&amp;rsquo;ll notice in this area, along with those that follow, Microsoft is adopting not only its own technologies but those within open-source. This is a positive sign, and means that you will have a best-of-breed, supported set of tools to move the data from one location to another. Traditional file-copy, File Transfer Protocol and more are certainly options, but do not normally deal with moving datasets.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve already mentioned the ability of a streaming tool to push data into a store-analyze-present model, so I&amp;rsquo;ll follow up that discussion with the tools that can extract data from one source and place it in another.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;SQL Server Integration Services (SSIS)/SQL Server Bulk Copy Program (BCP)&lt;/span&gt; &lt;/strong&gt;- SSIS is a SQL Server tool used to move data from one location to another, and optionally perform transform or other processes as it does so. You are not limited to working with SQL Server data - in fact, almost any modern source of data from text to various database platforms is available to move to various systems. It is also extremely fast and has a rich development environment. You can learn more about SSIS here: &lt;a href="http://msdn.microsoft.com/en-us/library/ms141026.aspx"&gt;http://msdn.microsoft.com/en-us/library/ms141026.aspx&lt;/a&gt; BCP is a tool that has been used with SQL Server data since the first releases; it has multiple sources and destinations as well. It is a command-line utility,and has some limited transform capabilities. You can learn more about BCP here: &lt;a href="http://msdn.microsoft.com/en-us/library/ms162802.aspx"&gt;http://msdn.microsoft.com/en-us/library/ms162802.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#0000ff;"&gt;&lt;span style="color:#800000;"&gt;Sqoop&lt;/span&gt; &lt;/span&gt;&lt;/strong&gt;- Tied to Microsoft&amp;rsquo;s latest announcements with Hadoop on Windows and Windows Azure, Sqoop is a tool that is used to move data between SQL Server 2008R2 (and higher)&amp;nbsp;and Hadoop, quickly and efficiently. You can read more about that in the Readme file here: &lt;a href="http://www.microsoft.com/download/en/details.aspx?id=27584"&gt;http://www.microsoft.com/download/en/details.aspx?id=27584&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#800000;"&gt;&lt;strong&gt;Application Programming Interfaces&lt;/strong&gt;&lt;/span&gt; - API&amp;rsquo;s exist in most every major language that can connect to one data source, access data, optionally transforming it and storing it in another system. Most every dialect of&amp;nbsp; the .NET-based languages contain methods to perform this task.&lt;/p&gt;
&lt;h3&gt;Store and Process&lt;/h3&gt;
&lt;p&gt;Data at rest is normally used for historical analysis. In some cases this analysis is performed near real-time, and in others historical data is analyzed periodically. Systems that handle data at rest range from simple storage to active management engines.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;SQL Server&lt;/span&gt;&lt;/strong&gt; - Microsoft&amp;rsquo;s flagship RDBMS can indeed store massive amounts of complex data. I am familiar with a two systems in excess of 300 Terabytes of federated data, and the &lt;a href="http://pan-starrs.ifa.hawaii.edu/public/" target="_blank"&gt;Pan-Starrs&lt;/a&gt; project is designed to handle 1+ Petabyte of data. The theoretical limit of SQL Server DataCenter edition is 540 Petabytes. SQL Server is an engine, so the data access and storage is handled in an abstract layer that also handles concurrency for ACID properties. You can learn more about SQL Server here: &lt;a href="http://www.microsoft.com/sqlserver/en/us/product-info/compare.aspx"&gt;http://www.microsoft.com/sqlserver/en/us/product-info/compare.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;SQL Azure Federations&lt;/span&gt;&lt;/strong&gt; - SQL Azure is a database service from Microsoft associated with the Windows Azure platform. Database Servers are multi-tenant, but are shared across a &amp;ldquo;fabric&amp;rdquo; that moves active databases for redundancy and performance. Copies of all databases are kept triple-redundant with a consistent commitment model. Databases are (at this writing - check &lt;a href="http://WindowsAzure.com"&gt;http://WindowsAzure.com&lt;/a&gt; for the latest) capped at a 150 GB size limit per database. However, Microsoft released a &amp;ldquo;Federation&amp;rdquo; technology, allowing you to query a head node and have the data federated out to multiple databases. This improves both size and performance. You can read more about SQL Azure Federations here: &lt;a href="http://social.technet.microsoft.com/wiki/contents/articles/2281.federations-building-scalable-elastic-and-multi-tenant-database-solutions-with-sql-azure.aspx"&gt;http://social.technet.microsoft.com/wiki/contents/articles/2281.federations-building-scalable-elastic-and-multi-tenant-database-solutions-with-sql-azure.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;Analysis Services&lt;/span&gt;&lt;/strong&gt; - The Business Intelligence engine within SQL Server, called Analysis Services, can also handle extremely large data systems. In addition to traditional BI data store layouts (ROLAP, MOLAP and HOLAP), the latest version of SQL Server introduces the Vertipaq column-storage technology allowing more direct access to data and a different level of compression. You can read more about Analysis Services here: &lt;a href="http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/analysis-services.aspx"&gt;http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/analysis-services.aspx&lt;/a&gt; and more about Vertipaq here: &lt;a href="http://msdn.microsoft.com/en-us/library/hh212945(v=SQL.110).aspx"&gt;http://msdn.microsoft.com/en-us/library/hh212945(v=SQL.110).aspx&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#800000;"&gt;&lt;strong&gt;Parallel Data Warehouse &lt;/strong&gt;&lt;/span&gt;- The Parallel Data Warehouse (PDW) offering from Microsoft is largely described by the title. Accessed in multiple ways including using Transact-SQL (the Microsoft dialect of the Structured Query Language), &lt;a href="http://sqlpdw.com/2010/07/what-mpp-means-to-sql-server-parallel-data-warehouse/" target="_blank"&gt;This is an MPP appliance&lt;/a&gt;&amp;nbsp;scaling in parallel to extremely large datasets. It is a hardware and software offering - you can learn more about it here: &lt;a href="http://www.microsoft.com/sqlserver/en/us/solutions-technologies/data-warehousing/pdw.aspx"&gt;http://www.microsoft.com/sqlserver/en/us/solutions-technologies/data-warehousing/pdw.aspx&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;HPC Server&lt;/span&gt;&lt;/strong&gt; - Microsoft&amp;rsquo;s High-Performance Computing version of Windows Server deals not only with large data sets, but with extremely complicated computing requirements. A scale-out architecture and inter-operation with Linux systems, as well as dozens of applications pre-written to work with this server make this a capable &amp;ldquo;Big Data&amp;rdquo; system. It is a mature offering, with a long track record of success in scientific, financial and other areas of data processing. It is available both on premises and in Windows Azure, and also in a hybrid of both models, allowing you to &amp;ldquo;rent&amp;rdquo; a super-computer when needed. You can read more about it here: &lt;a href="http://www.microsoft.com/hpc/en/us/product/cluster-computing.aspx"&gt;http://www.microsoft.com/hpc/en/us/product/cluster-computing.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;Hadoop&lt;/span&gt;&lt;/strong&gt; - Pairing up with Hortonworks, Microsoft has released the Hadoop Open-Source system -&amp;nbsp; including HDFS and a Map/Reduce standardized software, Hive and Pig - on Windows and the Windows Azure platform. This is not a customized version; off-the-shelf concepts and queries work well here. You can read more about Hadoop here: &lt;a href="http://hadoop.apache.org/common/docs/current/"&gt;http://hadoop.apache.org/common/docs/current/&lt;/a&gt; and you can read more about Microsoft&amp;rsquo;s offerings here: &lt;a href="http://hortonworks.com/partners/microsoft/"&gt;http://hortonworks.com/partners/microsoft/&lt;/a&gt;&amp;nbsp;and here: &lt;a href="http://social.technet.microsoft.com/wiki/contents/articles/6204.hadoop-based-services-for-windows.aspx"&gt;http://social.technet.microsoft.com/wiki/contents/articles/6204.hadoop-based-services-for-windows.aspx&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;Windows and Azure Storage&lt;/span&gt;&lt;/strong&gt; - Although not an engine - other than a triple-redundant, immediately consistent commit - Windows Azure can hold terabytes of information and make it available to everything from the R programming language to the Hadoop offering. Binary storage (Blobs) and Table storage (Key-Value Pair) data can be queried across a distributed environment. You can learn more about Windows Azure storage here: &lt;a href="http://msdn.microsoft.com/en-us/library/windowsazure/gg433040.aspx"&gt;http://msdn.microsoft.com/en-us/library/windowsazure/gg433040.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3&gt;Analysis&lt;/h3&gt;
&lt;p&gt;In a &amp;ldquo;Big Data&amp;rdquo; environment, it&amp;rsquo;s not unusual to have a specialized set of tasks for analyzing and even interpreting the data. This is a new field called &amp;ldquo;data Science&amp;rdquo;, with a requirement not only for computing, but also a heavy emphasis on math.&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#800000;"&gt;&lt;strong&gt;Transact-SQL &lt;/strong&gt;&lt;/span&gt;- T-SQL is the dialect of the Structured Query Language used by Microsoft. It includes not only robust selection, updating and manipulating of data, but also analytical and domain-level interrogation as well. It can be used on SQL Server, PDW and ODBC data sources. You can read more about T-SQL here: &lt;a href="http://msdn.microsoft.com/en-us/library/bb510741.aspx"&gt;http://msdn.microsoft.com/en-us/library/bb510741.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;Multidimensional Expressions and Data Analysis Expressions&lt;/span&gt;&lt;/strong&gt; - The MDX and DAX languages allow you to query multidimensional data models that do not fit well with typical two-plane query languages. Pivots, aggregations and more are available within these constructs to query and work with data in Analysis Services. You can read more about MDX here: &lt;a href="http://msdn.microsoft.com/en-us/library/ms145506(v=sql.110).aspx"&gt;http://msdn.microsoft.com/en-us/library/ms145506(v=sql.110).aspx&lt;/a&gt; and more about DAX here: &lt;a href="http://www.microsoft.com/download/en/details.aspx?id=28572"&gt;http://www.microsoft.com/download/en/details.aspx?id=28572&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;HPC Jobs and Tasks &lt;/span&gt;&lt;/strong&gt;- Work submitted to the Windows HPC Server has a particular job - essentially a reservation request for resources. Within a job you can submit tasks, such as parametric sweeps and more. You can learn more about Jobs and Tasks here: &lt;a href="http://technet.microsoft.com/en-us/library/cc719020(v=ws.10).aspx"&gt;http://technet.microsoft.com/en-us/library/cc719020(v=ws.10).aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;HiveQL &lt;/span&gt;&lt;/strong&gt;- HiveQL is the language used to query a Hive object running on Hadoop. You can see a tutorial on that process here: &lt;a href="http://social.technet.microsoft.com/wiki/contents/articles/6628.aspx"&gt;http://social.technet.microsoft.com/wiki/contents/articles/6628.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;Piglatin &lt;/span&gt;&lt;/strong&gt;- Piglatin is the submission language for the Pig implementation on Hadoop. An example of that process is here: &lt;a href="http://sqlblog.com/b/avkashchauhan/archive/2012/01/10/running-apache-pig-pig-latin-at-apache-hadoop-on-windows-azure.aspx"&gt;http://blogs.msdn.com/b/avkashchauhan/archive/2012/01/10/running-apache-pig-pig-latin-at-apache-hadoop-on-windows-azure.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;Application Programming Interfaces &lt;/span&gt;&lt;/strong&gt;- Almost all of the analysis offerings have associated API&amp;rsquo;s - of special note is Microsoft Research&amp;rsquo;s Infer.NET, a new language construct for framework for running Bayesian inference in graphical models, as well as probabilistic programming. You can read more about Infer.NET here: &lt;a href="http://research.microsoft.com/en-us/um/cambridge/projects/infernet/"&gt;http://research.microsoft.com/en-us/um/cambridge/projects/infernet/&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3&gt;Presentation&lt;/h3&gt;
&lt;p&gt;Lots of tools work in presenting the data once you have done the primary analysis. In fact, there&amp;rsquo;s a great video of a comparison of various tools here: &lt;a href="http://msbiacademy.com/Lesson.aspx?id=73"&gt;http://msbiacademy.com/Lesson.aspx?id=73&lt;/a&gt; Primarily focused on Business Intelligence. That term itself is now not as completely defined, but the tools I&amp;rsquo;ll show below can be used in multiple ways - not just traditional Business Intelligence scenarios. Application Programming Interfaces (API&amp;rsquo;s) can also be used for presentation; but I&amp;rsquo;ll focus here on &amp;ldquo;out of the box&amp;rdquo; tools.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;Excel&lt;/span&gt;&lt;/strong&gt; - Microsoft&amp;rsquo;s Excel can be used not only for single-desk analysis of data sets, but with larger datasets as well. It has interfaces into SQL Server, Analysis Services, can be connected to the PDW, and is a first-class job submission system for the Windows HPC Server. You can watch a video about Excel and big data here: &lt;a href="http://www.microsoft.com/en-us/showcase/details.aspx?uuid=e20b7482-11c9-4965-b8f0-7fb6ac7a769f"&gt;http://www.microsoft.com/en-us/showcase/details.aspx?uuid=e20b7482-11c9-4965-b8f0-7fb6ac7a769f&lt;/a&gt;&amp;nbsp;and you can also connect Excel to Hadoop: &lt;a href="http://social.technet.microsoft.com/wiki/contents/articles/how-to-connect-excel-to-hadoop-on-azure-via-hiveodbc.aspx"&gt;http://social.technet.microsoft.com/wiki/contents/articles/how-to-connect-excel-to-hadoop-on-azure-via-hiveodbc.aspx&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;Reporting Services&lt;/span&gt;&lt;/strong&gt; - Reporting Services is a SQL Server tool that can query and show data from multiple sources, all at once. It can also be used with Analysis Services. You can read more about Reporting Services here: &lt;a href="http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/reporting-services.aspx"&gt;http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/reporting-services.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;Power View&lt;/span&gt;&lt;/strong&gt; - Power View is a &amp;ldquo;Self-Service&amp;rdquo; Business Intelligence reporting tool, which can work with on-premises data in addition to SQL Azure and other data. You can read more about it and see videos of Power View in action here: &lt;a href="http://www.microsoft.com/sqlserver/en/us/future-editions/business-intelligence/SQL-Server-2012-reporting-services.aspx"&gt;http://www.microsoft.com/sqlserver/en/us/future-editions/business-intelligence/SQL-Server-2012-reporting-services.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;SharePoint Services -&lt;/span&gt;&lt;/strong&gt; Microsoft has rolled several capable tools in SharePoint as &amp;ldquo;Services&amp;rdquo;. This has the advantage of being able to integrate into the working environment of many companies. You can read more about&amp;nbsp; lots of these reporting and analytic presentation tools here: &lt;a href="http://technet.microsoft.com/en-us/sharepoint/ee692578"&gt;http://technet.microsoft.com/en-us/sharepoint/ee692578&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;This is by no means an exhaustive list - more capabilities are added all the time to Microsoft&amp;rsquo;s products, and things will surely shift and merge as time goes on. Expect today&amp;rsquo;s &amp;ldquo;Big Data&amp;rdquo; to be tomorrow&amp;rsquo;s &amp;ldquo;Laptop Environment&amp;rdquo;.&lt;/p&gt;&lt;img src="http://sqlblog.com/aggbug.aspx?PostID=41832" width="1" height="1"&gt;</description><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Developer/default.aspx">Developer</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Microsoft/default.aspx">Microsoft</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Design/default.aspx">Design</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/SQL+Azure/default.aspx">SQL Azure</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Data/default.aspx">Data</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Data+Professional/default.aspx">Data Professional</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Cloud/default.aspx">Cloud</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Azure/default.aspx">Azure</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Windows+Azure/default.aspx">Windows Azure</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Cloud+Computing/default.aspx">Cloud Computing</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Storage/default.aspx">Storage</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Concepts/default.aspx">Concepts</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Business+Intelligence/default.aspx">Business Intelligence</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Windows+2008/default.aspx">Windows 2008</category></item><item><title>Big Data and the Cloud - More Hype or a Real Workload?</title><link>http://sqlblog.com/blogs/buck_woody/archive/2011/10/18/big-data-and-the-cloud-more-hype-or-a-real-workload.aspx</link><pubDate>Tue, 18 Oct 2011 13:57:36 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:39156</guid><dc:creator>BuckWoody</dc:creator><slash:comments>0</slash:comments><comments>http://sqlblog.com/blogs/buck_woody/comments/39156.aspx</comments><wfw:commentRss>http://sqlblog.com/blogs/buck_woody/commentrss.aspx?PostID=39156</wfw:commentRss><description>&lt;p&gt;Last week Microsoft announced several new offerings for “Big Data” - and since I’m a stickler for definitions, I wanted to make sure I understood what that really means. What is “Big Data”? What size hard drive is that? After all, my laptop has 1TB of storage - is my laptop “Big Data”?&lt;/p&gt;  &lt;p&gt;There are actually a few definitions for this term, most notably those involving the &lt;a href="http://nosql.mypopescu.com/post/9621746531/a-definition-of-big-data" target="_blank"&gt;“Four V’s” Volume, Velocity, Variety and Variability&lt;/a&gt;. Others &lt;a href="http://nosql.mypopescu.com/post/10120087314/big-data-and-the-4-vs-volume-velocity-variety" target="_blank"&gt;disagree with this&lt;/a&gt; definition. I tend to try and get things into their simplest form, so I’m using this definition for myself:&lt;/p&gt;  &lt;p align="center"&gt;&lt;font color="#c0504d" size="3"&gt;Big data is defined as a &lt;em&gt;large set &lt;/em&gt;of &lt;em&gt;computationally expensive &lt;/em&gt;data that is &lt;em&gt;worked on simultaneously&lt;/em&gt;.&lt;/font&gt; &lt;/p&gt;  &lt;p&gt;Let me flesh that out a&amp;#160; little. To be sure, “Big Data” has a larger size than say a few megabytes. The reason this is important is that it takes special hardware to be able to move large sets of data around, store it, process it and so on. (&lt;font color="#c0504d"&gt;large set&lt;/font&gt;)&lt;/p&gt;  &lt;p&gt;If you store a LOT of data, but only use a small portion of it at a time, that really isn’t super-hard to do. It’s mainly a storage issue at that point. But, if you do need to work with a large portion of the data at one time, then the memory, CPU and transfer components of the system have to adapt to be responsive - new ways to work with that data (game theory, knot-algorithms, map-reduce, etc.) need to be brought into play. (&lt;font color="#c0504d"&gt;computationally expensive&lt;/font&gt;)&lt;/p&gt;  &lt;p&gt;Once that data is loaded into the processing area (memory or whatever other mechanism is used) it must be worked on in parallel to come back in a reasonable time. You have two options here - you can scale the system up with more internal hardware (CPU’s, memory and so on) or you can scale it out to have multiple systems work on it at the same time using paradigms such as map/reduce and so on. Actually, when you lay this out in an architecture diagram, scale up or out doesn’t actually change the logical structure of the process - in scale out the network becomes the bus, and the nodes become more RAM and computing power. Of course, there are changes in code for how you stitch the workload back together. (&lt;font color="#c0504d"&gt;worked on simultaneously&lt;/font&gt;)&lt;/p&gt;  &lt;p&gt;So back to the original question. Is Big Data, as I have defined it here, a workload for Windows and SQL Azure? Absolutely! In fact, it’s probably one of the main workloads, and I believe it represents the latest, and perhaps also the earliest frontier of computing. Jim &lt;a href="http://research.microsoft.com/en-us/um/people/gray/" target="_blank"&gt;Gray, a former researcher here at Microsoft and a hero of mine, was working on this very topic.&lt;/a&gt; I believe as he did - all computing is simply an interface over data. &lt;/p&gt;  &lt;p&gt;Microsoft has multiple offerings on the topic of Big Data. In posts that follow from myself and my co-workers, we’ll explore when and where you use each one. Whether you are a data professional or a developer, this is the new frontier - &lt;a href="http://www.straightpathsql.com/archives/2011/10/microsoft-loves-your-big-data/" target="_blank"&gt;don’t wait to educate yourself&lt;/a&gt; on how to leverage Big Data for your organization. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Hadoop on Windows Azure and SQL Server&amp;#160; &lt;/strong&gt;- Microsoft’s &lt;a href="http://www.hortonworks.com/the-whys-behind-the-microsoft-and-hortonworks-partnership/" target="_blank"&gt;partnership to include Hadoop workloads on Windows Azure&lt;/a&gt; and &lt;a href="http://www.microsoft.com/download/en/details.aspx?id=27584" target="_blank"&gt;SQL Server/Parallel Data Warehouse (PDW)&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;LINQ to HPC &lt;/strong&gt;- Microsoft’s High-Performance Computing SKU of &lt;a href="http://blogs.technet.com/b/windowshpc/archive/2011/05/20/dryad-becomes-linq-to-hpc.aspx" target="_blank"&gt;HPC is now in Azure&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Windows Azure Table Storage &lt;/strong&gt;- A &lt;a href="http://msdn.microsoft.com/en-us/library/windowsazure/hh508997.aspx" target="_blank"&gt;key/value pair type storage with full partitioning&lt;/a&gt; that is immediately consistent, able to handle huge loads of data and works with any REST-compatible language&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;strong&gt;Other offerings &lt;/strong&gt;- Including the new &lt;a href="http://www.microsoft.com/en-us/sqlazurelabs/default.aspx" target="_blank"&gt;Data Explorer&lt;/a&gt;, &lt;a href="http://research.microsoft.com/en-us/news/headlines/daytona-071811.aspx" target="_blank"&gt;Project Daytona (with a Big Data Toolkit for Scientists and researchers)&lt;/a&gt;, &lt;a href="http://www.microsoft.com/sqlserver/en/us/future-editions/SQL-Server-2012-breakthrough-insight.aspx" target="_blank"&gt;Power View&lt;/a&gt; and more. &lt;/p&gt;  &lt;p&gt;The era of Big Data is here. And you can use Windows and SQL Azure to bring it to your organization. &lt;/p&gt;&lt;img src="http://sqlblog.com/aggbug.aspx?PostID=39156" width="1" height="1"&gt;</description><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Developer/default.aspx">Developer</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Microsoft/default.aspx">Microsoft</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/DBA/default.aspx">DBA</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/SQL+Server/default.aspx">SQL Server</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Career/default.aspx">Career</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/SQL+Azure/default.aspx">SQL Azure</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Conferences/default.aspx">Conferences</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/PASS/default.aspx">PASS</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/SQLServer/default.aspx">SQLServer</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Data/default.aspx">Data</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Data+Professional/default.aspx">Data Professional</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Cloud/default.aspx">Cloud</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Azure/default.aspx">Azure</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Windows+Azure/default.aspx">Windows Azure</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Cloud+Computing/default.aspx">Cloud Computing</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Storage/default.aspx">Storage</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Concepts/default.aspx">Concepts</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Azure+Use+Cases/default.aspx">Azure Use Cases</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Policy+Based+Management/default.aspx">Policy Based Management</category></item><item><title>SQL Server for the Oracle DBA Links</title><link>http://sqlblog.com/blogs/buck_woody/archive/2010/04/28/sql-server-for-the-oracle-dba-links.aspx</link><pubDate>Wed, 28 Apr 2010 14:29:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:24637</guid><dc:creator>BuckWoody</dc:creator><slash:comments>2</slash:comments><comments>http://sqlblog.com/blogs/buck_woody/comments/24637.aspx</comments><wfw:commentRss>http://sqlblog.com/blogs/buck_woody/commentrss.aspx?PostID=24637</wfw:commentRss><description>&lt;P&gt;I do a presentation (and a class) called "SQL Server for the Oracle DBA". It's a non-marketing overview that gives you the basics of working with SQL Server if you're already familiar wtih how Oracle works. This class and these links DO NOT help you with "Why should I use Oracle/SQL Server instead of Oracle/SQL Server" - I'll assume you're already there, and if not, there are LOTS of sites to help you make that decision. &lt;/P&gt;
&lt;P&gt;Although these links might contain slight marketing slants (I don't control them) I've tried to get the best links I can. Feel free to comment here to add more/better links.&lt;/P&gt;
&lt;P&gt;As such, these aren't links that help you work with Oracle - they are links to help you work with SQL Server. Some of them contain more information than you actually need, others don't have near enough. Taken together (and with the class) you're able to get done what you need to do.&lt;/P&gt;
&lt;P&gt;"Practical SQL Server for Oracle Professionals" - A Microsoft Whitepaper, probably the best place to get started: &lt;A href="http://download.microsoft.com/download/6/9/d/69d1fea7-5b42-437a-b3ba-a4ad13e34ef6/SQLServer2008forOracle.docx"&gt;http://download.microsoft.com/download/6/9/d/69d1fea7-5b42-437a-b3ba-a4ad13e34ef6/SQLServer2008forOracle.docx&lt;/A&gt; &lt;/P&gt;
&lt;P&gt;Free Training: &lt;A href="http://technet.microsoft.com/en-us/sqlserver/dd548020.aspx"&gt;http://technet.microsoft.com/en-us/sqlserver/dd548020.aspx&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Classroom training (will cost you): &lt;A href="http://www.microsoft.com/learning/en/us/course.aspx?ID=50068A&amp;amp;locale=en-us"&gt;http://www.microsoft.com/learning/en/us/course.aspx?ID=50068A&amp;amp;locale=en-us&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Terminology Differences: &lt;A href="http://www.associatedcontent.com/article/2383466/oracle_and_sql_server_basic_terminology.html"&gt;http://www.associatedcontent.com/article/2383466/oracle_and_sql_server_basic_terminology.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Datatype mapping between Oracle and SQL Server: &lt;A href="http://msdn.microsoft.com/en-us/library/ms151817.aspx"&gt;http://msdn.microsoft.com/en-us/library/ms151817.aspx&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;The "other" direction - can still be useful for the Oracle professional to see the other side: &lt;A href="http://blog.benday.com/archive/2008/10/23/23195.aspx"&gt;http://blog.benday.com/archive/2008/10/23/23195.aspx&lt;/A&gt; &lt;/P&gt;&lt;img src="http://sqlblog.com/aggbug.aspx?PostID=24637" width="1" height="1"&gt;</description><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Developer/default.aspx">Developer</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Microsoft/default.aspx">Microsoft</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/DBA/default.aspx">DBA</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/SQL+Server/default.aspx">SQL Server</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Link+Lists/default.aspx">Link Lists</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Conferences/default.aspx">Conferences</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Links/default.aspx">Links</category></item><item><title>Tools and Processes for “Fitting it all in”</title><link>http://sqlblog.com/blogs/buck_woody/archive/2010/01/18/tools-and-processes-for-fitting-it-all-in.aspx</link><pubDate>Mon, 18 Jan 2010 14:42:05 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:21147</guid><dc:creator>BuckWoody</dc:creator><slash:comments>0</slash:comments><comments>http://sqlblog.com/blogs/buck_woody/comments/21147.aspx</comments><wfw:commentRss>http://sqlblog.com/blogs/buck_woody/commentrss.aspx?PostID=21147</wfw:commentRss><description>&lt;p&gt;Most data professionals I’ve met work in two modes: we plan for our day, and we react to the situations around us. I’m staring at my list of things that I need to do today right now, which is my planned work. Of course, I have no idea how much of that will really get done – it’s optimistic to be sure. On the other hand I have several systems I manage, and at any moment one of them or the people that interface with them may “change state” such that I need to give them some attention.&lt;/p&gt;  &lt;p&gt;So how do I meld the two? Sometimes it can be quite difficult. I’m constantly working through my list in my mind, re-arranging what I’m focusing on based on what I perceive as the highest need. There are, however, some tools that I use each day to help me manage the workflow.&lt;/p&gt;  &lt;p&gt;I use Outlook for tracking everything, since it has a task list (my primary tracking), a calendar, mail and so on. Also I can share the information, it’s on-line so I can see it anywhere, and I can even take it offline onto the plane this week when I fly out of town. &lt;/p&gt;  &lt;p&gt;For the “ad-hoc” work, I rely on a script library, which I keep as SQL Server Management Studio projects. I keep those scripts and projects backed using Microsoft Live Mesh, which synchronizes those files (along with a few other critical files and my IE Favorites) across not only my laptop and primary systems, but even with my Virtual Machines. &lt;/p&gt;  &lt;p&gt;Also for my SQL Server systems I use the Standard Reports I’ve blogged about here. I also use Greg Larsen’s Database Dashboard, and a series of PowerShell scripts that work across my systems, alerting me to any problems. Of course I’m using SQL Server Agent Jobs quite a bit, and I also use Alerts and some Perfmon automation for my monthly baselining.&lt;/p&gt;  &lt;p&gt;So – is this your experience as well? Do you get driven by both planned and unplanned work? What tools and processes do you use to keep it all straight with your SQL Server Instances?&lt;/p&gt;&lt;img src="http://sqlblog.com/aggbug.aspx?PostID=21147" width="1" height="1"&gt;</description><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Development/default.aspx">Development</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/SSMS/default.aspx">SSMS</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Developer/default.aspx">Developer</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Tips/default.aspx">Tips</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Microsoft/default.aspx">Microsoft</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/DBA/default.aspx">DBA</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Design/default.aspx">Design</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/SQL+Server/default.aspx">SQL Server</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/SQL+Server+Management+Studio/default.aspx">SQL Server Management Studio</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Scripts/default.aspx">Scripts</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Web/default.aspx">Web</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Planning/default.aspx">Planning</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/PowerShell/default.aspx">PowerShell</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Process/default.aspx">Process</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Agent/default.aspx">Agent</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Management/default.aspx">Management</category></item><item><title>Know Your Product Specifications</title><link>http://sqlblog.com/blogs/buck_woody/archive/2010/01/13/know-your-product-specifications.aspx</link><pubDate>Wed, 13 Jan 2010 14:57:01 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:21010</guid><dc:creator>BuckWoody</dc:creator><slash:comments>1</slash:comments><comments>http://sqlblog.com/blogs/buck_woody/comments/21010.aspx</comments><wfw:commentRss>http://sqlblog.com/blogs/buck_woody/commentrss.aspx?PostID=21010</wfw:commentRss><description>&lt;p&gt;As the Data Professional in your organization, the rest of the org looks to you to ensure that the system can handle what the business requires. To do that, you need to know two things: what the business requires, and what SQL Server can do.&lt;/p&gt;  &lt;p&gt;But of course there’s a bit more to it than that. Knowing the business side of the requirements – well, I teach an entire course on that. But knowing what SQL Server can do is something you can find out on your own.&lt;/p&gt;  &lt;p&gt;SQL Server comes in &lt;em&gt;versions&lt;/em&gt;, which are released based on date, and &lt;em&gt;editions&lt;/em&gt;, which are based on features and capabilities. It’s that last part that I want to focus on today.&lt;/p&gt;  &lt;p&gt;As Microsoft SQL Server matures, you’re going to see even more separation between what each edition of SQL Server can do and where it should be used. In the past, most folks have only focused on three editions – Express (the “free” one), Standard, and Enterprise. The rule of thumb was that if Standard was good enough at the moment, put it in. And it is true (and a good thing) that you can upgrade from one edition to another fairly easily.&lt;/p&gt;  &lt;p&gt;But as time goes on, we should spend a little more time understanding what each edition does, what it’s features and capabilities are, and where and when we should put them in. As I study this information, I’ll throw in my 2 cents and you can as well based on what you see. One thing I’ve found so far is that once I have the business requirements, there’s a mix of what I can write in code and what might already be included in a different edition. It’s important to look long and hard at that choice – writing a feature on my own is certainly cheaper in the short term than moving to a “higher” edition, but in some cases it makes sense to let Microsoft handle that lifting.&lt;/p&gt;  &lt;p&gt;These links are ones that you should bookmark and take a peek at periodically. They are the “header” links for more information on those features and capabilities:&lt;/p&gt;  &lt;p&gt;SQL Server 2008: &lt;a href="http://msdn.microsoft.com/en-us/library/ms143287.aspx"&gt;http://msdn.microsoft.com/en-us/library/ms143287.aspx&lt;/a&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;SQL Server 2008 R2: &lt;a href="http://msdn.microsoft.com/en-us/library/ms143287(SQL.105).aspx"&gt;http://msdn.microsoft.com/en-us/library/ms143287(SQL.105).aspx&lt;/a&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;In addition, you might start learning a little more about SQL Azure. I’ll talk more about that later.&lt;/p&gt;&lt;img src="http://sqlblog.com/aggbug.aspx?PostID=21010" width="1" height="1"&gt;</description><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Development/default.aspx">Development</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Developer/default.aspx">Developer</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Microsoft/default.aspx">Microsoft</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/DBA/default.aspx">DBA</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Design/default.aspx">Design</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/SQL+Server/default.aspx">SQL Server</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Administration/default.aspx">Administration</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/SQL+Server+2008/default.aspx">SQL Server 2008</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Career/default.aspx">Career</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Planning/default.aspx">Planning</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/SQL+Azure/default.aspx">SQL Azure</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Microsoft+Update/default.aspx">Microsoft Update</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Link+Lists/default.aspx">Link Lists</category></item><item><title>Aren’t DBA’s Just System Admins for Databases?</title><link>http://sqlblog.com/blogs/buck_woody/archive/2009/11/30/aren-t-dba-s-just-system-admins-for-databases.aspx</link><pubDate>Mon, 30 Nov 2009 16:37:32 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:19335</guid><dc:creator>BuckWoody</dc:creator><slash:comments>8</slash:comments><comments>http://sqlblog.com/blogs/buck_woody/comments/19335.aspx</comments><wfw:commentRss>http://sqlblog.com/blogs/buck_woody/commentrss.aspx?PostID=19335</wfw:commentRss><description>&lt;p&gt;Last week I ran into an argument I’ve had since I left the mainframe space decades ago. A developer told me “DBA’s don’t design databases.” The inference was that DBA’s (i.e., Database Administrators) only worry about hardware, security, OS, database backups, things like that. He seemed amazed that a DBA would ever do “data” work.&lt;/p&gt;  &lt;p&gt;It may be the name. Perhaps the “admin” part confuses developers. Also, it is true that in some shops, a systems admin does double duty with Windows, SQL Server, and perhaps even mail and web admins. &lt;/p&gt;  &lt;p&gt;But there are a LOT of DBA’s, or as the term I like to use, “Data Professionals”, that actually DO get down in the trenches and design databases, write Transact-SQL code and stored procedures, and do almost everything in the database other than write middle-tier or User Interface (UI) code. Some I know even do that.&lt;/p&gt;  &lt;p&gt;So what if there is a miscommunication on this? Well, the ramifications can be huge. For one thing, there’s a lack of respect. That’s not called for ever, no matter what anyone’s role is. Also, one of the most impactful areas in a database is the design. When a DBA is asked to export data, tune a process, or troubleshoot an issue, it invariably involves the design. So when a DBA doesn’t get to do the design, they have to live with the results. And anything you’re responsible for when you don’t have the authority over is a recipe for stress.&lt;/p&gt;  &lt;p&gt;Another issue is that DBA’s “inherit” all kinds of data structures form around the company. From Microsoft Access to Excel, to amateur Business Analysts creating databases in the Express Editions, they deal with bad design day after day. The newer “modeling” languages that are coming into vogue will make this problem much worse. These languages do not take scale, extensibility, security or performance into account – they just make sure that the data ends up in the right place for that particular design, which is a recipe for data disaster when the “small application” the developer writes becomes a “mission critical” system the DBA has to troubleshoot at 2:00 A.M.&lt;/p&gt;  &lt;p&gt;So in case you’re a developer, and in case you think DBA’s “just do admin” – think again. DBA’s spend their whole day in this world, and can be a valuable asset to your development efforts. Bring them in early, bring them in often, and whatever you do – don’t design alone. Business Analysts, Developers and Data Professionals are needed to make a good, sustainable, secure, well-performing database.&lt;/p&gt;&lt;img src="http://sqlblog.com/aggbug.aspx?PostID=19335" width="1" height="1"&gt;</description><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Development/default.aspx">Development</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Developer/default.aspx">Developer</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Microsoft/default.aspx">Microsoft</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/DBA/default.aspx">DBA</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Design/default.aspx">Design</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/SQL+Server/default.aspx">SQL Server</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Rant/default.aspx">Rant</category></item><item><title>Changing the Primary Key After You Have Data</title><link>http://sqlblog.com/blogs/buck_woody/archive/2009/11/24/changing-the-primary-key-after-you-have-data.aspx</link><pubDate>Tue, 24 Nov 2009 13:25:22 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:19147</guid><dc:creator>BuckWoody</dc:creator><slash:comments>0</slash:comments><comments>http://sqlblog.com/blogs/buck_woody/comments/19147.aspx</comments><wfw:commentRss>http://sqlblog.com/blogs/buck_woody/commentrss.aspx?PostID=19147</wfw:commentRss><description>&lt;p&gt;&lt;a href="http://blogs.msdn.com/buckwoody/archive/2009/11/23/changing-the-primary-key-before-you-have-data.aspx" target="_blank"&gt;Yesterday I blogged about changing a Primary Key (PK)&lt;/a&gt; during the design phase, and before you have data in the database. Even then, it’s not trivial to change the data type or column(s) that make(s) up the PK. When you have data in that Primary Key and/or you have Foreign Keys (FK) that point to a PK field, this becomes a much more involved process.&lt;/p&gt;  &lt;p&gt;First, you MUST take a complete backup of the system, and you MUST do this work on a development system. You’re going to be manipulating base data, and in the worst kind of way. Why do I say that?&lt;/p&gt;  &lt;p&gt;There are three kinds of data problems. One is physical data corruption, and once you restore it (I don’t trust repair with data loss) then you know where the “good” data stops and starts. You know that the FKs are pointing to the right PKs, and that the data is readable. &lt;/p&gt;  &lt;p&gt;The second kind of data problem is “incorrect” entry, meaning that you put in “Buck Woodie” when you meant “Buck Woody”. This is also often correctable, as your Declarative Referential Integrity (DRI) will still keep that record together. It’s a simple matter to locate and correct.&lt;/p&gt;  &lt;p&gt;But the third kind of data problem is the worst. In this kind, the FKs point to the &lt;em&gt;wrong&lt;/em&gt; PK, or &lt;em&gt;not at all&lt;/em&gt;. Or perhaps some of the FK data is missing. The reason this is the worst is that the data cannot be trusted – did Buck Woody really pay for that purchase or was that payment from a different record? If the links are incorrect, everything will look fine, but it won’t be correct.&lt;/p&gt;  &lt;p&gt;So in the situation where you want to change the PKs when you already have data, one of the steps is to re-point the FKs to the proper PKs. Get this wrong and you’ll have “dirty data”. The only recourse there is to restore the backup and start over.&lt;/p&gt;  &lt;p&gt;I’m assuming that you have a REALLY GOOD reason to make this change, and that you’ve taken that backup and you’re on a test system. As Hannibal Lector would say: “Okeedokee – here we go.”&lt;/p&gt;  &lt;p&gt;I’ll describe two possible choices. In the first, understand and document the PK and FK relationships for each table. Then create queries that tie out the INSERTS for that data in the correct order, and include all fields for both tables, including the PKs in the parent table and the FKs in the child tables.&lt;/p&gt;  &lt;p&gt;With those INSERT statements made, drop the PK and FK constraints on all of the tables using the script I mentioned yesterday. Make your change to the data type (or field) in the tables. Then edit the INSERT statements to have the new value types, ensuring that the “sets” or records are kept together by keeping the FK’s pointing to the right PKs. Then re-apply the PK and FK constraints, watching for errors. With backups and scripts, you can make corrections along the way.&lt;/p&gt;  &lt;p&gt;The second method works much as the first, with the exception of using the INSERT statements. You can use SSIS or a third-party transfer process to actually move the data and change the data type or values “in flight”, although this requires more thought and planning in my experience.&lt;/p&gt;  &lt;p&gt;In any case, you should test the results thoroughly and ensure that you’re getting the data you expect. Communicate what you’re doing, and be ready to fall back to your backups at the first sign of trouble.&lt;/p&gt;  &lt;p&gt;Oh – since SQL Server 2005, you’ve been able to “relax” constraints to insert data. That won’t help you here, since you’re changing the entire key for whatever reason. That’s only meant to allow you to get data in when the key doesn’t change.&lt;/p&gt;  &lt;p&gt;OK – feel free to comment on this post, since there are other ways I didn’t cover. But if you’re reading my material or any of the comments, make sure you test it yourself – I might have missed something, no one is perfect, your situation may vary. Once again, you can see why I value a good design and ERD so highly – that way you never have to get here to begin with!&lt;/p&gt;&lt;img src="http://sqlblog.com/aggbug.aspx?PostID=19147" width="1" height="1"&gt;</description><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Development/default.aspx">Development</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/SSMS/default.aspx">SSMS</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Developer/default.aspx">Developer</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Tips/default.aspx">Tips</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Microsoft/default.aspx">Microsoft</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/DBA/default.aspx">DBA</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Design/default.aspx">Design</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/SQL+Server/default.aspx">SQL Server</category></item></channel></rss>