<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Search results matching tag 'Data'</title><link>http://sqlblog.com/search/SearchResults.aspx?o=DateDescending&amp;tag=Data&amp;orTags=0</link><description>Search results matching tag 'Data'</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>How Does the Cloud Change a  Developer's Job?</title><link>http://sqlblog.com/blogs/buck_woody/archive/2013/02/12/how-does-the-cloud-change-a-developer-s-job.aspx</link><pubDate>Tue, 12 Feb 2013 16:26:51 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:47670</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;I've recently &lt;a href="http://sqlblog.com/b/buckwoody/archive/2013/01/22/how-does-the-cloud-change-a-systems-architect-s-job.aspx" target="_blank"&gt;posted a blog on how cloud computing would change the Systems Architect&amp;rsquo;s role in an organization&lt;/a&gt;, another on &lt;a href="http://sqlblog.com/b/buckwoody/archive/2013/01/29/how-does-the-cloud-change-a-database-administrator-s-job.aspx" target="_blank"&gt;how the cloud changes a Database Administrator's job&lt;/a&gt;, and the &lt;a href="http://sqlblog.com/b/buckwoody/archive/2013/02/05/how-does-the-cloud-change-a-systems-administrator-s-job.aspx" target="_blank"&gt;last post dealt with the &lt;/a&gt;&lt;a&gt;Systems Administrator&lt;/a&gt;. In this post I'll cover the changes facing the Software Developer when using the cloud. &lt;/p&gt;
&lt;p&gt;The software developer role was the earliest adopter of cloud computing. This makes perfect sense, because the software developer has always used computing "as a service" - they (most often) don't buy and configure servers, platforms and the like, they write code that runs on those platforms. And there's probably not a simpler definition of a software developer to be found, but as with all simple statements, you lose fidelity and detail.&amp;nbsp; I'll offer a more complete list in a moment.&lt;/p&gt;
&lt;p&gt;Because the software developer's process involves designing, testing and writing code locally and then migrating it to a production environment, all of the paradigms in cloud computing - &lt;a href="http://sqlblog.com/b/buckwoody/archive/2012/06/13/windows-azure-write-run-or-use-software.aspx" target="_blank"&gt;from IaaS to PaaS to SaaS&lt;/a&gt; - come naturally. &lt;/p&gt;
&lt;h1&gt;The Software Developer's Role&lt;/h1&gt;
&lt;p&gt;The software developer has evolved since the earliest days of programming.The software developer not only "writes code"&amp;nbsp; - there are far more tasks involved in modern systems development:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Assisting the Business Role(s) in developing software specifications&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Planning software system components and modules&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Designing system components&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Working in teams writing classes, modules, interfaces and software endpoints&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Designing data layouts, architectures, access and other data controls&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Designing and implementing security, either programmatic, declarative, or referential&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Mixing and matching various languages, scripting and other constructs within the system&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Designing and implementing user and account security rights and restrictions&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Designing various software code tests - unit, functional, fuzz, integration, regression, performance and others&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Deploying systems &lt;br /&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Managing and maintaining code updates and changes&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Like most of the previous roles, those tasks also unpacks into a larger set of tasks, and no single developer has exactly that same list. And like the DBA, the role is often more, or less of that list based on where the developer works. Smaller companies may include the development platform in the duties so that a developer is also a systems administrator. In larger organizations I've seen developers that specialized on User Interfaces, Engine Components, Data Controls or other specific areas.&lt;/p&gt;
&lt;h1&gt;How the Cloud Changes Things&lt;/h1&gt;
&lt;p&gt;The software developer role obviously has the same concerns and impacts of "the cloud" as the Systems Architect. They need to educate themselves on the options within this new option (&lt;span style="color:#0000ff;"&gt;Knowledge&lt;/span&gt;), try a few test solutions out (&lt;span style="color:#0000ff;"&gt;Experience&lt;/span&gt;) and of course work with others on various parts of the implementation (&lt;span style="color:#0000ff;"&gt;Coordination&lt;/span&gt;).&lt;/p&gt;
&lt;p&gt;The big changes for a developer include three major areas: Hybrid Software Design, Security, and Distributed Computing.&lt;/p&gt;
&lt;h2&gt;Hybrid Software Design&lt;/h2&gt;
&lt;p&gt;After the PC revolution, software developers designed systems that ran primarily on a single computer. From there the industry moved to "client/server", where most of the code still lived on the user's workstation, and various levels of state (such as the data layer) moved to a server over fast connected lines. After than followed the Internet phase, which had less to do with HTML coding than it did with state-less architectures. While no architecture is truly stateless, there are ways of allowing the client to be in a different state than the server of the application at any one time - this is the way the Web works.&lt;/p&gt;
&lt;p&gt;Even so, the developer often simply moved one the primary layers (such as Model, View or Controller) to the server, using the User Interface merely as the View or Presentation layer. While technically stateless, this doesn't require a great deal of architecture change - there are various software modules that run on a server, and perhaps that connects to a remote data server. In the end, it's still a single paradigm.&amp;nbsp;&lt;/p&gt;
&lt;p&gt;We now have the ability to run IaaS (hardware abstraction), PaaS (hardware, operating system and runtime abstraction) and SaaS (everything abstracted, API calls only) in a single environment such as Windows Azure. A single application might have a Web-based Interface Server with federated processes&amp;nbsp; (using a PaaS set of roles), a database service (using a SaaS provider such as Windows Azure SQL Database), a specialized process in Linux (using an IaaS role in Windows Azure) and a translator API (from the Windows Azure Marketplace). This example involves only one vendor - Microsoft. I've seen applications that use multiple vendors in this same way.&lt;/p&gt;
&lt;p&gt;Thinking this way opens up a great deal of flexibility - and complexity. Complexity isn't evil; it's how complicated things get done many times. The modern developer&amp;nbsp; needs to understand how to build hybrid software architectures. &lt;/p&gt;
&lt;p style="color:#993300;"&gt;&lt;span style="color:#993300;"&gt;&lt;em&gt;&lt;span style="color:#0000ff;"&gt;Resources&lt;/span&gt;:&lt;/em&gt;&lt;/span&gt; Hybrid Architectures with step-by-step instructions and examples:&amp;nbsp;&lt;a href="http://msdn.microsoft.com/en-us/library/hh871440.aspx" target="_blank"&gt;http://msdn.microsoft.com/en-us/library/hh871440.aspx &lt;/a&gt; and &lt;span style="color:#993300;"&gt;Windows Azure Hybrid Systems&lt;/span&gt;:&amp;nbsp;&lt;a href="http://msdn.microsoft.com/en-us/library/hh871440.aspx?AnnouncementFeed&amp;amp;nbsp;" target="_blank"&gt;http://msdn.microsoft.com/en-us/library/hh871440.aspx?AnnouncementFeed&amp;nbsp;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;Security&lt;/h2&gt;
&lt;p&gt;Having a single security boundary, such as "everyone who works in my company", is a relatively simple problem to solve. Normally the System Administrators configure and control a security provider, such as Active Directory, and developers can access that security layer programmatically.&amp;nbsp; That allows for good separation of duties and role-based control.&lt;/p&gt;
&lt;p&gt;In modern applications, clients, managers, and users both internal and external need various levels of access to the same objects, code and data. A client should be able to enter an order, a store should be able to accept the order, the credit-card company should be able to check the order and authorize payment, and the managers should be able to report on the order or change it if needed. Using role-based security across multiple domains would be impossible to maintain.&lt;/p&gt;
&lt;p&gt;Enter "claims-based" authentication. In this paradigm, the user logs in with whatever security they use - corporate or other Active Directory, Facebook, Google, whatever. The application (using Windows Identity Foundation or WIF) can accept a "claim" from that provider, and the developer can match whatever parts of that claim they wish to the objects, code and data. And example might be useful.&lt;/p&gt;
&lt;p&gt;Buck logs in to his corporate Active Directory (AD), and attempts to use a program based in Windows Azure. Windows Azure rejects the login silently, and is configured to check with Buck's AD. Buck's AD says "yes, I know Buck, and he has been granted the following claims: "partner", "manager", "approver". The developer does not need to know about Buck's AD, Buck, his login, or anything else. She simply codes the proper data access to allow "approver" to approve a sale.&amp;nbsp;&lt;/p&gt;
&lt;p&gt;This allows a lot of control, at a very fine level, without having to get into the details of each security provider. .&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#993300;"&gt;&lt;em&gt;&lt;span style="color:#0000ff;"&gt;Resources&lt;/span&gt;:&lt;/em&gt;&lt;/span&gt; &lt;span style="color:#993300;"&gt;Overview of using claims-based Azure Security&lt;/span&gt;: &lt;a href="http://adnanboz.wordpress.com/2011/02/06/claims-based-access-and-windows-azure/" target="_blank"&gt;http://adnanboz.wordpress.com/2011/02/06/claims-based-access-and-windows-azure/ &lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;Distributed Computing&lt;/h2&gt;
&lt;p&gt;Is there a difference between stateless computing, or even the hybrid programming I mentioned earlier, and "Distributed Computing"? Yes - the primary difference is latency. Even stateless code can have too small a tolerance for latency.&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Dealing with slow connectivity, or breaks in connections has many impacts. One method of dealing with this is to locate data and computing of that data as closely as possible, even if this means relaxing consistency or duplicating data. Another method is to go back to a great paradigm from the past that is possible underused today is a Service Oriented Architecture. The Windows Azure Service Bus is possibly one of the fastest and easiest way to adopt cloud computing without completely rearchitecting your application. &lt;/p&gt;
&lt;p&gt;&lt;span style="color:#0000ff;"&gt;&lt;em&gt;References&lt;/em&gt;&lt;/span&gt;: &lt;span style="color:#993300;"&gt;Great breakdown of the thought process around a distributed architecture:&lt;/span&gt; &lt;a href="http://msdn.microsoft.com/en-us/magazine/jj553517.aspx" target="_blank"&gt;http://msdn.microsoft.com/en-us/magazine/jj553517.aspx &lt;/a&gt;and &lt;span style="color:#993300;"&gt;using a Windows Azure Relay Service&lt;/span&gt;: &lt;a href="http://www.windowsazure.com/en-us/develop/net/how-to-guides/service-bus-relay/" target="_blank"&gt;http://www.windowsazure.com/en-us/develop/net/how-to-guides/service-bus-relay/&lt;/a&gt;&amp;nbsp;&lt;/p&gt;</description></item><item><title>How Does the Cloud Change a Database Administrator’s Job?</title><link>http://sqlblog.com/blogs/buck_woody/archive/2013/01/29/how-does-the-cloud-change-a-database-administrator-s-job.aspx</link><pubDate>Tue, 29 Jan 2013 15:08:32 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:47385</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;I recently&lt;a href="http://sqlblog.com/b/buckwoody/archive/2013/01/22/how-does-the-cloud-change-a-systems-architect-s-job.aspx" target="_blank"&gt; posted a blog entry on how cloud computing would change the Systems Architect&amp;rsquo;s role in an organization&lt;/a&gt;. In a way, the Systems Architect has the easiest transition to a new way of using computing technologies. In fact, that&amp;rsquo;s actually part of the job description.&amp;nbsp;I mentioned that a Systems Architect has three primary vectors to think about for cloud computing, as it applies to what they should do:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;span style="color:#0000ff;"&gt;Knowledge - Which options are available to solve problems, and what are their strengths and weaknesses.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#0000ff;"&gt;Experience - What has the System Architect seen and worked with in the past.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#0000ff;"&gt;Coordination - A system design is based on multiple factors, and one person can't make all the choices. There will need to be others involved at every level of the solution, and the Systems Architect will need to know who those people are and how to work with them.&lt;/span&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h1&gt;The Database Administrator Role&lt;/h1&gt;
&lt;p&gt;But a Database Administrator (DBA) is probably one of the harder roles to think about when it comes to cloud computing. First, let&amp;rsquo;s define what a Database Administrator usually thinks about as part of their job:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Planning, Installing and Configuring a Database Platform&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Planning, designing and creating databases&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Planning, designing and implementing High Availability and Disaster Recovery for each database (HADR) based on requirements for its workload&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Maintaining and monitoring the database platform&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Implementing performance tuning on the databases based on monitoring&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Re-balancing workloads across database servers based on monitoring&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Securing databases platforms and individual databases based on requirements and implementation&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That&amp;rsquo;s just a short list, and each of those unpacks into a larger set of tasks.&lt;/p&gt;
&lt;p&gt;The issue is that&lt;em&gt; I&amp;rsquo;ve never actually met a DBA that does all of those things&lt;/em&gt;, or &lt;strong&gt;just&lt;/strong&gt; all of those things. Many times they do much more, sometimes the systems are so large they specialize on just a few of them.&lt;/p&gt;
&lt;p&gt;And as you can see from the list, some of these areas are shared with other roles. For instance, in some shops, the DBA plans, purchases, sets up and configures the hardware for database servers. In others that&amp;rsquo;s done&lt;br /&gt;by the Infrastructure Team. In some shops the DBA designs databases from software requirements, and in others the developers do that &amp;ndash; or perhaps it&amp;rsquo;s done as a joint effort. The same holds true for database code &amp;ndash; sometimes the&lt;br /&gt;DBA does it, other times the developer, and still others it&amp;rsquo;s a shared task.&lt;/p&gt;
&lt;p&gt;In fact, you could argue that there are few other roles in IT where the roles are so intermixed. Also, the DBA works with software the company develops, and software the company buys. They work with hardware, networking, security and software. There are certain aspects of design and tuning that are outside the purview of some of those things, and inside the others.&lt;/p&gt;
&lt;p&gt;With all of these variables, simply telling a DBA that they should &amp;ldquo;use the cloud&amp;rdquo; is not the proper approach.&lt;/p&gt;
&lt;h1&gt;How the Cloud Changes Things&lt;/h1&gt;
&lt;p&gt;To be sure, the DBA has the same vectors as the Systems Architect. They need to educate themselves on the options within this new option (&lt;span style="color:#0000ff;"&gt;Knowledge&lt;/span&gt;), try a few test solutions out (&lt;span style="color:#0000ff;"&gt;Experience&lt;/span&gt;) and of course work with others on various parts of the implementation (&lt;span style="color:#0000ff;"&gt;Coordination&lt;/span&gt;). But it goes beyond that.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.windowsazure.com/en-us/manage/windows/fundamentals/intro-to-windows-azure/#components" target="_blank"&gt;There are three big buckets of cloud computing&lt;/a&gt;, dealing with simply using a Virtual Machine (IaaS) to writing code without worrying about the virtualization or even the operating system (PaaS) and using software that&amp;rsquo;s already written and being delivered via an Application Programming Interface (API). Each of these has so many options and configurations that it&amp;rsquo;s often better to think about the problem you&amp;rsquo;re trying to solve rather than all of the technology within a given area - although some of that is certainly necessary anyway.&amp;nbsp;&lt;/p&gt;
&lt;h2&gt;Database Platform Architecture&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;ll start with when the DBA should even consider cloud computing for a solution. Once again, it&amp;rsquo;s not an &amp;ldquo;all or nothing&amp;rdquo; paradigm, where you either run something on premises or in the cloud &amp;ndash; it&amp;rsquo;s often a matter of selecting the right components to solve a problem.&amp;nbsp; In my design sessions with DBA&amp;rsquo;s I break these down into three big areas where they might want to consider the cloud &amp;ndash;and then we talk about how to implement each one:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;span style="color:#0000ff;"&gt;Audiences&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#0000ff;"&gt;HADR&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#0000ff;"&gt;Data Services&lt;/span&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;Audiences&lt;/h3&gt;
&lt;p&gt;If the users of your database systems all sit in the same facility, you own the servers and networking, and the application servers are separate from the database server, it doesn&amp;rsquo;t usually make sense to take that database workload and place it on Windows Azure &amp;ndash; or any other cloud provider. The latency alone prevents a satisfactory performance profile, and in some cases won&amp;rsquo;t work at all. It doesn&amp;rsquo;t matter if the cloud solution is cheaper or easier &amp;ndash; if you&amp;rsquo;re moving a lot of data every second between an on-premises system and the cloud it won&amp;rsquo;t work well.&lt;/p&gt;
&lt;p&gt;However &amp;ndash; if your users are in multiple locations, especially globally, or you have a mix of company and external customer users, it might make sense to evaluate a shared data location. You still need to consider the implications of how much data the application server pushes back and forth, but you may be able to locate both the application server and SQL Server in an IaaS role. Assuming the data sent to the final client will work across public Internet channels, there may be a fit. There are security implications, but unless you have point-to-point connections for your current solution you&amp;rsquo;re faced with the same security questions on both options.&lt;/p&gt;
&lt;p&gt;Your audience might also be developers looking for a way to quickly spin up a server and then turn it down when they are done, paying for the time and not the hardware or licenses. This is also a prime case for evaluating IaaS. And there are others that you'll find in your own organization as you work through the requirements you have.&amp;nbsp;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Resources: Windows Azure Virtual Machines: &lt;a href="http://www.windowsazure.com/en-us/manage/windows/tutorials/virtual-machine-from-gallery/"&gt;http://www.windowsazure.com/en-us/manage/windows/tutorials/virtual-machine-from-gallery/&lt;/a&gt;&amp;nbsp;and&amp;nbsp;&lt;span style="color:#993300;"&gt;Windows Azure SQL Server Virtual Machines&lt;/span&gt;: &lt;a href="http://www.windowsazure.com/en-us/manage/windows/common-tasks/install-sql-server/"&gt;http://www.windowsazure.com/en-us/manage/windows/common-tasks/install-sql-server/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;HADR&lt;/h3&gt;
&lt;p&gt;The next possible place to consider using cloud computing with SQL Server is as a part of your High Availability and Disaster Recovery plans. In fact, this is the most common use I see for cloud computing and the Database Administrator. The key is the Recovery Point Objective (RPO) and Recovery Time Objective (RTO). Based on each application&amp;rsquo;s requirements, you may find that using Windows Azure or even supplementing your current plan is&lt;br /&gt;the right place to evaluate options. I&amp;rsquo;ve covered this use-case in more detail in another article.&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#993300;"&gt;References: SQL Server High Availability and Disaster Recovery options with Windows Azure&lt;/span&gt;: &lt;a href="http://sqlblog.com/b/buckwoody/archive/2013/01/08/microsoft-windows-azure-disaster-recovery-options-for-on-premises-sql-server.aspx"&gt;http://blogs.msdn.com/b/buckwoody/archive/2013/01/08/microsoft-windows-azure-disaster-recovery-options-for-on-premises-sql-server.aspx&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;Data Services&lt;/h3&gt;
&lt;p&gt;Windows Azure, along with other cloud providers, offers another way to design, create and consume data. In this use-case, however, the tasks DBA&amp;rsquo;s normally perform for sizing, ordering and configuring a system don&amp;rsquo;t apply.&lt;/p&gt;
&lt;p&gt;With Windows Azure SQL Databases (the artist formerly known as SQL Azure), you can simply create a database and begin using it. There are places where this fits and others where it doesn&amp;rsquo;t, and there are differences, limitations and enhancements, so it isn&amp;rsquo;t meant as replacement for what you could do with &amp;ldquo;Full-up&amp;rdquo; SQL Server on a Windows Azure Virtual Machine or an on-premises Instance. If a developer needs an Relational Database Management&lt;br /&gt;(RDBMS) data store for a web-based application, then this might be a perfect fit.&lt;/p&gt;
&lt;p&gt;But there is more to data services than Windows Azure SQL Databases. Windows Azure also offers MySQL as a service, RIAK and MongoDB (among others) and even Hadoop for larger distributed data sets. In addition you can use Windows Azure Reporting Services, and also tap into datasets and data functions in the Windows Azure Marketplace.&lt;/p&gt;
&lt;p&gt;The key for the DBA with this option is that you &lt;em&gt;will&lt;/em&gt; have to do a little investigation this time, and potentially without a specific workload in mind this time. I think that&amp;rsquo;s acceptable thing to ask &amp;ndash; DBA&amp;rsquo;s constantly keep up with data processing trends, and most will consider different ways to solve a problem.&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#993300;"&gt;References:&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#993300;"&gt;Windows Azure SQL Databases&lt;/span&gt;: &lt;a href="http://www.windowsazure.com/en-us/home/features/data-management/" target="_blank"&gt;http://www.windowsazure.com/en-us/home/features/data-management/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#993300;"&gt;Windows Azure Reporting Services&lt;/span&gt;: &lt;a href="http://www.windowsazure.com/en-us/manage/services/other/sql-reporting/" target="_blank"&gt;http://www.windowsazure.com/en-us/manage/services/other/sql-reporting/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#993300;"&gt;HDInsight Service (Hadoop on Azure): &lt;/span&gt;&lt;a href="https://www.hadooponazure.com/" target="_blank"&gt;https://www.hadooponazure.com/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#993300;"&gt;MongoDB Offerings on Windows Azure&lt;/span&gt;: &lt;a href="http://www.windowsazure.com/en-us/manage/linux/common-tasks/mongodb-on-a-linux-vm/" target="_blank"&gt;http://www.windowsazure.com/en-us/manage/linux/common-tasks/mongodb-on-a-linux-vm/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#993300;"&gt;Windows Azure Marketplace&lt;/span&gt;: &lt;a href="http://www.windowsazure.com/en-us/store/overview/" target="_blank"&gt;http://www.windowsazure.com/en-us/store/overview/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;</description></item><item><title>How Does the Cloud Change a  Systems Architect’s Job?</title><link>http://sqlblog.com/blogs/buck_woody/archive/2013/01/22/how-does-the-cloud-change-a-systems-architect-s-job.aspx</link><pubDate>Tue, 22 Jan 2013 15:43:59 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:47243</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;I know - I said I didn't like the "cloud" term, but my better-phrased "Distributed Systems" moniker just never took off like I had hoped. So I'll stick with the "c" word for now, at least until the search engines catch up with my more accurate term.&lt;/p&gt;
&lt;p&gt;I thought I might spend a little time on how the cloud affects the way we work - from Systems Architects to Database Administrators and Developers, and Systems Administrators - a group often referred to as "IT Pro's". But each role within these groups have different aspects when using cloud computing. In this post we'll take a look at the role of the Systems Architect, and in the posts that follow I'll talk more about the other roles in the IT Pro area.&lt;/p&gt;
&lt;h1&gt;The Systems Architect Role&lt;/h1&gt;
&lt;p&gt;What does a "Systems Architect" do? Like most IT roles, it depends on the company or organization where they work. &lt;a href="http://en.wikipedia.org/wiki/Systems_architect" target="_blank"&gt;In fact, the term isn't even specific to technology&lt;/a&gt;, but I'll use it in that context here. In general, a Systems Architect takes the requirements for a given system, and assembles the relevant technology areas that best fulfill those requirements. That's a single-sentence explanation, and needs further unpacking.&lt;/p&gt;
&lt;p&gt;As an example, a Systems Architect at a medical firm&amp;nbsp;is presented with a set of requirements for tracking a patient through the entire care cycle. The Systems Architect first looks at all of the requirements for the data that needs to be collected based on business, financial, regulations, and other requirements, and then how that data needs to flow from one system to another. They check the security requirements, performance, location and other aspects of the system. They then check to see which options are available for processing that data, and which parts they should "build or buy".&lt;/p&gt;
&lt;p&gt;For instance, the requirements might be so specific that only custom code is the proper solution - but even there, choices still exist, such as which language(s) to use, what type of data persistence (a Relational Database Management System or or other data storage and processing) will be used, what talent within the company is available for the system and a myriad of other decision.&lt;/p&gt;
&lt;p&gt;All of this boils down to three primary vectors:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;span style="color:#0000ff;"&gt;&lt;strong&gt;Knowledge&lt;/strong&gt; - Which options are available to solve problems, and what are their strengths and weaknesses.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#0000ff;"&gt;&lt;strong&gt;Experience&lt;/strong&gt; - What has the System Architect seen and worked with in the past.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#0000ff;"&gt;&lt;strong&gt;Coordination&lt;/strong&gt; - A system design is based on multiple factors, and one person can't make all the choices. There will need to be others involved at every level of the solution, and the Systems Architect will need to know who those people are and how to work with them.&lt;/span&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h1&gt;How the Cloud Changes Things&lt;/h1&gt;
&lt;p&gt;From the outset, it doesn't seem that using a distributed system would change anything in the Systems Architect role. Isn't the cloud simply another option that the Systems Architect needs to learn and apply? Yes, that is true - but it goes a bit deeper. Let's return to those vectors a moment to see what a Systems Architect needs to take into account.&lt;/p&gt;
&lt;h2&gt;Knowledge&lt;/h2&gt;
&lt;p&gt;The first and probably most obvious impact is learning about cloud technologies. But the important part of that knowledge is to learn &lt;em&gt;when&lt;/em&gt; and &lt;em&gt;where&lt;/em&gt; to use each service. It's a common misconception that the cloud should be an "all or nothing" approach. That's just not true - every Windows Azure project I work on has some element of on-premises interaction, and in some cases only one small part of a solution is placed on the Windows Azure architecture. Since Windows Azure contains IaaS (VM's) PaaS (you write code, we run it)&amp;nbsp; and even SaaS (Such as Hadoop or Media Services), a given architecture can use multiple components even within just one provider. And I've worked on several projects where the customer used not only Windows Azure and On-Premises environments, but also components from other providers. That's not only acceptable, but often the best way to solve a given problem.&lt;/p&gt;
&lt;p&gt;As part of the learning experience, it's vital to keep in mind what you need to pick as key decision points. In your organization, cost could be ranked higher than performance, or perhaps security is the highest decision point.&lt;/p&gt;
&lt;p&gt;To stay educated, there are various journals, websites and conferences that Systems Architects use to keep current. Almost all of those are talking about "cloud" - but there is no substitute for learning from the vendor about their solution. I'm speaking here of the technical information, not the marketing information. The marketing information is also useful, at least from a familiarity standpoint, but the technical information is what you need.&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#800000;"&gt;Resource: For Windows Azure, the Systems Architect can start here:&lt;/span&gt; &lt;a href="http://sqlblog.com/b/buckwoody/archive/2012/06/13/windows-azure-write-run-or-use-software.aspx" target="_blank"&gt;http://blogs.msdn.com/b/buckwoody/archive/2012/06/13/windows-azure-write-run-or-use-software.aspx&lt;/a&gt;&amp;nbsp; &lt;/p&gt;
&lt;h2&gt;Experience&lt;/h2&gt;
&lt;p&gt;Cloud computing is relatively new - it's only been out a few years, and the main competitors are only now settling in to their respective areas. It might not be common for a Systems Architect to have a lot of hands-on experience with cloud projects.&lt;/p&gt;
&lt;p&gt;Even so, there are ways to leverage the experience of others, such as direct contact or even attending conferences where customers present findings from their experiences.&lt;/p&gt;
&lt;p&gt;You can also gain hands-on experience by setting up pilots and proof-of-concept projects yourself. Most all vendors - Microsoft included - have free time available on their systems. The key to an experiment like this is choosing some problem you are familiar with that exercises as many features in the platform as possible. There is no substitute for working with a platform when you want to design a solution. &lt;/p&gt;
&lt;h2&gt;Coordination&lt;/h2&gt;
&lt;p&gt;Probably one of the largest changes in the Systems Architect role that the cloud brings is in the area of coordination. When a Systems Architect deals with the business and other technical professionals, there is a 20+ year history of technology that we are all familiar with. When you mention "the cloud", those audiences may not have spent the time you have in understanding what that means - and often they think it means the "all or nothing" approach I mentioned earlier.&lt;/p&gt;
&lt;p&gt;I've found that a series of "lunch and learns" for the technical staff is useful to explain to each role-group how the cloud is used in their area is useful. In the posts that follow this one, I'll give you some material for those. For managers and business professionals, you'll want to go a different route. I've found that an "Executive Briefing" e-mail, consisting of about a page, with headings that are applicable to your audience.&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#800000;"&gt;Resource: Writing Executive Summaries:&lt;/span&gt; &lt;a href="http://writing.colostate.edu/guides/guide.cfm?guideid=76" target="_blank"&gt;http://writing.colostate.edu/guides/guide.cfm?guideid=76&lt;/a&gt; &lt;/p&gt;</description></item><item><title>Using Resources the Right Way</title><link>http://sqlblog.com/blogs/buck_woody/archive/2012/09/04/using-resources-the-right-way.aspx</link><pubDate>Tue, 04 Sep 2012 13:11:45 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:44995</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;It&amp;rsquo;s an interesting time in computing technology. At one point there was a dearth of information available for solving a given problem, or educating ourselves on broader topics so that we can solve problems in the future. With dozens, perhaps hundreds or thousands of web sites and content available (for free, in many cases) from vendors, peers, even colleges and universities, it seems like there is actually too much information. Who has the time to absorb all this information and training? Even if you had the inclination, where to start?&lt;/p&gt;
&lt;p&gt;&lt;a href="http://officeimg.vo.msecnd.net/en-us/images/MH900432552.jpg"&gt;&lt;img width="103" height="111" style="border:0px currentColor;vertical-align:middle;float:left;max-width:550px;" src="http://officeimg.vo.msecnd.net/en-us/images/MH900432552.jpg" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In fact, it seems so overwhelming that I often hear people saying that they can&amp;rsquo;t find the training they need, or that vendor X or Y &amp;ldquo;doesn&amp;rsquo;t help their users&amp;rdquo;. On questioning these folks, however, I often find that they &amp;ndash; and sometimes I - haven&amp;rsquo;t put in the effort to learn what resources we have.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s where blogs, like this one, can help. If you follow a blog, either by checking it often or perhaps subscribing to the Really Simple Syndication (RSS) feed, you&amp;rsquo;ll be able to spread out the search or create a mental filter for the information you need.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://officeimg.vo.msecnd.net/en-us/images/MH900441322.jpg"&gt;&lt;img width="108" height="92" style="float:right;max-width:550px;" alt="" src="http://officeimg.vo.msecnd.net/en-us/images/MH900441322.jpg" border="0" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;But it&amp;rsquo;s not enough just read a blog or a web page. The creators need real feedback &amp;ndash; what doesn&amp;rsquo;t work, and what does. Yes, you&amp;rsquo;re allowed to tell a vendor or writer &amp;ldquo;This helped me because&amp;hellip;&amp;rdquo; so that you reinforce the positives. To be sure, bring up what doesn&amp;rsquo;t work as well &amp;ndash;&amp;nbsp; that&amp;rsquo;s fine. But be specific, and be constructive. You&amp;rsquo;d be surprised at how much it matters. I know for a fact at Microsoft we listen &amp;ndash; there is a real live person that reads your comments. I&amp;rsquo;m sure this is true of other vendors, and I also know that most blog authors &amp;ndash; yours truly most especially &amp;ndash; wants to know what you think.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://officeimg.vo.msecnd.net/en-us/images/MH900441322.jpg"&gt;&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;In this blog entry I&amp;rsquo;d to call your attention to three resources you have at your disposal, and how you can use them to help. I&amp;rsquo;ll try to bring up things like this from time to time that I find useful, and cover in them in more depth like this. Think of this as a synopsis of a longer set of resources that you can use to filter whether you want to research further, bookmark, or forward on to a circle of friends where you think it might help them.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h1&gt;Data Driven Design Concepts&lt;/h1&gt;
&lt;p&gt;&lt;a href="http://msdn.microsoft.com/en-us/library/windowsazure/jj156154" target="_blank"&gt;http://msdn.microsoft.com/en-us/library/windowsazure/jj156154&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ll start with a great site that walks you through the process of designing a solution from a data-first perspective. As you know, I believe all computing is merely re-arranging data. If you follow that logic as well, you&amp;rsquo;ll realize&lt;/p&gt;
&lt;p&gt;&lt;a href="http://officeimg.vo.msecnd.net/en-us/images/MH900432681.jpg"&gt;&lt;img width="98" height="103" style="float:right;max-width:550px;" alt="" src="http://officeimg.vo.msecnd.net/en-us/images/MH900432681.jpg" border="0" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;that whenever you create a solution, you should start at the data-end of the application. This resource helps you do that. Even if you don&amp;rsquo;t use the specific technologies the instructions use, the concepts hold for almost any other technology that deals with data. This should be a definite bookmark for a developer, DBA, or Data Architect.&lt;/p&gt;
&lt;p&gt;When I mentioned my admiration for this resource here at Microsoft, the team that created it contacted me and asked if I&amp;rsquo;d share an e-mail address to my readers so that you can comment on it. You&amp;rsquo;re guaranteed to be heard &amp;ndash; you can suggest changes, talk about how useful &amp;ndash; or not &amp;ndash; it is, and so on. Here&amp;rsquo;s that address:&amp;nbsp; &lt;a href="mailto:azuremigration@microsoft.com"&gt;azuremigration@microsoft.com&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://officeimg.vo.msecnd.net/en-us/images/MH900432681.jpg"&gt;&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;h1&gt;End-to-End Example of a complete Hybrid Application &amp;ndash; with Live Demo&lt;/h1&gt;
&lt;p&gt;&lt;a href="https://azurestocktrader.cloudapp.net/Default.aspx" target="_blank"&gt;https://azurestocktrader.cloudapp.net/Default.aspx&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I learn by example. I also like having ready-made, live, functional demos that show the completed solution at work. If you&amp;rsquo;ve ever wanted to learn how a complex, complete, hybrid application that bridges on-premises systems with cloud-based databases, code, functions and more, this is it. It&amp;rsquo;s a stock-trading simulator, and you can get everything from the design to the code itself, or you can just play with the application. It&amp;rsquo;s running on Windows Azure, the actual production servers we use for everything else.&lt;/p&gt;
&lt;h1&gt;Using a Cloud-Based Service&lt;/h1&gt;
&lt;p&gt;&lt;a href="https://azureconfigweb.cloudapp.net/Default.aspx" target="_blank"&gt;https://azureconfigweb.cloudapp.net/Default.aspx&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Along with that stock-trading application, you have a full demonstration and usable code sample of a web-based service available. If you&amp;rsquo;re a developer, this is a style of code you need to understand for everything from iPhone development to a full Service-Oriented Architecture (SOA) environment.&lt;/p&gt;
&lt;p&gt;So check out these resources. I&amp;rsquo;ll post more from time to time as I run across them. Hopefully they&amp;rsquo;ll be as useful to you as they are to me. Oh, and if you have a comment on any of the resources, let them know. And if you have any comments about these or any of my entries, feel free to post away.&lt;/p&gt;
&lt;p&gt;To quote a famous TV Show: &amp;ldquo;Hello Seattle &amp;ndash; I&amp;rsquo;m listening&amp;hellip;&amp;rdquo;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://officeimg.vo.msecnd.net/en-us/images/MH900187459.jpg"&gt;&lt;img style="max-width:550px;" alt="" src="http://officeimg.vo.msecnd.net/en-us/images/MH900187459.jpg" border="0" /&gt;&lt;/a&gt;&lt;/p&gt;</description></item><item><title>Creating a Corporate Data Hub</title><link>http://sqlblog.com/blogs/buck_woody/archive/2012/06/26/creating-a-corporate-data-hub.aspx</link><pubDate>Tue, 26 Jun 2012 14:36:41 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:44090</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;The Windows Azure Marketplace has a rich assortment of data and software offerings for you to use – a type of Software as a Service (SaaS) for IT workers, not necessarily for end-users. Among those offerings is the “Data Hub” – a&amp;#160; codename for a project that ironically actually does what the codename says. &lt;/p&gt;  &lt;p&gt;In many of our organizations, we have multiple data quality issues. Finding data is one problem, but finding it just once is often a bigger problem. Lots of departments and even individuals have stored the same data more than once, and in some cases, made changes to one of the copies. It’s difficult to know which location or version of the data is authoritative.&lt;/p&gt;  &lt;p&gt;Then there’s the problem of accessing the data. It’s fairly straightforward to publish a database, share or other location internally to store the data. But then you have to figure out who owns it, how it is controlled, and pass out the various connection strings to those who want to use it. And then you need to figure out how to let folks access the internal data externally – bringing up all kinds of security issues. Finally, in many cases our user community wants us to combine data from the internally sources with external data, bringing up the security, strings, and exploration features up all over again.&lt;/p&gt;  &lt;p&gt;Enter the Data Hub. This is an online offering, where you assign an administrator and data stewards. You import the data into the service, and it’s available to you - and only you and your organization if you wish. &lt;/p&gt;  &lt;p&gt;&lt;img src="http://www.microsoft.com/global/en-us/sqlazurelabs/PublishingImages/datahub-image3-large.jpg" width="447" height="376" /&gt;&lt;/p&gt;  &lt;p&gt;The basic steps for this service are to set up the portal for your company, assign administrators and permissions, and then you assign data areas and import data into them. From there you make them discoverable, and then you have multiple options that you or your users can access that data. You’re then able, if you wish, to combine that data with other data in one location. &lt;/p&gt;  &lt;p&gt;So how does all that work? What about security? Is it really that easy? And can you really move the data definition off to the Subject Matter Experts (SME’s) that know the particular data stack better than the IT team does?&lt;/p&gt;  &lt;p&gt;Well, nothing good is easy – but using the Data Hub is actually pretty simple. I’ll give you a link in a moment where you can sign up and try this yourself. Once you sign up, you assign an administrator. From there you’ll create data areas, and then use a simple interface to bring the data in. All of this is done in a portal interface – nothing to install, configure, update or manage. &lt;/p&gt;  &lt;p&gt;After the data is entered in, and you’ve assigned meta-data to describe it, your users have multiple options to access it. They can simply use the portal – which actually has powerful visualizations you can use on any platform, even mobile phones or tablets. &lt;/p&gt;  &lt;p&gt;&lt;img src="http://i.msdn.microsoft.com/dynimg/IC498608.gif" width="459" height="213" /&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Your users can also hit the data with Excel – which gives them ultimate flexibility for display, all while using an authoritative, single reference for the data. Since the service is online, they can do this wherever they are – given the proper authentication and permissions. You can also hit the service with simple API calls, like this one from C#: &lt;a title="http://msdn.microsoft.com/en-us/library/hh921924" href="http://msdn.microsoft.com/en-us/library/hh921924"&gt;http://msdn.microsoft.com/en-us/library/hh921924&lt;/a&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;You can make HTTP calls instead of code, and the data can even be exposed as an OData Feed. As you can see, there are a lot of options. &lt;/p&gt;  &lt;p&gt;You can check out the offering here: &lt;a title="http://www.microsoft.com/en-us/sqlazurelabs/labs/data-hub.aspx" href="http://www.microsoft.com/en-us/sqlazurelabs/labs/data-hub.aspx"&gt;http://www.microsoft.com/en-us/sqlazurelabs/labs/data-hub.aspx&lt;/a&gt; and you can read the documentation here: &lt;a title="http://msdn.microsoft.com/en-us/library/hh921938" href="http://msdn.microsoft.com/en-us/library/hh921938"&gt;http://msdn.microsoft.com/en-us/library/hh921938&lt;/a&gt;&lt;/p&gt;</description></item><item><title>Big Data - A Microsoft Tools Approach</title><link>http://sqlblog.com/blogs/buck_woody/archive/2012/02/20/big-data-a-microsoft-tools-approach.aspx</link><pubDate>Mon, 20 Feb 2012 21:16:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:41832</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;&lt;em&gt;&lt;span style="color:#c0504d;"&gt;(As with all of these types of posts, check the date of the latest update I&amp;rsquo;ve made here. Anything older than 6 months is probably out of date, given the speed with which we release new features into Windows and SQL Azure)&lt;/span&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t normally like to discuss things in terms of tools. I find that whenever you start with a given tool (or even a tool stack) it&amp;rsquo;s too easy to fit the problem to the tool(s), rather than the other way around as it should be.&lt;/p&gt;
&lt;p&gt;That being said, it&amp;rsquo;s often useful to have an example to work through to better understand a concept. But like many ideas in Computer Science, &amp;ldquo;Big Data&amp;rdquo; is too broad a term in use to show a single example that brings out the multiple processes, use-cases and patterns you can use it for.&lt;/p&gt;
&lt;p&gt;So we turn to a description of the tools you can use to analyze large data sets. &amp;ldquo;Big Data&amp;rdquo; is a term used lately to describe data sets that have the &amp;ldquo;&lt;a href="http://radar.oreilly.com/2012/01/what-is-big-data.html" target="_blank"&gt;Four V&amp;rsquo;s&lt;/a&gt;&amp;rdquo;&amp;nbsp; as a characteristic, but I have a simpler definition I like to use:&lt;/p&gt;
&lt;p align="center"&gt;&lt;em&gt;&lt;span style="color:#0000ff;font-size:small;"&gt;Big Data involves a data set too large to process in a reasonable period of time&lt;/span&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I realize that&amp;rsquo;s a bit broad, but in my mind it answers the question and is fairly future-proof. The general idea is that you want to analyze some data, and using whatever current methods, storage, compute and so on that you have at hand it doesn&amp;rsquo;t allow you to finish processing it in a time period that you are comfortable with. I&amp;rsquo;ll explain some new tools you can use for this processing.&lt;/p&gt;
&lt;p&gt;Yes, this post is Microsoft-centric. There are probably posts from other vendors and open-source that cover this process in the way they best see fit. And of course you can always &amp;ldquo;mix and match&amp;rdquo;, meaning using Microsoft for one or more parts of the process and other vendors or open-source for another. I never advise that you use any one vendor blindly - educate yourself, examine the facts, perform some tests and choose whatever mix of technologies best solves your problem.&lt;/p&gt;
&lt;p&gt;At the risk of being vendor-specific, and probably incomplete, I use the following short list of tools Microsoft has for working with &amp;ldquo;Big Data&amp;rdquo;. There is no single package that performs all phases of analysis. These tools are what I use; they should not be taken as a Microsoft authoritative testament to the toolset we&amp;rsquo;ll finalize for a given problem-space. In fact, that&amp;rsquo;s the key: find the problem and then fit the tools to that.&lt;/p&gt;
&lt;h2&gt;Process Types&lt;/h2&gt;
&lt;p&gt;I break up the analysis of the data into two process types. The first is examining and processing the data &lt;em&gt;in-line&lt;/em&gt;, meaning as the data passes through some process. The second is a &lt;em&gt;store-analyze-present&lt;/em&gt; process.&lt;/p&gt;
&lt;h2&gt;Processing Data In-Line&lt;/h2&gt;
&lt;p&gt;Processing data in-line means that the data doesn&amp;rsquo;t have a destination - it remains in the source system. But as it moves from an input or is routed to storage within the source system, various methods are available to examine the data as it passes, and either trigger some action or create some analysis.&lt;/p&gt;
&lt;p&gt;You might not think of this as &amp;ldquo;Big Data&amp;rdquo;, but in fact it can be. Organizations have huge amounts of data stored in multiple systems. Many times the data from these systems do not end up in a database for evaluation. There are options, however, to evaluate that data real-time and either act on the data or perhaps copy or stream it to another process for evaluation.&lt;/p&gt;
&lt;p&gt;The advantage of an in-stream data analysis is that you don&amp;rsquo;t necessarily have to store the data again to work with it. That&amp;rsquo;s also a disadvantage - depending on how you architect the solution, you might not retain a historical record. One method of dealing with this requirement is to trigger a rollup collection or a more detailed collection based on the event.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;StreamInsight &lt;/strong&gt;- StreamInsight is Microsoft&amp;rsquo;s &amp;ldquo;Complex Event Processing&amp;rdquo; or CEP engine. This product, hooked into SQL Server 2008R2, has multiple ways of interacting with a data flow. You can create adapters to talk with systems, and then examine the data mid-stream and create triggers to do something with it. You can read more about StreamInsight here: &lt;a title="http://msdn.microsoft.com/en-us/library/ee391416(v=sql.110).aspx" href="http://msdn.microsoft.com/en-us/library/ee391416(v=sql.110).aspx"&gt;http://msdn.microsoft.com/en-us/library/ee391416(v=sql.110).aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;BizTalk &lt;/strong&gt;- When there is more latency available between the initiation of the data and its processing, you can use Microsoft BizTalk. This is a message-passing and Service Bus oriented tool, and it can also be used to join system&amp;rsquo;s data together than normally does not have a direct link, for instance a Mainframe system to SQL Server. You can learn more about BizTalk here: &lt;a href="http://www.microsoft.com/biztalk/en/us/overview.aspx"&gt;http://www.microsoft.com/biztalk/en/us/overview.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;.NET and the Windows Azure Service Bus &lt;/strong&gt;- Along the same lines as BizTalk but with a more programming-oriented design are the Windows and Windows Azure Service Bus tools. The Service Bus allows you to pass messages as well, and opens up web interactions and even inter-company routing. BizTalk can do this as well, but the Service Bus tools use an API approach for designing the flow and interfaces you want. The Service Bus offerings are also intended as near real-time, not as a streaming interface. You can learn more about the Windows Azure Service Bus here: &lt;a href="http://www.windowsazure.com/en-us/home/tour/service-bus/"&gt;http://www.windowsazure.com/en-us/home/tour/service-bus/&lt;/a&gt; and more about the Event Processing side here: &lt;a href="http://msdn.microsoft.com/en-us/magazine/dd569756.aspx"&gt;http://msdn.microsoft.com/en-us/magazine/dd569756.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2&gt;Store-Analyze-Present&lt;/h2&gt;
&lt;p&gt;A more traditional approach with an organization&amp;rsquo;s data is to store the data and analyze it out-of-band. This began with simply running code over a data store, but as locking and blocking became an issue on a file system, Relational Database Management Systems (RDBMs) were created. Over time a distinction was made between data used in an online processing system, meant to be highly available for writing data (OLTP) and systems designed for analytical and reporting purposes (OLAP).&lt;/p&gt;
&lt;p&gt;Later the data grew larger than these systems were designed for, primarily due to consistency requirements. In analysis, however, consistency isn&amp;rsquo;t always a requirement, and so file-based systems for that analysis were re-introduced from the Mainframe concepts, with new technology layered in for speed and size.&lt;/p&gt;
&lt;p&gt;I normally break up the process of analyzing large data sets into four phases:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;em&gt;Source and Transfer &lt;/em&gt;- Obtaining the data at its source and transferring or loading it into the storage; optionally transforming it along the way&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Store and Process&lt;/em&gt; - Data is stored on some sort of persistence, and in some cases an engine handles the acquisition and placement on persistent storage, as well as retrieval through an interface.&lt;/li&gt;
&lt;li&gt;&amp;nbsp;&lt;em&gt;Analysis &lt;/em&gt;- A new layer introduced with &amp;ldquo;Big Data&amp;rdquo; is a separate analysis step. This is dependent on the engine or storage methodology, is often programming language or script based, and sometimes re-introduces the analysis back into the data. Some engines and processes combine this function into the previous phase.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Presentation&lt;/em&gt; - In most cases, the data wants a graphical representation to comprehend, especially in a series or trend analysis. In other cases a simple symbolic representation, similar to the &amp;ldquo;dashboard&amp;rdquo; elements in a Business Intelligence suite. Presentation tools may also have an analysis or refinement capability to allow end-users to work with the data sets. As in the Analysis phase, some methodologies bundle in the Analysis and Presentation phases into one toolset.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;Source and Transfer&lt;/h3&gt;
&lt;p&gt;You&amp;rsquo;ll notice in this area, along with those that follow, Microsoft is adopting not only its own technologies but those within open-source. This is a positive sign, and means that you will have a best-of-breed, supported set of tools to move the data from one location to another. Traditional file-copy, File Transfer Protocol and more are certainly options, but do not normally deal with moving datasets.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve already mentioned the ability of a streaming tool to push data into a store-analyze-present model, so I&amp;rsquo;ll follow up that discussion with the tools that can extract data from one source and place it in another.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;SQL Server Integration Services (SSIS)/SQL Server Bulk Copy Program (BCP)&lt;/span&gt; &lt;/strong&gt;- SSIS is a SQL Server tool used to move data from one location to another, and optionally perform transform or other processes as it does so. You are not limited to working with SQL Server data - in fact, almost any modern source of data from text to various database platforms is available to move to various systems. It is also extremely fast and has a rich development environment. You can learn more about SSIS here: &lt;a href="http://msdn.microsoft.com/en-us/library/ms141026.aspx"&gt;http://msdn.microsoft.com/en-us/library/ms141026.aspx&lt;/a&gt; BCP is a tool that has been used with SQL Server data since the first releases; it has multiple sources and destinations as well. It is a command-line utility,and has some limited transform capabilities. You can learn more about BCP here: &lt;a href="http://msdn.microsoft.com/en-us/library/ms162802.aspx"&gt;http://msdn.microsoft.com/en-us/library/ms162802.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#0000ff;"&gt;&lt;span style="color:#800000;"&gt;Sqoop&lt;/span&gt; &lt;/span&gt;&lt;/strong&gt;- Tied to Microsoft&amp;rsquo;s latest announcements with Hadoop on Windows and Windows Azure, Sqoop is a tool that is used to move data between SQL Server 2008R2 (and higher)&amp;nbsp;and Hadoop, quickly and efficiently. You can read more about that in the Readme file here: &lt;a href="http://www.microsoft.com/download/en/details.aspx?id=27584"&gt;http://www.microsoft.com/download/en/details.aspx?id=27584&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#800000;"&gt;&lt;strong&gt;Application Programming Interfaces&lt;/strong&gt;&lt;/span&gt; - API&amp;rsquo;s exist in most every major language that can connect to one data source, access data, optionally transforming it and storing it in another system. Most every dialect of&amp;nbsp; the .NET-based languages contain methods to perform this task.&lt;/p&gt;
&lt;h3&gt;Store and Process&lt;/h3&gt;
&lt;p&gt;Data at rest is normally used for historical analysis. In some cases this analysis is performed near real-time, and in others historical data is analyzed periodically. Systems that handle data at rest range from simple storage to active management engines.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;SQL Server&lt;/span&gt;&lt;/strong&gt; - Microsoft&amp;rsquo;s flagship RDBMS can indeed store massive amounts of complex data. I am familiar with a two systems in excess of 300 Terabytes of federated data, and the &lt;a href="http://pan-starrs.ifa.hawaii.edu/public/" target="_blank"&gt;Pan-Starrs&lt;/a&gt; project is designed to handle 1+ Petabyte of data. The theoretical limit of SQL Server DataCenter edition is 540 Petabytes. SQL Server is an engine, so the data access and storage is handled in an abstract layer that also handles concurrency for ACID properties. You can learn more about SQL Server here: &lt;a href="http://www.microsoft.com/sqlserver/en/us/product-info/compare.aspx"&gt;http://www.microsoft.com/sqlserver/en/us/product-info/compare.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;SQL Azure Federations&lt;/span&gt;&lt;/strong&gt; - SQL Azure is a database service from Microsoft associated with the Windows Azure platform. Database Servers are multi-tenant, but are shared across a &amp;ldquo;fabric&amp;rdquo; that moves active databases for redundancy and performance. Copies of all databases are kept triple-redundant with a consistent commitment model. Databases are (at this writing - check &lt;a href="http://WindowsAzure.com"&gt;http://WindowsAzure.com&lt;/a&gt; for the latest) capped at a 150 GB size limit per database. However, Microsoft released a &amp;ldquo;Federation&amp;rdquo; technology, allowing you to query a head node and have the data federated out to multiple databases. This improves both size and performance. You can read more about SQL Azure Federations here: &lt;a href="http://social.technet.microsoft.com/wiki/contents/articles/2281.federations-building-scalable-elastic-and-multi-tenant-database-solutions-with-sql-azure.aspx"&gt;http://social.technet.microsoft.com/wiki/contents/articles/2281.federations-building-scalable-elastic-and-multi-tenant-database-solutions-with-sql-azure.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;Analysis Services&lt;/span&gt;&lt;/strong&gt; - The Business Intelligence engine within SQL Server, called Analysis Services, can also handle extremely large data systems. In addition to traditional BI data store layouts (ROLAP, MOLAP and HOLAP), the latest version of SQL Server introduces the Vertipaq column-storage technology allowing more direct access to data and a different level of compression. You can read more about Analysis Services here: &lt;a href="http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/analysis-services.aspx"&gt;http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/analysis-services.aspx&lt;/a&gt; and more about Vertipaq here: &lt;a href="http://msdn.microsoft.com/en-us/library/hh212945(v=SQL.110).aspx"&gt;http://msdn.microsoft.com/en-us/library/hh212945(v=SQL.110).aspx&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#800000;"&gt;&lt;strong&gt;Parallel Data Warehouse &lt;/strong&gt;&lt;/span&gt;- The Parallel Data Warehouse (PDW) offering from Microsoft is largely described by the title. Accessed in multiple ways including using Transact-SQL (the Microsoft dialect of the Structured Query Language), &lt;a href="http://sqlpdw.com/2010/07/what-mpp-means-to-sql-server-parallel-data-warehouse/" target="_blank"&gt;This is an MPP appliance&lt;/a&gt;&amp;nbsp;scaling in parallel to extremely large datasets. It is a hardware and software offering - you can learn more about it here: &lt;a href="http://www.microsoft.com/sqlserver/en/us/solutions-technologies/data-warehousing/pdw.aspx"&gt;http://www.microsoft.com/sqlserver/en/us/solutions-technologies/data-warehousing/pdw.aspx&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;HPC Server&lt;/span&gt;&lt;/strong&gt; - Microsoft&amp;rsquo;s High-Performance Computing version of Windows Server deals not only with large data sets, but with extremely complicated computing requirements. A scale-out architecture and inter-operation with Linux systems, as well as dozens of applications pre-written to work with this server make this a capable &amp;ldquo;Big Data&amp;rdquo; system. It is a mature offering, with a long track record of success in scientific, financial and other areas of data processing. It is available both on premises and in Windows Azure, and also in a hybrid of both models, allowing you to &amp;ldquo;rent&amp;rdquo; a super-computer when needed. You can read more about it here: &lt;a href="http://www.microsoft.com/hpc/en/us/product/cluster-computing.aspx"&gt;http://www.microsoft.com/hpc/en/us/product/cluster-computing.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;Hadoop&lt;/span&gt;&lt;/strong&gt; - Pairing up with Hortonworks, Microsoft has released the Hadoop Open-Source system -&amp;nbsp; including HDFS and a Map/Reduce standardized software, Hive and Pig - on Windows and the Windows Azure platform. This is not a customized version; off-the-shelf concepts and queries work well here. You can read more about Hadoop here: &lt;a href="http://hadoop.apache.org/common/docs/current/"&gt;http://hadoop.apache.org/common/docs/current/&lt;/a&gt; and you can read more about Microsoft&amp;rsquo;s offerings here: &lt;a href="http://hortonworks.com/partners/microsoft/"&gt;http://hortonworks.com/partners/microsoft/&lt;/a&gt;&amp;nbsp;and here: &lt;a href="http://social.technet.microsoft.com/wiki/contents/articles/6204.hadoop-based-services-for-windows.aspx"&gt;http://social.technet.microsoft.com/wiki/contents/articles/6204.hadoop-based-services-for-windows.aspx&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;Windows and Azure Storage&lt;/span&gt;&lt;/strong&gt; - Although not an engine - other than a triple-redundant, immediately consistent commit - Windows Azure can hold terabytes of information and make it available to everything from the R programming language to the Hadoop offering. Binary storage (Blobs) and Table storage (Key-Value Pair) data can be queried across a distributed environment. You can learn more about Windows Azure storage here: &lt;a href="http://msdn.microsoft.com/en-us/library/windowsazure/gg433040.aspx"&gt;http://msdn.microsoft.com/en-us/library/windowsazure/gg433040.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3&gt;Analysis&lt;/h3&gt;
&lt;p&gt;In a &amp;ldquo;Big Data&amp;rdquo; environment, it&amp;rsquo;s not unusual to have a specialized set of tasks for analyzing and even interpreting the data. This is a new field called &amp;ldquo;data Science&amp;rdquo;, with a requirement not only for computing, but also a heavy emphasis on math.&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#800000;"&gt;&lt;strong&gt;Transact-SQL &lt;/strong&gt;&lt;/span&gt;- T-SQL is the dialect of the Structured Query Language used by Microsoft. It includes not only robust selection, updating and manipulating of data, but also analytical and domain-level interrogation as well. It can be used on SQL Server, PDW and ODBC data sources. You can read more about T-SQL here: &lt;a href="http://msdn.microsoft.com/en-us/library/bb510741.aspx"&gt;http://msdn.microsoft.com/en-us/library/bb510741.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;Multidimensional Expressions and Data Analysis Expressions&lt;/span&gt;&lt;/strong&gt; - The MDX and DAX languages allow you to query multidimensional data models that do not fit well with typical two-plane query languages. Pivots, aggregations and more are available within these constructs to query and work with data in Analysis Services. You can read more about MDX here: &lt;a href="http://msdn.microsoft.com/en-us/library/ms145506(v=sql.110).aspx"&gt;http://msdn.microsoft.com/en-us/library/ms145506(v=sql.110).aspx&lt;/a&gt; and more about DAX here: &lt;a href="http://www.microsoft.com/download/en/details.aspx?id=28572"&gt;http://www.microsoft.com/download/en/details.aspx?id=28572&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;HPC Jobs and Tasks &lt;/span&gt;&lt;/strong&gt;- Work submitted to the Windows HPC Server has a particular job - essentially a reservation request for resources. Within a job you can submit tasks, such as parametric sweeps and more. You can learn more about Jobs and Tasks here: &lt;a href="http://technet.microsoft.com/en-us/library/cc719020(v=ws.10).aspx"&gt;http://technet.microsoft.com/en-us/library/cc719020(v=ws.10).aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;HiveQL &lt;/span&gt;&lt;/strong&gt;- HiveQL is the language used to query a Hive object running on Hadoop. You can see a tutorial on that process here: &lt;a href="http://social.technet.microsoft.com/wiki/contents/articles/6628.aspx"&gt;http://social.technet.microsoft.com/wiki/contents/articles/6628.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;Piglatin &lt;/span&gt;&lt;/strong&gt;- Piglatin is the submission language for the Pig implementation on Hadoop. An example of that process is here: &lt;a href="http://sqlblog.com/b/avkashchauhan/archive/2012/01/10/running-apache-pig-pig-latin-at-apache-hadoop-on-windows-azure.aspx"&gt;http://blogs.msdn.com/b/avkashchauhan/archive/2012/01/10/running-apache-pig-pig-latin-at-apache-hadoop-on-windows-azure.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;Application Programming Interfaces &lt;/span&gt;&lt;/strong&gt;- Almost all of the analysis offerings have associated API&amp;rsquo;s - of special note is Microsoft Research&amp;rsquo;s Infer.NET, a new language construct for framework for running Bayesian inference in graphical models, as well as probabilistic programming. You can read more about Infer.NET here: &lt;a href="http://research.microsoft.com/en-us/um/cambridge/projects/infernet/"&gt;http://research.microsoft.com/en-us/um/cambridge/projects/infernet/&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3&gt;Presentation&lt;/h3&gt;
&lt;p&gt;Lots of tools work in presenting the data once you have done the primary analysis. In fact, there&amp;rsquo;s a great video of a comparison of various tools here: &lt;a href="http://msbiacademy.com/Lesson.aspx?id=73"&gt;http://msbiacademy.com/Lesson.aspx?id=73&lt;/a&gt; Primarily focused on Business Intelligence. That term itself is now not as completely defined, but the tools I&amp;rsquo;ll show below can be used in multiple ways - not just traditional Business Intelligence scenarios. Application Programming Interfaces (API&amp;rsquo;s) can also be used for presentation; but I&amp;rsquo;ll focus here on &amp;ldquo;out of the box&amp;rdquo; tools.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;Excel&lt;/span&gt;&lt;/strong&gt; - Microsoft&amp;rsquo;s Excel can be used not only for single-desk analysis of data sets, but with larger datasets as well. It has interfaces into SQL Server, Analysis Services, can be connected to the PDW, and is a first-class job submission system for the Windows HPC Server. You can watch a video about Excel and big data here: &lt;a href="http://www.microsoft.com/en-us/showcase/details.aspx?uuid=e20b7482-11c9-4965-b8f0-7fb6ac7a769f"&gt;http://www.microsoft.com/en-us/showcase/details.aspx?uuid=e20b7482-11c9-4965-b8f0-7fb6ac7a769f&lt;/a&gt;&amp;nbsp;and you can also connect Excel to Hadoop: &lt;a href="http://social.technet.microsoft.com/wiki/contents/articles/how-to-connect-excel-to-hadoop-on-azure-via-hiveodbc.aspx"&gt;http://social.technet.microsoft.com/wiki/contents/articles/how-to-connect-excel-to-hadoop-on-azure-via-hiveodbc.aspx&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;Reporting Services&lt;/span&gt;&lt;/strong&gt; - Reporting Services is a SQL Server tool that can query and show data from multiple sources, all at once. It can also be used with Analysis Services. You can read more about Reporting Services here: &lt;a href="http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/reporting-services.aspx"&gt;http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/reporting-services.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;Power View&lt;/span&gt;&lt;/strong&gt; - Power View is a &amp;ldquo;Self-Service&amp;rdquo; Business Intelligence reporting tool, which can work with on-premises data in addition to SQL Azure and other data. You can read more about it and see videos of Power View in action here: &lt;a href="http://www.microsoft.com/sqlserver/en/us/future-editions/business-intelligence/SQL-Server-2012-reporting-services.aspx"&gt;http://www.microsoft.com/sqlserver/en/us/future-editions/business-intelligence/SQL-Server-2012-reporting-services.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;SharePoint Services -&lt;/span&gt;&lt;/strong&gt; Microsoft has rolled several capable tools in SharePoint as &amp;ldquo;Services&amp;rdquo;. This has the advantage of being able to integrate into the working environment of many companies. You can read more about&amp;nbsp; lots of these reporting and analytic presentation tools here: &lt;a href="http://technet.microsoft.com/en-us/sharepoint/ee692578"&gt;http://technet.microsoft.com/en-us/sharepoint/ee692578&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;This is by no means an exhaustive list - more capabilities are added all the time to Microsoft&amp;rsquo;s products, and things will surely shift and merge as time goes on. Expect today&amp;rsquo;s &amp;ldquo;Big Data&amp;rdquo; to be tomorrow&amp;rsquo;s &amp;ldquo;Laptop Environment&amp;rdquo;.&lt;/p&gt;</description></item><item><title>Cloud Computing Patterns: Using Data Transaction Commitment Models for Design</title><link>http://sqlblog.com/blogs/buck_woody/archive/2012/02/14/cloud-computing-patterns-using-data-transaction-commitment-models-for-design.aspx</link><pubDate>Tue, 14 Feb 2012 20:45:47 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:41744</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;There are multiple ways to store data in a cloud provider, specifically around Windows and SQL Azure. As part of a &amp;ldquo;Data First&amp;rdquo; architecture design, one decision vector &amp;ndash; assuming you&amp;rsquo;ve already done a data classification of the elements you want to store &amp;ndash; is to decide the transaction level you need for that datum.&amp;nbsp; Once you&amp;rsquo;ve decided on what level of transactional commitment you need, you can make intelligent decisions about the storage engine, method of access and storage, speed and other requirements.&lt;/p&gt;
&lt;p&gt;Although the list below is neither original nor exhaustive, these are the general considerations I use for a given data set. It&amp;rsquo;s important to note that in many on premises systems the engine choice at hand overrides these concerns. If you have a large Relational Database Management System (RDBMS) for instance, you might simply place all data there without further consideration. In a Platform as a Service (PaaS) like Windows and SQL Azure, however, selection of the proper engine for a particular dataset has implications ranging from cost to performance, and selecting the right engine is critical when you want to leverage the data across &amp;ldquo;Bid Data&amp;rdquo; analysis like Hadoop or other constructs.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Monolithic Consistent Transactional&lt;/strong&gt;&lt;br /&gt;The first selection is analogous to a local RDBMS system. The dataset is retrieved in a functionally single, monolithic transaction, i.e. kept together with ACID properties in mind. This is the most reliable type of data design for datasets that require a high degree of safety in the read/write pattern. As an example, a bank ATM transaction should be modeled in a monolithic way. If I make a transfer of funds from one account to another, I want the money to be subtracted from one account if and only if it is successfully added to the other. The bank, on the other hand, wants the money added to the second account if and only if it is subtracted from the first. This is a prime example of a monolithic (single atomic transaction), Consistent (if and only if) and Transactional (as a unit, with provision for roll-back and reporting if unsuccessful) data requirement.&lt;/p&gt;
&lt;p&gt;The primary engine used for this type of data is often SQL Azure &amp;ndash; an RDMBS in the same datacenters as Windows Azure. Placing both the calling application, whether that is a Data Access Layer-based code widget or a direct call from a Web or Worker Role, means that data is retrieved quickly and in a monolithic way. The costs for this method is based on overall database size.&amp;nbsp; A consideration is how much data you can store this way. Database sizes have limits, although there are ways of overcoming size issues using technologies such as Sharding or SQL Azure Federations. There is also the consideration of performance. In an RDBMs that conforms to ACID properties, locking and other overhead for safety is at conflict with the highest possible read performance.&amp;nbsp; But in some cases the ACID properties are worth the cost, as in the banking example.&lt;/p&gt;
&lt;p&gt;You are not limited to SQL Azure in this model. Windows Azure Table storage, while similar to NoSQL offerings is different in that it is immediately consistent across all three replicated copies of data, offering a higher degree of safety. And while Table storage does not offer built-in support for transactions, there are ways to achieve certain transaction levels.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Monolithic Realtime&lt;/strong&gt;&lt;br /&gt;If consistency can be relaxed &amp;ndash; meaning that a guaranteed read/write patter is not essential &amp;ndash; then more options arise in Windows and SQL Azure. You can still use SQL Azure for this type of storage, with either automatic or programmatic hints allowing for &amp;ldquo;dirty reads&amp;rdquo;. Windows Azure Table storage is still consistent, but the selection of the method for querying the data such as separate copies of read and write data can be employed. Because of the relaxed transaction nature, higher speeds are possible by querying cached or separate datasets.&lt;/p&gt;
&lt;p&gt;An example here is that same transaction from the bank, but a statement inquiry. Just after the money is deposited, the user wishes to query the current balance. The current balance &amp;ndash; minus the transaction that just occurred &amp;ndash; is retrieved and shown to the user, perhaps even combining the amount with the latest transaction, perhaps saved as a local cached object, with a caveat to the user.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Distributed Realtime&lt;/strong&gt;&lt;br /&gt;At some point, the data becomes too large to fit inside a single processing session, and parallelism is used. In this case, either separate databases in SQL Azure or Windows Azure Tables, local data storage on the Web or Worker Role, or a combination of all with Caches is the right approach for the data design.&lt;/p&gt;
&lt;p&gt;The biggest implication in this type of system is speed &amp;ndash; a higher degree of data separation is essential, and so the dataset selection must fit the pattern. It is unacceptable to force an ACID-properties type workload into this environment. Typical examples here are the actual data asset payload for streaming video or music, read-only documents and so on. This pattern is often separated from the meta-data, which is kept in more of a transactional model.&lt;/p&gt;
&lt;p&gt;As an example, assume you log on to a website to watch a movie or listen to music. The provider needs to verify your identity and account balance, which are transactional data loads. After that process is complete, the workload shifts to a copy &amp;ndash; perhaps one of several &amp;ndash; of the asset to stream to your location.&lt;/p&gt;
&lt;p&gt;In this case, Windows Azure Blob storage, along with the Content Delivery Network (CDN &amp;ndash; a series of servers closer to the user) is employed along with the transactional realtime requirements for the metadata.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Distributed Eventual&lt;/strong&gt;&lt;br /&gt;At the furthest end of the data scale are large datasets that need deeper analysis, but not necessarily in realtime. Examples here are terrabytes of data requiring a Business Intelligence view, but with a tolerance of a few seconds to minutes or hours. In this case, Storage, Processing and Query methods, such as the Hadoop offering in Windows Azure, or perhaps the High Performance Computing (HPC) Windows Server in Windows Azure fit well.&amp;nbsp; Here, the design of the data is often dictated by the source, and more emphasis is placed on the algorithms around processing and re-assembling the data.&lt;/p&gt;
&lt;p&gt;There are, of course, other patterns. In many cases a single dataset may have needs in one or more of these categories &amp;ndash; in fact, sitting at 30,000 feet typing this entry, I&amp;rsquo;m having that very design discussion with a gentleman sitting next to me. The key is to design data-first, and fit the technology to the requirement for each datum. Allow each function and engine to handle the data in the most efficient, effective way for cost, performance and utility.&lt;/p&gt;</description></item><item><title>How Microsoft helps you NOT break your Windows Azure Application: Storage Services Versioning</title><link>http://sqlblog.com/blogs/buck_woody/archive/2011/12/06/how-microsoft-helps-you-not-break-your-windows-azure-application-storage-services-versioning.aspx</link><pubDate>Tue, 06 Dec 2011 14:42:57 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:40161</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;&lt;font size="2"&gt;One of the advantages of using Windows Azure to run your code is that you don’t have to constantly manage upgrades on your platform. While that’s a big advantage indeed, it immediately brings up the question - how do the upgrades happen? Microsoft upgrades the Azure platform in periodic increments, and the components that are affected are documented. &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font size="2"&gt;This brings up another question - upgrades mean change, and change can sometimes alter the way you might implement a feature. What if you have taken a dependency on some feature in your code that has been altered by an upgrade? Windows Azure does have an Application Lifecycle Management (ALM) Process, which I’ll reference at the end of this post. But beyond that, there are some features we’ve put into place that will help you manage many of these changes. One of those is being able to set the version of storage features you would like your code to use. &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font size="2"&gt;Windows Azure is made up of three main component areas: Computing, Storage and a group of features called the Application Fabric. You can use these components together or separately, depending on what you would like your application to do. In this post I’ll deal with the version control in the storage subsystem - in other posts I’ll explain how to track and in some cases control the versions of the other components you work with.&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font size="2"&gt;When you send a request to a Windows Azure resource, you’re actually using a &lt;a href="http://en.wikipedia.org/wiki/REST" target="_blank"&gt;REST&lt;/a&gt; call. That’s a three-part call to the system that has a request (called a URI), a header, and a body of code you want to send. So a typical call, such as to a table, might look like this example, which changes the properties of a Blob: &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font size="2"&gt;&lt;strong&gt;URI&lt;/strong&gt;:       &lt;br /&gt;&lt;font color="#0000ff"&gt;PUT http://myaccount.table.core.windows.net/?restype=service&amp;amp;comp=properties HTTP/1.1&lt;/font&gt;&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font size="2"&gt;&lt;strong&gt;Header&lt;/strong&gt;:       &lt;br /&gt;&lt;font color="#0000ff"&gt;&lt;font style="background-color:#ffff00;"&gt;x-ms-version: 2011-08-18&lt;/font&gt;        &lt;br /&gt;x-ms-date: Tue, 30 Aug 2011 04:28:19 GMT        &lt;br /&gt;Authorization: SharedKey        &lt;br /&gt;myaccount:Z1lTLDwtq5o1UYQluucdsXk6/iB7YxEu0m6VofAEkUE=        &lt;br /&gt;Host: myaccount.table.core.windows.net&lt;/font&gt;&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font size="2"&gt;&lt;strong&gt;Body&lt;/strong&gt;:      &lt;br /&gt;&lt;font color="#0000ff"&gt;&amp;lt;?xml version=&amp;quot;1.0&amp;quot; encoding=&amp;quot;utf-8&amp;quot;?&amp;gt;       &lt;br /&gt;&amp;lt;StorageServiceProperties&amp;gt;        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; &amp;lt;Logging&amp;gt;        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &amp;lt;Version&amp;gt;1.0&amp;lt;/Version&amp;gt;        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &amp;lt;Delete&amp;gt;true&amp;lt;/Delete&amp;gt;        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &amp;lt;Read&amp;gt;false&amp;lt;/Read&amp;gt;        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &amp;lt;Write&amp;gt;true&amp;lt;/Write&amp;gt;        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &amp;lt;RetentionPolicy&amp;gt;        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &amp;lt;Enabled&amp;gt;true&amp;lt;/Enabled&amp;gt;        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &amp;lt;Days&amp;gt;7&amp;lt;/Days&amp;gt;        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &amp;lt;/RetentionPolicy&amp;gt;        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; &amp;lt;/Logging&amp;gt;        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; &amp;lt;Metrics&amp;gt;        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &amp;lt;Version&amp;gt;1.0&amp;lt;/Version&amp;gt;        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &amp;lt;Enabled&amp;gt;true&amp;lt;/Enabled&amp;gt;        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &amp;lt;IncludeAPIs&amp;gt;false&amp;lt;/IncludeAPIs&amp;gt;        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &amp;lt;RetentionPolicy&amp;gt;        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &amp;lt;Enabled&amp;gt;true&amp;lt;/Enabled&amp;gt;        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &amp;lt;Days&amp;gt;7&amp;lt;/Days&amp;gt;        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; &amp;lt;/RetentionPolicy&amp;gt;        &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; &amp;lt;/Metrics&amp;gt;        &lt;br /&gt;&amp;lt;/StorageServiceProperties&amp;gt;        &lt;br /&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font size="2"&gt;&lt;em&gt;(&lt;/em&gt;&lt;a href="http://msdn.microsoft.com/en-us/library/windowsazure/hh452240.aspx" target="_blank"&gt;&lt;em&gt;Source&lt;/em&gt;&lt;/a&gt;&lt;em&gt; of this code)&lt;/em&gt;&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font size="2"&gt;You can see that I’ve highlighted a portion of the header block - that’s where you set the version of the Storage Services you would like to use. You can find a list of the &lt;a href="http://msdn.microsoft.com/en-us/library/windowsazure/dd894041.aspx" target="_blank"&gt;features introduced in each version here&lt;/a&gt;. &lt;/font&gt;&lt;font size="2"&gt;It’s not a requirement of adding that element to the header, but it’s best practices to do so. &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font size="2"&gt;You don’t have to use REST calls directly, however. It’s more common to use the API in the Software Development Kit to just change the property in your IDE environment - the setting you’re looking for there is the &lt;a href="http://msdn.microsoft.com/en-us/library/windowsazure/hh343266.aspx"&gt;Set Storage Service Properties&lt;/a&gt; call. &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font size="2"&gt;Interestingly, rather than a breaking change you might run into an unexpected behavior if you are not aware of these parameters. In some code I recently reviewed a newer feature from the storage system failed when it was called. On inspection I found that the developer had used an older codeblock from a previous version of the storage system - he was not aware you can set the version of storage in the call. We changed the header to the latest version, and everything worked as expected. &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font size="2"&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font size="2"&gt;The Storage Services Versioning and the changes for each version: &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font size="2"&gt;&lt;span style="font-family:'Calibri','sans-serif';font-size:11pt;mso-fareast-font-family:'Times New Roman';mso-bidi-font-family:'Times New Roman';mso-ansi-language:en-us;mso-fareast-language:en-us;mso-bidi-language:ar-sa;"&gt;&lt;a href="http://msdn.microsoft.com/en-us/library/windowsazure/dd894041.aspx"&gt;&lt;u&gt;&lt;font color="#4f81bd" size="2" face="Arial"&gt;http://msdn.microsoft.com/en-us/library/windowsazure/dd894041.aspx&lt;/font&gt;&lt;/u&gt;&lt;/a&gt;&lt;font color="#000000" face="Arial"&gt; &lt;/font&gt;&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family:'Calibri','sans-serif';font-size:11pt;mso-fareast-font-family:'Times New Roman';mso-bidi-font-family:'Times New Roman';mso-ansi-language:en-us;mso-fareast-language:en-us;mso-bidi-language:ar-sa;"&gt;&lt;font color="#000000" size="2" face="Arial"&gt;Windows Azure Application Lifecycle Management: &lt;/font&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family:'Calibri','sans-serif';font-size:11pt;mso-fareast-font-family:'Times New Roman';mso-bidi-font-family:'Times New Roman';mso-ansi-language:en-us;mso-fareast-language:en-us;mso-bidi-language:ar-sa;"&gt;&lt;font color="#000000" size="2" face="Arial"&gt;&lt;a href="http://msdn.microsoft.com/en-us/library/ff803362.aspx"&gt;http://msdn.microsoft.com/en-us/library/ff803362.aspx&lt;/a&gt;&lt;/font&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family:'Calibri','sans-serif';font-size:11pt;mso-fareast-font-family:'Times New Roman';mso-bidi-font-family:'Times New Roman';mso-ansi-language:en-us;mso-fareast-language:en-us;mso-bidi-language:ar-sa;"&gt;&lt;font color="#000000" size="2" face="Arial"&gt;&lt;a href="http://channel9.msdn.com/posts/Windows-Azure-Jump-Start-03-Windows-Azure-Lifecycle-Part-1"&gt;http://channel9.msdn.com/posts/Windows-Azure-Jump-Start-03-Windows-Azure-Lifecycle-Part-1&lt;/a&gt;&lt;/font&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family:'Calibri','sans-serif';font-size:11pt;mso-fareast-font-family:'Times New Roman';mso-bidi-font-family:'Times New Roman';mso-ansi-language:en-us;mso-fareast-language:en-us;mso-bidi-language:ar-sa;"&gt;&lt;a href="http://channel9.msdn.com/Events/TechEd/Australia/Tech-Ed-Australia-2011/COS201"&gt;http://channel9.msdn.com/Events/TechEd/Australia/Tech-Ed-Australia-2011/COS201&lt;/a&gt;&amp;#160;&lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family:'Calibri','sans-serif';font-size:11pt;mso-fareast-font-family:'Times New Roman';mso-bidi-font-family:'Times New Roman';mso-ansi-language:en-us;mso-fareast-language:en-us;mso-bidi-language:ar-sa;"&gt;&amp;#160;&lt;/span&gt;&lt;/p&gt;</description></item><item><title>The Data Scientist</title><link>http://sqlblog.com/blogs/buck_woody/archive/2011/11/15/the-data-scientist.aspx</link><pubDate>Tue, 15 Nov 2011 15:00:18 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:39814</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;A new term - well, perhaps not that new - has come up and I’m actually very excited about it. The term is Data Scientist, and since it’s new, it’s fairly undefined. I’ll explain what I &lt;em&gt;think&lt;/em&gt; it means, and why I’m excited about it.&lt;/p&gt;  &lt;p&gt;In general, I’ve found the term deals at its most basic with analyzing data. Of course, we all do that, and the term itself in that definition is redundant. There is no science that I know of that does not work with analyzing lots of data. But the term seems to refer to more than the common practices of looking at data visually, putting it in a spreadsheet or report, or even using simple coding to examine data sets. &lt;/p&gt;  &lt;p&gt;The term Data Scientist (as far as I can make out this early in it’s use) is someone who has a strong understanding of data sources, relevance (statistical and otherwise) and processing methods as well as front-end displays of large sets of complicated data. Some - but not all - Business Intelligence professionals have these skills. In other cases, senior developers, database architects or others fill these needs, but in my experience, many lack the strong mathematical skills needed to make these choices properly. &lt;/p&gt;  &lt;p&gt;I’ve divided the knowledge base for someone that would wear this title into three large segments. It remains to be seen if a given Data Scientist would be responsible for knowing all these areas or would specialize. There are pretty high requirements on the math side, specifically in graduate-degree level statistics, but in my experience a company will only have a few of these folks, so they are expected to know quite a bit in each of these areas. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Persistence&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;The first area is finding, cleaning and storing the data. In some cases, no cleaning is done prior to storage - it’s just identified and the cleansing is done in a later step. This area is where the professional would be able to tell if a particular data set should be stored in a Relational Database Management System (RDBMS), across a set of key/value pair storage (NoSQL) or in a file system like HDFS (part of the Hadoop landscape) or other methods. Or do you examine the stream of data without storing it in another system at all? &lt;/p&gt;  &lt;p&gt;This is an important decision - it’s a foundation choice that deals not only with a lot of expense of purchasing systems or even using Cloud Computing (PaaS, SaaS or IaaS) to source it, but also the skillsets and other resources needed to care and feed the system for a long time. The Data Scientist sets something into motion that will probably outlast his or her career at a company or organization.&lt;/p&gt;  &lt;p&gt;Often these choices are made by senior developers, database administrators or architects in a company. But sometimes each of these has a certain bias towards making a decision one way or another. The Data Scientist would examine these choices in light of the data itself, starting perhaps even before the business requirements are created. The business may not even be aware of all the strategic and tactical data sources that they have access to. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Processing&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;Once the decision is made to store the data, the next set of decisions are based around how to process the data. An RDBMS scales well to a certain level, and provides a high degree of ACID compliance as well as offering a well-known set-based language to work with this data. In other cases, scale should be spread among multiple nodes (as in the case of Hadoop landscapes or NoSQL offerings) or even across a Cloud provider like Windows Azure Table Storage. In fact, in many cases - most of the ones I’m dealing with lately - the data should be split among multiple types of processing environments. This is a newer idea. Many data professionals simply pick a methodology (RDBMS with Star Schemas, NoSQL, etc.) and put all data there, regardless of its shape, processing needs and so on. &lt;/p&gt;  &lt;p&gt;A Data Scientist is familiar not only with the various processing methods, but how they work, so that they can choose the right one for a given need. This is a huge time commitment, hence the need for a dedicated title like this one. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Presentation&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;This is where the need for a Data Scientist is most often already being filled, sometimes with more or less success. The latest Business Intelligence systems are quite good at allowing you to create amazing graphics - but it’s the data behind the graphics that are the most important component of truly effective displays. &lt;/p&gt;  &lt;p&gt;This is where the mathematics requirement of the Data Scientist title is the most unforgiving. In fact, someone without a good foundation in statistics is not a good candidate for creating reports. Even a basic level of statistics can be dangerous. Anyone who works in analyzing data will tell you that there are multiple errors possible when data just seems right - and basic statistics bears out that you’re on the right track - that are only solvable when you understanding why the statistical formula works the way it does. &lt;/p&gt;  &lt;p&gt;And there are lots of ways of presenting data. Sometimes all you need is a “yes” or “no” answer that can only come after heavy analysis work. In that case, a simple e-mail might be all the reporting you need. In others, complex relationships and multiple components require a deep understanding of the various graphical methods of presenting data. Knowing which kind of chart, color, graphic or shape conveys a particular datum best is essential knowledge for the Data Scientist. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Why I’m excited&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;I love this area of study. I like math, stats, and computing technologies, but it goes beyond that. I love what data can do - how it can help an organization. I’ve been fortunate enough in my professional career these past two decades to work with lots of folks who perform this role at companies from aerospace to medical firms, from manufacturing to retail. &lt;/p&gt;  &lt;p&gt;Interestingly, the size of the company really isn’t germane here. I worked with one very small bio-tech (cryogenics) company that worked deeply with analysis of complex interrelated data. &lt;/p&gt;  &lt;p&gt;So&amp;#160; watch this space. No, I’m not leaving Azure or distributed computing or Microsoft. In fact, I think I’m perfectly situated to investigate this role further. We have a huge set of tools, from RDBMS to Hadoop to allow me to explore. And I’m happy to share what I learn along the way. &lt;/p&gt;</description></item><item><title>Big Data and the Cloud - More Hype or a Real Workload?</title><link>http://sqlblog.com/blogs/buck_woody/archive/2011/10/18/big-data-and-the-cloud-more-hype-or-a-real-workload.aspx</link><pubDate>Tue, 18 Oct 2011 13:57:36 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:39156</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;Last week Microsoft announced several new offerings for “Big Data” - and since I’m a stickler for definitions, I wanted to make sure I understood what that really means. What is “Big Data”? What size hard drive is that? After all, my laptop has 1TB of storage - is my laptop “Big Data”?&lt;/p&gt;  &lt;p&gt;There are actually a few definitions for this term, most notably those involving the &lt;a href="http://nosql.mypopescu.com/post/9621746531/a-definition-of-big-data" target="_blank"&gt;“Four V’s” Volume, Velocity, Variety and Variability&lt;/a&gt;. Others &lt;a href="http://nosql.mypopescu.com/post/10120087314/big-data-and-the-4-vs-volume-velocity-variety" target="_blank"&gt;disagree with this&lt;/a&gt; definition. I tend to try and get things into their simplest form, so I’m using this definition for myself:&lt;/p&gt;  &lt;p align="center"&gt;&lt;font color="#c0504d" size="3"&gt;Big data is defined as a &lt;em&gt;large set &lt;/em&gt;of &lt;em&gt;computationally expensive &lt;/em&gt;data that is &lt;em&gt;worked on simultaneously&lt;/em&gt;.&lt;/font&gt; &lt;/p&gt;  &lt;p&gt;Let me flesh that out a&amp;#160; little. To be sure, “Big Data” has a larger size than say a few megabytes. The reason this is important is that it takes special hardware to be able to move large sets of data around, store it, process it and so on. (&lt;font color="#c0504d"&gt;large set&lt;/font&gt;)&lt;/p&gt;  &lt;p&gt;If you store a LOT of data, but only use a small portion of it at a time, that really isn’t super-hard to do. It’s mainly a storage issue at that point. But, if you do need to work with a large portion of the data at one time, then the memory, CPU and transfer components of the system have to adapt to be responsive - new ways to work with that data (game theory, knot-algorithms, map-reduce, etc.) need to be brought into play. (&lt;font color="#c0504d"&gt;computationally expensive&lt;/font&gt;)&lt;/p&gt;  &lt;p&gt;Once that data is loaded into the processing area (memory or whatever other mechanism is used) it must be worked on in parallel to come back in a reasonable time. You have two options here - you can scale the system up with more internal hardware (CPU’s, memory and so on) or you can scale it out to have multiple systems work on it at the same time using paradigms such as map/reduce and so on. Actually, when you lay this out in an architecture diagram, scale up or out doesn’t actually change the logical structure of the process - in scale out the network becomes the bus, and the nodes become more RAM and computing power. Of course, there are changes in code for how you stitch the workload back together. (&lt;font color="#c0504d"&gt;worked on simultaneously&lt;/font&gt;)&lt;/p&gt;  &lt;p&gt;So back to the original question. Is Big Data, as I have defined it here, a workload for Windows and SQL Azure? Absolutely! In fact, it’s probably one of the main workloads, and I believe it represents the latest, and perhaps also the earliest frontier of computing. Jim &lt;a href="http://research.microsoft.com/en-us/um/people/gray/" target="_blank"&gt;Gray, a former researcher here at Microsoft and a hero of mine, was working on this very topic.&lt;/a&gt; I believe as he did - all computing is simply an interface over data. &lt;/p&gt;  &lt;p&gt;Microsoft has multiple offerings on the topic of Big Data. In posts that follow from myself and my co-workers, we’ll explore when and where you use each one. Whether you are a data professional or a developer, this is the new frontier - &lt;a href="http://www.straightpathsql.com/archives/2011/10/microsoft-loves-your-big-data/" target="_blank"&gt;don’t wait to educate yourself&lt;/a&gt; on how to leverage Big Data for your organization. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Hadoop on Windows Azure and SQL Server&amp;#160; &lt;/strong&gt;- Microsoft’s &lt;a href="http://www.hortonworks.com/the-whys-behind-the-microsoft-and-hortonworks-partnership/" target="_blank"&gt;partnership to include Hadoop workloads on Windows Azure&lt;/a&gt; and &lt;a href="http://www.microsoft.com/download/en/details.aspx?id=27584" target="_blank"&gt;SQL Server/Parallel Data Warehouse (PDW)&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;LINQ to HPC &lt;/strong&gt;- Microsoft’s High-Performance Computing SKU of &lt;a href="http://blogs.technet.com/b/windowshpc/archive/2011/05/20/dryad-becomes-linq-to-hpc.aspx" target="_blank"&gt;HPC is now in Azure&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Windows Azure Table Storage &lt;/strong&gt;- A &lt;a href="http://msdn.microsoft.com/en-us/library/windowsazure/hh508997.aspx" target="_blank"&gt;key/value pair type storage with full partitioning&lt;/a&gt; that is immediately consistent, able to handle huge loads of data and works with any REST-compatible language&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;strong&gt;Other offerings &lt;/strong&gt;- Including the new &lt;a href="http://www.microsoft.com/en-us/sqlazurelabs/default.aspx" target="_blank"&gt;Data Explorer&lt;/a&gt;, &lt;a href="http://research.microsoft.com/en-us/news/headlines/daytona-071811.aspx" target="_blank"&gt;Project Daytona (with a Big Data Toolkit for Scientists and researchers)&lt;/a&gt;, &lt;a href="http://www.microsoft.com/sqlserver/en/us/future-editions/SQL-Server-2012-breakthrough-insight.aspx" target="_blank"&gt;Power View&lt;/a&gt; and more. &lt;/p&gt;  &lt;p&gt;The era of Big Data is here. And you can use Windows and SQL Azure to bring it to your organization. &lt;/p&gt;</description></item></channel></rss>