<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Search results matching tags 'Concepts' and 'SQL Azure'</title><link>http://sqlblog.com/search/SearchResults.aspx?o=DateDescending&amp;tag=Concepts,SQL+Azure&amp;orTags=0</link><description>Search results matching tags 'Concepts' and 'SQL Azure'</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>How Does the Cloud Change a Database Administrator’s Job?</title><link>http://sqlblog.com/blogs/buck_woody/archive/2013/01/29/how-does-the-cloud-change-a-database-administrator-s-job.aspx</link><pubDate>Tue, 29 Jan 2013 15:08:32 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:47385</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;I recently&lt;a href="http://sqlblog.com/b/buckwoody/archive/2013/01/22/how-does-the-cloud-change-a-systems-architect-s-job.aspx" target="_blank"&gt; posted a blog entry on how cloud computing would change the Systems Architect&amp;rsquo;s role in an organization&lt;/a&gt;. In a way, the Systems Architect has the easiest transition to a new way of using computing technologies. In fact, that&amp;rsquo;s actually part of the job description.&amp;nbsp;I mentioned that a Systems Architect has three primary vectors to think about for cloud computing, as it applies to what they should do:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;span style="color:#0000ff;"&gt;Knowledge - Which options are available to solve problems, and what are their strengths and weaknesses.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#0000ff;"&gt;Experience - What has the System Architect seen and worked with in the past.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#0000ff;"&gt;Coordination - A system design is based on multiple factors, and one person can't make all the choices. There will need to be others involved at every level of the solution, and the Systems Architect will need to know who those people are and how to work with them.&lt;/span&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h1&gt;The Database Administrator Role&lt;/h1&gt;
&lt;p&gt;But a Database Administrator (DBA) is probably one of the harder roles to think about when it comes to cloud computing. First, let&amp;rsquo;s define what a Database Administrator usually thinks about as part of their job:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Planning, Installing and Configuring a Database Platform&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Planning, designing and creating databases&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Planning, designing and implementing High Availability and Disaster Recovery for each database (HADR) based on requirements for its workload&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Maintaining and monitoring the database platform&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Implementing performance tuning on the databases based on monitoring&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Re-balancing workloads across database servers based on monitoring&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Securing databases platforms and individual databases based on requirements and implementation&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That&amp;rsquo;s just a short list, and each of those unpacks into a larger set of tasks.&lt;/p&gt;
&lt;p&gt;The issue is that&lt;em&gt; I&amp;rsquo;ve never actually met a DBA that does all of those things&lt;/em&gt;, or &lt;strong&gt;just&lt;/strong&gt; all of those things. Many times they do much more, sometimes the systems are so large they specialize on just a few of them.&lt;/p&gt;
&lt;p&gt;And as you can see from the list, some of these areas are shared with other roles. For instance, in some shops, the DBA plans, purchases, sets up and configures the hardware for database servers. In others that&amp;rsquo;s done&lt;br /&gt;by the Infrastructure Team. In some shops the DBA designs databases from software requirements, and in others the developers do that &amp;ndash; or perhaps it&amp;rsquo;s done as a joint effort. The same holds true for database code &amp;ndash; sometimes the&lt;br /&gt;DBA does it, other times the developer, and still others it&amp;rsquo;s a shared task.&lt;/p&gt;
&lt;p&gt;In fact, you could argue that there are few other roles in IT where the roles are so intermixed. Also, the DBA works with software the company develops, and software the company buys. They work with hardware, networking, security and software. There are certain aspects of design and tuning that are outside the purview of some of those things, and inside the others.&lt;/p&gt;
&lt;p&gt;With all of these variables, simply telling a DBA that they should &amp;ldquo;use the cloud&amp;rdquo; is not the proper approach.&lt;/p&gt;
&lt;h1&gt;How the Cloud Changes Things&lt;/h1&gt;
&lt;p&gt;To be sure, the DBA has the same vectors as the Systems Architect. They need to educate themselves on the options within this new option (&lt;span style="color:#0000ff;"&gt;Knowledge&lt;/span&gt;), try a few test solutions out (&lt;span style="color:#0000ff;"&gt;Experience&lt;/span&gt;) and of course work with others on various parts of the implementation (&lt;span style="color:#0000ff;"&gt;Coordination&lt;/span&gt;). But it goes beyond that.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.windowsazure.com/en-us/manage/windows/fundamentals/intro-to-windows-azure/#components" target="_blank"&gt;There are three big buckets of cloud computing&lt;/a&gt;, dealing with simply using a Virtual Machine (IaaS) to writing code without worrying about the virtualization or even the operating system (PaaS) and using software that&amp;rsquo;s already written and being delivered via an Application Programming Interface (API). Each of these has so many options and configurations that it&amp;rsquo;s often better to think about the problem you&amp;rsquo;re trying to solve rather than all of the technology within a given area - although some of that is certainly necessary anyway.&amp;nbsp;&lt;/p&gt;
&lt;h2&gt;Database Platform Architecture&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;ll start with when the DBA should even consider cloud computing for a solution. Once again, it&amp;rsquo;s not an &amp;ldquo;all or nothing&amp;rdquo; paradigm, where you either run something on premises or in the cloud &amp;ndash; it&amp;rsquo;s often a matter of selecting the right components to solve a problem.&amp;nbsp; In my design sessions with DBA&amp;rsquo;s I break these down into three big areas where they might want to consider the cloud &amp;ndash;and then we talk about how to implement each one:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;span style="color:#0000ff;"&gt;Audiences&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#0000ff;"&gt;HADR&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#0000ff;"&gt;Data Services&lt;/span&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;Audiences&lt;/h3&gt;
&lt;p&gt;If the users of your database systems all sit in the same facility, you own the servers and networking, and the application servers are separate from the database server, it doesn&amp;rsquo;t usually make sense to take that database workload and place it on Windows Azure &amp;ndash; or any other cloud provider. The latency alone prevents a satisfactory performance profile, and in some cases won&amp;rsquo;t work at all. It doesn&amp;rsquo;t matter if the cloud solution is cheaper or easier &amp;ndash; if you&amp;rsquo;re moving a lot of data every second between an on-premises system and the cloud it won&amp;rsquo;t work well.&lt;/p&gt;
&lt;p&gt;However &amp;ndash; if your users are in multiple locations, especially globally, or you have a mix of company and external customer users, it might make sense to evaluate a shared data location. You still need to consider the implications of how much data the application server pushes back and forth, but you may be able to locate both the application server and SQL Server in an IaaS role. Assuming the data sent to the final client will work across public Internet channels, there may be a fit. There are security implications, but unless you have point-to-point connections for your current solution you&amp;rsquo;re faced with the same security questions on both options.&lt;/p&gt;
&lt;p&gt;Your audience might also be developers looking for a way to quickly spin up a server and then turn it down when they are done, paying for the time and not the hardware or licenses. This is also a prime case for evaluating IaaS. And there are others that you'll find in your own organization as you work through the requirements you have.&amp;nbsp;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Resources: Windows Azure Virtual Machines: &lt;a href="http://www.windowsazure.com/en-us/manage/windows/tutorials/virtual-machine-from-gallery/"&gt;http://www.windowsazure.com/en-us/manage/windows/tutorials/virtual-machine-from-gallery/&lt;/a&gt;&amp;nbsp;and&amp;nbsp;&lt;span style="color:#993300;"&gt;Windows Azure SQL Server Virtual Machines&lt;/span&gt;: &lt;a href="http://www.windowsazure.com/en-us/manage/windows/common-tasks/install-sql-server/"&gt;http://www.windowsazure.com/en-us/manage/windows/common-tasks/install-sql-server/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;HADR&lt;/h3&gt;
&lt;p&gt;The next possible place to consider using cloud computing with SQL Server is as a part of your High Availability and Disaster Recovery plans. In fact, this is the most common use I see for cloud computing and the Database Administrator. The key is the Recovery Point Objective (RPO) and Recovery Time Objective (RTO). Based on each application&amp;rsquo;s requirements, you may find that using Windows Azure or even supplementing your current plan is&lt;br /&gt;the right place to evaluate options. I&amp;rsquo;ve covered this use-case in more detail in another article.&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#993300;"&gt;References: SQL Server High Availability and Disaster Recovery options with Windows Azure&lt;/span&gt;: &lt;a href="http://sqlblog.com/b/buckwoody/archive/2013/01/08/microsoft-windows-azure-disaster-recovery-options-for-on-premises-sql-server.aspx"&gt;http://blogs.msdn.com/b/buckwoody/archive/2013/01/08/microsoft-windows-azure-disaster-recovery-options-for-on-premises-sql-server.aspx&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;Data Services&lt;/h3&gt;
&lt;p&gt;Windows Azure, along with other cloud providers, offers another way to design, create and consume data. In this use-case, however, the tasks DBA&amp;rsquo;s normally perform for sizing, ordering and configuring a system don&amp;rsquo;t apply.&lt;/p&gt;
&lt;p&gt;With Windows Azure SQL Databases (the artist formerly known as SQL Azure), you can simply create a database and begin using it. There are places where this fits and others where it doesn&amp;rsquo;t, and there are differences, limitations and enhancements, so it isn&amp;rsquo;t meant as replacement for what you could do with &amp;ldquo;Full-up&amp;rdquo; SQL Server on a Windows Azure Virtual Machine or an on-premises Instance. If a developer needs an Relational Database Management&lt;br /&gt;(RDBMS) data store for a web-based application, then this might be a perfect fit.&lt;/p&gt;
&lt;p&gt;But there is more to data services than Windows Azure SQL Databases. Windows Azure also offers MySQL as a service, RIAK and MongoDB (among others) and even Hadoop for larger distributed data sets. In addition you can use Windows Azure Reporting Services, and also tap into datasets and data functions in the Windows Azure Marketplace.&lt;/p&gt;
&lt;p&gt;The key for the DBA with this option is that you &lt;em&gt;will&lt;/em&gt; have to do a little investigation this time, and potentially without a specific workload in mind this time. I think that&amp;rsquo;s acceptable thing to ask &amp;ndash; DBA&amp;rsquo;s constantly keep up with data processing trends, and most will consider different ways to solve a problem.&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#993300;"&gt;References:&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#993300;"&gt;Windows Azure SQL Databases&lt;/span&gt;: &lt;a href="http://www.windowsazure.com/en-us/home/features/data-management/" target="_blank"&gt;http://www.windowsazure.com/en-us/home/features/data-management/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#993300;"&gt;Windows Azure Reporting Services&lt;/span&gt;: &lt;a href="http://www.windowsazure.com/en-us/manage/services/other/sql-reporting/" target="_blank"&gt;http://www.windowsazure.com/en-us/manage/services/other/sql-reporting/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#993300;"&gt;HDInsight Service (Hadoop on Azure): &lt;/span&gt;&lt;a href="https://www.hadooponazure.com/" target="_blank"&gt;https://www.hadooponazure.com/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#993300;"&gt;MongoDB Offerings on Windows Azure&lt;/span&gt;: &lt;a href="http://www.windowsazure.com/en-us/manage/linux/common-tasks/mongodb-on-a-linux-vm/" target="_blank"&gt;http://www.windowsazure.com/en-us/manage/linux/common-tasks/mongodb-on-a-linux-vm/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#993300;"&gt;Windows Azure Marketplace&lt;/span&gt;: &lt;a href="http://www.windowsazure.com/en-us/store/overview/" target="_blank"&gt;http://www.windowsazure.com/en-us/store/overview/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;</description></item><item><title>The Windows Azure Software Development Kit (SDK) and the Windows Azure Training Kit (WATK)</title><link>http://sqlblog.com/blogs/buck_woody/archive/2012/09/12/the-windows-azure-software-development-kit-sdk-and-the-windows-azure-training-kit-watk.aspx</link><pubDate>Wed, 12 Sep 2012 13:40:40 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:45165</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;Windows Azure is a platform that allows you to write software, run software, or use software that we've already written. We provide lots of resources to help you do that - many can be found right here in this blog series. There are two primary resources you can use, and it's important to understand what they are and what they do.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://officeimg.vo.msecnd.net/en-us/images/MH900441285.jpg"&gt;&lt;img width="121" height="107" style="float:left;max-width:550px;" alt="" src="http://officeimg.vo.msecnd.net/en-us/images/MH900441285.jpg" border="0" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;The Windows Azure Software Development Kit (SDK)&lt;/h1&gt;
&lt;p&gt;Actually, this isn't one resource. We have SDK's for multiple development environments, such as Visual Studio and also Eclipse, along with SDK's for iOS, Android and other environments. Windows Azure is a "back end", so almost any technology or front end system can use it to solve a problem.&lt;/p&gt;
&lt;p&gt;The SDK's are primarily for development. In the case of Visual Studio, you'll get a runtime environment for Windows Azure which allows you to develop, test and even run code all locally - you do not have to be connected to Windows Azure at all, until you're ready to deploy.&lt;/p&gt;
&lt;p&gt;You'll also get a few samples and codeblocks, along with all of the libraries you need to code with Windows Azure in .NET, PHP, Ruby, Java and more.&lt;/p&gt;
&lt;p&gt;The SDK is updated frequently, so check this location to find the latest for your environment and language - just click the bar that corresponds to what you want:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.windowsazure.com/en-us/develop/downloads/" target="_blank"&gt;http://www.windowsazure.com/en-us/develop/downloads/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://officeimg.vo.msecnd.net/en-us/images/MH900438678.jpg"&gt;&lt;img width="151" height="163" style="margin:2px 5px;border:0px currentColor;float:left;max-width:550px;" src="http://officeimg.vo.msecnd.net/en-us/images/MH900438678.jpg" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;The Windows Azure Training Kit (WATK)&lt;/h1&gt;
&lt;p&gt;Whether you're writing code, using Windows Azure Virtual Machines (VM's) or working with Hadoop, you can use the WATK to get examples, code, PowerShell scripts, PowerPoint decks, training videos and much more. This should be your second download after the SDK. This is all of the training you need to get started, and even beyond. The WATK is updated frequently - and you can find the latest one here:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.windowsazure.com/en-us/develop/net/other-resources/training-kit/" target="_blank"&gt;http://www.windowsazure.com/en-us/develop/net/other-resources/training-kit/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;There are many other resources - again, check the &lt;a href="http://windowsazure.com"&gt;http://windowsazure.com&lt;/a&gt; site, the &lt;a href="http://www.windowsazure.com/en-us/community/newsletter/2012/june/" target="_blank"&gt;community newsletter (which introduces the latest features)&lt;/a&gt;, and &lt;a href="http://sqlblog.com/b/buckwoody/rss.aspx" target="_blank"&gt;my blog for more&lt;/a&gt;.&lt;/p&gt;</description></item><item><title>In the Cloud, Everything Costs Money</title><link>http://sqlblog.com/blogs/buck_woody/archive/2012/07/10/in-the-cloud-everything-costs-money.aspx</link><pubDate>Tue, 10 Jul 2012 12:55:50 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:44239</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;I’ve been teaching my daughter about budgeting. I’ve explained that most of the time the money coming in is from only one or two sources – and you can only change that from time to time. The money going out, however, is to many locations, and it changes all the time. She’s made a simple debits and credits spreadsheet, and I’m having her research each part of the budget. Her eyes grow wide when she finds out everything has a cost – the house, gas for the lawnmower, dishes, water for showers, food, electricity to run the fridge, a new fridge when that one breaks, everything has a cost. She asked me “how do you pay for all this?” It’s a sentiment many adults have looking at their own budgets – and one reason that some folks don’t even make a budget. It’s hard to face up to the realities of how much it costs to do what we want to do. &lt;/p&gt;  &lt;p&gt;When we design a computing solution, it’s interesting to set up a similar budget, because we don’t always consider all of the costs associated with it. I’ve seen design sessions where the new software or servers are considered, but the “sunk” costs of personnel, networking, maintenance, increased storage, new sizes for backups and offsite storage and so on are not added in. They are already on premises, so they are assumed to be paid for already.&lt;/p&gt;  &lt;p&gt;When you move to a distributed architecture, you'll see more costs directly reflected. Store something, pay for that storage. If the system is deployed and no one is using it, you’re still paying for it. As you watch those costs rise, you might be tempted to think that a distributed architecture costs more than an on-premises one. &lt;/p&gt;  &lt;p&gt;And you might be right – for some solutions. I’ve worked with a few clients where moving to a distributed architecture doesn’t make financial sense – so we didn’t implement it. I still designed the system in a distributed fashion, however, so that when it does make sense there isn’t much re-architecting to do. &lt;/p&gt;  &lt;p&gt;In other cases, however, if you consider all of the on-premises costs and compare those accurately to operating a system in the cloud, the distributed system is much cheaper. Again, I never recommend that you take a “here-or-there-only” mentality – I think a hybrid distributed system is usually best – but each solution is different. There simply is no “one size fits all” to architecting a solution.&lt;/p&gt;  &lt;p&gt;As you design your solution, cost out each element. You might find that using a hybrid approach saves you money in one design and not in another. It’s a brave new world indeed. &lt;/p&gt;  &lt;p&gt;So yes, in the cloud, everything costs money. But an on-premises solution also costs money – it’s just that “dad” (the company) is paying for it and we don’t always see it. When we go out on our own in the cloud, we need to ensure that we consider all of the costs. &lt;/p&gt;</description></item><item><title>Windows Azure End to End Examples</title><link>http://sqlblog.com/blogs/buck_woody/archive/2012/05/29/windows-azure-end-to-end-examples.aspx</link><pubDate>Tue, 29 May 2012 13:45:59 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:43642</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;I’m fascinated by the way people learn. I’m told there are several methods people use to understand new information, from reading to watching, from experiencing to exploring. &lt;/p&gt;  &lt;p&gt;Personally, I use multiple methods of learning when I encounter a new topic, usually starting with reading a bit about the concepts. I quickly want to put those into practice, however, especially in the technical realm. I immediately look for examples where I can start trying out the concepts. But I often want a “real” example – not just something that represents the concept, but something that is real-world, showing some feature I could actually use. &lt;/p&gt;  &lt;p&gt;And it’s no different with the Windows Azure platform – I like finding things I can do now, and actually use. So when I started learning Windows Azure, &lt;a href="http://www.microsoft.com/en-us/download/details.aspx?id=8396" target="_blank"&gt;I of course began with the Windows Azure Training Kit&lt;/a&gt; – which has lots of examples and labs, presentations and so on. But from there, I wanted more examples I could learn from, and eventually teach others with. I was asked if I would write a few of those up, so here are the ones I use. &lt;/p&gt;  &lt;h2&gt;CodePlex&lt;/h2&gt;  &lt;p&gt;&lt;a href="http://www.codeplex.com/" target="_blank"&gt;CodePlex is Microsoft’s version of an “Open Source” repository&lt;/a&gt;. Anyone can start a project, add code, documentation and more to it and make it available to the world, free of charge, using various licenses as they wish. Microsoft also uses this location for most of the examples we publish, and sample databases for SQL Server. &lt;/p&gt;  &lt;p&gt;If you search in CodePlex for “Azure”, you’ll come back with a list of projects that folks have posted, including those of us at Microsoft. The source code and documentation are there, so you can learn using actual examples of code that will do what you need. There’s everything from a simple table query to &lt;a href="http://blobshare.codeplex.com/" target="_blank"&gt;a full project that is sort of a “Corporate Dropbox” that uses Windows Azure Storage&lt;/a&gt;. &lt;/p&gt;  &lt;p&gt;The advantage is that this code is immediately usable. It’s searchable, and you can often find a complete solution to meet your needs. The disadvantage is that the code is pretty specific – it may not cover a huge project like you’re looking for. Also, depending on the author(s), you might not find the documentation level you want. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;Link: &lt;a href="http://azureexamples.codeplex.com/site/search?query=Azure&amp;amp;ac=8"&gt;http://azureexamples.codeplex.com/site/search?query=Azure&amp;amp;ac=8&lt;/a&gt;&amp;#160;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;h2&gt;Tailspin&lt;/h2&gt;  &lt;p&gt;&lt;a href="http://msdn.microsoft.com/en-us/practices/default" target="_blank"&gt;Microsoft Patterns and Practices&lt;/a&gt; is a group here that does an amazing job at sharing standard ways of doing IT – from operations to coding. If you’re not familiar with this resource, make sure you read up on it. Long before I joined Microsoft I used their work in my daily job – saved a ton of time. It has resources not only for Windows Azure but other Microsoft software as well. &lt;/p&gt;  &lt;p&gt;The Patterns and Practices group also publishes full books – you can buy these, but many are also online for free. There’s an end-to-end example for Windows Azure using a company called “Tailspin”, and the work covers not only the code but the design of the full solution. If you really want to understand the thought that goes into a Platform-as-a-Service solution, this is an excellent resource. &lt;/p&gt;  &lt;p&gt;The advantages are that this is a book, it’s complete, and it includes a discussion of design decisions. The disadvantage is that it’s a little over a year old – and in “Cloud” years that’s a lot. So many things have changed, improved, and have been added that you need to treat this as a resource, but not the only one. Still, highly recommended. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;Link: &lt;a href="http://msdn.microsoft.com/en-us/library/ff728592.aspx"&gt;http://msdn.microsoft.com/en-us/library/ff728592.aspx&lt;/a&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;h2&gt;Azure Stock Trader&lt;/h2&gt;  &lt;p&gt;Sometimes you need a mix of a CodePlex-style application, and a little more detail on how it was put together. And it would be great if you could actually play with the completed application, to see how it really functions on the actual platform.&lt;/p&gt;  &lt;p&gt;That’s the Azure Stock Trader application. There’s a place where you can read about the application, and then it’s been published to Windows Azure – the production platform – and you can use it, explore, and see how it performs. &lt;/p&gt;  &lt;p&gt;I use this application all the time to demonstrate Windows Azure, or a particular part of Windows Azure.&lt;/p&gt;  &lt;p&gt;The advantage is that this is an end-to-end application, and online as well. The disadvantage is that it takes a bit of self-learning to work through.&amp;#160; &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;Links: Learn it: &lt;a href="http://msdn.microsoft.com/en-us/netframework/bb499684"&gt;http://msdn.microsoft.com/en-us/netframework/bb499684&lt;/a&gt; Use it: &lt;a href="https://azurestocktrader.cloudapp.net/"&gt;https://azurestocktrader.cloudapp.net/&lt;/a&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;</description></item><item><title>Big Data - A Microsoft Tools Approach</title><link>http://sqlblog.com/blogs/buck_woody/archive/2012/02/20/big-data-a-microsoft-tools-approach.aspx</link><pubDate>Mon, 20 Feb 2012 21:16:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:41832</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;&lt;em&gt;&lt;span style="color:#c0504d;"&gt;(As with all of these types of posts, check the date of the latest update I&amp;rsquo;ve made here. Anything older than 6 months is probably out of date, given the speed with which we release new features into Windows and SQL Azure)&lt;/span&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t normally like to discuss things in terms of tools. I find that whenever you start with a given tool (or even a tool stack) it&amp;rsquo;s too easy to fit the problem to the tool(s), rather than the other way around as it should be.&lt;/p&gt;
&lt;p&gt;That being said, it&amp;rsquo;s often useful to have an example to work through to better understand a concept. But like many ideas in Computer Science, &amp;ldquo;Big Data&amp;rdquo; is too broad a term in use to show a single example that brings out the multiple processes, use-cases and patterns you can use it for.&lt;/p&gt;
&lt;p&gt;So we turn to a description of the tools you can use to analyze large data sets. &amp;ldquo;Big Data&amp;rdquo; is a term used lately to describe data sets that have the &amp;ldquo;&lt;a href="http://radar.oreilly.com/2012/01/what-is-big-data.html" target="_blank"&gt;Four V&amp;rsquo;s&lt;/a&gt;&amp;rdquo;&amp;nbsp; as a characteristic, but I have a simpler definition I like to use:&lt;/p&gt;
&lt;p align="center"&gt;&lt;em&gt;&lt;span style="color:#0000ff;font-size:small;"&gt;Big Data involves a data set too large to process in a reasonable period of time&lt;/span&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I realize that&amp;rsquo;s a bit broad, but in my mind it answers the question and is fairly future-proof. The general idea is that you want to analyze some data, and using whatever current methods, storage, compute and so on that you have at hand it doesn&amp;rsquo;t allow you to finish processing it in a time period that you are comfortable with. I&amp;rsquo;ll explain some new tools you can use for this processing.&lt;/p&gt;
&lt;p&gt;Yes, this post is Microsoft-centric. There are probably posts from other vendors and open-source that cover this process in the way they best see fit. And of course you can always &amp;ldquo;mix and match&amp;rdquo;, meaning using Microsoft for one or more parts of the process and other vendors or open-source for another. I never advise that you use any one vendor blindly - educate yourself, examine the facts, perform some tests and choose whatever mix of technologies best solves your problem.&lt;/p&gt;
&lt;p&gt;At the risk of being vendor-specific, and probably incomplete, I use the following short list of tools Microsoft has for working with &amp;ldquo;Big Data&amp;rdquo;. There is no single package that performs all phases of analysis. These tools are what I use; they should not be taken as a Microsoft authoritative testament to the toolset we&amp;rsquo;ll finalize for a given problem-space. In fact, that&amp;rsquo;s the key: find the problem and then fit the tools to that.&lt;/p&gt;
&lt;h2&gt;Process Types&lt;/h2&gt;
&lt;p&gt;I break up the analysis of the data into two process types. The first is examining and processing the data &lt;em&gt;in-line&lt;/em&gt;, meaning as the data passes through some process. The second is a &lt;em&gt;store-analyze-present&lt;/em&gt; process.&lt;/p&gt;
&lt;h2&gt;Processing Data In-Line&lt;/h2&gt;
&lt;p&gt;Processing data in-line means that the data doesn&amp;rsquo;t have a destination - it remains in the source system. But as it moves from an input or is routed to storage within the source system, various methods are available to examine the data as it passes, and either trigger some action or create some analysis.&lt;/p&gt;
&lt;p&gt;You might not think of this as &amp;ldquo;Big Data&amp;rdquo;, but in fact it can be. Organizations have huge amounts of data stored in multiple systems. Many times the data from these systems do not end up in a database for evaluation. There are options, however, to evaluate that data real-time and either act on the data or perhaps copy or stream it to another process for evaluation.&lt;/p&gt;
&lt;p&gt;The advantage of an in-stream data analysis is that you don&amp;rsquo;t necessarily have to store the data again to work with it. That&amp;rsquo;s also a disadvantage - depending on how you architect the solution, you might not retain a historical record. One method of dealing with this requirement is to trigger a rollup collection or a more detailed collection based on the event.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;StreamInsight &lt;/strong&gt;- StreamInsight is Microsoft&amp;rsquo;s &amp;ldquo;Complex Event Processing&amp;rdquo; or CEP engine. This product, hooked into SQL Server 2008R2, has multiple ways of interacting with a data flow. You can create adapters to talk with systems, and then examine the data mid-stream and create triggers to do something with it. You can read more about StreamInsight here: &lt;a title="http://msdn.microsoft.com/en-us/library/ee391416(v=sql.110).aspx" href="http://msdn.microsoft.com/en-us/library/ee391416(v=sql.110).aspx"&gt;http://msdn.microsoft.com/en-us/library/ee391416(v=sql.110).aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;BizTalk &lt;/strong&gt;- When there is more latency available between the initiation of the data and its processing, you can use Microsoft BizTalk. This is a message-passing and Service Bus oriented tool, and it can also be used to join system&amp;rsquo;s data together than normally does not have a direct link, for instance a Mainframe system to SQL Server. You can learn more about BizTalk here: &lt;a href="http://www.microsoft.com/biztalk/en/us/overview.aspx"&gt;http://www.microsoft.com/biztalk/en/us/overview.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;.NET and the Windows Azure Service Bus &lt;/strong&gt;- Along the same lines as BizTalk but with a more programming-oriented design are the Windows and Windows Azure Service Bus tools. The Service Bus allows you to pass messages as well, and opens up web interactions and even inter-company routing. BizTalk can do this as well, but the Service Bus tools use an API approach for designing the flow and interfaces you want. The Service Bus offerings are also intended as near real-time, not as a streaming interface. You can learn more about the Windows Azure Service Bus here: &lt;a href="http://www.windowsazure.com/en-us/home/tour/service-bus/"&gt;http://www.windowsazure.com/en-us/home/tour/service-bus/&lt;/a&gt; and more about the Event Processing side here: &lt;a href="http://msdn.microsoft.com/en-us/magazine/dd569756.aspx"&gt;http://msdn.microsoft.com/en-us/magazine/dd569756.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2&gt;Store-Analyze-Present&lt;/h2&gt;
&lt;p&gt;A more traditional approach with an organization&amp;rsquo;s data is to store the data and analyze it out-of-band. This began with simply running code over a data store, but as locking and blocking became an issue on a file system, Relational Database Management Systems (RDBMs) were created. Over time a distinction was made between data used in an online processing system, meant to be highly available for writing data (OLTP) and systems designed for analytical and reporting purposes (OLAP).&lt;/p&gt;
&lt;p&gt;Later the data grew larger than these systems were designed for, primarily due to consistency requirements. In analysis, however, consistency isn&amp;rsquo;t always a requirement, and so file-based systems for that analysis were re-introduced from the Mainframe concepts, with new technology layered in for speed and size.&lt;/p&gt;
&lt;p&gt;I normally break up the process of analyzing large data sets into four phases:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;em&gt;Source and Transfer &lt;/em&gt;- Obtaining the data at its source and transferring or loading it into the storage; optionally transforming it along the way&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Store and Process&lt;/em&gt; - Data is stored on some sort of persistence, and in some cases an engine handles the acquisition and placement on persistent storage, as well as retrieval through an interface.&lt;/li&gt;
&lt;li&gt;&amp;nbsp;&lt;em&gt;Analysis &lt;/em&gt;- A new layer introduced with &amp;ldquo;Big Data&amp;rdquo; is a separate analysis step. This is dependent on the engine or storage methodology, is often programming language or script based, and sometimes re-introduces the analysis back into the data. Some engines and processes combine this function into the previous phase.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Presentation&lt;/em&gt; - In most cases, the data wants a graphical representation to comprehend, especially in a series or trend analysis. In other cases a simple symbolic representation, similar to the &amp;ldquo;dashboard&amp;rdquo; elements in a Business Intelligence suite. Presentation tools may also have an analysis or refinement capability to allow end-users to work with the data sets. As in the Analysis phase, some methodologies bundle in the Analysis and Presentation phases into one toolset.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;Source and Transfer&lt;/h3&gt;
&lt;p&gt;You&amp;rsquo;ll notice in this area, along with those that follow, Microsoft is adopting not only its own technologies but those within open-source. This is a positive sign, and means that you will have a best-of-breed, supported set of tools to move the data from one location to another. Traditional file-copy, File Transfer Protocol and more are certainly options, but do not normally deal with moving datasets.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve already mentioned the ability of a streaming tool to push data into a store-analyze-present model, so I&amp;rsquo;ll follow up that discussion with the tools that can extract data from one source and place it in another.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;SQL Server Integration Services (SSIS)/SQL Server Bulk Copy Program (BCP)&lt;/span&gt; &lt;/strong&gt;- SSIS is a SQL Server tool used to move data from one location to another, and optionally perform transform or other processes as it does so. You are not limited to working with SQL Server data - in fact, almost any modern source of data from text to various database platforms is available to move to various systems. It is also extremely fast and has a rich development environment. You can learn more about SSIS here: &lt;a href="http://msdn.microsoft.com/en-us/library/ms141026.aspx"&gt;http://msdn.microsoft.com/en-us/library/ms141026.aspx&lt;/a&gt; BCP is a tool that has been used with SQL Server data since the first releases; it has multiple sources and destinations as well. It is a command-line utility,and has some limited transform capabilities. You can learn more about BCP here: &lt;a href="http://msdn.microsoft.com/en-us/library/ms162802.aspx"&gt;http://msdn.microsoft.com/en-us/library/ms162802.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#0000ff;"&gt;&lt;span style="color:#800000;"&gt;Sqoop&lt;/span&gt; &lt;/span&gt;&lt;/strong&gt;- Tied to Microsoft&amp;rsquo;s latest announcements with Hadoop on Windows and Windows Azure, Sqoop is a tool that is used to move data between SQL Server 2008R2 (and higher)&amp;nbsp;and Hadoop, quickly and efficiently. You can read more about that in the Readme file here: &lt;a href="http://www.microsoft.com/download/en/details.aspx?id=27584"&gt;http://www.microsoft.com/download/en/details.aspx?id=27584&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#800000;"&gt;&lt;strong&gt;Application Programming Interfaces&lt;/strong&gt;&lt;/span&gt; - API&amp;rsquo;s exist in most every major language that can connect to one data source, access data, optionally transforming it and storing it in another system. Most every dialect of&amp;nbsp; the .NET-based languages contain methods to perform this task.&lt;/p&gt;
&lt;h3&gt;Store and Process&lt;/h3&gt;
&lt;p&gt;Data at rest is normally used for historical analysis. In some cases this analysis is performed near real-time, and in others historical data is analyzed periodically. Systems that handle data at rest range from simple storage to active management engines.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;SQL Server&lt;/span&gt;&lt;/strong&gt; - Microsoft&amp;rsquo;s flagship RDBMS can indeed store massive amounts of complex data. I am familiar with a two systems in excess of 300 Terabytes of federated data, and the &lt;a href="http://pan-starrs.ifa.hawaii.edu/public/" target="_blank"&gt;Pan-Starrs&lt;/a&gt; project is designed to handle 1+ Petabyte of data. The theoretical limit of SQL Server DataCenter edition is 540 Petabytes. SQL Server is an engine, so the data access and storage is handled in an abstract layer that also handles concurrency for ACID properties. You can learn more about SQL Server here: &lt;a href="http://www.microsoft.com/sqlserver/en/us/product-info/compare.aspx"&gt;http://www.microsoft.com/sqlserver/en/us/product-info/compare.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;SQL Azure Federations&lt;/span&gt;&lt;/strong&gt; - SQL Azure is a database service from Microsoft associated with the Windows Azure platform. Database Servers are multi-tenant, but are shared across a &amp;ldquo;fabric&amp;rdquo; that moves active databases for redundancy and performance. Copies of all databases are kept triple-redundant with a consistent commitment model. Databases are (at this writing - check &lt;a href="http://WindowsAzure.com"&gt;http://WindowsAzure.com&lt;/a&gt; for the latest) capped at a 150 GB size limit per database. However, Microsoft released a &amp;ldquo;Federation&amp;rdquo; technology, allowing you to query a head node and have the data federated out to multiple databases. This improves both size and performance. You can read more about SQL Azure Federations here: &lt;a href="http://social.technet.microsoft.com/wiki/contents/articles/2281.federations-building-scalable-elastic-and-multi-tenant-database-solutions-with-sql-azure.aspx"&gt;http://social.technet.microsoft.com/wiki/contents/articles/2281.federations-building-scalable-elastic-and-multi-tenant-database-solutions-with-sql-azure.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;Analysis Services&lt;/span&gt;&lt;/strong&gt; - The Business Intelligence engine within SQL Server, called Analysis Services, can also handle extremely large data systems. In addition to traditional BI data store layouts (ROLAP, MOLAP and HOLAP), the latest version of SQL Server introduces the Vertipaq column-storage technology allowing more direct access to data and a different level of compression. You can read more about Analysis Services here: &lt;a href="http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/analysis-services.aspx"&gt;http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/analysis-services.aspx&lt;/a&gt; and more about Vertipaq here: &lt;a href="http://msdn.microsoft.com/en-us/library/hh212945(v=SQL.110).aspx"&gt;http://msdn.microsoft.com/en-us/library/hh212945(v=SQL.110).aspx&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#800000;"&gt;&lt;strong&gt;Parallel Data Warehouse &lt;/strong&gt;&lt;/span&gt;- The Parallel Data Warehouse (PDW) offering from Microsoft is largely described by the title. Accessed in multiple ways including using Transact-SQL (the Microsoft dialect of the Structured Query Language), &lt;a href="http://sqlpdw.com/2010/07/what-mpp-means-to-sql-server-parallel-data-warehouse/" target="_blank"&gt;This is an MPP appliance&lt;/a&gt;&amp;nbsp;scaling in parallel to extremely large datasets. It is a hardware and software offering - you can learn more about it here: &lt;a href="http://www.microsoft.com/sqlserver/en/us/solutions-technologies/data-warehousing/pdw.aspx"&gt;http://www.microsoft.com/sqlserver/en/us/solutions-technologies/data-warehousing/pdw.aspx&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;HPC Server&lt;/span&gt;&lt;/strong&gt; - Microsoft&amp;rsquo;s High-Performance Computing version of Windows Server deals not only with large data sets, but with extremely complicated computing requirements. A scale-out architecture and inter-operation with Linux systems, as well as dozens of applications pre-written to work with this server make this a capable &amp;ldquo;Big Data&amp;rdquo; system. It is a mature offering, with a long track record of success in scientific, financial and other areas of data processing. It is available both on premises and in Windows Azure, and also in a hybrid of both models, allowing you to &amp;ldquo;rent&amp;rdquo; a super-computer when needed. You can read more about it here: &lt;a href="http://www.microsoft.com/hpc/en/us/product/cluster-computing.aspx"&gt;http://www.microsoft.com/hpc/en/us/product/cluster-computing.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;Hadoop&lt;/span&gt;&lt;/strong&gt; - Pairing up with Hortonworks, Microsoft has released the Hadoop Open-Source system -&amp;nbsp; including HDFS and a Map/Reduce standardized software, Hive and Pig - on Windows and the Windows Azure platform. This is not a customized version; off-the-shelf concepts and queries work well here. You can read more about Hadoop here: &lt;a href="http://hadoop.apache.org/common/docs/current/"&gt;http://hadoop.apache.org/common/docs/current/&lt;/a&gt; and you can read more about Microsoft&amp;rsquo;s offerings here: &lt;a href="http://hortonworks.com/partners/microsoft/"&gt;http://hortonworks.com/partners/microsoft/&lt;/a&gt;&amp;nbsp;and here: &lt;a href="http://social.technet.microsoft.com/wiki/contents/articles/6204.hadoop-based-services-for-windows.aspx"&gt;http://social.technet.microsoft.com/wiki/contents/articles/6204.hadoop-based-services-for-windows.aspx&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;Windows and Azure Storage&lt;/span&gt;&lt;/strong&gt; - Although not an engine - other than a triple-redundant, immediately consistent commit - Windows Azure can hold terabytes of information and make it available to everything from the R programming language to the Hadoop offering. Binary storage (Blobs) and Table storage (Key-Value Pair) data can be queried across a distributed environment. You can learn more about Windows Azure storage here: &lt;a href="http://msdn.microsoft.com/en-us/library/windowsazure/gg433040.aspx"&gt;http://msdn.microsoft.com/en-us/library/windowsazure/gg433040.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3&gt;Analysis&lt;/h3&gt;
&lt;p&gt;In a &amp;ldquo;Big Data&amp;rdquo; environment, it&amp;rsquo;s not unusual to have a specialized set of tasks for analyzing and even interpreting the data. This is a new field called &amp;ldquo;data Science&amp;rdquo;, with a requirement not only for computing, but also a heavy emphasis on math.&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#800000;"&gt;&lt;strong&gt;Transact-SQL &lt;/strong&gt;&lt;/span&gt;- T-SQL is the dialect of the Structured Query Language used by Microsoft. It includes not only robust selection, updating and manipulating of data, but also analytical and domain-level interrogation as well. It can be used on SQL Server, PDW and ODBC data sources. You can read more about T-SQL here: &lt;a href="http://msdn.microsoft.com/en-us/library/bb510741.aspx"&gt;http://msdn.microsoft.com/en-us/library/bb510741.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;Multidimensional Expressions and Data Analysis Expressions&lt;/span&gt;&lt;/strong&gt; - The MDX and DAX languages allow you to query multidimensional data models that do not fit well with typical two-plane query languages. Pivots, aggregations and more are available within these constructs to query and work with data in Analysis Services. You can read more about MDX here: &lt;a href="http://msdn.microsoft.com/en-us/library/ms145506(v=sql.110).aspx"&gt;http://msdn.microsoft.com/en-us/library/ms145506(v=sql.110).aspx&lt;/a&gt; and more about DAX here: &lt;a href="http://www.microsoft.com/download/en/details.aspx?id=28572"&gt;http://www.microsoft.com/download/en/details.aspx?id=28572&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;HPC Jobs and Tasks &lt;/span&gt;&lt;/strong&gt;- Work submitted to the Windows HPC Server has a particular job - essentially a reservation request for resources. Within a job you can submit tasks, such as parametric sweeps and more. You can learn more about Jobs and Tasks here: &lt;a href="http://technet.microsoft.com/en-us/library/cc719020(v=ws.10).aspx"&gt;http://technet.microsoft.com/en-us/library/cc719020(v=ws.10).aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;HiveQL &lt;/span&gt;&lt;/strong&gt;- HiveQL is the language used to query a Hive object running on Hadoop. You can see a tutorial on that process here: &lt;a href="http://social.technet.microsoft.com/wiki/contents/articles/6628.aspx"&gt;http://social.technet.microsoft.com/wiki/contents/articles/6628.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;Piglatin &lt;/span&gt;&lt;/strong&gt;- Piglatin is the submission language for the Pig implementation on Hadoop. An example of that process is here: &lt;a href="http://sqlblog.com/b/avkashchauhan/archive/2012/01/10/running-apache-pig-pig-latin-at-apache-hadoop-on-windows-azure.aspx"&gt;http://blogs.msdn.com/b/avkashchauhan/archive/2012/01/10/running-apache-pig-pig-latin-at-apache-hadoop-on-windows-azure.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;Application Programming Interfaces &lt;/span&gt;&lt;/strong&gt;- Almost all of the analysis offerings have associated API&amp;rsquo;s - of special note is Microsoft Research&amp;rsquo;s Infer.NET, a new language construct for framework for running Bayesian inference in graphical models, as well as probabilistic programming. You can read more about Infer.NET here: &lt;a href="http://research.microsoft.com/en-us/um/cambridge/projects/infernet/"&gt;http://research.microsoft.com/en-us/um/cambridge/projects/infernet/&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3&gt;Presentation&lt;/h3&gt;
&lt;p&gt;Lots of tools work in presenting the data once you have done the primary analysis. In fact, there&amp;rsquo;s a great video of a comparison of various tools here: &lt;a href="http://msbiacademy.com/Lesson.aspx?id=73"&gt;http://msbiacademy.com/Lesson.aspx?id=73&lt;/a&gt; Primarily focused on Business Intelligence. That term itself is now not as completely defined, but the tools I&amp;rsquo;ll show below can be used in multiple ways - not just traditional Business Intelligence scenarios. Application Programming Interfaces (API&amp;rsquo;s) can also be used for presentation; but I&amp;rsquo;ll focus here on &amp;ldquo;out of the box&amp;rdquo; tools.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;Excel&lt;/span&gt;&lt;/strong&gt; - Microsoft&amp;rsquo;s Excel can be used not only for single-desk analysis of data sets, but with larger datasets as well. It has interfaces into SQL Server, Analysis Services, can be connected to the PDW, and is a first-class job submission system for the Windows HPC Server. You can watch a video about Excel and big data here: &lt;a href="http://www.microsoft.com/en-us/showcase/details.aspx?uuid=e20b7482-11c9-4965-b8f0-7fb6ac7a769f"&gt;http://www.microsoft.com/en-us/showcase/details.aspx?uuid=e20b7482-11c9-4965-b8f0-7fb6ac7a769f&lt;/a&gt;&amp;nbsp;and you can also connect Excel to Hadoop: &lt;a href="http://social.technet.microsoft.com/wiki/contents/articles/how-to-connect-excel-to-hadoop-on-azure-via-hiveodbc.aspx"&gt;http://social.technet.microsoft.com/wiki/contents/articles/how-to-connect-excel-to-hadoop-on-azure-via-hiveodbc.aspx&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;Reporting Services&lt;/span&gt;&lt;/strong&gt; - Reporting Services is a SQL Server tool that can query and show data from multiple sources, all at once. It can also be used with Analysis Services. You can read more about Reporting Services here: &lt;a href="http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/reporting-services.aspx"&gt;http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/reporting-services.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;Power View&lt;/span&gt;&lt;/strong&gt; - Power View is a &amp;ldquo;Self-Service&amp;rdquo; Business Intelligence reporting tool, which can work with on-premises data in addition to SQL Azure and other data. You can read more about it and see videos of Power View in action here: &lt;a href="http://www.microsoft.com/sqlserver/en/us/future-editions/business-intelligence/SQL-Server-2012-reporting-services.aspx"&gt;http://www.microsoft.com/sqlserver/en/us/future-editions/business-intelligence/SQL-Server-2012-reporting-services.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style="color:#800000;"&gt;SharePoint Services -&lt;/span&gt;&lt;/strong&gt; Microsoft has rolled several capable tools in SharePoint as &amp;ldquo;Services&amp;rdquo;. This has the advantage of being able to integrate into the working environment of many companies. You can read more about&amp;nbsp; lots of these reporting and analytic presentation tools here: &lt;a href="http://technet.microsoft.com/en-us/sharepoint/ee692578"&gt;http://technet.microsoft.com/en-us/sharepoint/ee692578&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;This is by no means an exhaustive list - more capabilities are added all the time to Microsoft&amp;rsquo;s products, and things will surely shift and merge as time goes on. Expect today&amp;rsquo;s &amp;ldquo;Big Data&amp;rdquo; to be tomorrow&amp;rsquo;s &amp;ldquo;Laptop Environment&amp;rdquo;.&lt;/p&gt;</description></item><item><title>The Data Scientist</title><link>http://sqlblog.com/blogs/buck_woody/archive/2011/11/15/the-data-scientist.aspx</link><pubDate>Tue, 15 Nov 2011 15:00:18 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:39814</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;A new term - well, perhaps not that new - has come up and I’m actually very excited about it. The term is Data Scientist, and since it’s new, it’s fairly undefined. I’ll explain what I &lt;em&gt;think&lt;/em&gt; it means, and why I’m excited about it.&lt;/p&gt;  &lt;p&gt;In general, I’ve found the term deals at its most basic with analyzing data. Of course, we all do that, and the term itself in that definition is redundant. There is no science that I know of that does not work with analyzing lots of data. But the term seems to refer to more than the common practices of looking at data visually, putting it in a spreadsheet or report, or even using simple coding to examine data sets. &lt;/p&gt;  &lt;p&gt;The term Data Scientist (as far as I can make out this early in it’s use) is someone who has a strong understanding of data sources, relevance (statistical and otherwise) and processing methods as well as front-end displays of large sets of complicated data. Some - but not all - Business Intelligence professionals have these skills. In other cases, senior developers, database architects or others fill these needs, but in my experience, many lack the strong mathematical skills needed to make these choices properly. &lt;/p&gt;  &lt;p&gt;I’ve divided the knowledge base for someone that would wear this title into three large segments. It remains to be seen if a given Data Scientist would be responsible for knowing all these areas or would specialize. There are pretty high requirements on the math side, specifically in graduate-degree level statistics, but in my experience a company will only have a few of these folks, so they are expected to know quite a bit in each of these areas. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Persistence&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;The first area is finding, cleaning and storing the data. In some cases, no cleaning is done prior to storage - it’s just identified and the cleansing is done in a later step. This area is where the professional would be able to tell if a particular data set should be stored in a Relational Database Management System (RDBMS), across a set of key/value pair storage (NoSQL) or in a file system like HDFS (part of the Hadoop landscape) or other methods. Or do you examine the stream of data without storing it in another system at all? &lt;/p&gt;  &lt;p&gt;This is an important decision - it’s a foundation choice that deals not only with a lot of expense of purchasing systems or even using Cloud Computing (PaaS, SaaS or IaaS) to source it, but also the skillsets and other resources needed to care and feed the system for a long time. The Data Scientist sets something into motion that will probably outlast his or her career at a company or organization.&lt;/p&gt;  &lt;p&gt;Often these choices are made by senior developers, database administrators or architects in a company. But sometimes each of these has a certain bias towards making a decision one way or another. The Data Scientist would examine these choices in light of the data itself, starting perhaps even before the business requirements are created. The business may not even be aware of all the strategic and tactical data sources that they have access to. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Processing&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;Once the decision is made to store the data, the next set of decisions are based around how to process the data. An RDBMS scales well to a certain level, and provides a high degree of ACID compliance as well as offering a well-known set-based language to work with this data. In other cases, scale should be spread among multiple nodes (as in the case of Hadoop landscapes or NoSQL offerings) or even across a Cloud provider like Windows Azure Table Storage. In fact, in many cases - most of the ones I’m dealing with lately - the data should be split among multiple types of processing environments. This is a newer idea. Many data professionals simply pick a methodology (RDBMS with Star Schemas, NoSQL, etc.) and put all data there, regardless of its shape, processing needs and so on. &lt;/p&gt;  &lt;p&gt;A Data Scientist is familiar not only with the various processing methods, but how they work, so that they can choose the right one for a given need. This is a huge time commitment, hence the need for a dedicated title like this one. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Presentation&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;This is where the need for a Data Scientist is most often already being filled, sometimes with more or less success. The latest Business Intelligence systems are quite good at allowing you to create amazing graphics - but it’s the data behind the graphics that are the most important component of truly effective displays. &lt;/p&gt;  &lt;p&gt;This is where the mathematics requirement of the Data Scientist title is the most unforgiving. In fact, someone without a good foundation in statistics is not a good candidate for creating reports. Even a basic level of statistics can be dangerous. Anyone who works in analyzing data will tell you that there are multiple errors possible when data just seems right - and basic statistics bears out that you’re on the right track - that are only solvable when you understanding why the statistical formula works the way it does. &lt;/p&gt;  &lt;p&gt;And there are lots of ways of presenting data. Sometimes all you need is a “yes” or “no” answer that can only come after heavy analysis work. In that case, a simple e-mail might be all the reporting you need. In others, complex relationships and multiple components require a deep understanding of the various graphical methods of presenting data. Knowing which kind of chart, color, graphic or shape conveys a particular datum best is essential knowledge for the Data Scientist. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Why I’m excited&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;I love this area of study. I like math, stats, and computing technologies, but it goes beyond that. I love what data can do - how it can help an organization. I’ve been fortunate enough in my professional career these past two decades to work with lots of folks who perform this role at companies from aerospace to medical firms, from manufacturing to retail. &lt;/p&gt;  &lt;p&gt;Interestingly, the size of the company really isn’t germane here. I worked with one very small bio-tech (cryogenics) company that worked deeply with analysis of complex interrelated data. &lt;/p&gt;  &lt;p&gt;So&amp;#160; watch this space. No, I’m not leaving Azure or distributed computing or Microsoft. In fact, I think I’m perfectly situated to investigate this role further. We have a huge set of tools, from RDBMS to Hadoop to allow me to explore. And I’m happy to share what I learn along the way. &lt;/p&gt;</description></item><item><title>Big Data and the Cloud - More Hype or a Real Workload?</title><link>http://sqlblog.com/blogs/buck_woody/archive/2011/10/18/big-data-and-the-cloud-more-hype-or-a-real-workload.aspx</link><pubDate>Tue, 18 Oct 2011 13:57:36 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:39156</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;Last week Microsoft announced several new offerings for “Big Data” - and since I’m a stickler for definitions, I wanted to make sure I understood what that really means. What is “Big Data”? What size hard drive is that? After all, my laptop has 1TB of storage - is my laptop “Big Data”?&lt;/p&gt;  &lt;p&gt;There are actually a few definitions for this term, most notably those involving the &lt;a href="http://nosql.mypopescu.com/post/9621746531/a-definition-of-big-data" target="_blank"&gt;“Four V’s” Volume, Velocity, Variety and Variability&lt;/a&gt;. Others &lt;a href="http://nosql.mypopescu.com/post/10120087314/big-data-and-the-4-vs-volume-velocity-variety" target="_blank"&gt;disagree with this&lt;/a&gt; definition. I tend to try and get things into their simplest form, so I’m using this definition for myself:&lt;/p&gt;  &lt;p align="center"&gt;&lt;font color="#c0504d" size="3"&gt;Big data is defined as a &lt;em&gt;large set &lt;/em&gt;of &lt;em&gt;computationally expensive &lt;/em&gt;data that is &lt;em&gt;worked on simultaneously&lt;/em&gt;.&lt;/font&gt; &lt;/p&gt;  &lt;p&gt;Let me flesh that out a&amp;#160; little. To be sure, “Big Data” has a larger size than say a few megabytes. The reason this is important is that it takes special hardware to be able to move large sets of data around, store it, process it and so on. (&lt;font color="#c0504d"&gt;large set&lt;/font&gt;)&lt;/p&gt;  &lt;p&gt;If you store a LOT of data, but only use a small portion of it at a time, that really isn’t super-hard to do. It’s mainly a storage issue at that point. But, if you do need to work with a large portion of the data at one time, then the memory, CPU and transfer components of the system have to adapt to be responsive - new ways to work with that data (game theory, knot-algorithms, map-reduce, etc.) need to be brought into play. (&lt;font color="#c0504d"&gt;computationally expensive&lt;/font&gt;)&lt;/p&gt;  &lt;p&gt;Once that data is loaded into the processing area (memory or whatever other mechanism is used) it must be worked on in parallel to come back in a reasonable time. You have two options here - you can scale the system up with more internal hardware (CPU’s, memory and so on) or you can scale it out to have multiple systems work on it at the same time using paradigms such as map/reduce and so on. Actually, when you lay this out in an architecture diagram, scale up or out doesn’t actually change the logical structure of the process - in scale out the network becomes the bus, and the nodes become more RAM and computing power. Of course, there are changes in code for how you stitch the workload back together. (&lt;font color="#c0504d"&gt;worked on simultaneously&lt;/font&gt;)&lt;/p&gt;  &lt;p&gt;So back to the original question. Is Big Data, as I have defined it here, a workload for Windows and SQL Azure? Absolutely! In fact, it’s probably one of the main workloads, and I believe it represents the latest, and perhaps also the earliest frontier of computing. Jim &lt;a href="http://research.microsoft.com/en-us/um/people/gray/" target="_blank"&gt;Gray, a former researcher here at Microsoft and a hero of mine, was working on this very topic.&lt;/a&gt; I believe as he did - all computing is simply an interface over data. &lt;/p&gt;  &lt;p&gt;Microsoft has multiple offerings on the topic of Big Data. In posts that follow from myself and my co-workers, we’ll explore when and where you use each one. Whether you are a data professional or a developer, this is the new frontier - &lt;a href="http://www.straightpathsql.com/archives/2011/10/microsoft-loves-your-big-data/" target="_blank"&gt;don’t wait to educate yourself&lt;/a&gt; on how to leverage Big Data for your organization. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Hadoop on Windows Azure and SQL Server&amp;#160; &lt;/strong&gt;- Microsoft’s &lt;a href="http://www.hortonworks.com/the-whys-behind-the-microsoft-and-hortonworks-partnership/" target="_blank"&gt;partnership to include Hadoop workloads on Windows Azure&lt;/a&gt; and &lt;a href="http://www.microsoft.com/download/en/details.aspx?id=27584" target="_blank"&gt;SQL Server/Parallel Data Warehouse (PDW)&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;LINQ to HPC &lt;/strong&gt;- Microsoft’s High-Performance Computing SKU of &lt;a href="http://blogs.technet.com/b/windowshpc/archive/2011/05/20/dryad-becomes-linq-to-hpc.aspx" target="_blank"&gt;HPC is now in Azure&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Windows Azure Table Storage &lt;/strong&gt;- A &lt;a href="http://msdn.microsoft.com/en-us/library/windowsazure/hh508997.aspx" target="_blank"&gt;key/value pair type storage with full partitioning&lt;/a&gt; that is immediately consistent, able to handle huge loads of data and works with any REST-compatible language&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;strong&gt;Other offerings &lt;/strong&gt;- Including the new &lt;a href="http://www.microsoft.com/en-us/sqlazurelabs/default.aspx" target="_blank"&gt;Data Explorer&lt;/a&gt;, &lt;a href="http://research.microsoft.com/en-us/news/headlines/daytona-071811.aspx" target="_blank"&gt;Project Daytona (with a Big Data Toolkit for Scientists and researchers)&lt;/a&gt;, &lt;a href="http://www.microsoft.com/sqlserver/en/us/future-editions/SQL-Server-2012-breakthrough-insight.aspx" target="_blank"&gt;Power View&lt;/a&gt; and more. &lt;/p&gt;  &lt;p&gt;The era of Big Data is here. And you can use Windows and SQL Azure to bring it to your organization. &lt;/p&gt;</description></item><item><title>Windows Azure Security Review</title><link>http://sqlblog.com/blogs/buck_woody/archive/2011/08/02/windows-azure-security-review.aspx</link><pubDate>Tue, 02 Aug 2011 13:24:50 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:37432</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;&lt;em&gt;&lt;font color="#d19049"&gt;Current as of 08/01/2011 - Check the Resources listed below for more up-to-date information on this topic&lt;/font&gt;&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Background:&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;Security for any computing platform involves three primary areas:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;&lt;font color="#ff0000"&gt;Principals&lt;/font&gt; (users or programmatic access to an asset or other program) &lt;/li&gt;    &lt;li&gt;&lt;font color="#ff0000"&gt;Securables&lt;/font&gt; (objects, data or programs that can be accessed) &lt;/li&gt;    &lt;li&gt;&lt;font color="#ff0000"&gt;Channels&lt;/font&gt; (methods of access by Principals to Securables) &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;On-premise systems normally use a central system to control security. In a Windows operating system-based environment, this is &lt;a href="http://technet.microsoft.com/en-us/library/cc758436(WS.10).aspx" target="_blank"&gt;often accomplished with Active Directory&lt;/a&gt; or other systems that&amp;#160; provide sign-on and user identity information. While other networking security paradigms have different terminology, all involve the three areas defined above. &lt;/p&gt;  &lt;p&gt;In addition to the names and passwords for a user, Active Directory (like other security mechanisms) store other information about Principals - called &lt;em&gt;&lt;a href="http://claimsid.codeplex.com/" target="_blank"&gt;Claims&lt;/a&gt;&lt;/em&gt;. These claims can include any custom fields the provider allows. In many networks, these fields are not used heavily, because applications that eventually need to secure the assets they control are not always deployed on the same platforms everywhere. &lt;/p&gt;  &lt;p&gt;In a single environment, security is often quite simple. A Principal is created such as a user or group, and then the Principal is granted access to a Securable such as a a folder, database or other asset. Permissions or Rights (or both) combine to allow a particular Principal to read, write, delete or edit data, or to access or run a particular program.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/3324.Figure1_5F00_2.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="Figure1" border="0" alt="Figure1" src="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/5140.Figure1_5F00_thumb.png" width="549" height="398" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;em&gt;&lt;font color="#008000"&gt;Figure 1 - On-premise security environment example&lt;/font&gt;&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;The simplicity of this arrangement is due to a single, homogenous boundary. Even if more than one location is used, the Principals and Securables are grouped into a single logical boundary that is managed from one location. &lt;/p&gt;  &lt;p&gt;This background serves as the starting point for the Federating Security topic below.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Windows Azure Security Boundaries&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;Windows Azure is a series of resources - servers, data and service buses, in addition to other features. Developers write code, and the deploy that to the Azure environment. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/1665.Figure2a_5F00_2.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="Figure2a" border="0" alt="Figure2a" src="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/3480.Figure2a_5F00_thumb.png" width="702" height="471" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;em&gt;&lt;font color="#008000"&gt;Figure 2 - Azure Components&lt;/font&gt;&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;The code or data can be deployed to use one or more of the services. In other words, the &lt;a href="http://www.31a2ba2a-b718-11dc-8314-0800200c9a66.com/2010/12/how-to-combine-worker-and-web-role-in.html" target="_blank"&gt;Web Role in Windows Azure might host a simple website&lt;/a&gt;, and no other component need be used. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/4073.Figure2_5F00_2.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="Figure2" border="0" alt="Figure2" src="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/1258.Figure2_5F00_thumb.png" width="737" height="252" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;em&gt;&lt;font color="#008000"&gt;Figure 3 - Simple Azure Web Role Application - only one feature used&lt;/font&gt;&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;Or, &lt;a href="http://blogs.msdn.com/b/buckwoody/archive/2011/02/22/windows-azure-use-case-hybrid-applications.aspx" target="_blank"&gt;a complex mix of Web, Worker and Data Services, along with a Service Bus, RDBS and even on-site systems&lt;/a&gt; can be grouped into a much larger program. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/6136.Figure4_5F00_2.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="Figure4" border="0" alt="Figure4" src="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/4863.Figure4_5F00_thumb.png" width="735" height="456" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;em&gt;&lt;font color="#008000"&gt;Figure 4 - Complex Windows and SQL Azure Application With Multiple Interactions&lt;/font&gt;&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;For a more basic introduction to Windows and SQL Azure, see this link: &lt;a href="http://channel9.msdn.com/Events/TechEd/Europe/2010/COS322"&gt;http://channel9.msdn.com/Events/TechEd/Europe/2010/COS322&lt;/a&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Windows Azure, like any web-based property, has three general layers of security:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;&lt;font color="#ff0000"&gt;Physical Access&lt;/font&gt; &lt;/li&gt;    &lt;li&gt;&lt;font color="#ff0000"&gt;Operating Environment (Including the Operating System itself)&lt;/font&gt; &lt;/li&gt;    &lt;li&gt;&lt;font color="#ff0000"&gt;Data and Programmatic Security&lt;/font&gt; &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;Each of these layers have additional layers within themselves, and this forms the basis of a secure experience for the end user or program. Some of these layers are the responsibility of Microsoft; others are the responsibility of the architect and developer; others are a joint or shared responsibility of both Microsoft and the client.&lt;/p&gt;  &lt;p&gt;&lt;em&gt;&lt;font color="#0000ff"&gt;Layer One: Physical Access&lt;/font&gt;&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;The first layer of security within a web property such as Windows or SQL Azure is a secure facility. the following data points are important to understand for the worldwide facilities that host Windows and SQL Azure:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Microsoft Global Foundation Services (GFS) is responsible for the physical security of the datacenters located worldwide for Windows and SQL Azure. Information on Microsoft datacenters can be found here:&amp;#160; &lt;a href="http://www.globalfoundationservices.com/"&gt;http://www.globalfoundationservices.com/&lt;/a&gt; &lt;/li&gt;    &lt;li&gt;The address and exact locations facilities are not commonly documented for security reasons. &lt;/li&gt;    &lt;li&gt;Microsoft runs it’s own data centers and does not contract this function out. &lt;/li&gt;    &lt;li&gt;The GFS controlled facilities hold an ISO/IEC 27001:2005 certification, and are audited to SAS level II. &lt;/li&gt;    &lt;li&gt;Standard secure operations protocols are in place, including least-privilege access. &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&lt;em&gt;&lt;font color="#0000ff"&gt;Layer Two: Operating Environment&lt;/font&gt;&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;Windows Azure and SQL Azure do not currently hold certifications. Microsoft does not comment on the security certifications being pursued for Windows or SQL Azure. That being said, the Windows Azure environment is based on a modified Windows 2008 R2 Enterprise environment, developed using the Trustworthy Computing Initiative (TCI). &lt;/p&gt;  &lt;p&gt;The system controlling the host machines and their guest environments that ultimately hold the Web and Worker Roles within Windows Azure is called the Fabric - not to be confused with the Application Fabric feature. The Fabric is not accessible by client code - it controls the inner workings of Windows Azure, including Load-balancing, system restarts, maintenance and monitoring. &lt;/p&gt;  &lt;p&gt;Within the host machines that house the Web and Worker Roles, special networking constructs broker all conversations between Virtual Machines. Virtual Machines - even ones configured to communicate with each other - move through this network. Direct-machine to machine communication is not allowed, protecting one application from another or one data construct from another.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/8015.Figure5_5F00_2.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="Figure5" border="0" alt="Figure5" src="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/8182.Figure5_5F00_thumb.png" width="720" height="351" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;em&gt;&lt;font color="#008000"&gt;Figure 5 - Windows Azure Fabric&lt;/font&gt;&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;Windows and SQL Azure support only TCP-based communications. Ports commonly used are:&amp;#160; &lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;80 - Default public port used for Web Roles - can be enabled/disabled per configuration &lt;/li&gt;    &lt;li&gt;443 - Default secure port used for Web roles - &lt;a href="http://msdn.microsoft.com/en-us/gg271302" target="_blank"&gt;can be enabled/disabled per configuration&lt;/a&gt; &lt;/li&gt;    &lt;li&gt;9350-9353 - These ports are used by the Windows Azure AppFabric service bus bindings. Refer to &lt;a href="http://msdn.microsoft.com/en-us/library/ee732535.aspx"&gt;http://msdn.microsoft.com/en-us/library/ee732535.aspx&lt;/a&gt; for more details &lt;/li&gt;    &lt;li&gt;1433 - SQL Azure &lt;/li&gt;    &lt;li&gt;3389 - This port is used for RDP access to VM-based roles, only if enabled &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&lt;em&gt;&lt;font color="#0000ff"&gt;Layer Three: Data and Programmatic Security&lt;/font&gt;&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;All internal access through use of keys only. Without the proper key, code or data will not transfer. Storage Accounts have individual keys, so in this manner different security layers may be applied not only programmatically but at the account layer. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/6840.Figure6_5F00_2.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="Figure6" border="0" alt="Figure6" src="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/4370.Figure6_5F00_thumb.png" width="703" height="290" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;em&gt;&lt;font color="#008000"&gt;Figure 6 - Windows Azure communications between components&lt;/font&gt;&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;Calls to Windows Azure are made using standard SOAP, XML or REST-based protocols. The communications channel can be encrypted between the client and Windows Azure or allow it to remain unencrypted based on security needs. &lt;/p&gt;  &lt;p&gt;SQL Azure uses the standard SQL Server Tabular Data Stream (TDS) protocol, but only allows encrypted communications.&lt;/p&gt;  &lt;p&gt;Data is unencrypted within Windows Azure Blob or Table Storage - but is only accessible via the key for a storage account. &lt;a href="http://blogs.msdn.com/b/plankytronixx/archive/2010/10/23/crypto-primer-understanding-encryption-public-private-key-signatures-and-certificates.aspx" target="_blank"&gt;Data can be encrypted client-side and stored in Windows Azure in an encrypted fashion&lt;/a&gt;. Microsoft does not inspect internal data for validity or encryption enforcement.&amp;#160; The key is that the data is client-side encrypted and decrypted.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/8203.Figure7_5F00_2.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="Figure7" border="0" alt="Figure7" src="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/4466.Figure7_5F00_thumb.png" width="702" height="307" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;em&gt;&lt;font color="#008000"&gt;Figure 7 - Example data at rest encryption scenario &lt;/font&gt;&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;Alternatively, a hybrid solution can store sensitive data locally and non-sensitive data in Azure Storage. The data can be coalesced at the client level such that the data is never transferred over any channel not owned or controlled by the organization.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Federating Security:&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;In the case of a single security boundary for Windows Azure, multiple security options are available. Users can be anonymously authorized, such as in the case of a public website for advertisement or informational purposes. &lt;/p&gt;  &lt;p&gt;Another option is to create an Internet Information Services (IIS) Internal Security Store. This is not a best-practice (although still possible) approach since the Fabric services within Windows Azure may recycle an instance and the session may sever between a given role and a client. Architecting stateless applications is a preferred approach.&lt;/p&gt;  &lt;p&gt;Using Claims-Based Authentication is a better solution. In this approach, the Principal is authenticated through a trusted party, such as Active Directory, OpenID, OpenAuthentication, or LiveID. Many web-properties use these methods, such as Microsoft, Google, Yahoo and Facebook to name a few. After authenticating with one of these services, the client is issued Claims using the WS-Federation (WS-Fed) or Security Assertion Markup Language (SAML)&amp;#160; that are passed to Windows Azure. At no time does Windows Azure store, transfer or interrogate the Principal’s security token. Claims can be anything from a group or role membership to location or any other settable attribute. Assets are then secured allowing only the Claim, without regard to the user’s location or access method. In this fashion a single security paradigm covers the Securables, with the Principals being controlled in any number of other mechanisms. This allows single-sign-on and/or federated security access from multiple providers. &lt;/p&gt;  &lt;p&gt;The simplest mechanism for building this environment is the Access Control Services (ACS) feature found in the Windows Azure Application Fabric component. It is a federated authorization management service that simplifies user access authorization across organizations and ID providers and performs claims transformation to map identities with access levels.&lt;/p&gt;  &lt;p&gt;ACS can:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Create and manage scopes such as URLs &lt;/li&gt;    &lt;li&gt;Create and manage claim types &lt;/li&gt;    &lt;li&gt;Create and manage signing and encryption keys &lt;/li&gt;    &lt;li&gt;Create and manage rules within an application scope &lt;/li&gt;    &lt;li&gt;Chain claims rules &lt;/li&gt;    &lt;li&gt;Manage permissions on scopes or perform delegation &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/2728.Figure8_5F00_2.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="Figure8" border="0" alt="Figure8" src="http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79-metablogapi/5852.Figure8_5F00_thumb.png" width="693" height="410" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;em&gt;&lt;font color="#008000"&gt;Figure 8 - Federated Security Example &lt;/font&gt;&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;Full information on the Access Control Service is available at this link:&amp;#160; &lt;a href="http://social.technet.microsoft.com/wiki/contents/articles/windows-identity-foundation-wif-and-azure-appfabric-access-control-service-acs-survival-guide.aspx?wa=wsignin1.0"&gt;&lt;u&gt;&lt;font color="#0066cc"&gt;http://social.technet.microsoft.com/wiki/contents/articles/windows-identity-foundation-wif-and-azure-appfabric-access-control-service-acs-survival-guide.aspx?wa=wsignin1.0&lt;/font&gt;&lt;/u&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Since the Web and Worker Roles within Windows Azure are designed to be stateless, Microsoft created a Certification Store within the Management area to hold Certificates that can be called from within code. An example of using the Certification Store is here: &lt;a href="http://blogs.msdn.com/b/jnak/archive/2010/01/29/installing-certificates-in-windows-azure-vms.aspx"&gt;http://blogs.msdn.com/b/jnak/archive/2010/01/29/installing-certificates-in-windows-azure-vms.aspx&lt;/a&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Additional Resources:&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="color:#1f497d;font-size:10pt;"&gt;&lt;font face="Calibri"&gt;Official, authoritative security resource list: &lt;a href="http://msdn.microsoft.com/en-us/library/ff934690.aspx"&gt;&lt;font face="Arial"&gt;&lt;/font&gt;&lt;a href="http://msdn.microsoft.com/en-us/library/ff934690.aspxTechnical"&gt;http://msdn.microsoft.com/en-us/library/ff934690.aspx&lt;/a&gt;&lt;/a&gt;         &lt;br /&gt;&lt;/a&gt;&lt;/font&gt;&lt;span style="color:#1f497d;font-size:10pt;"&gt;&lt;font face="Calibri"&gt;Technical&lt;/font&gt; Overview of the Security Features in the Windows Azure Platform: &lt;/span&gt;&lt;a href="http://www.microsoft.com/online/legal/?langid=en-us&amp;amp;docid=11"&gt;&lt;u&gt;&lt;font color="#0000ff" face="Calibri"&gt;http://www.microsoft.com/online/legal/?langid=en-us&amp;amp;docid=11&lt;/font&gt;&lt;/u&gt;&lt;/a&gt;&lt;font face="Calibri"&gt;.        &lt;br /&gt;&lt;/font&gt;&lt;/span&gt;&lt;span style="color:#1f497d;font-size:10pt;"&gt;&lt;font face="Calibri"&gt;Windows Azure Security Overview: &lt;/font&gt;&lt;a href="http://www.globalfoundationservices.com/security/documents/WindowsAzureSecurityOverview1_0Aug2010.pdf"&gt;&lt;u&gt;&lt;font color="#0000ff" face="Calibri"&gt;http://www.globalfoundationservices.com/security/documents/WindowsAzureSecurityOverview1_0Aug2010.pdf&lt;/font&gt;&lt;/u&gt;&lt;/a&gt;       &lt;br /&gt;&lt;/span&gt;&lt;span style="color:#1f497d;font-size:10pt;"&gt;&lt;font face="Calibri"&gt;Windows Azure Privacy: &lt;/font&gt;&lt;a href="http://www.microsoft.com/online/legal/?langid=en-us&amp;amp;docid=11"&gt;&lt;u&gt;&lt;font color="#0000ff" face="Calibri"&gt;http://www.microsoft.com/online/legal/?langid=en-us&amp;amp;docid=11&lt;/font&gt;&lt;/u&gt;&lt;/a&gt;       &lt;br /&gt;&lt;/span&gt;&lt;span style="color:#1f497d;font-size:10pt;"&gt;&lt;font face="Calibri"&gt;Securing Microsoft Cloud Infrastructure: &lt;/font&gt;&lt;a href="http://www.globalfoundationservices.com/security/documents/SecuringtheMSCloudMay09.pdf"&gt;&lt;u&gt;&lt;font color="#0000ff" face="Calibri"&gt;http://www.globalfoundationservices.com/security/documents/SecuringtheMSCloudMay09.pdf&lt;/font&gt;&lt;/u&gt;&lt;/a&gt;&lt;font face="Calibri"&gt;.        &lt;br /&gt;&lt;/font&gt;&lt;/span&gt;A list of other security resources is here: &lt;a href="http://blogs.msdn.com/b/buckwoody/archive/2010/12/07/windows-azure-learning-plan-security.aspx"&gt;http://blogs.msdn.com/b/buckwoody/archive/2010/12/07/windows-azure-learning-plan-security.aspx&lt;/a&gt;&amp;#160;&lt;/p&gt;    &lt;p&gt;&lt;font color="#0000ff" size="1"&gt;&lt;em&gt;Image Attribution: David Pallmann: &lt;/em&gt;&lt;/font&gt;&lt;a href="http://davidpallmann.blogspot.com/2011/07/windows-azure-design-patterns-part-1.html"&gt;&lt;font color="#0000ff" size="1"&gt;&lt;em&gt;http://davidpallmann.blogspot.com/2011/07/windows-azure-design-patterns-part-1.html&lt;/em&gt;&lt;/font&gt;&lt;/a&gt;&lt;/p&gt;</description></item><item><title>Computer books are dead. Well, some of them, anyway.</title><link>http://sqlblog.com/blogs/buck_woody/archive/2011/05/10/computer-books-are-dead-well-some-of-them-anyway.aspx</link><pubDate>Tue, 10 May 2011 13:58:23 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:35551</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;I read a lot. I mean a LOT. It seems that computer professionals have much in common with medical professionals – we have to read in order to stay on top of our game. For me, this used to mean web sites, magazines, and other print medium, and of course lots of books. I’ve even &lt;a href="http://buckwoody.com/BResume.html#Publications_and_Communications" target="_blank"&gt;written several computer books myself and had them published&lt;/a&gt;. &lt;/p&gt;  &lt;p&gt;Whenever I teach a class, do a presentation, or hold an architectural design session on a new (or new to that person) technology, they usually follow up with “what’s a good book for learning X technology?” This happens so often that I have a list I keep of the titles I like for a particular subject – &lt;a href="http://www.facebook.com/apps/application.php?id=2397701323&amp;amp;ref=appd" target="_blank"&gt;you probably have similar book lists&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;Windows, SQL Server, and other Microsoft products change on an average of around three or four year cycles. That’s enough time to play with a beta product, wait until it releases, and write a solid book about it, and have that in a decent market for sales, and allow people to read and recommend it. &lt;/p&gt;  &lt;p&gt;&lt;font color="#0000ff" size="3"&gt;Enter “the Cloud” – Distributed Computing.&lt;/font&gt; &lt;/p&gt;  &lt;p&gt;Windows Azure and SQL Azure don’t release every three years. Changes – some of them dramatic – release &lt;em&gt;every three or four months&lt;/em&gt;. You can’t even write a book that fast, much less update it that quickly and re-sell it. So what is a technical professional to do?&lt;/p&gt;  &lt;p&gt;Well, although I really like a couple of books I’ve read so far (especially this one, &lt;a href="http://oreilly.com/catalog/0790145308795/" target="_blank"&gt;print and e-book version here&lt;/a&gt;), they are out of date almost by the time they publish. Instead, I rely on blogs, the web, documentation from the vendor and how-to articles published online. Many of these, ironically, are stored, hosted or delivered using – wait for it – Windows Azure. That’s interesting because it’s a medium that describes itself – “reflection”, anyone? &lt;/p&gt;  &lt;p&gt;This brings up an interesting conundrum. Books have a version, are arranged, thought-out and categorized. Since I’m now getting information off of the web, it’s difficult to figure out whether that material is correct at the time, what level it’s aimed at – and forget about any coherent structure. It’s topic-by-topic. &lt;/p&gt;  &lt;p&gt;So, like most of you, I use links and favorites to arrange things. And I found myself making “virtual books” by essentially creating my own Table-Of-Contents. I’ve shared some of those, &lt;a href="http://blogs.msdn.com/b/buckwoody/archive/2010/11/16/windows-azure-learning-plan.aspx" target="_blank"&gt;such as my Windows and SQL Azure Learning Plan&lt;/a&gt;. The key is that I have to update that to ensure that the latest information is there – otherwise it becomes an organized list that is not authoritative.&lt;/p&gt;  &lt;p&gt;Don’t get me wrong – I still have tons of&amp;#160; (e-book format) books, especially on “conceptual” topics like development paradigms and so on. But when it comes to specifics and how-to’s – electronic medium is best for me. It’s more current, adaptable, searchable, interactive and immersive than books. But how long will I retain regular print-type books? We’ll see. Times, they are a changing – fast.&lt;/p&gt;</description></item><item><title>SQL Azure Use Case: Shared Storage Application</title><link>http://sqlblog.com/blogs/buck_woody/archive/2011/04/26/sql-azure-use-case-shared-storage-application.aspx</link><pubDate>Tue, 26 Apr 2011 13:33:50 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:35207</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;&lt;span style="font-size:x-small;"&gt;&lt;em&gt;&lt;span style="font-size:small;"&gt;This is one in a series of posts on when and where to use a distributed architecture design in your organization's computing needs. You can find the main post here: &lt;/span&gt;&lt;a href="http://blogs.msdn.com/b/buckwoody/archive/2011/01/18/windows-azure-and-sql-azure-use-cases.aspx"&gt;&lt;span style="font-size:small;"&gt;&lt;u&gt;&lt;font color="#800080"&gt;http://blogs.msdn.com/b/buckwoody/archive/2011/01/18/windows-azure-and-sql-azure-use-cases.aspx&lt;/font&gt;&lt;/u&gt;&lt;/span&gt;&lt;/a&gt;&lt;span style="font-size:small;"&gt; &lt;/span&gt;&lt;/em&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;span style="font-size:small;"&gt;Description:&lt;/span&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="font-size:small;"&gt;On-premise data will be a part of computing for quite some time – perhaps permanently. Bandwidth requirements, security, or even financial considerations for large data sets often dictate that relational (on non-relational) systems will be maintained locally in many organizations, especially in enterprise computing. &lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="font-size:small;"&gt;But distributed data systems are useful in many situations. Organizations may wish to store a portion of data off-site, either for sharing the data with other applications (including web-based applications) or as a supplement to a High-Availability and Disaster Recovery (HADR) strategy.&lt;/span&gt;&lt;/p&gt; &lt;span style="font-size:small;"&gt;   &lt;p&gt;&lt;strong&gt;&lt;span style="font-size:small;"&gt;Implementation:&lt;/span&gt;&lt;/strong&gt;&lt;/p&gt;    &lt;p&gt;&lt;span style="font-size:small;"&gt;SQL Azure can be used to add an additional option to an HADR strategy by copying off portions (or all) of an on-premise database system.&lt;/span&gt;&lt;/p&gt;    &lt;p&gt;&lt;span style="font-size:small;"&gt;&lt;a href="http://blogs.msdn.com/cfs-file.ashx/__key/CommunityServer-Blogs-Components-WeblogFiles/00-00-00-79-79-metablogapi/3386.sql_2D00_aHADR_5F00_2.png"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="sql-aHADR" border="0" alt="sql-aHADR" src="http://blogs.msdn.com/cfs-file.ashx/__key/CommunityServer-Blogs-Components-WeblogFiles/00-00-00-79-79-metablogapi/4265.sql_2D00_aHADR_5F00_thumb.png" width="298" height="181" /&gt;&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;    &lt;p&gt;&lt;span style="font-size:small;"&gt;In this arrangement, on-premise systems remain as they are. Data is replicated using many technologies, such as SQL Server Integration Services (SSIS), scripts, or Microsoft’s Sync Framework to a SQL Azure database. This data can be kept “cold”, meaning that a manual process is required to bring the data back, or as a “warm” standby using connection string management in the application.&lt;/span&gt;&lt;/p&gt;    &lt;p&gt;&lt;span style="font-size:small;"&gt;Recently we architected a solution where a company kept a rolling two-week window of data replicated to SQL Azure using the &lt;a href="http://msdn.microsoft.com/en-us/sync/default.aspx" target="_blank"&gt;Sync Framework&lt;/a&gt;. The application, a compiled EXE running on user’s systems, had a “switch connections” button, that allowed the users to take a laptop to another location, select that option, and continue working from anywhere they had Internet connectivity. This required forethought and planning, and did not replace their primary HADR systems, but it did allow them to continue operations in the case of a severe outage at multiple sites. Since they are an emergency services provider, this gave them the highest redundancy.&lt;/span&gt;&lt;/p&gt;    &lt;p&gt;&lt;span style="font-size:small;"&gt;Another option is to amalgamate data from disparate sources. &lt;/span&gt;&lt;/p&gt;    &lt;p&gt;&lt;span style="font-size:small;"&gt;&lt;a href="http://blogs.msdn.com/cfs-file.ashx/__key/CommunityServer-Blogs-Components-WeblogFiles/00-00-00-79-79-metablogapi/6320.sql_2D00_aHyb_5F00_2.png"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="sql-aHyb" border="0" alt="sql-aHyb" src="http://blogs.msdn.com/cfs-file.ashx/__key/CommunityServer-Blogs-Components-WeblogFiles/00-00-00-79-79-metablogapi/2625.sql_2D00_aHyb_5F00_thumb.png" width="342" height="134" /&gt;&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;    &lt;p&gt;&lt;span style="font-size:small;"&gt;In this arrangement, two or more data services (one of which is SQL Azure) are accessed by a single program. The program queries each system independently, and using LINQ a single query can work across all of the data, assuming there is some sort of natural or artificial “key” that can join the data sets together. The user programs simply view this single data set as a single data source, unaware of the underlying data sets. This allows great flexibility and agility in the downstream program. The upstream data sources can change as long as the elements are kept consistent.&lt;/span&gt;&lt;/p&gt;    &lt;p&gt;&lt;span style="font-size:small;"&gt;There are performance and security implications to amalgamated data systems, but if architected carefully they provide multiple benefits. A few of of these are that other systems can access the individual data sources, reporting is simplified and standardized, and multiple copies of data are eliminated.&lt;/span&gt;&lt;/p&gt;   &lt;span style="font-size:small;"&gt;     &lt;p&gt;&lt;strong&gt;&lt;span style="font-size:small;"&gt;Resources:&lt;/span&gt;&lt;/strong&gt;&lt;/p&gt;      &lt;p&gt;&lt;span style="font-size:small;"&gt;You can read more about the Sync Framework and SQL Azure here: &lt;a href="http://social.technet.microsoft.com/wiki/contents/articles/sync-framework-sql-server-to-sql-azure-synchronization.aspx"&gt;http://social.technet.microsoft.com/wiki/contents/articles/sync-framework-sql-server-to-sql-azure-synchronization.aspx&lt;/a&gt;&amp;#160;&lt;/span&gt;&lt;/p&gt;      &lt;p&gt;&lt;span style="font-size:small;"&gt;If you are new to LINQ, you can find more resources on it here: &lt;a href="http://msdn.microsoft.com/en-us/library/bb308959.aspx"&gt;http://msdn.microsoft.com/en-us/library/bb308959.aspx&lt;/a&gt;&amp;#160;&lt;/span&gt;&lt;/p&gt;   &lt;/span&gt;&lt;/span&gt;</description></item></channel></rss>