<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Search results matching tags 'SQL Server' and 'Windows Azure'</title><link>http://sqlblog.com/search/SearchResults.aspx?o=DateDescending&amp;tag=SQL+Server,Windows+Azure&amp;orTags=0</link><description>Search results matching tags 'SQL Server' and 'Windows Azure'</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>SQL Server in Windows Azure Infrastructure Services – Updated Documentation and Best Practices for GA, Upcoming Blogs</title><link>http://sqlblog.com/blogs/sqlos_team/archive/2013/04/24/sql-server-in-windows-azure-infrastructure-services-updated-documentation-and-best-practices-for-ga-upcoming-blogs.aspx</link><pubDate>Wed, 24 Apr 2013 22:04:16 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:48858</guid><dc:creator>SQLOS Team</dc:creator><description>&lt;p&gt;It&amp;rsquo;s been just over a week since Windows Azure announced the GA of &lt;a href="http://www.windowsazure.com/en-us/home/scenarios/infrastructure-services/"&gt;Infrastructure Services&lt;/a&gt;, marking the beginning of a fully supported Infrastructure as a Service in Windows Azure, with SQL Server as a major component.&lt;/p&gt;
&lt;p&gt;Pre-installed SQL Server VMs are available for pay-per-hour usage in the Windows Azure gallery. Currently Enterprise, Standard and Web edition VMs running on Windows Server 2008 R2 SP1 are available, with more SQL Server editions coming soon. SQL Server editions running on Windows Server 2012 images are also on the way. For more details on the scenarios and benefits of running SQL Server workloads on Windows Azure Virtual Machines, please visit the SQL Server blog post &lt;a href="http://blogs.technet.com/b/dataplatforminsider/archive/2013/04/16/develop-and-test-new-sql-server-apps-scale-existing-apps-and-unlock-hybrid-scenarios-with-windows-azure-infrastructure-services.aspx" target="_blank"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We are very happy to announce that the updated technical documentation for deploying and running SQL Server in Windows Azure Infrastructures Services is now available online. When deploying SQL Server in Windows Azure Virtual Machines, we recommend that you follow the detailed guidance given in the new &lt;a href="http://go.microsoft.com/fwlink/?linkid=294719&amp;amp;clcid=0x409" target="_blank"&gt;SQL Server in Windows Azure Virtual Machines&lt;/a&gt; documentation in the library. This documentation includes a series of articles and tutorials that provide detailed guidance on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://go.microsoft.com/fwlink/?linkid=294720&amp;amp;clcid=0x409" target="_blank"&gt;Getting Started with SQL Server in Windows Azure Virtual Machines&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://go.microsoft.com/fwlink/?linkid=294721&amp;amp;clcid=0x409" target="_blank"&gt;Getting Ready to Migrate to SQL Server in Windows Azure Virtual Machines&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://go.microsoft.com/fwlink/?linkid=294722&amp;amp;clcid=0x409" target="_blank"&gt;SQL Server Deployment in Windows Azure Virtual Machines&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://go.microsoft.com/fwlink/?linkid=294723&amp;amp;clcid=0x409" target="_blank"&gt;Connectivity Considerations for SQL Server in Windows Azure Virtual Machines&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://go.microsoft.com/fwlink/?linkid=294724&amp;amp;clcid=0x409" target="_blank"&gt;Performance Considerations for SQL Server in Windows Azure Virtual Machines&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://go.microsoft.com/fwlink/?linkid=294725&amp;amp;clcid=0x409" target="_blank"&gt;Security Considerations for SQL Server in Windows Azure Virtual Machines&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://go.microsoft.com/fwlink/?linkid=294726&amp;amp;clcid=0x409" target="_blank"&gt;Troubleshooting and Monitoring for SQL Server in Windows Azure Virtual Machines&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://go.microsoft.com/fwlink/?linkid=294727&amp;amp;clcid=0x409" target="_blank"&gt;High Availability and Disaster Recovery for SQL Server in Windows Azure Virtual Machines&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://go.microsoft.com/fwlink/?linkid=294728&amp;amp;clcid=0x409" target="_blank"&gt;Backup and Restore for SQL Server in Windows Azure Virtual Machines&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://msdn.microsoft.com/en-us/library/windowsazure/jj992719.aspx" target="_blank"&gt;SQL Server Business Intelligence in Windows Azure Virtual Machines&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Over the next few weeks we are planning a series of blog posts to provide more detailed information on specific SQL Server topics. Subjects in the pipeline include: high availability, disaster recovery, performance, application migration and security. Let us know what topics you would like to see covered in this series by adding comments to this post.&lt;/p&gt;
&lt;p&gt;SQL Server Team&amp;nbsp;&lt;/p&gt;
Originally posted at http://blogs.msdn.com/b/sqlosteam/</description></item><item><title>How Does the Cloud Change a Database Administrator’s Job?</title><link>http://sqlblog.com/blogs/buck_woody/archive/2013/01/29/how-does-the-cloud-change-a-database-administrator-s-job.aspx</link><pubDate>Tue, 29 Jan 2013 15:08:32 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:47385</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;I recently&lt;a href="http://sqlblog.com/b/buckwoody/archive/2013/01/22/how-does-the-cloud-change-a-systems-architect-s-job.aspx" target="_blank"&gt; posted a blog entry on how cloud computing would change the Systems Architect&amp;rsquo;s role in an organization&lt;/a&gt;. In a way, the Systems Architect has the easiest transition to a new way of using computing technologies. In fact, that&amp;rsquo;s actually part of the job description.&amp;nbsp;I mentioned that a Systems Architect has three primary vectors to think about for cloud computing, as it applies to what they should do:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;span style="color:#0000ff;"&gt;Knowledge - Which options are available to solve problems, and what are their strengths and weaknesses.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#0000ff;"&gt;Experience - What has the System Architect seen and worked with in the past.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#0000ff;"&gt;Coordination - A system design is based on multiple factors, and one person can't make all the choices. There will need to be others involved at every level of the solution, and the Systems Architect will need to know who those people are and how to work with them.&lt;/span&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h1&gt;The Database Administrator Role&lt;/h1&gt;
&lt;p&gt;But a Database Administrator (DBA) is probably one of the harder roles to think about when it comes to cloud computing. First, let&amp;rsquo;s define what a Database Administrator usually thinks about as part of their job:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Planning, Installing and Configuring a Database Platform&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Planning, designing and creating databases&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Planning, designing and implementing High Availability and Disaster Recovery for each database (HADR) based on requirements for its workload&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Maintaining and monitoring the database platform&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Implementing performance tuning on the databases based on monitoring&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Re-balancing workloads across database servers based on monitoring&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#993300;"&gt;Securing databases platforms and individual databases based on requirements and implementation&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That&amp;rsquo;s just a short list, and each of those unpacks into a larger set of tasks.&lt;/p&gt;
&lt;p&gt;The issue is that&lt;em&gt; I&amp;rsquo;ve never actually met a DBA that does all of those things&lt;/em&gt;, or &lt;strong&gt;just&lt;/strong&gt; all of those things. Many times they do much more, sometimes the systems are so large they specialize on just a few of them.&lt;/p&gt;
&lt;p&gt;And as you can see from the list, some of these areas are shared with other roles. For instance, in some shops, the DBA plans, purchases, sets up and configures the hardware for database servers. In others that&amp;rsquo;s done&lt;br /&gt;by the Infrastructure Team. In some shops the DBA designs databases from software requirements, and in others the developers do that &amp;ndash; or perhaps it&amp;rsquo;s done as a joint effort. The same holds true for database code &amp;ndash; sometimes the&lt;br /&gt;DBA does it, other times the developer, and still others it&amp;rsquo;s a shared task.&lt;/p&gt;
&lt;p&gt;In fact, you could argue that there are few other roles in IT where the roles are so intermixed. Also, the DBA works with software the company develops, and software the company buys. They work with hardware, networking, security and software. There are certain aspects of design and tuning that are outside the purview of some of those things, and inside the others.&lt;/p&gt;
&lt;p&gt;With all of these variables, simply telling a DBA that they should &amp;ldquo;use the cloud&amp;rdquo; is not the proper approach.&lt;/p&gt;
&lt;h1&gt;How the Cloud Changes Things&lt;/h1&gt;
&lt;p&gt;To be sure, the DBA has the same vectors as the Systems Architect. They need to educate themselves on the options within this new option (&lt;span style="color:#0000ff;"&gt;Knowledge&lt;/span&gt;), try a few test solutions out (&lt;span style="color:#0000ff;"&gt;Experience&lt;/span&gt;) and of course work with others on various parts of the implementation (&lt;span style="color:#0000ff;"&gt;Coordination&lt;/span&gt;). But it goes beyond that.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.windowsazure.com/en-us/manage/windows/fundamentals/intro-to-windows-azure/#components" target="_blank"&gt;There are three big buckets of cloud computing&lt;/a&gt;, dealing with simply using a Virtual Machine (IaaS) to writing code without worrying about the virtualization or even the operating system (PaaS) and using software that&amp;rsquo;s already written and being delivered via an Application Programming Interface (API). Each of these has so many options and configurations that it&amp;rsquo;s often better to think about the problem you&amp;rsquo;re trying to solve rather than all of the technology within a given area - although some of that is certainly necessary anyway.&amp;nbsp;&lt;/p&gt;
&lt;h2&gt;Database Platform Architecture&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;ll start with when the DBA should even consider cloud computing for a solution. Once again, it&amp;rsquo;s not an &amp;ldquo;all or nothing&amp;rdquo; paradigm, where you either run something on premises or in the cloud &amp;ndash; it&amp;rsquo;s often a matter of selecting the right components to solve a problem.&amp;nbsp; In my design sessions with DBA&amp;rsquo;s I break these down into three big areas where they might want to consider the cloud &amp;ndash;and then we talk about how to implement each one:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;span style="color:#0000ff;"&gt;Audiences&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#0000ff;"&gt;HADR&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style="color:#0000ff;"&gt;Data Services&lt;/span&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;Audiences&lt;/h3&gt;
&lt;p&gt;If the users of your database systems all sit in the same facility, you own the servers and networking, and the application servers are separate from the database server, it doesn&amp;rsquo;t usually make sense to take that database workload and place it on Windows Azure &amp;ndash; or any other cloud provider. The latency alone prevents a satisfactory performance profile, and in some cases won&amp;rsquo;t work at all. It doesn&amp;rsquo;t matter if the cloud solution is cheaper or easier &amp;ndash; if you&amp;rsquo;re moving a lot of data every second between an on-premises system and the cloud it won&amp;rsquo;t work well.&lt;/p&gt;
&lt;p&gt;However &amp;ndash; if your users are in multiple locations, especially globally, or you have a mix of company and external customer users, it might make sense to evaluate a shared data location. You still need to consider the implications of how much data the application server pushes back and forth, but you may be able to locate both the application server and SQL Server in an IaaS role. Assuming the data sent to the final client will work across public Internet channels, there may be a fit. There are security implications, but unless you have point-to-point connections for your current solution you&amp;rsquo;re faced with the same security questions on both options.&lt;/p&gt;
&lt;p&gt;Your audience might also be developers looking for a way to quickly spin up a server and then turn it down when they are done, paying for the time and not the hardware or licenses. This is also a prime case for evaluating IaaS. And there are others that you'll find in your own organization as you work through the requirements you have.&amp;nbsp;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Resources: Windows Azure Virtual Machines: &lt;a href="http://www.windowsazure.com/en-us/manage/windows/tutorials/virtual-machine-from-gallery/"&gt;http://www.windowsazure.com/en-us/manage/windows/tutorials/virtual-machine-from-gallery/&lt;/a&gt;&amp;nbsp;and&amp;nbsp;&lt;span style="color:#993300;"&gt;Windows Azure SQL Server Virtual Machines&lt;/span&gt;: &lt;a href="http://www.windowsazure.com/en-us/manage/windows/common-tasks/install-sql-server/"&gt;http://www.windowsazure.com/en-us/manage/windows/common-tasks/install-sql-server/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;HADR&lt;/h3&gt;
&lt;p&gt;The next possible place to consider using cloud computing with SQL Server is as a part of your High Availability and Disaster Recovery plans. In fact, this is the most common use I see for cloud computing and the Database Administrator. The key is the Recovery Point Objective (RPO) and Recovery Time Objective (RTO). Based on each application&amp;rsquo;s requirements, you may find that using Windows Azure or even supplementing your current plan is&lt;br /&gt;the right place to evaluate options. I&amp;rsquo;ve covered this use-case in more detail in another article.&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#993300;"&gt;References: SQL Server High Availability and Disaster Recovery options with Windows Azure&lt;/span&gt;: &lt;a href="http://sqlblog.com/b/buckwoody/archive/2013/01/08/microsoft-windows-azure-disaster-recovery-options-for-on-premises-sql-server.aspx"&gt;http://blogs.msdn.com/b/buckwoody/archive/2013/01/08/microsoft-windows-azure-disaster-recovery-options-for-on-premises-sql-server.aspx&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;Data Services&lt;/h3&gt;
&lt;p&gt;Windows Azure, along with other cloud providers, offers another way to design, create and consume data. In this use-case, however, the tasks DBA&amp;rsquo;s normally perform for sizing, ordering and configuring a system don&amp;rsquo;t apply.&lt;/p&gt;
&lt;p&gt;With Windows Azure SQL Databases (the artist formerly known as SQL Azure), you can simply create a database and begin using it. There are places where this fits and others where it doesn&amp;rsquo;t, and there are differences, limitations and enhancements, so it isn&amp;rsquo;t meant as replacement for what you could do with &amp;ldquo;Full-up&amp;rdquo; SQL Server on a Windows Azure Virtual Machine or an on-premises Instance. If a developer needs an Relational Database Management&lt;br /&gt;(RDBMS) data store for a web-based application, then this might be a perfect fit.&lt;/p&gt;
&lt;p&gt;But there is more to data services than Windows Azure SQL Databases. Windows Azure also offers MySQL as a service, RIAK and MongoDB (among others) and even Hadoop for larger distributed data sets. In addition you can use Windows Azure Reporting Services, and also tap into datasets and data functions in the Windows Azure Marketplace.&lt;/p&gt;
&lt;p&gt;The key for the DBA with this option is that you &lt;em&gt;will&lt;/em&gt; have to do a little investigation this time, and potentially without a specific workload in mind this time. I think that&amp;rsquo;s acceptable thing to ask &amp;ndash; DBA&amp;rsquo;s constantly keep up with data processing trends, and most will consider different ways to solve a problem.&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#993300;"&gt;References:&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#993300;"&gt;Windows Azure SQL Databases&lt;/span&gt;: &lt;a href="http://www.windowsazure.com/en-us/home/features/data-management/" target="_blank"&gt;http://www.windowsazure.com/en-us/home/features/data-management/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#993300;"&gt;Windows Azure Reporting Services&lt;/span&gt;: &lt;a href="http://www.windowsazure.com/en-us/manage/services/other/sql-reporting/" target="_blank"&gt;http://www.windowsazure.com/en-us/manage/services/other/sql-reporting/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#993300;"&gt;HDInsight Service (Hadoop on Azure): &lt;/span&gt;&lt;a href="https://www.hadooponazure.com/" target="_blank"&gt;https://www.hadooponazure.com/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#993300;"&gt;MongoDB Offerings on Windows Azure&lt;/span&gt;: &lt;a href="http://www.windowsazure.com/en-us/manage/linux/common-tasks/mongodb-on-a-linux-vm/" target="_blank"&gt;http://www.windowsazure.com/en-us/manage/linux/common-tasks/mongodb-on-a-linux-vm/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#993300;"&gt;Windows Azure Marketplace&lt;/span&gt;: &lt;a href="http://www.windowsazure.com/en-us/store/overview/" target="_blank"&gt;http://www.windowsazure.com/en-us/store/overview/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;</description></item><item><title>Microsoft Windows Azure Disaster Recovery Options for On-Premises SQL Server</title><link>http://sqlblog.com/blogs/buck_woody/archive/2013/01/08/microsoft-windows-azure-disaster-recovery-options-for-on-premises-sql-server.aspx</link><pubDate>Tue, 08 Jan 2013 14:40:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:47070</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;One of the use-cases for a cloud solution is to serve as a Disaster Recovery option for your on-premises servers. I&amp;rsquo;ll explain one particular use-case in this entry, specifically using Windows Azure &amp;ldquo;IaaS&amp;rdquo; or Virtual Machines as a Recovery Solution for SQL Server (more detail here: &lt;a href="http://www.windowsazure.com/en-us/home/features/virtual-machines/" target="_blank"&gt;http://www.windowsazure.com/en-us/home/features/virtual-machines/&lt;/a&gt;). In future installments I&amp;rsquo;ll explain options for other workloads such as Linux and Windows Servers, SharePoint and other solutions. Some architectures also allow for using Windows Azure SQL Database (Formerly SQL Azure) in recovery scenarios; I&amp;rsquo;ll cover that separately.&lt;/p&gt;
&lt;p&gt;Using Azure as a Disaster Recovery site gives you a range of options, uses world-wide datacenters that you can pick from, and does not require traditional licensing and maintenance paths. You can also integrate the offsite data into other uses, such as reporting (in some cases) or to leverage within other applications.&amp;nbsp; However, the cost-model is different, so make sure you do your homework to ensure that it makes sense to use a cloud provider for safety. You may find that it is cheaper, more expensive, or that you require a mix of technologies and options to get the best solution.&lt;/p&gt;
&lt;p style="padding-left:30px;"&gt;&lt;span style="color:#339966;"&gt;&lt;em&gt;NOTE: The Microsoft Windows Azure platform evolves constantly. That means new features and capabilities, as well as security, optimizations and more improve on a frequent basis. As with any cloud provider, ensure that you check the date of this post to ensure you are within six months or so. If the date is longer than that, then check each of the &amp;ldquo;Details&amp;rdquo; links to ensure you are working with the latest information. &lt;/em&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;The options you have range from simple off-site storage for database backups to systems that your users can access when your primary options are offline.&amp;nbsp; To select which options to use, evaluate the databases you want to protect, and then create your Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO) for each workload. Those two vectors will provide the starting point for each choice you make.&lt;/p&gt;
&lt;p style="padding-left:30px;"&gt;&lt;em&gt;NOTE: If you&amp;rsquo;re not familiar with RPO and RTO on a database system, learn those terms carefully before designing a recovery solution &amp;ndash; on any platform. RPO and RTO are business/technology terms, and are not vendor or platform-specific. &lt;a href="http://wikibon.org/wiki/v/Recovery_point_objective_-_recovery_time_objective_strategy" target="_blank"&gt;http://wikibon.org/wiki/v/Recovery_point_objective_-_recovery_time_objective_strategy&lt;/a&gt;&amp;nbsp; &lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The range of protection you have is very similar to the on-premises options for SQL Server (on-premises details here: &lt;a href="http://msdn.microsoft.com/en-us/library/ms190202.aspx" target="_blank"&gt;http://msdn.microsoft.com/en-us/library/ms190202.aspx&lt;/a&gt;), with the primary limitation being bandwidth. While Microsoft has the largest connections we can get into our datacenters, depending on where your systems are and their connection to the Internet, you will need to consider how much data you transfer, and how often.&amp;nbsp; For backup files, a single, larger transfer is acceptable, using Log Shipping or Database Mirroring, smaller, more frequent transfers are preferable.&lt;/p&gt;
&lt;p&gt;Another limitation is controlling the hardware on the Windows Azure Virtual Machine. That means hardware-based clustering isn&amp;rsquo;t possible, as of this writing. You&amp;rsquo;re also limited to the size of the Virtual Machines that Windows Azure (or any other cloud provider) offers. It&amp;rsquo;s important to keep in mind that you&amp;rsquo;re building a Disaster Recovery solution, not necessarily a full Highly-Available system. The difference is that in this case DR provides a means to recover and operate at a more limited fashion than a full on-premises HA (with matching hardware and licenses) involves. Storage, however, isn&amp;rsquo;t as affected. You can mount large amounts of storage on a Windows Azure Virtual Machine, so it&amp;rsquo;s more memory and CPU that you need to consider for your solution.&lt;/p&gt;
&lt;p&gt;The final consideration is security. There are two aspects in security that you need to consider: data security and authentication and access. For the first consideration, the Windows Azure system does hold multiple certifications and attestations that you can find here:&amp;nbsp; . In some cases those certifications are agreements on the part of security each party will hold liability for; so it&amp;rsquo;s important to carefully read and understand what the agreement states. There are also methods of encrypting data (such as the backups) using your own certificates or hardware devices and then storing them externally. This means no one can easily un-encrypt your data.&lt;/p&gt;
&lt;p&gt;For the authentication portion, you can create a secure &amp;ldquo;tunnel&amp;rdquo;&amp;nbsp; between your network and Windows Azure. This involves a certificate that is installed on your hardware firewall at your facility, and an agent that is enabled with the same certificate on Windows Azure. This gives you a &amp;ldquo;point to point&amp;rdquo; connection, encrypted but over a public connection. From there you can use Active Directory to connect the authentication for the systems involved in the DR solution.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Backups&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;The First and most simple DR solution using Windows Azure is to store your backup files (&lt;em&gt;*.bak&lt;/em&gt;) in Windows Azure storage. Windows Azure Storage is triple-redundant across multiple fault-domains within a single datacenter, and then all three copies are replicated to a geographically separate (although data-sovereignty same) location. That translates to six copies of data stored remotely. In case of a disaster, you connect to storage, download the images, and restore them to a new server. The server can have the same name or different, and unless you&amp;rsquo;re using contained databases, you&amp;rsquo;ll need to re-create and re-authorize the security accounts needed for the database.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://sqlblog.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79/6740.HADR1.png"&gt;&lt;img src="http://sqlblog.com/resized-image.ashx/__size/550x0/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79/6740.HADR1.png" alt="" width="353" height="89" border="0" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Note that you also have the option of using an &amp;ldquo;appliance&amp;rdquo;, which is a piece of hardware you install at your facility which will act as a backup device or share location (or both). The device handles the encryption, de-duplication and compression for the files, and then stores those files on Windows Azure. More information on that option is here: &lt;a href="http://www.storsimple.com/" target="_blank"&gt;http://www.storsimple.com/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#0000ff;"&gt;&lt;em&gt;RPO: As of last backup&lt;/em&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#0000ff;"&gt;&lt;em&gt;RTO: (Time of transfer from Windows Azure + Time of Restore to New System + Bringing System Online with User Accounts) - Time of Backup&lt;/em&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#993300;"&gt;References:&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;More detail on storing files on Windows Azure: &lt;a href="http://sqlblog.com/blogs/sqlos_team/archive/2013/01/24/backup-and-restore-to-cloud-simplified-in-sql-server-2012-sp1-cu2.aspx" target="_blank"&gt;http://sqlblog.com/blogs/sqlos_team/archive/2013/01/24/backup-and-restore-to-cloud-simplified-in-sql-server-2012-sp1-cu2.aspx&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Free Client: &lt;a href="http://azurestorageexplorer.codeplex.com/" target="_blank"&gt;http://azurestorageexplorer.codeplex.com/&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Database Mirroring&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Database Mirroring is a deprecated feature in SQL Server, which means it will be removed in a future release. It is, however, still supported in SQL Server 2012, and it can be used between on-premises SQL Server Instances and Windows Azure VM&amp;rsquo;s.&amp;nbsp; Using connection strings and .NET languages, clients can actually point to the partner server automatically.&lt;/p&gt;
&lt;p&gt;The granularity of this solution is at the individual database level.&amp;nbsp; Machines can retain their individual identities. You can use certificates to connect the systems, or you can use the point-to-point solution and Active Directory.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://sqlblog.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79/6874.HADR3.png"&gt;&lt;img src="http://sqlblog.com/resized-image.ashx/__size/550x0/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79/6874.HADR3.png" alt="" width="354" height="133" border="0" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;There are limitations, however. You won&amp;rsquo;t use a Listener in this configuration, and you&amp;rsquo;ll be using Asynchronous mode. If you are not running in the same Active Directory, you&amp;rsquo;ll also need to factor in the time to re-create and tie out those accounts when calculating the RTO value.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;a href="http://sqlblog.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79/4212.HADR2.png"&gt;&lt;img src="http://sqlblog.com/resized-image.ashx/__size/550x0/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79/4212.HADR2.png" alt="" width="332" height="130" border="0" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#0000ff;"&gt;&lt;em&gt;RPO: As of last good synchronization&lt;/em&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#0000ff;"&gt;&lt;em&gt;RTO: (Time of failure + Time of client redirect to New System ) - Time of last good synchronization&lt;/em&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#993300;"&gt;References:&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;A complete tutorial on setting up this configuration is here: &lt;a href="http://msdn.microsoft.com/en-us/library/jj870964.aspx" target="_blank"&gt;http://msdn.microsoft.com/en-us/library/jj870964.aspx&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Log Shipping&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Another feature available for DR in a Hybrid fashion is using Log Shipping, which also protects your system at a database level. Log shipping involves an automated log backup of your database, and the log is copied and then applied at the secondary server. Because the log file is copied to a Windows share, this solution requires both networking access and an Active Directory integration.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://sqlblog.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79/3146.HADR4.png"&gt;&lt;img src="http://sqlblog.com/resized-image.ashx/__size/550x0/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79/3146.HADR4.png" alt="" width="429" height="180" border="0" /&gt;&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#0000ff;"&gt;&lt;em&gt;RPO: As of last good log backup application to the secondary system&lt;/em&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#0000ff;"&gt;&lt;em&gt;RTO: (Time of failure + Time of manual client redirect to New System + Time of Manual Failover ) - Time of last good log backup&lt;/em&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#993300;"&gt;References:&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Log Shipping information is here: &lt;a href="http://technet.microsoft.com/en-us/library/ms187103.aspx" target="_blank"&gt;http://technet.microsoft.com/en-us/library/ms187103.aspx&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;AlwaysOn Availability Groups&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;SQL Server 2012 introduces a new set of features called &amp;ldquo;AlwaysOn&amp;rdquo; that encompass many of the HA/DR features in previous releases. One feature within that set is called &amp;ldquo;Availability Groups&amp;rdquo;, and with certain caveats that feature is available for a Hybrid on-premises to Windows Azure VM solution.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://sqlblog.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79/7183.HADR5.png"&gt;&lt;img src="http://sqlblog.com/resized-image.ashx/__size/550x0/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79/7183.HADR5.png" alt="" width="390" height="136" border="0" /&gt;&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;AlwaysOn requires a Windows Cluster (WFSC), which is where the caveats come into play. You&amp;rsquo;re able to set up a&amp;nbsp; multi-subnet WSFC cluster, but you won&amp;rsquo;t have access to the Availability Group Listener function, so you need to consider the client reconnection.&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#0000ff;"&gt;&lt;em&gt;RPO: As of last good synchronization&lt;/em&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#0000ff;"&gt;&lt;em&gt;RTO: (Time of failure + Time of manual client redirect to New System + Time of Manual Failover ) - Time of last good log backup&lt;/em&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#993300;"&gt;References: &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;A complete tutorial on setting up this configuration is here: &lt;a href="http://msdn.microsoft.com/en-us/library/jj870959.aspx" target="_blank"&gt;http://msdn.microsoft.com/en-us/library/jj870959.aspx&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Other Solution Options&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Taking an overview approach, you can use other data transfer mechanisms. While these involve more manual coding and architecture, you do have more control. For instance, you could copy the data to multiple locations, platforms and more, and allow reading and manipulations of the data at the destination. You can use code options, Windows Azure Data Sync (&lt;a href="http://msdn.microsoft.com/en-us/library/windowsazure/hh456371.aspx" target="_blank"&gt;http://msdn.microsoft.com/en-us/library/windowsazure/hh456371.aspx&lt;/a&gt;), or even SQL Server Replication (blog on this process is here: &lt;a href="http://tk.azurewebsites.net/2012/07/17/how-to-setup-peer-to-peer-replication-in-azure-iaas-sql-server-2012/" target="_blank"&gt;http://tk.azurewebsites.net/2012/07/17/how-to-setup-peer-to-peer-replication-in-azure-iaas-sql-server-2012/&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#0000ff;"&gt;&lt;em&gt;RPO: Varies&lt;/em&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#0000ff;"&gt;&lt;em&gt;RTO: Varies&lt;/em&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;span style="color:#993300;"&gt;References:&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;A whitepaper on the information I've discussed throughout this article and other options is available here: &lt;a href="http://msdn.microsoft.com/en-us/library/jj870962.aspx" target="_blank"&gt;http://msdn.microsoft.com/en-us/library/jj870962.aspx&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The &amp;ldquo;SQL AlwaysOn&amp;rdquo; Team Blog (where you may find more current information) is here: &lt;a href="http://sqlblog.com/b/sqlalwayson/" target="_blank"&gt;http://blogs.msdn.com/b/sqlalwayson/&lt;/a&gt;&lt;/p&gt;</description></item><item><title>Using Hadooop (HDInsight) with Microsoft - Two (OK, Three) Options </title><link>http://sqlblog.com/blogs/buck_woody/archive/2012/12/04/using-hadooop-hdinsight-with-microsoft-two-ok-three-options.aspx</link><pubDate>Tue, 04 Dec 2012 15:28:23 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:46509</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;Microsoft has many tools for &amp;ldquo;Big Data&amp;rdquo;. In fact, you need many tools &amp;ndash; there&amp;rsquo;s no product called &amp;ldquo;Big Data Solution&amp;rdquo; in a shrink-wrapped box &amp;ndash; if you find one, you probably shouldn&amp;rsquo;t buy it. It&amp;rsquo;s tempting to want a single tool that handles everything in a problem domain, but with large, complex data, that isn&amp;rsquo;t a reality. You&amp;rsquo;ll mix and match several systems, open and closed source, to solve a given problem.&lt;/p&gt;
&lt;p&gt;But there are tools that help with handling data at large, complex scales. Normally the best way to do this is to break up the data into parts, and then put the calculation engines for that chunk of data right on the node where the data is stored. These systems are in a family called &amp;ldquo;Distributed File and Compute&amp;rdquo;. Microsoft has a couple of these, including the &lt;a href="http://www.microsoft.com/hpc/en/us/default.aspx"&gt;High Performance Computing edition of Windows Server&lt;/a&gt;. Recently we partnered with &lt;a href="http://hortonworks.com/"&gt;Hortonworks&lt;/a&gt; to bring the &lt;a href="http://hadoop.apache.org/"&gt;Apache Foundation&amp;rsquo;s release of Hadoop&lt;/a&gt; to Windows. And as it turns out, there are actually two (technically three) ways you can use it.&lt;/p&gt;
&lt;p style="padding-left:30px;"&gt;&lt;span style="color:#993300;"&gt;&lt;em&gt;(There&amp;rsquo;s a more detailed set of information here: &lt;a href="http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/big-data.aspx"&gt;&lt;span style="color:#993300;"&gt;http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/big-data.aspx&lt;/span&gt;&lt;/a&gt;, I&amp;rsquo;ll cover the options at a general level below)&amp;nbsp; &lt;/em&gt;&lt;/span&gt;&lt;/p&gt;
&lt;h1&gt;First Option: Windows Azure HDInsight Service&lt;/h1&gt;
&lt;p&gt;&amp;nbsp;Your first option is that you can simply log on to a Hadoop control node and begin to run Pig or Hive statements against data that you have stored in Windows Azure. There&amp;rsquo;s nothing to set up (although you can configure things where needed), and you can send the commands, get the output of the job(s), and stop using the service when you are done &amp;ndash; and repeat the process later if you wish.&lt;/p&gt;
&lt;p&gt;(There are also connectors to run jobs from Microsoft Excel, but that&amp;rsquo;s another post)&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;a href="http://sqlblog.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79/0572.option_2D00_1.png"&gt;&lt;img src="http://sqlblog.com/resized-image.ashx/__size/550x0/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79/0572.option_2D00_1.png" alt="" width="367" height="212" border="0" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This option is useful when you have a periodic burst of work for a Hadoop workload, or the data collection has been happening into Windows Azure storage anyway. That might be from a web application, the logs from a web application, &lt;a href="http://en.wikipedia.org/wiki/Telemetry"&gt;telemetrics&lt;/a&gt; (remote sensor input), and other modes of constant collection. &amp;nbsp;&lt;/p&gt;
&lt;p&gt;You can read more about this option here: &amp;nbsp;&lt;a href="http://sqlblog.com/b/windowsazure/archive/2012/10/24/getting-started-with-windows-azure-hdinsight-service.aspx"&gt;http://blogs.msdn.com/b/windowsazure/archive/2012/10/24/getting-started-with-windows-azure-hdinsight-service.aspx&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;Second Option: Microsoft HDInsight Server&lt;/h1&gt;
&lt;p&gt;Your second option is to use the Hadoop Distribution for on-premises Windows called Microsoft HDInsight Server. You set up the Name Node(s), Job Tracker(s), and Data Node(s), among other components, and you have control over the entire ecostructure.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://sqlblog.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79/7041.option_2D00_2.png"&gt;&lt;img src="http://sqlblog.com/resized-image.ashx/__size/550x0/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79/7041.option_2D00_2.png" alt="" width="152" height="179" border="0" /&gt;&lt;/a&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;This option is useful if you want to &amp;nbsp;have complete control over the system, leave it running all the time, or you have a huge quantity of data that you have to bulk-load constantly &amp;ndash; something that isn&amp;rsquo;t going to be practical with a network transfer or disk-mailing scheme.&lt;/p&gt;
&lt;p&gt;You can read more about this option here: &lt;a href="http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/big-data.aspx"&gt;http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/big-data.aspx&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;Third Option (unsupported): Installation on Windows Azure Virtual Machines&lt;/h1&gt;
&lt;p&gt;&amp;nbsp;Although unsupported, you could simply use a Windows Azure Virtual Machine (we support both Windows and Linux servers) and install Hadoop yourself &amp;ndash; it&amp;rsquo;s open-source, so there&amp;rsquo;s nothing preventing you from doing that.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;a href="http://sqlblog.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79/0121.option_2D00_3.png"&gt;&lt;img src="http://sqlblog.com/resized-image.ashx/__size/550x0/__key/communityserver-blogs-components-weblogfiles/00-00-00-79-79/0121.option_2D00_3.png" alt="" width="326" height="188" border="0" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Aside from being unsupported, there are other issues you&amp;rsquo;ll run into with this approach &amp;ndash; primarily involving performance and the amount of configuration you&amp;rsquo;ll need to do to access the data nodes properly. But for a single-node installation (where all components run on one system) such as learning, demos, training and the like, this isn&amp;rsquo;t a bad option.&lt;/p&gt;
&lt;p&gt;Did I mention that&amp;rsquo;s unsupported? :) &lt;/p&gt;
&lt;p&gt;You can learn more about Windows Azure Virtual Machines here: &lt;a href="http://www.windowsazure.com/en-us/home/scenarios/virtual-machines/"&gt;http://www.windowsazure.com/en-us/home/scenarios/virtual-machines/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;And more about Hadoop and the installation/configuration (on Linux) here: &lt;a href="http://en.wikipedia.org/wiki/Apache_Hadoop"&gt;http://en.wikipedia.org/wiki/Apache_Hadoop&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;And more about the HDInsight installation here: &lt;a href="http://www.microsoft.com/web/gallery/install.aspx?appid=HDINSIGHT-PREVIEW"&gt;http://www.microsoft.com/web/gallery/install.aspx?appid=HDINSIGHT-PREVIEW&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;Choosing the right option&lt;/h1&gt;
&lt;p&gt;Since you have two or three routes you can go, the best thing to do is evaluate the need you have, and place the workload where it makes the most sense.&amp;nbsp; My suggestion is to install the HDInsight Server locally on a test system, and play around with it. Read up on the best ways to use Hadoop for a given workload, understand the parts, write a little Pig and Hive, and get your feet wet. Then sign up for a test account on HDInsight Service, and see how that leverages what you know. If you're a true tinkerer, go ahead and try the VM route as well. &lt;/p&gt;
&lt;p&gt;Oh - there&amp;rsquo;s another great reference on the Windows Azure HDInsight that just came out, here: &lt;a href="http://sqlblog.com/b/brunoterkaly/archive/2012/11/16/hadoop-on-azure-introduction.aspx"&gt;http://blogs.msdn.com/b/brunoterkaly/archive/2012/11/16/hadoop-on-azure-introduction.aspx&lt;/a&gt; &amp;nbsp;&lt;/p&gt;</description></item><item><title>The Data Scientist</title><link>http://sqlblog.com/blogs/buck_woody/archive/2011/11/15/the-data-scientist.aspx</link><pubDate>Tue, 15 Nov 2011 15:00:18 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:39814</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;A new term - well, perhaps not that new - has come up and I’m actually very excited about it. The term is Data Scientist, and since it’s new, it’s fairly undefined. I’ll explain what I &lt;em&gt;think&lt;/em&gt; it means, and why I’m excited about it.&lt;/p&gt;  &lt;p&gt;In general, I’ve found the term deals at its most basic with analyzing data. Of course, we all do that, and the term itself in that definition is redundant. There is no science that I know of that does not work with analyzing lots of data. But the term seems to refer to more than the common practices of looking at data visually, putting it in a spreadsheet or report, or even using simple coding to examine data sets. &lt;/p&gt;  &lt;p&gt;The term Data Scientist (as far as I can make out this early in it’s use) is someone who has a strong understanding of data sources, relevance (statistical and otherwise) and processing methods as well as front-end displays of large sets of complicated data. Some - but not all - Business Intelligence professionals have these skills. In other cases, senior developers, database architects or others fill these needs, but in my experience, many lack the strong mathematical skills needed to make these choices properly. &lt;/p&gt;  &lt;p&gt;I’ve divided the knowledge base for someone that would wear this title into three large segments. It remains to be seen if a given Data Scientist would be responsible for knowing all these areas or would specialize. There are pretty high requirements on the math side, specifically in graduate-degree level statistics, but in my experience a company will only have a few of these folks, so they are expected to know quite a bit in each of these areas. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Persistence&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;The first area is finding, cleaning and storing the data. In some cases, no cleaning is done prior to storage - it’s just identified and the cleansing is done in a later step. This area is where the professional would be able to tell if a particular data set should be stored in a Relational Database Management System (RDBMS), across a set of key/value pair storage (NoSQL) or in a file system like HDFS (part of the Hadoop landscape) or other methods. Or do you examine the stream of data without storing it in another system at all? &lt;/p&gt;  &lt;p&gt;This is an important decision - it’s a foundation choice that deals not only with a lot of expense of purchasing systems or even using Cloud Computing (PaaS, SaaS or IaaS) to source it, but also the skillsets and other resources needed to care and feed the system for a long time. The Data Scientist sets something into motion that will probably outlast his or her career at a company or organization.&lt;/p&gt;  &lt;p&gt;Often these choices are made by senior developers, database administrators or architects in a company. But sometimes each of these has a certain bias towards making a decision one way or another. The Data Scientist would examine these choices in light of the data itself, starting perhaps even before the business requirements are created. The business may not even be aware of all the strategic and tactical data sources that they have access to. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Processing&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;Once the decision is made to store the data, the next set of decisions are based around how to process the data. An RDBMS scales well to a certain level, and provides a high degree of ACID compliance as well as offering a well-known set-based language to work with this data. In other cases, scale should be spread among multiple nodes (as in the case of Hadoop landscapes or NoSQL offerings) or even across a Cloud provider like Windows Azure Table Storage. In fact, in many cases - most of the ones I’m dealing with lately - the data should be split among multiple types of processing environments. This is a newer idea. Many data professionals simply pick a methodology (RDBMS with Star Schemas, NoSQL, etc.) and put all data there, regardless of its shape, processing needs and so on. &lt;/p&gt;  &lt;p&gt;A Data Scientist is familiar not only with the various processing methods, but how they work, so that they can choose the right one for a given need. This is a huge time commitment, hence the need for a dedicated title like this one. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Presentation&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;This is where the need for a Data Scientist is most often already being filled, sometimes with more or less success. The latest Business Intelligence systems are quite good at allowing you to create amazing graphics - but it’s the data behind the graphics that are the most important component of truly effective displays. &lt;/p&gt;  &lt;p&gt;This is where the mathematics requirement of the Data Scientist title is the most unforgiving. In fact, someone without a good foundation in statistics is not a good candidate for creating reports. Even a basic level of statistics can be dangerous. Anyone who works in analyzing data will tell you that there are multiple errors possible when data just seems right - and basic statistics bears out that you’re on the right track - that are only solvable when you understanding why the statistical formula works the way it does. &lt;/p&gt;  &lt;p&gt;And there are lots of ways of presenting data. Sometimes all you need is a “yes” or “no” answer that can only come after heavy analysis work. In that case, a simple e-mail might be all the reporting you need. In others, complex relationships and multiple components require a deep understanding of the various graphical methods of presenting data. Knowing which kind of chart, color, graphic or shape conveys a particular datum best is essential knowledge for the Data Scientist. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Why I’m excited&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;I love this area of study. I like math, stats, and computing technologies, but it goes beyond that. I love what data can do - how it can help an organization. I’ve been fortunate enough in my professional career these past two decades to work with lots of folks who perform this role at companies from aerospace to medical firms, from manufacturing to retail. &lt;/p&gt;  &lt;p&gt;Interestingly, the size of the company really isn’t germane here. I worked with one very small bio-tech (cryogenics) company that worked deeply with analysis of complex interrelated data. &lt;/p&gt;  &lt;p&gt;So&amp;#160; watch this space. No, I’m not leaving Azure or distributed computing or Microsoft. In fact, I think I’m perfectly situated to investigate this role further. We have a huge set of tools, from RDBMS to Hadoop to allow me to explore. And I’m happy to share what I learn along the way. &lt;/p&gt;</description></item><item><title>Big Data and the Cloud - More Hype or a Real Workload?</title><link>http://sqlblog.com/blogs/buck_woody/archive/2011/10/18/big-data-and-the-cloud-more-hype-or-a-real-workload.aspx</link><pubDate>Tue, 18 Oct 2011 13:57:36 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:39156</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;Last week Microsoft announced several new offerings for “Big Data” - and since I’m a stickler for definitions, I wanted to make sure I understood what that really means. What is “Big Data”? What size hard drive is that? After all, my laptop has 1TB of storage - is my laptop “Big Data”?&lt;/p&gt;  &lt;p&gt;There are actually a few definitions for this term, most notably those involving the &lt;a href="http://nosql.mypopescu.com/post/9621746531/a-definition-of-big-data" target="_blank"&gt;“Four V’s” Volume, Velocity, Variety and Variability&lt;/a&gt;. Others &lt;a href="http://nosql.mypopescu.com/post/10120087314/big-data-and-the-4-vs-volume-velocity-variety" target="_blank"&gt;disagree with this&lt;/a&gt; definition. I tend to try and get things into their simplest form, so I’m using this definition for myself:&lt;/p&gt;  &lt;p align="center"&gt;&lt;font color="#c0504d" size="3"&gt;Big data is defined as a &lt;em&gt;large set &lt;/em&gt;of &lt;em&gt;computationally expensive &lt;/em&gt;data that is &lt;em&gt;worked on simultaneously&lt;/em&gt;.&lt;/font&gt; &lt;/p&gt;  &lt;p&gt;Let me flesh that out a&amp;#160; little. To be sure, “Big Data” has a larger size than say a few megabytes. The reason this is important is that it takes special hardware to be able to move large sets of data around, store it, process it and so on. (&lt;font color="#c0504d"&gt;large set&lt;/font&gt;)&lt;/p&gt;  &lt;p&gt;If you store a LOT of data, but only use a small portion of it at a time, that really isn’t super-hard to do. It’s mainly a storage issue at that point. But, if you do need to work with a large portion of the data at one time, then the memory, CPU and transfer components of the system have to adapt to be responsive - new ways to work with that data (game theory, knot-algorithms, map-reduce, etc.) need to be brought into play. (&lt;font color="#c0504d"&gt;computationally expensive&lt;/font&gt;)&lt;/p&gt;  &lt;p&gt;Once that data is loaded into the processing area (memory or whatever other mechanism is used) it must be worked on in parallel to come back in a reasonable time. You have two options here - you can scale the system up with more internal hardware (CPU’s, memory and so on) or you can scale it out to have multiple systems work on it at the same time using paradigms such as map/reduce and so on. Actually, when you lay this out in an architecture diagram, scale up or out doesn’t actually change the logical structure of the process - in scale out the network becomes the bus, and the nodes become more RAM and computing power. Of course, there are changes in code for how you stitch the workload back together. (&lt;font color="#c0504d"&gt;worked on simultaneously&lt;/font&gt;)&lt;/p&gt;  &lt;p&gt;So back to the original question. Is Big Data, as I have defined it here, a workload for Windows and SQL Azure? Absolutely! In fact, it’s probably one of the main workloads, and I believe it represents the latest, and perhaps also the earliest frontier of computing. Jim &lt;a href="http://research.microsoft.com/en-us/um/people/gray/" target="_blank"&gt;Gray, a former researcher here at Microsoft and a hero of mine, was working on this very topic.&lt;/a&gt; I believe as he did - all computing is simply an interface over data. &lt;/p&gt;  &lt;p&gt;Microsoft has multiple offerings on the topic of Big Data. In posts that follow from myself and my co-workers, we’ll explore when and where you use each one. Whether you are a data professional or a developer, this is the new frontier - &lt;a href="http://www.straightpathsql.com/archives/2011/10/microsoft-loves-your-big-data/" target="_blank"&gt;don’t wait to educate yourself&lt;/a&gt; on how to leverage Big Data for your organization. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Hadoop on Windows Azure and SQL Server&amp;#160; &lt;/strong&gt;- Microsoft’s &lt;a href="http://www.hortonworks.com/the-whys-behind-the-microsoft-and-hortonworks-partnership/" target="_blank"&gt;partnership to include Hadoop workloads on Windows Azure&lt;/a&gt; and &lt;a href="http://www.microsoft.com/download/en/details.aspx?id=27584" target="_blank"&gt;SQL Server/Parallel Data Warehouse (PDW)&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;LINQ to HPC &lt;/strong&gt;- Microsoft’s High-Performance Computing SKU of &lt;a href="http://blogs.technet.com/b/windowshpc/archive/2011/05/20/dryad-becomes-linq-to-hpc.aspx" target="_blank"&gt;HPC is now in Azure&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Windows Azure Table Storage &lt;/strong&gt;- A &lt;a href="http://msdn.microsoft.com/en-us/library/windowsazure/hh508997.aspx" target="_blank"&gt;key/value pair type storage with full partitioning&lt;/a&gt; that is immediately consistent, able to handle huge loads of data and works with any REST-compatible language&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;strong&gt;Other offerings &lt;/strong&gt;- Including the new &lt;a href="http://www.microsoft.com/en-us/sqlazurelabs/default.aspx" target="_blank"&gt;Data Explorer&lt;/a&gt;, &lt;a href="http://research.microsoft.com/en-us/news/headlines/daytona-071811.aspx" target="_blank"&gt;Project Daytona (with a Big Data Toolkit for Scientists and researchers)&lt;/a&gt;, &lt;a href="http://www.microsoft.com/sqlserver/en/us/future-editions/SQL-Server-2012-breakthrough-insight.aspx" target="_blank"&gt;Power View&lt;/a&gt; and more. &lt;/p&gt;  &lt;p&gt;The era of Big Data is here. And you can use Windows and SQL Azure to bring it to your organization. &lt;/p&gt;</description></item><item><title>SQL Azure Use Case: Shared Storage Application</title><link>http://sqlblog.com/blogs/buck_woody/archive/2011/04/26/sql-azure-use-case-shared-storage-application.aspx</link><pubDate>Tue, 26 Apr 2011 13:33:50 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:35207</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;&lt;span style="font-size:x-small;"&gt;&lt;em&gt;&lt;span style="font-size:small;"&gt;This is one in a series of posts on when and where to use a distributed architecture design in your organization's computing needs. You can find the main post here: &lt;/span&gt;&lt;a href="http://blogs.msdn.com/b/buckwoody/archive/2011/01/18/windows-azure-and-sql-azure-use-cases.aspx"&gt;&lt;span style="font-size:small;"&gt;&lt;u&gt;&lt;font color="#800080"&gt;http://blogs.msdn.com/b/buckwoody/archive/2011/01/18/windows-azure-and-sql-azure-use-cases.aspx&lt;/font&gt;&lt;/u&gt;&lt;/span&gt;&lt;/a&gt;&lt;span style="font-size:small;"&gt; &lt;/span&gt;&lt;/em&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;span style="font-size:small;"&gt;Description:&lt;/span&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="font-size:small;"&gt;On-premise data will be a part of computing for quite some time – perhaps permanently. Bandwidth requirements, security, or even financial considerations for large data sets often dictate that relational (on non-relational) systems will be maintained locally in many organizations, especially in enterprise computing. &lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="font-size:small;"&gt;But distributed data systems are useful in many situations. Organizations may wish to store a portion of data off-site, either for sharing the data with other applications (including web-based applications) or as a supplement to a High-Availability and Disaster Recovery (HADR) strategy.&lt;/span&gt;&lt;/p&gt; &lt;span style="font-size:small;"&gt;   &lt;p&gt;&lt;strong&gt;&lt;span style="font-size:small;"&gt;Implementation:&lt;/span&gt;&lt;/strong&gt;&lt;/p&gt;    &lt;p&gt;&lt;span style="font-size:small;"&gt;SQL Azure can be used to add an additional option to an HADR strategy by copying off portions (or all) of an on-premise database system.&lt;/span&gt;&lt;/p&gt;    &lt;p&gt;&lt;span style="font-size:small;"&gt;&lt;a href="http://blogs.msdn.com/cfs-file.ashx/__key/CommunityServer-Blogs-Components-WeblogFiles/00-00-00-79-79-metablogapi/3386.sql_2D00_aHADR_5F00_2.png"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="sql-aHADR" border="0" alt="sql-aHADR" src="http://blogs.msdn.com/cfs-file.ashx/__key/CommunityServer-Blogs-Components-WeblogFiles/00-00-00-79-79-metablogapi/4265.sql_2D00_aHADR_5F00_thumb.png" width="298" height="181" /&gt;&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;    &lt;p&gt;&lt;span style="font-size:small;"&gt;In this arrangement, on-premise systems remain as they are. Data is replicated using many technologies, such as SQL Server Integration Services (SSIS), scripts, or Microsoft’s Sync Framework to a SQL Azure database. This data can be kept “cold”, meaning that a manual process is required to bring the data back, or as a “warm” standby using connection string management in the application.&lt;/span&gt;&lt;/p&gt;    &lt;p&gt;&lt;span style="font-size:small;"&gt;Recently we architected a solution where a company kept a rolling two-week window of data replicated to SQL Azure using the &lt;a href="http://msdn.microsoft.com/en-us/sync/default.aspx" target="_blank"&gt;Sync Framework&lt;/a&gt;. The application, a compiled EXE running on user’s systems, had a “switch connections” button, that allowed the users to take a laptop to another location, select that option, and continue working from anywhere they had Internet connectivity. This required forethought and planning, and did not replace their primary HADR systems, but it did allow them to continue operations in the case of a severe outage at multiple sites. Since they are an emergency services provider, this gave them the highest redundancy.&lt;/span&gt;&lt;/p&gt;    &lt;p&gt;&lt;span style="font-size:small;"&gt;Another option is to amalgamate data from disparate sources. &lt;/span&gt;&lt;/p&gt;    &lt;p&gt;&lt;span style="font-size:small;"&gt;&lt;a href="http://blogs.msdn.com/cfs-file.ashx/__key/CommunityServer-Blogs-Components-WeblogFiles/00-00-00-79-79-metablogapi/6320.sql_2D00_aHyb_5F00_2.png"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="sql-aHyb" border="0" alt="sql-aHyb" src="http://blogs.msdn.com/cfs-file.ashx/__key/CommunityServer-Blogs-Components-WeblogFiles/00-00-00-79-79-metablogapi/2625.sql_2D00_aHyb_5F00_thumb.png" width="342" height="134" /&gt;&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;    &lt;p&gt;&lt;span style="font-size:small;"&gt;In this arrangement, two or more data services (one of which is SQL Azure) are accessed by a single program. The program queries each system independently, and using LINQ a single query can work across all of the data, assuming there is some sort of natural or artificial “key” that can join the data sets together. The user programs simply view this single data set as a single data source, unaware of the underlying data sets. This allows great flexibility and agility in the downstream program. The upstream data sources can change as long as the elements are kept consistent.&lt;/span&gt;&lt;/p&gt;    &lt;p&gt;&lt;span style="font-size:small;"&gt;There are performance and security implications to amalgamated data systems, but if architected carefully they provide multiple benefits. A few of of these are that other systems can access the individual data sources, reporting is simplified and standardized, and multiple copies of data are eliminated.&lt;/span&gt;&lt;/p&gt;   &lt;span style="font-size:small;"&gt;     &lt;p&gt;&lt;strong&gt;&lt;span style="font-size:small;"&gt;Resources:&lt;/span&gt;&lt;/strong&gt;&lt;/p&gt;      &lt;p&gt;&lt;span style="font-size:small;"&gt;You can read more about the Sync Framework and SQL Azure here: &lt;a href="http://social.technet.microsoft.com/wiki/contents/articles/sync-framework-sql-server-to-sql-azure-synchronization.aspx"&gt;http://social.technet.microsoft.com/wiki/contents/articles/sync-framework-sql-server-to-sql-azure-synchronization.aspx&lt;/a&gt;&amp;#160;&lt;/span&gt;&lt;/p&gt;      &lt;p&gt;&lt;span style="font-size:small;"&gt;If you are new to LINQ, you can find more resources on it here: &lt;a href="http://msdn.microsoft.com/en-us/library/bb308959.aspx"&gt;http://msdn.microsoft.com/en-us/library/bb308959.aspx&lt;/a&gt;&amp;#160;&lt;/span&gt;&lt;/p&gt;   &lt;/span&gt;&lt;/span&gt;</description></item></channel></rss>