<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Buck Woody : Data Professional, Azure Use Cases</title><link>http://sqlblog.com/blogs/buck_woody/archive/tags/Data+Professional/Azure+Use+Cases/default.aspx</link><description>Tags: Data Professional, Azure Use Cases</description><dc:language>en</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>Big Data and the Cloud - More Hype or a Real Workload?</title><link>http://sqlblog.com/blogs/buck_woody/archive/2011/10/18/big-data-and-the-cloud-more-hype-or-a-real-workload.aspx</link><pubDate>Tue, 18 Oct 2011 13:57:36 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:39156</guid><dc:creator>BuckWoody</dc:creator><slash:comments>0</slash:comments><comments>http://sqlblog.com/blogs/buck_woody/comments/39156.aspx</comments><wfw:commentRss>http://sqlblog.com/blogs/buck_woody/commentrss.aspx?PostID=39156</wfw:commentRss><description>&lt;p&gt;Last week Microsoft announced several new offerings for “Big Data” - and since I’m a stickler for definitions, I wanted to make sure I understood what that really means. What is “Big Data”? What size hard drive is that? After all, my laptop has 1TB of storage - is my laptop “Big Data”?&lt;/p&gt;  &lt;p&gt;There are actually a few definitions for this term, most notably those involving the &lt;a href="http://nosql.mypopescu.com/post/9621746531/a-definition-of-big-data" target="_blank"&gt;“Four V’s” Volume, Velocity, Variety and Variability&lt;/a&gt;. Others &lt;a href="http://nosql.mypopescu.com/post/10120087314/big-data-and-the-4-vs-volume-velocity-variety" target="_blank"&gt;disagree with this&lt;/a&gt; definition. I tend to try and get things into their simplest form, so I’m using this definition for myself:&lt;/p&gt;  &lt;p align="center"&gt;&lt;font color="#c0504d" size="3"&gt;Big data is defined as a &lt;em&gt;large set &lt;/em&gt;of &lt;em&gt;computationally expensive &lt;/em&gt;data that is &lt;em&gt;worked on simultaneously&lt;/em&gt;.&lt;/font&gt; &lt;/p&gt;  &lt;p&gt;Let me flesh that out a&amp;#160; little. To be sure, “Big Data” has a larger size than say a few megabytes. The reason this is important is that it takes special hardware to be able to move large sets of data around, store it, process it and so on. (&lt;font color="#c0504d"&gt;large set&lt;/font&gt;)&lt;/p&gt;  &lt;p&gt;If you store a LOT of data, but only use a small portion of it at a time, that really isn’t super-hard to do. It’s mainly a storage issue at that point. But, if you do need to work with a large portion of the data at one time, then the memory, CPU and transfer components of the system have to adapt to be responsive - new ways to work with that data (game theory, knot-algorithms, map-reduce, etc.) need to be brought into play. (&lt;font color="#c0504d"&gt;computationally expensive&lt;/font&gt;)&lt;/p&gt;  &lt;p&gt;Once that data is loaded into the processing area (memory or whatever other mechanism is used) it must be worked on in parallel to come back in a reasonable time. You have two options here - you can scale the system up with more internal hardware (CPU’s, memory and so on) or you can scale it out to have multiple systems work on it at the same time using paradigms such as map/reduce and so on. Actually, when you lay this out in an architecture diagram, scale up or out doesn’t actually change the logical structure of the process - in scale out the network becomes the bus, and the nodes become more RAM and computing power. Of course, there are changes in code for how you stitch the workload back together. (&lt;font color="#c0504d"&gt;worked on simultaneously&lt;/font&gt;)&lt;/p&gt;  &lt;p&gt;So back to the original question. Is Big Data, as I have defined it here, a workload for Windows and SQL Azure? Absolutely! In fact, it’s probably one of the main workloads, and I believe it represents the latest, and perhaps also the earliest frontier of computing. Jim &lt;a href="http://research.microsoft.com/en-us/um/people/gray/" target="_blank"&gt;Gray, a former researcher here at Microsoft and a hero of mine, was working on this very topic.&lt;/a&gt; I believe as he did - all computing is simply an interface over data. &lt;/p&gt;  &lt;p&gt;Microsoft has multiple offerings on the topic of Big Data. In posts that follow from myself and my co-workers, we’ll explore when and where you use each one. Whether you are a data professional or a developer, this is the new frontier - &lt;a href="http://www.straightpathsql.com/archives/2011/10/microsoft-loves-your-big-data/" target="_blank"&gt;don’t wait to educate yourself&lt;/a&gt; on how to leverage Big Data for your organization. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Hadoop on Windows Azure and SQL Server&amp;#160; &lt;/strong&gt;- Microsoft’s &lt;a href="http://www.hortonworks.com/the-whys-behind-the-microsoft-and-hortonworks-partnership/" target="_blank"&gt;partnership to include Hadoop workloads on Windows Azure&lt;/a&gt; and &lt;a href="http://www.microsoft.com/download/en/details.aspx?id=27584" target="_blank"&gt;SQL Server/Parallel Data Warehouse (PDW)&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;LINQ to HPC &lt;/strong&gt;- Microsoft’s High-Performance Computing SKU of &lt;a href="http://blogs.technet.com/b/windowshpc/archive/2011/05/20/dryad-becomes-linq-to-hpc.aspx" target="_blank"&gt;HPC is now in Azure&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Windows Azure Table Storage &lt;/strong&gt;- A &lt;a href="http://msdn.microsoft.com/en-us/library/windowsazure/hh508997.aspx" target="_blank"&gt;key/value pair type storage with full partitioning&lt;/a&gt; that is immediately consistent, able to handle huge loads of data and works with any REST-compatible language&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;strong&gt;Other offerings &lt;/strong&gt;- Including the new &lt;a href="http://www.microsoft.com/en-us/sqlazurelabs/default.aspx" target="_blank"&gt;Data Explorer&lt;/a&gt;, &lt;a href="http://research.microsoft.com/en-us/news/headlines/daytona-071811.aspx" target="_blank"&gt;Project Daytona (with a Big Data Toolkit for Scientists and researchers)&lt;/a&gt;, &lt;a href="http://www.microsoft.com/sqlserver/en/us/future-editions/SQL-Server-2012-breakthrough-insight.aspx" target="_blank"&gt;Power View&lt;/a&gt; and more. &lt;/p&gt;  &lt;p&gt;The era of Big Data is here. And you can use Windows and SQL Azure to bring it to your organization. &lt;/p&gt;&lt;img src="http://sqlblog.com/aggbug.aspx?PostID=39156" width="1" height="1"&gt;</description><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Azure/default.aspx">Azure</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Azure+Use+Cases/default.aspx">Azure Use Cases</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Career/default.aspx">Career</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Cloud/default.aspx">Cloud</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Cloud+Computing/default.aspx">Cloud Computing</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Concepts/default.aspx">Concepts</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Conferences/default.aspx">Conferences</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Data/default.aspx">Data</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Data+Professional/default.aspx">Data Professional</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/DBA/default.aspx">DBA</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Developer/default.aspx">Developer</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Microsoft/default.aspx">Microsoft</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/PASS/default.aspx">PASS</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Policy+Based+Management/default.aspx">Policy Based Management</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/SQL+Azure/default.aspx">SQL Azure</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/SQL+Server/default.aspx">SQL Server</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/SQLServer/default.aspx">SQLServer</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Storage/default.aspx">Storage</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Windows+Azure/default.aspx">Windows Azure</category></item><item><title>SQL Azure Use Case: Shared Data Hub</title><link>http://sqlblog.com/blogs/buck_woody/archive/2011/04/05/sql-azure-use-case-shared-data-hub.aspx</link><pubDate>Tue, 05 Apr 2011 14:10:50 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:34672</guid><dc:creator>BuckWoody</dc:creator><slash:comments>0</slash:comments><comments>http://sqlblog.com/blogs/buck_woody/comments/34672.aspx</comments><wfw:commentRss>http://sqlblog.com/blogs/buck_woody/commentrss.aspx?PostID=34672</wfw:commentRss><description>&lt;p&gt;&lt;span style="font-size:x-small;"&gt;&lt;em&gt;&lt;span style="font-size:small;"&gt;This is one in a series of posts on when and where to use a distributed architecture design in your organization's computing needs. You can find the main post here: &lt;/span&gt;&lt;a href="http://blogs.msdn.com/b/buckwoody/archive/2011/01/18/windows-azure-and-sql-azure-use-cases.aspx"&gt;&lt;span style="font-size:small;"&gt;&lt;u&gt;&lt;font color="#800080"&gt;http://blogs.msdn.com/b/buckwoody/archive/2011/01/18/windows-azure-and-sql-azure-use-cases.aspx&lt;/font&gt;&lt;/u&gt;&lt;/span&gt;&lt;/a&gt;&lt;span style="font-size:small;"&gt; &lt;/span&gt;&lt;/em&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;span style="font-size:small;"&gt;Description:&lt;/span&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&lt;font size="2"&gt;Organizations often need to share all or part of a data set, which is consumed by other systems. These systems can be on-premise or at another location, or even at a different organization. &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font size="2"&gt;Many times these systems use a well-defined data interchange system, such as EDI or other standards. In the case of a trusted system, simply using a direct connection into another database is the process used to transfer data. This process might be one-way or bi-directional.&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font size="2"&gt;But there are systems that transfer data back and forth in stages using intermediate systems. A typical data flow in this case looks similar to the following:&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font size="2"&gt;&lt;a href="http://blogs.msdn.com/cfs-file.ashx/__key/CommunityServer-Blogs-Components-WeblogFiles/00-00-00-79-79-metablogapi/7823.SADH_2D00_1_5F00_2.png"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="SADH-1" border="0" alt="SADH-1" src="http://blogs.msdn.com/cfs-file.ashx/__key/CommunityServer-Blogs-Components-WeblogFiles/00-00-00-79-79-metablogapi/7206.SADH_2D00_1_5F00_thumb.png" width="550" height="227" /&gt;&lt;/a&gt;&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font size="2"&gt;In this example, the owning system contains data set A. This is set to a staging system or server, where the receiving system collects it. The receiving system contains data set B and works with data set A to create a new data set, C. This new data is consumed by the original system to complete the cycle. A concrete example is an inventory control system. Data set A is the original inventory list, shipped to a manufacturer. The manufacturer consumes the inventory available, orders and components, and returns the ordering bid with any changes to the staging server as data set C. The data is consumed by the originating system and components are noted in the overall flow of data set A.&lt;/font&gt;&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font size="2"&gt;&lt;em&gt;Note: Normally this is solved with a full EDI implementation, but this process is still a common practice.&lt;/em&gt;&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;font size="2"&gt;There are other examples, but the general concept is one where the need is for two, possibly untrusted systems to share a common source of data.&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;span style="font-size:small;"&gt;Implementation:&lt;/span&gt;&lt;/strong&gt; &lt;/p&gt;  &lt;p&gt;&lt;font size="2"&gt;One possible solution is to segregate the data that is being transferred into an agree-upon set of entities that can be added or edited real-time, where both systems (or many) feed from the same data set instead of shipping the data. This removes latency, improves data quality, and shares the cost of the data. Also, security is increased because there are no shared logins - each firm gets its own.&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font size="2"&gt;&lt;a href="http://blogs.msdn.com/cfs-file.ashx/__key/CommunityServer-Blogs-Components-WeblogFiles/00-00-00-79-79-metablogapi/6232.SADH_2D00_2_5F00_2.png"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="SADH-2" border="0" alt="SADH-2" src="http://blogs.msdn.com/cfs-file.ashx/__key/CommunityServer-Blogs-Components-WeblogFiles/00-00-00-79-79-metablogapi/0044.SADH_2D00_2_5F00_thumb.png" width="412" height="283" /&gt;&lt;/a&gt;&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font size="2"&gt;One consideration with this layout is that the source systems must be altered to use a shared data set. If this is not possible, this is still a possibility as the system can be used as it was before - a data transfer - but the data can be cleansed real-time by both systems. It’s also a more secure and shared-cost system even if used in the original manner.&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font size="2"&gt;&lt;strong&gt;&lt;span style="font-size:small;"&gt;Resources:&lt;/span&gt;&lt;/strong&gt;&lt;/font&gt;    &lt;p&gt;&lt;font size="2"&gt;Security is a concern in this arrangement, so it’s best to understand exactly how the security works in SQL Azure: &lt;a href="http://msdn.microsoft.com/en-us/library/ff394108.aspx"&gt;http://msdn.microsoft.com/en-us/library/ff394108.aspx&lt;/a&gt;&amp;#160;&lt;/font&gt;&lt;/p&gt;    &lt;p&gt;Another possibility to solve this pattern is to use Data Sync, in many different arrangements that involve SQL Azure. You can learn more about it here: &lt;a href="http://blogs.msdn.com/b/sync/archive/2010/10/07/windows-azure-sync-service-demo-available-for-download.aspx"&gt;http://blogs.msdn.com/b/sync/archive/2010/10/07/windows-azure-sync-service-demo-available-for-download.aspx&lt;/a&gt;&lt;/p&gt;&lt;/p&gt;&lt;img src="http://sqlblog.com/aggbug.aspx?PostID=34672" width="1" height="1"&gt;</description><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Azure/default.aspx">Azure</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Azure+Use+Cases/default.aspx">Azure Use Cases</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Cloud/default.aspx">Cloud</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Cloud+Computing/default.aspx">Cloud Computing</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Concepts/default.aspx">Concepts</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Data/default.aspx">Data</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/Data+Professional/default.aspx">Data Professional</category><category domain="http://sqlblog.com/blogs/buck_woody/archive/tags/SQL+Azure/default.aspx">SQL Azure</category></item></channel></rss>