<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Search results matching tags 'Data' and 'Development'</title><link>http://sqlblog.com/search/SearchResults.aspx?o=DateDescending&amp;tag=Data,Development&amp;orTags=0</link><description>Search results matching tags 'Data' and 'Development'</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>Cloud Computing Patterns: Using Data Transaction Commitment Models for Design</title><link>http://sqlblog.com/blogs/buck_woody/archive/2012/02/14/cloud-computing-patterns-using-data-transaction-commitment-models-for-design.aspx</link><pubDate>Tue, 14 Feb 2012 20:45:47 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:41744</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;There are multiple ways to store data in a cloud provider, specifically around Windows and SQL Azure. As part of a &amp;ldquo;Data First&amp;rdquo; architecture design, one decision vector &amp;ndash; assuming you&amp;rsquo;ve already done a data classification of the elements you want to store &amp;ndash; is to decide the transaction level you need for that datum.&amp;nbsp; Once you&amp;rsquo;ve decided on what level of transactional commitment you need, you can make intelligent decisions about the storage engine, method of access and storage, speed and other requirements.&lt;/p&gt;
&lt;p&gt;Although the list below is neither original nor exhaustive, these are the general considerations I use for a given data set. It&amp;rsquo;s important to note that in many on premises systems the engine choice at hand overrides these concerns. If you have a large Relational Database Management System (RDBMS) for instance, you might simply place all data there without further consideration. In a Platform as a Service (PaaS) like Windows and SQL Azure, however, selection of the proper engine for a particular dataset has implications ranging from cost to performance, and selecting the right engine is critical when you want to leverage the data across &amp;ldquo;Bid Data&amp;rdquo; analysis like Hadoop or other constructs.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Monolithic Consistent Transactional&lt;/strong&gt;&lt;br /&gt;The first selection is analogous to a local RDBMS system. The dataset is retrieved in a functionally single, monolithic transaction, i.e. kept together with ACID properties in mind. This is the most reliable type of data design for datasets that require a high degree of safety in the read/write pattern. As an example, a bank ATM transaction should be modeled in a monolithic way. If I make a transfer of funds from one account to another, I want the money to be subtracted from one account if and only if it is successfully added to the other. The bank, on the other hand, wants the money added to the second account if and only if it is subtracted from the first. This is a prime example of a monolithic (single atomic transaction), Consistent (if and only if) and Transactional (as a unit, with provision for roll-back and reporting if unsuccessful) data requirement.&lt;/p&gt;
&lt;p&gt;The primary engine used for this type of data is often SQL Azure &amp;ndash; an RDMBS in the same datacenters as Windows Azure. Placing both the calling application, whether that is a Data Access Layer-based code widget or a direct call from a Web or Worker Role, means that data is retrieved quickly and in a monolithic way. The costs for this method is based on overall database size.&amp;nbsp; A consideration is how much data you can store this way. Database sizes have limits, although there are ways of overcoming size issues using technologies such as Sharding or SQL Azure Federations. There is also the consideration of performance. In an RDBMs that conforms to ACID properties, locking and other overhead for safety is at conflict with the highest possible read performance.&amp;nbsp; But in some cases the ACID properties are worth the cost, as in the banking example.&lt;/p&gt;
&lt;p&gt;You are not limited to SQL Azure in this model. Windows Azure Table storage, while similar to NoSQL offerings is different in that it is immediately consistent across all three replicated copies of data, offering a higher degree of safety. And while Table storage does not offer built-in support for transactions, there are ways to achieve certain transaction levels.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Monolithic Realtime&lt;/strong&gt;&lt;br /&gt;If consistency can be relaxed &amp;ndash; meaning that a guaranteed read/write patter is not essential &amp;ndash; then more options arise in Windows and SQL Azure. You can still use SQL Azure for this type of storage, with either automatic or programmatic hints allowing for &amp;ldquo;dirty reads&amp;rdquo;. Windows Azure Table storage is still consistent, but the selection of the method for querying the data such as separate copies of read and write data can be employed. Because of the relaxed transaction nature, higher speeds are possible by querying cached or separate datasets.&lt;/p&gt;
&lt;p&gt;An example here is that same transaction from the bank, but a statement inquiry. Just after the money is deposited, the user wishes to query the current balance. The current balance &amp;ndash; minus the transaction that just occurred &amp;ndash; is retrieved and shown to the user, perhaps even combining the amount with the latest transaction, perhaps saved as a local cached object, with a caveat to the user.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Distributed Realtime&lt;/strong&gt;&lt;br /&gt;At some point, the data becomes too large to fit inside a single processing session, and parallelism is used. In this case, either separate databases in SQL Azure or Windows Azure Tables, local data storage on the Web or Worker Role, or a combination of all with Caches is the right approach for the data design.&lt;/p&gt;
&lt;p&gt;The biggest implication in this type of system is speed &amp;ndash; a higher degree of data separation is essential, and so the dataset selection must fit the pattern. It is unacceptable to force an ACID-properties type workload into this environment. Typical examples here are the actual data asset payload for streaming video or music, read-only documents and so on. This pattern is often separated from the meta-data, which is kept in more of a transactional model.&lt;/p&gt;
&lt;p&gt;As an example, assume you log on to a website to watch a movie or listen to music. The provider needs to verify your identity and account balance, which are transactional data loads. After that process is complete, the workload shifts to a copy &amp;ndash; perhaps one of several &amp;ndash; of the asset to stream to your location.&lt;/p&gt;
&lt;p&gt;In this case, Windows Azure Blob storage, along with the Content Delivery Network (CDN &amp;ndash; a series of servers closer to the user) is employed along with the transactional realtime requirements for the metadata.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Distributed Eventual&lt;/strong&gt;&lt;br /&gt;At the furthest end of the data scale are large datasets that need deeper analysis, but not necessarily in realtime. Examples here are terrabytes of data requiring a Business Intelligence view, but with a tolerance of a few seconds to minutes or hours. In this case, Storage, Processing and Query methods, such as the Hadoop offering in Windows Azure, or perhaps the High Performance Computing (HPC) Windows Server in Windows Azure fit well.&amp;nbsp; Here, the design of the data is often dictated by the source, and more emphasis is placed on the algorithms around processing and re-assembling the data.&lt;/p&gt;
&lt;p&gt;There are, of course, other patterns. In many cases a single dataset may have needs in one or more of these categories &amp;ndash; in fact, sitting at 30,000 feet typing this entry, I&amp;rsquo;m having that very design discussion with a gentleman sitting next to me. The key is to design data-first, and fit the technology to the requirement for each datum. Allow each function and engine to handle the data in the most efficient, effective way for cost, performance and utility.&lt;/p&gt;</description></item><item><title>Rip and Replace or Extend and Embrace?</title><link>http://sqlblog.com/blogs/buck_woody/archive/2011/09/13/rip-and-replace-or-extend-and-embrace.aspx</link><pubDate>Tue, 13 Sep 2011 11:20:05 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:38437</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;As most of you know, I don&amp;rsquo;t like the term &amp;ldquo;cloud&amp;rdquo; very&lt;br /&gt;much. It isn&amp;rsquo;t defined, which means it can be anything. I prefer &amp;ldquo;distributed&lt;br /&gt;computing&amp;rdquo;, which is more technically accurate and describes what you&amp;rsquo;re doing&lt;br /&gt;in more concrete terms.&lt;/p&gt;
&lt;p&gt;So when you think about Windows and SQL Azure, you don&amp;rsquo;t&lt;br /&gt;have to think about an entire product &amp;ndash; you can use parts of the system&lt;br /&gt;together or independently to accomplish what you need to do. You can use the&lt;br /&gt;computing functions, storage, and more and more I see folks leverage the&lt;br /&gt;Service Bus to enable current applications to expose things to the web.&lt;/p&gt;
&lt;p&gt;And that brings up the point of this post. Once you decide&lt;br /&gt;that a distributed architecture works to solve a problem, you&amp;rsquo;re faced with a&lt;br /&gt;decision: should you completely re-write your architecture to take advantage of&lt;br /&gt;the current systems or should you just fold in new code that makes the data or&lt;br /&gt;function available to the web?&lt;/p&gt;
&lt;p&gt;Of course, the answer is always &amp;ldquo;it depends&amp;rdquo; on the situation&lt;br /&gt;&amp;ndash; and it does. But unless you&amp;rsquo;re fixing a problem with current code, I usually&lt;br /&gt;advocate a migration approach. That means at the very least retaining the&lt;br /&gt;business logic (again, unless it&amp;rsquo;s not currently working) and as much of the&lt;br /&gt;code as you can. In fact, if you follow this paradigm, you&amp;rsquo;re on your way to&lt;br /&gt;making a Service Bus out of the functions you currently have. You can expose&lt;br /&gt;the results of a system rather than opening the system up. Let&amp;rsquo;s take an&lt;br /&gt;example.&lt;/p&gt;
&lt;p&gt;Assume for a moment that you have an order-taking system&lt;br /&gt;on-premise. That system performs many functions, one of which might creating a&lt;br /&gt;Purchase Order. Your system might be enclosed, meaning that it has an&lt;br /&gt;application that talks to a middle-tier, and then from there to a database&lt;br /&gt;system. A query is generated from a screen, and passed along to eventually&lt;br /&gt;compute, store and return a Purchase Order Number, along with other&lt;br /&gt;information. Imagine now that you wire up the code not only to return the PO&lt;br /&gt;number to the client, but to make that number available on an endpoint &amp;ndash;&lt;br /&gt;actually really not that hard to do.&lt;/p&gt;
&lt;p&gt;Now you can make that PO number available to the web using&lt;br /&gt;Azure. You could restrict who can make that call to the system, or open it up&lt;br /&gt;to a broader audience. Or instead of the PO Number, you could make a product&lt;br /&gt;list available. And you can go further than that &amp;ndash; EBay, for instance, uses the&lt;br /&gt;OData protocol (which is very cool in and of itself) which you can query from&lt;br /&gt;the web. You could compare your company&amp;rsquo;s product catalog to what is on EBay,&lt;br /&gt;and list the items you have there if there are no competitors in that space.&lt;br /&gt;And on and on it goes.&lt;/p&gt;
&lt;p&gt;So the point is this &amp;ndash; where you can, retain what works.&lt;br /&gt;Fold in systems like Azure where they make sense. Extend and Embrace.&lt;/p&gt;</description></item><item><title>Cloud Computing and the Importance of Code Diagrams</title><link>http://sqlblog.com/blogs/buck_woody/archive/2011/05/03/cloud-computing-and-the-importance-of-code-diagrams.aspx</link><pubDate>Tue, 03 May 2011 13:59:20 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:35407</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;Most mature development shops use various code diagrams to give a symbolic representation of high-level and database code structures. Standards such as &lt;a href="http://www.bpmb.de/images/BPMN2_0_Poster_EN.pdf" target="_blank"&gt;Business Process Model Notation&lt;/a&gt; (BPMN), &lt;a href="http://www.informit.com/guides/content.aspx?g=sqlserver&amp;amp;seqNum=62" target="_blank"&gt;Entity Relationship Diagrams&lt;/a&gt; (ERD) and the &lt;a href="http://uml.org/" target="_blank"&gt;Unified Modeling Language&lt;/a&gt; (UML) are a few I use all the time. &lt;/p&gt;  &lt;p&gt;In the Distributed Computing (Cloud Computing) paradigm, these three diagrams (or their equivalent) become essential. In the past, I’ve been able to rely on a single architecture where my code will run. I understand the servers, the networking and the path the code takes between the client and the components within that architecture.&lt;/p&gt;  &lt;p&gt;With Distributed Computing (DC), the architecture changes. In fact, the reason I use the term “Distributed Computing” instead of “Cloud Computing” most often (except in the title of this post, as you can see) is that I feel it’s more technically accurate about how we write code. I don’t view DC coding as an “all or nothing” exercise – I view it as just another option to solve a computing problem. A “hybrid” approach, where I mix in the strengths of a cloud provider is often a great way to leverage the best cost, performance and other advantages of each part of your solution. It can also help keep data secure, provide options for High Availability and Disaster Recovery, and more.&lt;/p&gt;  &lt;p&gt;To gain these advantages, we have to think more about the components of the application rather than a monolithic stack of components in a single architecture. And that brings us to the title of this post…&lt;/p&gt;  &lt;p&gt;For us to correctly identify code components, database objects, security paths and other elements, we have to be able to conceptualize them. And that’s where those diagrams come into play. Starting with some sort of business or organizational need, we can use BPMN or UML Actor diagrams to explain what the program needs to do. That helps segregate the security and location requirements. For instance, if&amp;#160; the BPMN shows a data access to Private Information, we can evaluate the need for an on-premise system that is federated to a DC provider. If the business users need global access, we can decide whether to set up a VPN to allow access to an on-premise system or whether a login component can be used on the web.&lt;/p&gt;  &lt;p&gt;After determining the flow of the program, move on to the data the system will store. In the case of Windows and SQL Azure, there are several options for storing data. In the past, I’ve often selected a single storage type, such as an RDBMS, and stored program data there. Now we can store in multiple formats, in multiple locations and more. The ERD is pivotal, because it defines data types, which can help decisions around where things go. Another important aspect to the data decision which is not covered in an ERD (but perhaps should be) is the estimated size and growth of a datum, since that can also drive the decision on where to put a data component.&lt;/p&gt;  &lt;p&gt;From there, the UML document helps me understand where each computing element can live. There are strengths for each type of computing, and using the UML diagram I can place each code component in the best environment for speed, security and other considerations.&lt;/p&gt;  &lt;p&gt;So in the new Distributed Computing world, these graphical documents do much more than just help design the application – they can help define the architecture as well.&lt;/p&gt;</description></item></channel></rss>