<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Search results matching tags 'developer', 'DBA', and 'SQL Azure'</title><link>http://sqlblog.com/search/SearchResults.aspx?o=DateDescending&amp;tag=developer,DBA,SQL+Azure&amp;orTags=0</link><description>Search results matching tags 'developer', 'DBA', and 'SQL Azure'</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>The Data Scientist</title><link>http://sqlblog.com/blogs/buck_woody/archive/2011/11/15/the-data-scientist.aspx</link><pubDate>Tue, 15 Nov 2011 15:00:18 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:39814</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;A new term - well, perhaps not that new - has come up and I’m actually very excited about it. The term is Data Scientist, and since it’s new, it’s fairly undefined. I’ll explain what I &lt;em&gt;think&lt;/em&gt; it means, and why I’m excited about it.&lt;/p&gt;  &lt;p&gt;In general, I’ve found the term deals at its most basic with analyzing data. Of course, we all do that, and the term itself in that definition is redundant. There is no science that I know of that does not work with analyzing lots of data. But the term seems to refer to more than the common practices of looking at data visually, putting it in a spreadsheet or report, or even using simple coding to examine data sets. &lt;/p&gt;  &lt;p&gt;The term Data Scientist (as far as I can make out this early in it’s use) is someone who has a strong understanding of data sources, relevance (statistical and otherwise) and processing methods as well as front-end displays of large sets of complicated data. Some - but not all - Business Intelligence professionals have these skills. In other cases, senior developers, database architects or others fill these needs, but in my experience, many lack the strong mathematical skills needed to make these choices properly. &lt;/p&gt;  &lt;p&gt;I’ve divided the knowledge base for someone that would wear this title into three large segments. It remains to be seen if a given Data Scientist would be responsible for knowing all these areas or would specialize. There are pretty high requirements on the math side, specifically in graduate-degree level statistics, but in my experience a company will only have a few of these folks, so they are expected to know quite a bit in each of these areas. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Persistence&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;The first area is finding, cleaning and storing the data. In some cases, no cleaning is done prior to storage - it’s just identified and the cleansing is done in a later step. This area is where the professional would be able to tell if a particular data set should be stored in a Relational Database Management System (RDBMS), across a set of key/value pair storage (NoSQL) or in a file system like HDFS (part of the Hadoop landscape) or other methods. Or do you examine the stream of data without storing it in another system at all? &lt;/p&gt;  &lt;p&gt;This is an important decision - it’s a foundation choice that deals not only with a lot of expense of purchasing systems or even using Cloud Computing (PaaS, SaaS or IaaS) to source it, but also the skillsets and other resources needed to care and feed the system for a long time. The Data Scientist sets something into motion that will probably outlast his or her career at a company or organization.&lt;/p&gt;  &lt;p&gt;Often these choices are made by senior developers, database administrators or architects in a company. But sometimes each of these has a certain bias towards making a decision one way or another. The Data Scientist would examine these choices in light of the data itself, starting perhaps even before the business requirements are created. The business may not even be aware of all the strategic and tactical data sources that they have access to. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Processing&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;Once the decision is made to store the data, the next set of decisions are based around how to process the data. An RDBMS scales well to a certain level, and provides a high degree of ACID compliance as well as offering a well-known set-based language to work with this data. In other cases, scale should be spread among multiple nodes (as in the case of Hadoop landscapes or NoSQL offerings) or even across a Cloud provider like Windows Azure Table Storage. In fact, in many cases - most of the ones I’m dealing with lately - the data should be split among multiple types of processing environments. This is a newer idea. Many data professionals simply pick a methodology (RDBMS with Star Schemas, NoSQL, etc.) and put all data there, regardless of its shape, processing needs and so on. &lt;/p&gt;  &lt;p&gt;A Data Scientist is familiar not only with the various processing methods, but how they work, so that they can choose the right one for a given need. This is a huge time commitment, hence the need for a dedicated title like this one. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Presentation&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;This is where the need for a Data Scientist is most often already being filled, sometimes with more or less success. The latest Business Intelligence systems are quite good at allowing you to create amazing graphics - but it’s the data behind the graphics that are the most important component of truly effective displays. &lt;/p&gt;  &lt;p&gt;This is where the mathematics requirement of the Data Scientist title is the most unforgiving. In fact, someone without a good foundation in statistics is not a good candidate for creating reports. Even a basic level of statistics can be dangerous. Anyone who works in analyzing data will tell you that there are multiple errors possible when data just seems right - and basic statistics bears out that you’re on the right track - that are only solvable when you understanding why the statistical formula works the way it does. &lt;/p&gt;  &lt;p&gt;And there are lots of ways of presenting data. Sometimes all you need is a “yes” or “no” answer that can only come after heavy analysis work. In that case, a simple e-mail might be all the reporting you need. In others, complex relationships and multiple components require a deep understanding of the various graphical methods of presenting data. Knowing which kind of chart, color, graphic or shape conveys a particular datum best is essential knowledge for the Data Scientist. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Why I’m excited&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;I love this area of study. I like math, stats, and computing technologies, but it goes beyond that. I love what data can do - how it can help an organization. I’ve been fortunate enough in my professional career these past two decades to work with lots of folks who perform this role at companies from aerospace to medical firms, from manufacturing to retail. &lt;/p&gt;  &lt;p&gt;Interestingly, the size of the company really isn’t germane here. I worked with one very small bio-tech (cryogenics) company that worked deeply with analysis of complex interrelated data. &lt;/p&gt;  &lt;p&gt;So&amp;#160; watch this space. No, I’m not leaving Azure or distributed computing or Microsoft. In fact, I think I’m perfectly situated to investigate this role further. We have a huge set of tools, from RDBMS to Hadoop to allow me to explore. And I’m happy to share what I learn along the way. &lt;/p&gt;</description></item><item><title>Big Data and the Cloud - More Hype or a Real Workload?</title><link>http://sqlblog.com/blogs/buck_woody/archive/2011/10/18/big-data-and-the-cloud-more-hype-or-a-real-workload.aspx</link><pubDate>Tue, 18 Oct 2011 13:57:36 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:39156</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;Last week Microsoft announced several new offerings for “Big Data” - and since I’m a stickler for definitions, I wanted to make sure I understood what that really means. What is “Big Data”? What size hard drive is that? After all, my laptop has 1TB of storage - is my laptop “Big Data”?&lt;/p&gt;  &lt;p&gt;There are actually a few definitions for this term, most notably those involving the &lt;a href="http://nosql.mypopescu.com/post/9621746531/a-definition-of-big-data" target="_blank"&gt;“Four V’s” Volume, Velocity, Variety and Variability&lt;/a&gt;. Others &lt;a href="http://nosql.mypopescu.com/post/10120087314/big-data-and-the-4-vs-volume-velocity-variety" target="_blank"&gt;disagree with this&lt;/a&gt; definition. I tend to try and get things into their simplest form, so I’m using this definition for myself:&lt;/p&gt;  &lt;p align="center"&gt;&lt;font color="#c0504d" size="3"&gt;Big data is defined as a &lt;em&gt;large set &lt;/em&gt;of &lt;em&gt;computationally expensive &lt;/em&gt;data that is &lt;em&gt;worked on simultaneously&lt;/em&gt;.&lt;/font&gt; &lt;/p&gt;  &lt;p&gt;Let me flesh that out a&amp;#160; little. To be sure, “Big Data” has a larger size than say a few megabytes. The reason this is important is that it takes special hardware to be able to move large sets of data around, store it, process it and so on. (&lt;font color="#c0504d"&gt;large set&lt;/font&gt;)&lt;/p&gt;  &lt;p&gt;If you store a LOT of data, but only use a small portion of it at a time, that really isn’t super-hard to do. It’s mainly a storage issue at that point. But, if you do need to work with a large portion of the data at one time, then the memory, CPU and transfer components of the system have to adapt to be responsive - new ways to work with that data (game theory, knot-algorithms, map-reduce, etc.) need to be brought into play. (&lt;font color="#c0504d"&gt;computationally expensive&lt;/font&gt;)&lt;/p&gt;  &lt;p&gt;Once that data is loaded into the processing area (memory or whatever other mechanism is used) it must be worked on in parallel to come back in a reasonable time. You have two options here - you can scale the system up with more internal hardware (CPU’s, memory and so on) or you can scale it out to have multiple systems work on it at the same time using paradigms such as map/reduce and so on. Actually, when you lay this out in an architecture diagram, scale up or out doesn’t actually change the logical structure of the process - in scale out the network becomes the bus, and the nodes become more RAM and computing power. Of course, there are changes in code for how you stitch the workload back together. (&lt;font color="#c0504d"&gt;worked on simultaneously&lt;/font&gt;)&lt;/p&gt;  &lt;p&gt;So back to the original question. Is Big Data, as I have defined it here, a workload for Windows and SQL Azure? Absolutely! In fact, it’s probably one of the main workloads, and I believe it represents the latest, and perhaps also the earliest frontier of computing. Jim &lt;a href="http://research.microsoft.com/en-us/um/people/gray/" target="_blank"&gt;Gray, a former researcher here at Microsoft and a hero of mine, was working on this very topic.&lt;/a&gt; I believe as he did - all computing is simply an interface over data. &lt;/p&gt;  &lt;p&gt;Microsoft has multiple offerings on the topic of Big Data. In posts that follow from myself and my co-workers, we’ll explore when and where you use each one. Whether you are a data professional or a developer, this is the new frontier - &lt;a href="http://www.straightpathsql.com/archives/2011/10/microsoft-loves-your-big-data/" target="_blank"&gt;don’t wait to educate yourself&lt;/a&gt; on how to leverage Big Data for your organization. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Hadoop on Windows Azure and SQL Server&amp;#160; &lt;/strong&gt;- Microsoft’s &lt;a href="http://www.hortonworks.com/the-whys-behind-the-microsoft-and-hortonworks-partnership/" target="_blank"&gt;partnership to include Hadoop workloads on Windows Azure&lt;/a&gt; and &lt;a href="http://www.microsoft.com/download/en/details.aspx?id=27584" target="_blank"&gt;SQL Server/Parallel Data Warehouse (PDW)&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;LINQ to HPC &lt;/strong&gt;- Microsoft’s High-Performance Computing SKU of &lt;a href="http://blogs.technet.com/b/windowshpc/archive/2011/05/20/dryad-becomes-linq-to-hpc.aspx" target="_blank"&gt;HPC is now in Azure&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Windows Azure Table Storage &lt;/strong&gt;- A &lt;a href="http://msdn.microsoft.com/en-us/library/windowsazure/hh508997.aspx" target="_blank"&gt;key/value pair type storage with full partitioning&lt;/a&gt; that is immediately consistent, able to handle huge loads of data and works with any REST-compatible language&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;strong&gt;Other offerings &lt;/strong&gt;- Including the new &lt;a href="http://www.microsoft.com/en-us/sqlazurelabs/default.aspx" target="_blank"&gt;Data Explorer&lt;/a&gt;, &lt;a href="http://research.microsoft.com/en-us/news/headlines/daytona-071811.aspx" target="_blank"&gt;Project Daytona (with a Big Data Toolkit for Scientists and researchers)&lt;/a&gt;, &lt;a href="http://www.microsoft.com/sqlserver/en/us/future-editions/SQL-Server-2012-breakthrough-insight.aspx" target="_blank"&gt;Power View&lt;/a&gt; and more. &lt;/p&gt;  &lt;p&gt;The era of Big Data is here. And you can use Windows and SQL Azure to bring it to your organization. &lt;/p&gt;</description></item><item><title>Where is the SQL Azure Development Environment</title><link>http://sqlblog.com/blogs/buck_woody/archive/2011/02/03/where-is-the-sql-azure-development-environment.aspx</link><pubDate>Thu, 03 Feb 2011 14:31:55 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:33163</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;Recently &lt;a href="http://blogs.msdn.com/b/buckwoody/archive/2011/02/01/windows-azure-emulators-on-your-desktop.aspx" target="_blank"&gt;I posted an entry explaining that you can develop in Windows Azure without having to connect to the main service on the Internet, using the Software Development Kit (SDK)&lt;/a&gt; which installs two emulators - one for compute and the other for storage. That brought up the question of the same kind of thing for SQL Azure.&lt;/p&gt;  &lt;p&gt;The short answer is that there isn’t one. While we’ll make the development experience for all versions of SQL Server, including SQL Azure more easy to write against, you can simply treat it as another edition of SQL Server. For instance, many of us use the SQL Server Developer Edition - which in versions up to 2008 is actually the Enterprise Edition - to develop our code. We might write that code against all kinds of environments, from SQL Express through Enterprise Edition. We know which features work on a certain edition, what T-SQL it supports and so on, and develop accordingly. We then test on the actual platform to ensure the code runs as expected. You can simply fold SQL Azure into that same development process.&lt;/p&gt;  &lt;p&gt;When you’re ready to deploy, if you’re using SQL Server Management Studio 2008 R2 or higher, you can script out the database when you’re done as a SQL Azure script (with change notifications where needed) by selecting the right “Engine Type” on the scripting panel:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/cfs-file.ashx/__key/CommunityServer-Blogs-Components-WeblogFiles/00-00-00-79-79-metablogapi/6622.sqla_5F00_2.png"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="sqla" border="0" alt="sqla" src="http://blogs.msdn.com/cfs-file.ashx/__key/CommunityServer-Blogs-Components-WeblogFiles/00-00-00-79-79-metablogapi/3414.sqla_5F00_thumb.png" width="335" height="190" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;em&gt;(Thanks to David Robinson for pointing this out and my co-worker Rick Shahid for the screen-shot - saved me firing up a VM this morning!)&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;Will all this change? Will SSMS, “Data Dude” and other tools change to include SQL Azure? Well, I don’t have a specific roadmap for those tools, but we’re making big investments on Windows Azure and SQL Azure, so I can say that as time goes on, it will get easier. For now, make sure you know what features are and are not included in SQL Azure, and what T-SQL is supported. Here are a couple of references to help:&lt;/p&gt;  &lt;p&gt;General Guidelines and Limitations: &lt;a href="http://msdn.microsoft.com/en-us/library/ee336245.aspx"&gt;&lt;u&gt;&lt;font color="#0066cc"&gt;http://msdn.microsoft.com/en-us/library/ee336245.aspx&lt;/font&gt;&lt;/u&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Transact-SQL Supported by SQL Azure: &lt;a href="http://msdn.microsoft.com/en-us/library/ee336250.aspx"&gt;&lt;u&gt;&lt;font color="#0066cc"&gt;http://msdn.microsoft.com/en-us/library/ee336250.aspx&lt;/font&gt;&lt;/u&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;SQL Azure Learning Plan: &lt;a href="http://blogs.msdn.com/b/buckwoody/archive/2010/12/13/windows-azure-learning-plan-sql-azure.aspx"&gt;http://blogs.msdn.com/b/buckwoody/archive/2010/12/13/windows-azure-learning-plan-sql-azure.aspx&lt;/a&gt;&lt;/p&gt;</description></item><item><title>Know Your Product Specifications</title><link>http://sqlblog.com/blogs/buck_woody/archive/2010/01/13/know-your-product-specifications.aspx</link><pubDate>Wed, 13 Jan 2010 14:57:01 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:21010</guid><dc:creator>BuckWoody</dc:creator><description>&lt;p&gt;As the Data Professional in your organization, the rest of the org looks to you to ensure that the system can handle what the business requires. To do that, you need to know two things: what the business requires, and what SQL Server can do.&lt;/p&gt;  &lt;p&gt;But of course there’s a bit more to it than that. Knowing the business side of the requirements – well, I teach an entire course on that. But knowing what SQL Server can do is something you can find out on your own.&lt;/p&gt;  &lt;p&gt;SQL Server comes in &lt;em&gt;versions&lt;/em&gt;, which are released based on date, and &lt;em&gt;editions&lt;/em&gt;, which are based on features and capabilities. It’s that last part that I want to focus on today.&lt;/p&gt;  &lt;p&gt;As Microsoft SQL Server matures, you’re going to see even more separation between what each edition of SQL Server can do and where it should be used. In the past, most folks have only focused on three editions – Express (the “free” one), Standard, and Enterprise. The rule of thumb was that if Standard was good enough at the moment, put it in. And it is true (and a good thing) that you can upgrade from one edition to another fairly easily.&lt;/p&gt;  &lt;p&gt;But as time goes on, we should spend a little more time understanding what each edition does, what it’s features and capabilities are, and where and when we should put them in. As I study this information, I’ll throw in my 2 cents and you can as well based on what you see. One thing I’ve found so far is that once I have the business requirements, there’s a mix of what I can write in code and what might already be included in a different edition. It’s important to look long and hard at that choice – writing a feature on my own is certainly cheaper in the short term than moving to a “higher” edition, but in some cases it makes sense to let Microsoft handle that lifting.&lt;/p&gt;  &lt;p&gt;These links are ones that you should bookmark and take a peek at periodically. They are the “header” links for more information on those features and capabilities:&lt;/p&gt;  &lt;p&gt;SQL Server 2008: &lt;a href="http://msdn.microsoft.com/en-us/library/ms143287.aspx"&gt;http://msdn.microsoft.com/en-us/library/ms143287.aspx&lt;/a&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;SQL Server 2008 R2: &lt;a href="http://msdn.microsoft.com/en-us/library/ms143287(SQL.105).aspx"&gt;http://msdn.microsoft.com/en-us/library/ms143287(SQL.105).aspx&lt;/a&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;In addition, you might start learning a little more about SQL Azure. I’ll talk more about that later.&lt;/p&gt;</description></item></channel></rss>