I've recently posted a blog on how cloud computing would change the Systems Architect’s role in an organization, another on how the cloud changes a Database Administrator's job, and the last post dealt with the Systems Administrator. In this post I'll cover the changes facing the Software Developer when using the cloud.
The software developer role was the earliest adopter of cloud computing. This makes perfect sense, because the software developer has always used computing "as a service" - they (most often) don't buy and configure servers, platforms and the like, they write code that runs on those platforms. And there's probably not a simpler definition of a software developer to be found, but as with all simple statements, you lose fidelity and detail. I'll offer a more complete list in a moment.
Because the software developer's process involves designing, testing and writing code locally and then migrating it to a production environment, all of the paradigms in cloud computing - from IaaS to PaaS to SaaS - come naturally.
The Software Developer's Role
The software developer has evolved since the earliest days of programming.The software developer not only "writes code" - there are far more tasks involved in modern systems development:
- Assisting the Business Role(s) in developing software specifications
- Planning software system components and modules
- Designing system components
- Working in teams writing classes, modules, interfaces and software endpoints
- Designing data layouts, architectures, access and other data controls
- Designing and implementing security, either programmatic, declarative, or referential
- Mixing and matching various languages, scripting and other constructs within the system
- Designing and implementing user and account security rights and restrictions
- Designing various software code tests - unit, functional, fuzz, integration, regression, performance and others
- Deploying systems
- Managing and maintaining code updates and changes
Like most of the previous roles, those tasks also unpacks into a larger set of tasks, and no single developer has exactly that same list. And like the DBA, the role is often more, or less of that list based on where the developer works. Smaller companies may include the development platform in the duties so that a developer is also a systems administrator. In larger organizations I've seen developers that specialized on User Interfaces, Engine Components, Data Controls or other specific areas.
How the Cloud Changes Things
The software developer role obviously has the same concerns and impacts of "the cloud" as the Systems Architect. They need to educate themselves on the options within this new option (Knowledge), try a few test solutions out (Experience) and of course work with others on various parts of the implementation (Coordination).
The big changes for a developer include three major areas: Hybrid Software Design, Security, and Distributed Computing.
Hybrid Software Design
After the PC revolution, software developers designed systems that ran primarily on a single computer. From there the industry moved to "client/server", where most of the code still lived on the user's workstation, and various levels of state (such as the data layer) moved to a server over fast connected lines. After than followed the Internet phase, which had less to do with HTML coding than it did with state-less architectures. While no architecture is truly stateless, there are ways of allowing the client to be in a different state than the server of the application at any one time - this is the way the Web works.
Even so, the developer often simply moved one the primary layers (such as Model, View or Controller) to the server, using the User Interface merely as the View or Presentation layer. While technically stateless, this doesn't require a great deal of architecture change - there are various software modules that run on a server, and perhaps that connects to a remote data server. In the end, it's still a single paradigm.
We now have the ability to run IaaS (hardware abstraction), PaaS (hardware, operating system and runtime abstraction) and SaaS (everything abstracted, API calls only) in a single environment such as Windows Azure. A single application might have a Web-based Interface Server with federated processes (using a PaaS set of roles), a database service (using a SaaS provider such as Windows Azure SQL Database), a specialized process in Linux (using an IaaS role in Windows Azure) and a translator API (from the Windows Azure Marketplace). This example involves only one vendor - Microsoft. I've seen applications that use multiple vendors in this same way.
Thinking this way opens up a great deal of flexibility - and complexity. Complexity isn't evil; it's how complicated things get done many times. The modern developer needs to understand how to build hybrid software architectures.
Resources: Hybrid Architectures with step-by-step instructions and examples: http://msdn.microsoft.com/en-us/library/hh871440.aspx and Windows Azure Hybrid Systems: http://msdn.microsoft.com/en-us/library/hh871440.aspx?AnnouncementFeed
Having a single security boundary, such as "everyone who works in my company", is a relatively simple problem to solve. Normally the System Administrators configure and control a security provider, such as Active Directory, and developers can access that security layer programmatically. That allows for good separation of duties and role-based control.
In modern applications, clients, managers, and users both internal and external need various levels of access to the same objects, code and data. A client should be able to enter an order, a store should be able to accept the order, the credit-card company should be able to check the order and authorize payment, and the managers should be able to report on the order or change it if needed. Using role-based security across multiple domains would be impossible to maintain.
Enter "claims-based" authentication. In this paradigm, the user logs in with whatever security they use - corporate or other Active Directory, Facebook, Google, whatever. The application (using Windows Identity Foundation or WIF) can accept a "claim" from that provider, and the developer can match whatever parts of that claim they wish to the objects, code and data. And example might be useful.
Buck logs in to his corporate Active Directory (AD), and attempts to use a program based in Windows Azure. Windows Azure rejects the login silently, and is configured to check with Buck's AD. Buck's AD says "yes, I know Buck, and he has been granted the following claims: "partner", "manager", "approver". The developer does not need to know about Buck's AD, Buck, his login, or anything else. She simply codes the proper data access to allow "approver" to approve a sale.
This allows a lot of control, at a very fine level, without having to get into the details of each security provider. .
Resources: Overview of using claims-based Azure Security: http://adnanboz.wordpress.com/2011/02/06/claims-based-access-and-windows-azure/
Is there a difference between stateless computing, or even the hybrid programming I mentioned earlier, and "Distributed Computing"? Yes - the primary difference is latency. Even stateless code can have too small a tolerance for latency.
Dealing with slow connectivity, or breaks in connections has many impacts. One method of dealing with this is to locate data and computing of that data as closely as possible, even if this means relaxing consistency or duplicating data. Another method is to go back to a great paradigm from the past that is possible underused today is a Service Oriented Architecture. The Windows Azure Service Bus is possibly one of the fastest and easiest way to adopt cloud computing without completely rearchitecting your application.
References: Great breakdown of the thought process around a distributed architecture: http://msdn.microsoft.com/en-us/magazine/jj553517.aspx and using a Windows Azure Relay Service: http://www.windowsazure.com/en-us/develop/net/how-to-guides/service-bus-relay/
I recently posted a blog entry on how cloud computing would change the Systems Architect’s role in an organization, and another on how the cloud changes a database administrator's job. This time I'll cover a few of the changes the cloud brings for the Systems Administrator.
The systems administrator shares some similarity with the database administrator, in that it's rare to find a single job description that fits all people in that role. There are some basic similarities among various organizations, so I'll use those as a starting point.
The Systems Administrator Role
The systems administrator role is perhaps one of the earliest in technology, at least as far as the implementation of a system goes. In the earliest days of computing, electronic technical professionals built prototype computers, and newly minted "programmers" wrote logical instructions for these systems. In time, the systems administration role owned the installation, configuration, operation and tuning of these systems once they went into production and use on a larger scale. A few of the tasks associated with the role are:
- Planning, installing and configuring systems
- Planning, designing and creating storage, networking and other system components
- Planning, designing and implementing High Availability and Disaster Recovery for each system
- Maintaining and monitoring systems
- Implementing performance tuning systems based on monitoring
- Re-balancing workloads across servers based on monitoring
- Securing systems, networks and individual computers based on requirements and implementation
- Planning, implementing and controlling user and account security rights and restrictions
Like the DBA, that’s just a short list, and each of those tasks also unpacks into a larger set of tasks. And like the DBA, the role is often more, or less of that list based on where the system administrator works. In smaller companies I've been a "systems administrator" that also ran the database and mail servers, web systems, front-line end-user support and made the coffee. In larger organization I was only able to spend the day on one or two parts of that list, since there were so many systems and they interacted with so many other systems.
Systems administrators often deal with multiple operating systems. In one company where I was a system administrator, I worked with no less than six operating systems from mainframes to PC servers, two of them highly specialized to the hardware.
How the Cloud Changes Things
The systems administrator has the same concerns and impacts of "the cloud" as the DBA and the Systems Architect. They need to educate themselves on the options within this new option (Knowledge), try a few test solutions out (Experience) and of course work with others on various parts of the implementation (Coordination).
I've mentioned the three big buckets of cloud computing, dealing with Virtual Machines (IaaS) writing code (PaaS) and using software that’s already written and being delivered via an Application Programming Interface (API). In my experience, the systems administrator role normally tackles the first "bucket" most often - IaaS, which has at its base the technology of virtualization.
One of the first areas the systems administrator is involved with "the cloud" is in the area of virtualization. This technology isn't new - in fact, I worked on Virtual Machines (VM's) way back in my mainframe days. It's the process of using software to emulate hardware - which has implications far beyond that simple sentence.
Virtualization is normally a standard on-premises process. When you take Virtual Machines and host them in another location, this is called Co-Location, or CoLo. Personally, I don't define either of these activities as "Cloud" computing - it's simply virtualization. Infrastructure as a Service (IaaS) normally involves several more components, at the very least being able to set up the systems (provision) and deploy them in a standard, automated way. It also involves (at a minimum) the ability to monitor, move and alter the systems using a prescribed methodology. There are other parts of IaaS to be sure, but this level above simply scripting installations or virtualizing a machine is where the system administrator becomes involved in this new "cloud computing" paradigm.
There are multiple VM technologies available, from the hypervisor that is built-in to the Windows operating system (Hyper-V) to third-party alternatives such as VMWare. The choice of cloud provider often dictates the selection of hypervisor. Windows Azure uses Hyper-V, and allows you to move systems from the cloud to the desktop and back again. Other providers use VMWare, or a proprietary format. Some allow you to push or pull images from the cloud service, others do not. The systems administrator must educate themselves on the business need and then select the cloud provider that best fits the requirements for a workload. It's also common to use several cloud providers within a single company.
Resources: Windows Azure Virtual Machines: http://www.windowsazure.com/en-us/manage/windows/tutorials/virtual-machine-from-gallery/ and System Center: http://blogs.technet.com/b/server-cloud/archive/2011/12/01/managing-and-monitoring-windows-azure-applications-with-system-center-2012.aspx
Cloud Computing Architecture - Private, Public and Hybrid
It's important to note that IaaS can be on-premises, at another facility, or both. The first is called "private cloud", the second "public cloud", and the third "hybrid cloud". Yes, these are marketing terms, but they are useful in describing where the decisions are for deploying a system. If data security is paramount, then private cloud may be the right choice for a given workload. If agility or cost is an issue, public cloud may be the right answer for another workload. And in many cases - perhaps most - using both architectures is the right way to split the workload.
The key is to understand the workload well. In the past the system administrator needed to know the component requirements, such as how much memory, CPU, network and storage a workload needed. In cloud computing, these are also concerns, but you need to add in the questions of cost, business use, location of users, security and other vectors. These concerns bring the systems administrator closer to the business and its goals.
Resources: Windows Azure Hybrid Systems: http://msdn.microsoft.com/en-us/library/hh871440.aspx?AnnouncementFeed
One new term introduced into cloud computing is "DevOps" - short for Developer Operations. Not everyone agrees that this is even a real "thing" - that it's a made-up term by cloud vendors. Regardless, there is a new set of tasks that the cloud brings that may sit within the purview of the system administrator.
Basically it involves the administration needed at the PaaS or SaaS level. The IaaS function of cloud computing holds most of the same characteristics as an on-premises system, defined the in the first list I mentioned above. But when the organization uses Platform as a Service, the operating system, much of the security, scale and other components of infrastructure are abstracted into the platform, and are often even controlled by the developer.
But once the application "goes live", there are a host of billing, controlling, scaling and other security questions that developers aren't equipped to handle. Who takes care of those? As companies are finding out, they need to appoint someone to cover these overlapped areas between developers and administrators.
References: How DevOps brings order: http://searchcloudcomputing.techtarget.com/feature/How-DevOps-brings-order-to-a-cloud-oriented-world and Managing Windows Azure: http://www.windowsazure.com/en-us/manage/overview/
I recently posted a blog entry on how cloud computing would change the Systems Architect’s role in an organization. In a way, the Systems Architect has the easiest transition to a new way of using computing technologies. In fact, that’s actually part of the job description. I mentioned that a Systems Architect has three primary vectors to think about for cloud computing, as it applies to what they should do:
- Knowledge - Which options are available to solve problems, and what are their strengths and weaknesses.
- Experience - What has the System Architect seen and worked with in the past.
- Coordination - A system design is based on multiple factors, and one person can't make all the choices. There will need to be others involved at every level of the solution, and the Systems Architect will need to know who those people are and how to work with them.
The Database Administrator Role
But a Database Administrator (DBA) is probably one of the harder roles to think about when it comes to cloud computing. First, let’s define what a Database Administrator usually thinks about as part of their job:
- Planning, Installing and Configuring a Database Platform
- Planning, designing and creating databases
- Planning, designing and implementing High Availability and Disaster Recovery for each database (HADR) based on requirements for its workload
- Maintaining and monitoring the database platform
- Implementing performance tuning on the databases based on monitoring
- Re-balancing workloads across database servers based on monitoring
- Securing databases platforms and individual databases based on requirements and implementation
That’s just a short list, and each of those unpacks into a larger set of tasks.
The issue is that I’ve never actually met a DBA that does all of those things, or just all of those things. Many times they do much more, sometimes the systems are so large they specialize on just a few of them.
And as you can see from the list, some of these areas are shared with other roles. For instance, in some shops, the DBA plans, purchases, sets up and configures the hardware for database servers. In others that’s done
by the Infrastructure Team. In some shops the DBA designs databases from software requirements, and in others the developers do that – or perhaps it’s done as a joint effort. The same holds true for database code – sometimes the
DBA does it, other times the developer, and still others it’s a shared task.
In fact, you could argue that there are few other roles in IT where the roles are so intermixed. Also, the DBA works with software the company develops, and software the company buys. They work with hardware, networking, security and software. There are certain aspects of design and tuning that are outside the purview of some of those things, and inside the others.
With all of these variables, simply telling a DBA that they should “use the cloud” is not the proper approach.
How the Cloud Changes Things
To be sure, the DBA has the same vectors as the Systems Architect. They need to educate themselves on the options within this new option (Knowledge), try a few test solutions out (Experience) and of course work with others on various parts of the implementation (Coordination). But it goes beyond that.
There are three big buckets of cloud computing, dealing with simply using a Virtual Machine (IaaS) to writing code without worrying about the virtualization or even the operating system (PaaS) and using software that’s already written and being delivered via an Application Programming Interface (API). Each of these has so many options and configurations that it’s often better to think about the problem you’re trying to solve rather than all of the technology within a given area - although some of that is certainly necessary anyway.
Database Platform Architecture
I’ll start with when the DBA should even consider cloud computing for a solution. Once again, it’s not an “all or nothing” paradigm, where you either run something on premises or in the cloud – it’s often a matter of selecting the right components to solve a problem. In my design sessions with DBA’s I break these down into three big areas where they might want to consider the cloud –and then we talk about how to implement each one:
- Data Services
If the users of your database systems all sit in the same facility, you own the servers and networking, and the application servers are separate from the database server, it doesn’t usually make sense to take that database workload and place it on Windows Azure – or any other cloud provider. The latency alone prevents a satisfactory performance profile, and in some cases won’t work at all. It doesn’t matter if the cloud solution is cheaper or easier – if you’re moving a lot of data every second between an on-premises system and the cloud it won’t work well.
However – if your users are in multiple locations, especially globally, or you have a mix of company and external customer users, it might make sense to evaluate a shared data location. You still need to consider the implications of how much data the application server pushes back and forth, but you may be able to locate both the application server and SQL Server in an IaaS role. Assuming the data sent to the final client will work across public Internet channels, there may be a fit. There are security implications, but unless you have point-to-point connections for your current solution you’re faced with the same security questions on both options.
Your audience might also be developers looking for a way to quickly spin up a server and then turn it down when they are done, paying for the time and not the hardware or licenses. This is also a prime case for evaluating IaaS. And there are others that you'll find in your own organization as you work through the requirements you have.
Resources: Windows Azure Virtual Machines: http://www.windowsazure.com/en-us/manage/windows/tutorials/virtual-machine-from-gallery/ and Windows Azure SQL Server Virtual Machines: http://www.windowsazure.com/en-us/manage/windows/common-tasks/install-sql-server/
The next possible place to consider using cloud computing with SQL Server is as a part of your High Availability and Disaster Recovery plans. In fact, this is the most common use I see for cloud computing and the Database Administrator. The key is the Recovery Point Objective (RPO) and Recovery Time Objective (RTO). Based on each application’s requirements, you may find that using Windows Azure or even supplementing your current plan is
the right place to evaluate options. I’ve covered this use-case in more detail in another article.
References: SQL Server High Availability and Disaster Recovery options with Windows Azure: http://blogs.msdn.com/b/buckwoody/archive/2013/01/08/microsoft-windows-azure-disaster-recovery-options-for-on-premises-sql-server.aspx
Windows Azure, along with other cloud providers, offers another way to design, create and consume data. In this use-case, however, the tasks DBA’s normally perform for sizing, ordering and configuring a system don’t apply.
With Windows Azure SQL Databases (the artist formerly known as SQL Azure), you can simply create a database and begin using it. There are places where this fits and others where it doesn’t, and there are differences, limitations and enhancements, so it isn’t meant as replacement for what you could do with “Full-up” SQL Server on a Windows Azure Virtual Machine or an on-premises Instance. If a developer needs an Relational Database Management
(RDBMS) data store for a web-based application, then this might be a perfect fit.
But there is more to data services than Windows Azure SQL Databases. Windows Azure also offers MySQL as a service, RIAK and MongoDB (among others) and even Hadoop for larger distributed data sets. In addition you can use Windows Azure Reporting Services, and also tap into datasets and data functions in the Windows Azure Marketplace.
The key for the DBA with this option is that you will have to do a little investigation this time, and potentially without a specific workload in mind this time. I think that’s acceptable thing to ask – DBA’s constantly keep up with data processing trends, and most will consider different ways to solve a problem.
Windows Azure SQL Databases: http://www.windowsazure.com/en-us/home/features/data-management/
Windows Azure Reporting Services: http://www.windowsazure.com/en-us/manage/services/other/sql-reporting/
HDInsight Service (Hadoop on Azure): https://www.hadooponazure.com/
MongoDB Offerings on Windows Azure: http://www.windowsazure.com/en-us/manage/linux/common-tasks/mongodb-on-a-linux-vm/
Windows Azure Marketplace: http://www.windowsazure.com/en-us/store/overview/
I know - I said I didn't like the "cloud" term, but my better-phrased "Distributed Systems" moniker just never took off like I had hoped. So I'll stick with the "c" word for now, at least until the search engines catch up with my more accurate term.
I thought I might spend a little time on how the cloud affects the way we work - from Systems Architects to Database Administrators and Developers, and Systems Administrators - a group often referred to as "IT Pro's". But each role within these groups have different aspects when using cloud computing. In this post we'll take a look at the role of the Systems Architect, and in the posts that follow I'll talk more about the other roles in the IT Pro area.
The Systems Architect Role
What does a "Systems Architect" do? Like most IT roles, it depends on the company or organization where they work. In fact, the term isn't even specific to technology, but I'll use it in that context here. In general, a Systems Architect takes the requirements for a given system, and assembles the relevant technology areas that best fulfill those requirements. That's a single-sentence explanation, and needs further unpacking.
As an example, a Systems Architect at a medical firm is presented with a set of requirements for tracking a patient through the entire care cycle. The Systems Architect first looks at all of the requirements for the data that needs to be collected based on business, financial, regulations, and other requirements, and then how that data needs to flow from one system to another. They check the security requirements, performance, location and other aspects of the system. They then check to see which options are available for processing that data, and which parts they should "build or buy".
For instance, the requirements might be so specific that only custom code is the proper solution - but even there, choices still exist, such as which language(s) to use, what type of data persistence (a Relational Database Management System or or other data storage and processing) will be used, what talent within the company is available for the system and a myriad of other decision.
All of this boils down to three primary vectors:
- Knowledge - Which options are available to solve problems, and what are their strengths and weaknesses.
- Experience - What has the System Architect seen and worked with in the past.
- Coordination - A system design is based on multiple factors, and one person can't make all the choices. There will need to be others involved at every level of the solution, and the Systems Architect will need to know who those people are and how to work with them.
How the Cloud Changes Things
From the outset, it doesn't seem that using a distributed system would change anything in the Systems Architect role. Isn't the cloud simply another option that the Systems Architect needs to learn and apply? Yes, that is true - but it goes a bit deeper. Let's return to those vectors a moment to see what a Systems Architect needs to take into account.
The first and probably most obvious impact is learning about cloud technologies. But the important part of that knowledge is to learn when and where to use each service. It's a common misconception that the cloud should be an "all or nothing" approach. That's just not true - every Windows Azure project I work on has some element of on-premises interaction, and in some cases only one small part of a solution is placed on the Windows Azure architecture. Since Windows Azure contains IaaS (VM's) PaaS (you write code, we run it) and even SaaS (Such as Hadoop or Media Services), a given architecture can use multiple components even within just one provider. And I've worked on several projects where the customer used not only Windows Azure and On-Premises environments, but also components from other providers. That's not only acceptable, but often the best way to solve a given problem.
As part of the learning experience, it's vital to keep in mind what you need to pick as key decision points. In your organization, cost could be ranked higher than performance, or perhaps security is the highest decision point.
To stay educated, there are various journals, websites and conferences that Systems Architects use to keep current. Almost all of those are talking about "cloud" - but there is no substitute for learning from the vendor about their solution. I'm speaking here of the technical information, not the marketing information. The marketing information is also useful, at least from a familiarity standpoint, but the technical information is what you need.
Resource: For Windows Azure, the Systems Architect can start here: http://blogs.msdn.com/b/buckwoody/archive/2012/06/13/windows-azure-write-run-or-use-software.aspx
Cloud computing is relatively new - it's only been out a few years, and the main competitors are only now settling in to their respective areas. It might not be common for a Systems Architect to have a lot of hands-on experience with cloud projects.
Even so, there are ways to leverage the experience of others, such as direct contact or even attending conferences where customers present findings from their experiences.
You can also gain hands-on experience by setting up pilots and proof-of-concept projects yourself. Most all vendors - Microsoft included - have free time available on their systems. The key to an experiment like this is choosing some problem you are familiar with that exercises as many features in the platform as possible. There is no substitute for working with a platform when you want to design a solution.
Probably one of the largest changes in the Systems Architect role that the cloud brings is in the area of coordination. When a Systems Architect deals with the business and other technical professionals, there is a 20+ year history of technology that we are all familiar with. When you mention "the cloud", those audiences may not have spent the time you have in understanding what that means - and often they think it means the "all or nothing" approach I mentioned earlier.
I've found that a series of "lunch and learns" for the technical staff is useful to explain to each role-group how the cloud is used in their area is useful. In the posts that follow this one, I'll give you some material for those. For managers and business professionals, you'll want to go a different route. I've found that an "Executive Briefing" e-mail, consisting of about a page, with headings that are applicable to your audience.
Resource: Writing Executive Summaries: http://writing.colostate.edu/guides/guide.cfm?guideid=76
One of the use-cases for a cloud solution is to serve as a Disaster Recovery option for your on-premises servers. I’ll explain one particular use-case in this entry, specifically using Windows Azure “IaaS” or Virtual Machines as a Recovery Solution for SQL Server (more detail here: http://www.windowsazure.com/en-us/home/features/virtual-machines/). In future installments I’ll explain options for other workloads such as Linux and Windows Servers, SharePoint and other solutions. Some architectures also allow for using Windows Azure SQL Database (Formerly SQL Azure) in recovery scenarios; I’ll cover that separately.
Using Azure as a Disaster Recovery site gives you a range of options, uses world-wide datacenters that you can pick from, and does not require traditional licensing and maintenance paths. You can also integrate the offsite data into other uses, such as reporting (in some cases) or to leverage within other applications. However, the cost-model is different, so make sure you do your homework to ensure that it makes sense to use a cloud provider for safety. You may find that it is cheaper, more expensive, or that you require a mix of technologies and options to get the best solution.
NOTE: The Microsoft Windows Azure platform evolves constantly. That means new features and capabilities, as well as security, optimizations and more improve on a frequent basis. As with any cloud provider, ensure that you check the date of this post to ensure you are within six months or so. If the date is longer than that, then check each of the “Details” links to ensure you are working with the latest information.
The options you have range from simple off-site storage for database backups to systems that your users can access when your primary options are offline. To select which options to use, evaluate the databases you want to protect, and then create your Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO) for each workload. Those two vectors will provide the starting point for each choice you make.
NOTE: If you’re not familiar with RPO and RTO on a database system, learn those terms carefully before designing a recovery solution – on any platform. RPO and RTO are business/technology terms, and are not vendor or platform-specific. http://wikibon.org/wiki/v/Recovery_point_objective_-_recovery_time_objective_strategy
The range of protection you have is very similar to the on-premises options for SQL Server (on-premises details here: http://msdn.microsoft.com/en-us/library/ms190202.aspx), with the primary limitation being bandwidth. While Microsoft has the largest connections we can get into our datacenters, depending on where your systems are and their connection to the Internet, you will need to consider how much data you transfer, and how often. For backup files, a single, larger transfer is acceptable, using Log Shipping or Database Mirroring, smaller, more frequent transfers are preferable.
Another limitation is controlling the hardware on the Windows Azure Virtual Machine. That means hardware-based clustering isn’t possible, as of this writing. You’re also limited to the size of the Virtual Machines that Windows Azure (or any other cloud provider) offers. It’s important to keep in mind that you’re building a Disaster Recovery solution, not necessarily a full Highly-Available system. The difference is that in this case DR provides a means to recover and operate at a more limited fashion than a full on-premises HA (with matching hardware and licenses) involves. Storage, however, isn’t as affected. You can mount large amounts of storage on a Windows Azure Virtual Machine, so it’s more memory and CPU that you need to consider for your solution.
The final consideration is security. There are two aspects in security that you need to consider: data security and authentication and access. For the first consideration, the Windows Azure system does hold multiple certifications and attestations that you can find here: . In some cases those certifications are agreements on the part of security each party will hold liability for; so it’s important to carefully read and understand what the agreement states. There are also methods of encrypting data (such as the backups) using your own certificates or hardware devices and then storing them externally. This means no one can easily un-encrypt your data.
For the authentication portion, you can create a secure “tunnel” between your network and Windows Azure. This involves a certificate that is installed on your hardware firewall at your facility, and an agent that is enabled with the same certificate on Windows Azure. This gives you a “point to point” connection, encrypted but over a public connection. From there you can use Active Directory to connect the authentication for the systems involved in the DR solution.
The First and most simple DR solution using Windows Azure is to store your backup files (*.bak) in Windows Azure storage. Windows Azure Storage is triple-redundant across multiple fault-domains within a single datacenter, and then all three copies are replicated to a geographically separate (although data-sovereignty same) location. That translates to six copies of data stored remotely. In case of a disaster, you connect to storage, download the images, and restore them to a new server. The server can have the same name or different, and unless you’re using contained databases, you’ll need to re-create and re-authorize the security accounts needed for the database.
Note that you also have the option of using an “appliance”, which is a piece of hardware you install at your facility which will act as a backup device or share location (or both). The device handles the encryption, de-duplication and compression for the files, and then stores those files on Windows Azure. More information on that option is here: http://www.storsimple.com/
RPO: As of last backup
RTO: (Time of transfer from Windows Azure + Time of Restore to New System + Bringing System Online with User Accounts) - Time of Backup
More detail on storing files on Windows Azure: http://sqlblog.com/blogs/sqlos_team/archive/2013/01/24/backup-and-restore-to-cloud-simplified-in-sql-server-2012-sp1-cu2.aspx
Free Client: http://azurestorageexplorer.codeplex.com/
Database Mirroring is a deprecated feature in SQL Server, which means it will be removed in a future release. It is, however, still supported in SQL Server 2012, and it can be used between on-premises SQL Server Instances and Windows Azure VM’s. Using connection strings and .NET languages, clients can actually point to the partner server automatically.
The granularity of this solution is at the individual database level. Machines can retain their individual identities. You can use certificates to connect the systems, or you can use the point-to-point solution and Active Directory.
There are limitations, however. You won’t use a Listener in this configuration, and you’ll be using Asynchronous mode. If you are not running in the same Active Directory, you’ll also need to factor in the time to re-create and tie out those accounts when calculating the RTO value.
RPO: As of last good synchronization
RTO: (Time of failure + Time of client redirect to New System ) - Time of last good synchronization
A complete tutorial on setting up this configuration is here: http://msdn.microsoft.com/en-us/library/jj870964.aspx
Another feature available for DR in a Hybrid fashion is using Log Shipping, which also protects your system at a database level. Log shipping involves an automated log backup of your database, and the log is copied and then applied at the secondary server. Because the log file is copied to a Windows share, this solution requires both networking access and an Active Directory integration.
RPO: As of last good log backup application to the secondary system
RTO: (Time of failure + Time of manual client redirect to New System + Time of Manual Failover ) - Time of last good log backup
Log Shipping information is here: http://technet.microsoft.com/en-us/library/ms187103.aspx
AlwaysOn Availability Groups
SQL Server 2012 introduces a new set of features called “AlwaysOn” that encompass many of the HA/DR features in previous releases. One feature within that set is called “Availability Groups”, and with certain caveats that feature is available for a Hybrid on-premises to Windows Azure VM solution.
AlwaysOn requires a Windows Cluster (WFSC), which is where the caveats come into play. You’re able to set up a multi-subnet WSFC cluster, but you won’t have access to the Availability Group Listener function, so you need to consider the client reconnection.
RPO: As of last good synchronization
RTO: (Time of failure + Time of manual client redirect to New System + Time of Manual Failover ) - Time of last good log backup
A complete tutorial on setting up this configuration is here: http://msdn.microsoft.com/en-us/library/jj870959.aspx
Real-world notes and testing here: http://blogs.msdn.com/b/igorpag/archive/2013/09/02/sql-server-2012-alwayson-availability-group-and-listener-in-azure-vms-notes-details-and-recommendations.aspx
Other Solution Options
Taking an overview approach, you can use other data transfer mechanisms. While these involve more manual coding and architecture, you do have more control. For instance, you could copy the data to multiple locations, platforms and more, and allow reading and manipulations of the data at the destination. You can use code options or even SQL Server Replication (blog on this process is here: http://tk.azurewebsites.net/2012/07/17/how-to-setup-peer-to-peer-replication-in-azure-iaas-sql-server-2012/)
A whitepaper on the information I've discussed throughout this article and other options is available here: http://msdn.microsoft.com/en-us/library/jj870962.aspx
The “SQL AlwaysOn” Team Blog (where you may find more current information) is here: http://blogs.msdn.com/b/sqlalwayson/
There are several forms of corporate communication. From immediate, rich communications like phones and IM messaging to historical transactions like e-mail, there are a lot of ways to get information to one or more people. From time to time, it's even useful to have a meeting.
(This is where a witty picture of a guy sleeping in a meeting goes. I won't bother actually putting one here; you're already envisioning it in your mind)
Most meetings are pointless, and a complete waste of time. This is the fault, completely and solely, of the organizer. It's because he or she hasn't thought things through enough to think about alternate forms of information passing. Here's the criteria for a good meeting - whether in-person or over the web:
100% of the content of a meeting should require the participation of 100% of the attendees for 100% of the time
It doesn't get any simpler than that. If it doesn't meet that criteria, then don't invite that person to that meeting. If you're just conveying information and no one has the need for immediate interaction with that information (like telling you something that modifies the message), then send an e-mail. If you're a manager, and you need to get status from lots of people, pick up the phone.If you need a quick answer, use IM.
I once had a high-level manager that called frequent meetings. His real need was status updates on various processes, so 50 of us would sit in a room while he asked each one of us questions. He believed this larger meeting helped us "cross pollinate ideas". In fact, it was a complete waste of time for most everyone, except in the one or two moments that they interacted with him. So I wrote some code for a Palm Pilot (which was a kind of SmartPhone but with no phone and no real graphics, but this was in the days when we had just discovered fire and the wheel, although the order of those things is still in debate) that took an average of the salaries of the people in the room (I guessed at it) and ran a timer which multiplied the number of people against the salaries.
I left that running in plain sight for him, and when he asked about it, I explained how much the meetings were really costing the company. We had far fewer meetings after.
Meetings are now web-enabled. I believe that's largely a good thing, since it saves on travel time and allows more people to participate, but I think the rule above still holds. And in fact, there are some other rules that you should follow to have a great meeting - and fewer of them.
Be Clear About the Goal
This is important in any meeting, but all of us have probably gotten an invite with a web link and an ambiguous title. Then you get to the meeting, and it's a 500-level deep-dive on something everyone expects you to know.
This is unfair to the "expert" and to the participants. I always tell people that invite me to a meeting that I will be as detailed as I can - but the more detail they can tell me about the questions, the more detailed I can be in my responses. Granted, there are times when you don't know what you don't know, but the more you can say about the topic the better.
There's another point here - and it's that you should have a clearly defined "win" for the meeting. When the meeting is over, and everyone goes back to work, what were you expecting them to do with the information? Have that clearly defined in your head, and in the meeting invite.
Understand the Technology
There are several web-meeting clients out there. I use them all, since I meet with clients all over the world. They all work differently - so I take a few moments and read up on the different clients and find out how I can use the tools properly. I do this with the technology I use for everything else, and it's important to understand it if the meeting is to be a success. If you're running the meeting, know the tools. I don't care if you like the tools or not, learn them anyway. Don't waste everyone else's time just because you're too bitter/snarky/lazy to spend a few minutes reading.
Check your phone or mic. Check your video size. Install (and learn to use) ZoomIT (http://technet.microsoft.com/en-us/sysinternals/bb897434.aspx). Format your slides or screen or output correctly. Learn to use the voting features of the meeting software, and especially it's whiteboard features. Figure out how multiple monitors work. Try a quick meeting with someone to test all this. Do this *before* you invite lots of other people to your meeting.
Use a WebCam
I'm not a pretty man. I have a face fit for radio. But after attending a meeting with clients where one Microsoft person used a webcam and another did not, I'm convinced that people pay more attention when a face is involved. There are tons of studies around this, or you can take my word for it, but toss a shirt on over those pajamas and turn the webcam on.
Set Up Early
Whether you're attending or leading the meeting, don't wait to sign on to the meeting at the time when it starts. I can almost plan that a 10:00 meeting will actually start at 10:10 because the participants/leader is just now installing the web client for the meeting at 10:00. Sign on early, go on mute, and then wait for everyone to arrive.
Mute When Not Talking
No one wants to hear your screaming offspring / yappy dog / other cubicle conversations / car wind noise (are you driving in a desert storm or something?) while the person leading the meeting is trying to talk. I use the Lync software from Microsoft for my meetings, and I mute everyone by default, and then tell them to un-mute to talk to the group.
If you have a PowerPoint deck, mail it out in case you have a tech failure. If you have a document, share it as an attachment to the meeting. Don't make people ask you for the information - that's why you're there to begin with. Even better, send it out early. "But", you say, "then no one will come to the meeting if they have the deck first!" Uhm, then don't have a meeting. Send out the deck and a quick e-mail and let everyone get on with their productive day.
Set Actions At the Meeting
A meeting should have some sort of outcome (see point one). That means there are actions to take, a follow up, or some deliverable. Otherwise, it's an e-mail. At the meeting, decide who will do what, when things are needed, and so on. And avoid, if at all possible, setting up another meeting, unless absolutely necessary.
So there you have it. Whether it's on-premises or on the web, meetings are a necessary evil, and should be treated that way. Like politicians, you should have as few of them as are necessary to keep the roads paved and public libraries open.
There are multiple ways to learn, and one of the most effective is with examples. You have multiple options with Windows Azure, including the Software Development Kit, the Windows Azure Training Kit and now another one…. the Microsoft All-In-One Code Framework,, a free, centralized code sample library driven by developers' real-world pains and needs. The goal is to provide customer-driven code samples for all Microsoft development technologies, and Windows Azure is included.
Once you hit the site, you download an EXE that will create a web-app based installer for a Code Browser.
Once inside, you can configure where the samples store data and other settings, and then search for what you want.
You can also request a sample – if enough people ask, we do it. OneCode also partnered with gallery and Visual Studio team to develop this Sample Browser Visual Studio extension. It’s an easy way for developers to find and download samples from within Visual Studio.
Microsoft has many tools for “Big Data”. In fact, you need many tools – there’s no product called “Big Data Solution” in a shrink-wrapped box – if you find one, you probably shouldn’t buy it. It’s tempting to want a single tool that handles everything in a problem domain, but with large, complex data, that isn’t a reality. You’ll mix and match several systems, open and closed source, to solve a given problem.
But there are tools that help with handling data at large, complex scales. Normally the best way to do this is to break up the data into parts, and then put the calculation engines for that chunk of data right on the node where the data is stored. These systems are in a family called “Distributed File and Compute”. Microsoft has a couple of these, including the High Performance Computing edition of Windows Server. Recently we partnered with Hortonworks to bring the Apache Foundation’s release of Hadoop to Windows. And as it turns out, there are actually two (technically three) ways you can use it.
(There’s a more detailed set of information here: http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/big-data.aspx, I’ll cover the options at a general level below)
First Option: Windows Azure HDInsight Service
Your first option is that you can simply log on to a Hadoop control node and begin to run Pig or Hive statements against data that you have stored in Windows Azure. There’s nothing to set up (although you can configure things where needed), and you can send the commands, get the output of the job(s), and stop using the service when you are done – and repeat the process later if you wish.
(There are also connectors to run jobs from Microsoft Excel, but that’s another post)
This option is useful when you have a periodic burst of work for a Hadoop workload, or the data collection has been happening into Windows Azure storage anyway. That might be from a web application, the logs from a web application, telemetrics (remote sensor input), and other modes of constant collection.
You can read more about this option here: http://blogs.msdn.com/b/windowsazure/archive/2012/10/24/getting-started-with-windows-azure-hdinsight-service.aspx
Second Option: Microsoft HDInsight Server
Your second option is to use the Hadoop Distribution for on-premises Windows called Microsoft HDInsight Server. You set up the Name Node(s), Job Tracker(s), and Data Node(s), among other components, and you have control over the entire ecostructure.
This option is useful if you want to have complete control over the system, leave it running all the time, or you have a huge quantity of data that you have to bulk-load constantly – something that isn’t going to be practical with a network transfer or disk-mailing scheme.
You can read more about this option here: http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/big-data.aspx
Third Option (unsupported): Installation on Windows Azure Virtual Machines
Although unsupported, you could simply use a Windows Azure Virtual Machine (we support both Windows and Linux servers) and install Hadoop yourself – it’s open-source, so there’s nothing preventing you from doing that.
Aside from being unsupported, there are other issues you’ll run into with this approach – primarily involving performance and the amount of configuration you’ll need to do to access the data nodes properly. But for a single-node installation (where all components run on one system) such as learning, demos, training and the like, this isn’t a bad option.
Did I mention that’s unsupported? :)
You can learn more about Windows Azure Virtual Machines here: http://www.windowsazure.com/en-us/home/scenarios/virtual-machines/
And more about Hadoop and the installation/configuration (on Linux) here: http://en.wikipedia.org/wiki/Apache_Hadoop
And more about the HDInsight installation here: http://www.microsoft.com/web/gallery/install.aspx?appid=HDINSIGHT-PREVIEW
Choosing the right option
Since you have two or three routes you can go, the best thing to do is evaluate the need you have, and place the workload where it makes the most sense. My suggestion is to install the HDInsight Server locally on a test system, and play around with it. Read up on the best ways to use Hadoop for a given workload, understand the parts, write a little Pig and Hive, and get your feet wet. Then sign up for a test account on HDInsight Service, and see how that leverages what you know. If you're a true tinkerer, go ahead and try the VM route as well.
Oh - there’s another great reference on the Windows Azure HDInsight that just came out, here: http://blogs.msdn.com/b/brunoterkaly/archive/2012/11/16/hadoop-on-azure-introduction.aspx
To create a Windows Azure Infrastructure-as-a-Service Virtual Machine you have several options. You can simply select an image from a “Gallery” which includes Windows or Linux operating systems, or even a Windows Server with pre-installed software like SQL Server.
One of the advantages to Windows Azure Virtual Machines is that it is stored in a standard Hyper-V format – with the base hard-disk as a VHD. That means you can move a Virtual Machine from on-premises to Windows Azure, and then move it back again. You can even use a simple series of PowerShell scripts to do the move, or automate it with other methods.
And this then leads to another very interesting option for deploying systems: you can create a server VHD, configure it with the software you want, and then run the “SYSPREP” process on it. SYSPREP is a Windows utility that essentially strips the identity from a system, and when you re-start that system it asks a few details on what you want to call it and so on. By doing this, you can essentially create your own gallery of systems, either for testing, development servers, demo systems and more. You can learn more about how to do that here: http://msdn.microsoft.com/en-us/library/windowsazure/gg465407.aspx
But there is a small issue you can run into that I wanted to make you aware of. Whenever you deploy a system to Windows Azure Virtual Machines, you must meet certain password complexity requirements. However, when you build the machine locally and SYSPREP it, you might not choose a strong password for the account you use to Remote Desktop to the machine. In that case, you might not be able to reach the system after you deploy it.
Once again, the key here is reading through the instructions before you start. Check out the link I showed above, and this link: http://technet.microsoft.com/en-us/library/cc264456.aspx to make sure you understand what you want to deploy.
One of the more popular topics here on my technical blog doesn't have to do with technology, per-se - it's about the choice I made to go to a stand-up desk work environment. If you're interested in the history of those, check here:
Stand-Up Desk Part One
Stand-Up Desk Part Two
I have made some changes and I was asked to post those here.Yes, I'm still standing - I think the experiment has worked well, so I'm continuing to work this way. I've become so used to it that I notice when I sit for a long time. If I'm flying, or driving a long way, or have long meetings, I take breaks to stand up and move around.
That being said, I don't stand as much as I did. I started out by standing the entire day - which did not end well. As you can read in my second post, I found that sitting down for a few minutes each hour worked out much better. And over time I would say that I now stand about 70-80% of the day, depending on the day. Some days I don't even notice I'm standing, so I don't sit as often. Other days I find that I really tire quickly - so I sit more often. But in both cases, I stand more than I sit.
In the first post you can read about how I used a simple coffee-table from Ikea to elevate my desktop to the right height. I then adjusted the height where I stand by using a small plastic square and some carpet. Over time I found this did not work as well as I'd like. The primary reason is that the front of these are at the same depth - so my knees would hit the desk or table when I sat down. Also, the desk was at a certain height, and I had to adjust, rather than the other way around. Also, I like a lot of surface area on top of a desk - almost more of a table. Routing cables and wiring was a pain, and of course moving it was out of the question.
So I've changed what I use. I found a perfect solution for what I was looking for - industrial wire shelving:
I bought one, built only half of it (for the right height I wanted) and arranged the shelves the way I wanted. I then got a 5'x4' piece of wood from Lowes, and mounted it to where the top was balanced, but had an over-hang I could get my knees under easily.My wife sewed a piece of fake-leather for the top.
This arrangement provides the following benefits:
- Very strong
- Rolls easily, wheels can lock to prevent rolling
- Long, wide shelves
- Wire-frame allows me to route any kind of wiring and other things all over the desk
I plugged in my UPS and ran it's longer power-cable to the wall outlet. I then ran the router's LAN connection along that wire, and covered both with a large insulation sleeve. I then plugged in everything to the UPS, and routed all the wiring. I can now roll the desk almost anywhere in the room so that I can record, look out the window, get closer to or farther away from the door and more. I put a few boxes on the shelves as "drawers" and tidied that part up. Even my printer fits on a shelf.
Laser-dog not included - some assembly required
In the second post you can read about the bar-stool I purchased from Target for the desk.
I cheaped-out on this one, and it proved to be a bad choice. Because I had to raise it so high, and was constantly sitting on it and then standing up, the gas-cylinder in it just gave out. So it became a very short stool that I ended up getting rid of. In the end, this one from Ikea proved to be a better choice:
And so this arrangement is working out perfectly. I'm finding myself VERY productive this way.
I hope these posts help you if you decide to try working at a stand-up desk. Although I was skeptical at first, I've found it to be a very healthy, easy way to code, design and especially present over a web-cam. It's natural to stand to speak when you're presenting, and it feels more energetic than sitting down to talk to others.
Last week I blogged about developing a High-Availability plan. The specifics of a given plan aren't as simple as "Step 1, then Step 2" because in a hybrid environment (which most of us have) the situation changes the requirements. There are those that look for simple "template" solutions, but unless you settle on a single vendor and a single way of doing things, that's not really viable.
The same holds true for support. As I've mentioned before, I'm not fond of the term "cloud", and would rather use the tem "Distributed Computing". That being said, more people understand the former, so I'll just use that for now. What I mean by Distributed Computing is leveraging another system or setup to perform all or some of a computing function. If this definition holds true, then you're essentially creating a partnership with a vendor to run some of your IT - whether that be IaaS, PaaS or SaaS, or more often, a mix. In your on-premises systems, you're the first and sometimes only line of support. That changes when you bring in a Cloud vendor.
For Windows Azure, we have plans for support that you can pay for if you like. http://www.windowsazure.com/en-us/support/plans/ You're not off the hook entirely, however. You still need to create a plan to support your users in their applications, especially for the parts you control. The last thing they want to hear is "That's vendor X's problem - you'll have to call them." I find that this is often the last thing the architects think about in a solution.
It's fine to put off the support question prior to deployment, but I would hold off on calling it "production" until you have that plan in place. There are lots of examples, like this one: http://www.va-interactive.com/inbusiness/editorial/sales/ibt/customer.html some of which are technology-specific.
Once again, this is an "it depends" kind of approach. While it would be nice if there was just something in a box we could buy, it just doesn't work that way in a hybrid system. You have to know your options and apply them appropriately.
Outages, natural disasters and unforeseen events have proved that even in a distributed architecture, you need to plan for High Availability (HA). In this entry I'll explain a few considerations for HA within Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS). In a separate post I'll talk more about Disaster Recovery (DR), since each paradigm has a different way to handle that.
Planning for HA in IaaS
IaaS involves Virtual Machines - so in effect, an HA strategy here takes on many of the same characteristics as it would on-premises. The primary difference is that the vendor controls the hardware, so you need to verify what they do for things like local redundancy and so on from the hardware perspective.
As far as what you can control and plan for, the primary factors fall into three areas: multiple instances, geographical dispersion and task-switching.
In almost every cloud vendor I've studied, to ensure your application will be protected by any level of HA, you need to have at least two of the Instances (VM's) running. This makes sense, but you might assume that the vendor just takes care of that for you - they don't. If a single VM goes down (for whatever reason) then the access to it is lost. Depending on multiple factors, you might be able to recover the data, but you should assume that you can't. You should keep a sync to another location (perhaps the vendor's storage system in another geographic datacenter or to a local location) to ensure you can continue to serve your clients.
You'll also need to host the same VM's in another geographical location. Everything from a vendor outage to a network path problem could prevent your users from reaching the system, so you need to have multiple locations to handle this.
This means that you'll have to figure out how to manage state between the geo's. If the system goes down in the middle of a transaction, you need to figure out what part of the process the system was in, and then re-create or transfer that state to the second set of systems. If you didn't write the software yourself, this is non-trivial.
You'll also need a manual or automatic process to detect the failure and re-route the traffic to your secondary location. You could flip a DNS entry (if your application can tolerate that) or invoke another process to alias the first system to the second, such as load-balancing and so on. There are many options, but all of them involve coding the state into the application layer. If you've simply moved a state-ful application to VM's, you may not be able to easily implement an HA solution.
Planning for HA in PaaS
Implementing HA in PaaS is a bit simpler, since it's built on the concept of stateless applications deployment. Once again, you need at least two copies of each element in the solution (web roles, worker roles, etc.) to remain available in a single datacenter. Also, you need to deploy the application again in a separate geo, but the advantage here is that you could work out a "shared storage" model such that state is auto-balanced across the world. In fact, you don't have to maintain a "DR" site, the alternate location can be live and serving clients, and only take on extra load if the other site is not available. In Windows Azure, you can use the Traffic Manager service top route the requests as a type of auto balancer.
Even with these benefits, I recommend a second backup of storage in another geographic location. Storage is inexpensive; and that second copy can be used for not only HA but DR.
Planning for HA in SaaS
In Software-as-a-Service (such as Office 365, or Hadoop in Windows Azure) You have far less control over the HA solution, although you still maintain the responsibility to ensure you have it. Since each SaaS is different, check with the vendor on the solution for HA - and make sure you understand what they do and what you are responsible for. They may have no HA for that solution, or pin it to a particular geo, or perhaps they have a massive HA built in with automatic load balancing (which is often the case).
All of these options (with the exception of SaaS) involve higher costs for the design. Do not sacrifice reliability for cost - that will always cost you more in the end. Build in the redundancy and HA at the very outset of the project - if you try to tack it on later in the process the business will push back and potentially not implement HA.
References: http://www.bing.com/search?q=windows+azure+High+Availability (each type of implementation is different, so I'm routing you to a search on the topic - look for the "Patterns and Practices" results for the area in Azure you're interested in)
OK - you've been hearing about "cloud" (I really dislike that term, but whatever) for over two years. You've equated it with just throwing some VM's in some vendor's datacenter - which is certainly part of it, but not the whole story. There's a whole world of - wait for it - *coding* out there that you should be working on. If you're a developer, this is just a set of servers with operating systems and the runtime layer (like.NET, Java, PHP, etc.) that you can deploy code to and have it run. It can expand in a horizontal way, allowing massive - and I really, honestly mean massive, not just marketing talk kind of scale. We see this every day.
If you're not a developer, well, now's the time to learn. Explore a little. Try it.
We'll help you. There's a free conference you can attend in November, and you can sign up for it now. It's all on-line, and the tools you need to code are free.
Put down Facebook and Twitter for a minute - go sign up. Learn. Do. :)
See you there. http://www.windowsazureconf.net/
I deal with computing architectures by first laying out requirements, and then laying in any constraints for it's success. Only then do I bring in computing elements to apply to the system. As an example, a requirement might be "world-side availability" and a constraint might be "with less than 80ms response time and full HA" or something similar. Then I can choose from the best fit of technologies which range from full-up on-premises computing to IaaS, PaaS or SaaS.
I also deal in abstraction layers - on-premises systems are fully under your control, in IaaS the hardware is abstracted (but not the OS, scale, runtimes and so on), in PaaS the hardware and the OS is abstracted and you focus on code and data only, and in SaaS everything is abstracted - you merely purchase the function you want (like an e-mail server or some such) and simply use it.
When you think about solutions this way, the architecture moves to the primary factor in your decision. It's problem-first architecting, and then laying in whatever technology or vendor best fixes the problem.
To that end, most architects design a solution using a graphical tool (I use Visio) and then creating documents that let the rest of the team (and business) know what is required. It's the template, or recipe, for the solution. This is extremely easy to do for SaaS - you merely point out what the needs are, research the vendor and present the findings (and bill) to the business. IT might not even be involved there. In PaaS it's not much more complicated - you use the same Application Lifecycle Management and design tools you always have for code, such as Visual Studio or some other process and toolset, and you can "stamp out" the application in multiple locations, update it and so on.
IaaS is another story. Here you have multiple machines, operating systems, patches, virus scanning, run-times, scale-patterns and tools and much more that you have to deal with, since essentially it's just an in-house system being hosted by someone else. You can certainly automate builds of servers - we do this as technical professionals every day. From Windows to Linux, it's simple enough to create a "build script" that makes a system just like the one we made yesterday. What is more problematic is being able to tie those systems together in a coherent way (as a solution) and then stamp that out repeatedly, especially when you might want to deploy that solution on-premises, or in one cloud vendor or another.
Lately I've been working with a company called RightScale that does exactly this. I'll point you to their site for more info, but the general idea is that you document out your intent for a set of servers, and it will deploy them to on-premises private clouds, Windows Azure, and other public cloud providers all from the same script. In other words, it doesn't contain the images or anything like that - it contains the scripts to build them on-premises in private clouds or on a public cloud vendor like Microsoft.
Using a tool like this, you combine the steps of designing a system (all the way down to passwords and accounts if you wish) and then the document drives the distribution and implementation of that intent. As time goes on and more and more companies implement solutions on various providers (perhaps for HA and DR) then this becomes a compelling investigation.
The RightScale information is here, if you want to investigate it further. Yes, there are other methods I've found, but most are tied to a single kind of cloud, and I'm not into vendor lock-in.
Poppa Bear Level - Hands-on
Windows Azure Virtual Machines 3-tier Deployment - http://support.rightscale.com/09-Clouds/Azure/Tutorials/3_Tier_Deployment_with_Windows_Azure
Momma Bear Level - Just the Right level... ;0)
Baby Bear Level - Marketing
I hold the term “science” in very high esteem. I grew up on the Space Coast in Florida, and eventually worked at the Kennedy Space Center, surrounded by very intelligent people who worked in various scientific fields.
Recently a new term has entered the computing dialog – “Data Scientist”. Since it’s not a standard term, it has a lot of definitions, and in fact has been disputed as a correct term. After all, the reasoning goes, if there’s no such thing as “Data Science” then how can there be a Data Scientist?
This argument has been made before, albeit with a different term – “Computer Science”. In Peter Denning’s excellent article “Is Computer Science Science” (April 2005/Vol. 48, No. 4 COMMUNICATIONS OF THE ACM) there are many points that separate “science” from “engineering” and even “art”. I won’t repeat the content of that article here (I recommend you read it on your own) but will leverage the points he makes there.
Definition of Science
To ask the question “is data science ‘science’” then we need to start with a definition of terms. Various references put the definition into the same basic areas:
- Study of the physical world
- Systematic and/or disciplined study of a subject area
- ...and then they include the things studied, the bodies of knowledge and so on.
The word itself comes from Latin, and means merely “to know” or “to study to know”. Greek divides knowledge further into “truth” (episteme), and practical use or effects (tekhne). Normally computing falls into the second realm.
Definition of Data Science
And now a more controversial definition: Data Science. This term is so new and perhaps so niche that the major dictionaries haven’t yet picked it up (my OED reference is older – can’t afford to pop for the online registration at present).
Researching the term's general use I created an amalgam of the definitions this way:
“Studying and applying mathematical and other techniques to derive information from complex data sets.”
Using this definition, data science certainly seems to be science - it's learning about and studying some object or area using systematic methods. But implicit within the definition is the word “application”, which makes the process more akin to engineering or even technology than science. In fact, I find that using these techniques – and data itself – part of science, not science itself.
I leave out the concept of studying data patterns or algorithms as part of this discipline. That is actually a domain I see within research, mathematics or computer science. That of course is a type of science, but does not seek for practical applications.
As part of the argument against calling it “Data Science”, some point to the scientific method of creating a hypothesis, testing with controls, testing results against the hypothesis, and documenting for repeatability. These are not steps that we often take in working with data. We normally start with a question, and fit patterns and algorithms to predict outcomes and find correlations. In this way Data Science is more akin to statistics (and in fact makes heavy use of them) in the process rather than starting with an assumption and following on with it.
So, is Data Science “Science”? I’m uncertain – and I’m uncertain it matters. Even if we are facing rampant “title inflation” these days (does anyone introduce themselves as a secretary or supervisor anymore?) I can tolerate the term at least from the intent that we use data to study problems across a wide spectrum, rather than restricting it to a single domain. And I also understand those who have worked hard to achieve the very honorable title of “scientist” who have issues with those who borrow the term without asking.
What do you think? Science, or not? Does it matter?