THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Jamie Thomson

This is the blog of Jamie Thomson, a data mangler in London working for Dunnhumby

What would a cloud-based ETL tool look like?

Given that “moving data to the cloud” is, rightly or wrongly, currently in vogue in our industry I have to think that pretty soon there will be a glaring need for tools that help us to move data between these heterogeneous sources – a cloud-based ETL tool for cloud-based data if you will. Perhaps such a thing already exists - I’ve talked about Kapow in the past which may well be considered a form of cloud ETL tool given that it fetches data from the web– if you know of anything that might fit this very loose description feel free to let me know in the comments.

I started to ponder what capabilities a cloud ETL tool should have and here’s a quick brainstormed list:

Any thoughts here? As I said this is a brainstormed list so I don’t mind being told that I am approaching this from the wrong angle or even that I’m completely wrong . Should I be concentrating on scenarios rather than technologies?. I’m only too aware that given my ETL heritage my brain is already wired to consider how traditional ETL tools could be supplanted into the cloud (my mention of a job scheduler sold me out there) – perhaps that is completely wrong too and that my heritage is actually a disadvantage here.

I’m interested to know what people think and hopefully trigger a conversation. I’m especially keen to hear about scenarios that you might have where you need to move and transform data that lives “in the cloud”.


*where applicable

UPDATE. Within seconds of publishing this post I’d already been alerted to & AWS Data Pipeline. Checking those out now!

I’ve been recommended to check out the following articles:

Published Tuesday, February 12, 2013 3:15 PM by jamiet
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS



Joe Harris said:

Funny you should mention it but I was just reading up on this yesterday.]

Here are the 2 best articles I've seen:


TL:DR - It needs to be an API focused version of Yahoo Pipes.


TL:DR - ETL needs to become AP2 (Acquiring, Processing and Publishing)

Personally, I like want something totally different. I want an API focused tool that provides a nice syntax so I can script these actions without the tedious "drag > drop > click > fill_form > repeat" of current tools.


February 12, 2013 10:03 AM

jamiet said:

Hiya Joe,

Sounds like you want something like Datasift ( Its got what seems like a very nice little DSL ( for doing ETL on APIs.

I like AP2. Nothing is real util its got a TLA :)


February 12, 2013 10:14 AM

Chris Nelson said:

I took a look at some of the examples and they seem to be more news/text aggregators than true data driven APIs. A quick look at MyFCC doesn't appear to highlight some of the core open data available from the FCC ULS database. (It may, but it's not obvious at a glance.) There's huge amount of open data out there, but much of it requires someone familar with the domain to make it useful.

So instead of dealing with thousands of different file formats, we can deal with thousand of new API's?

February 12, 2013 12:23 PM

jamiet said:

"So instead of dealing with thousands of different file formats, we can deal with thousand of new API's?"

Of course, for every problem you solve with an extra layer of abstraction you simply create another problem :)

February 12, 2013 12:29 PM

Phil said:

Dell Boomi?

February 14, 2013 1:42 PM

SSIS Junkie said:

Three days ago I posted What would a cloud-based ETL tool look like? where I wondered out loud about

February 15, 2013 10:45 AM

Kent B said:

March 15, 2013 5:00 PM

SSIS Junkie said:

Recently Microsoft announced that they’re releasing a new XBox which was apparently big news and was

June 5, 2013 7:53 PM said:

Just launched a cloud ETL tool.  Not as complex as you describe but plenty of useful functions to be consumed by other (cloud) systems.  Currently about 45 functions.  Simply pass name-value pairs in, receive XML back.

January 24, 2014 10:18 PM

Jamie Thomson said:

I have maintained a watching brief on what I refer to as “cloud ETL”, that is the ability build ETL routines

April 24, 2014 7:42 AM

Jamie Thomson said:

I wouldn’t normally put up a blog post simply to draw attention to someone else’s blog post but in this

September 17, 2014 3:18 PM

The Danish said:

Any updates in this interesting subject? Now that cloud has more traction, did any ETL framework options for cloud or services emerge? A ETL framework that enables enterprise features like, data lineage, REST and SOAP API's, various authentication methods, web based GUI on HTML 5, etc.

October 12, 2015 5:56 PM

Leave a Comment


This Blog


Privacy Statement