Update. The challenge set below has been responded to by Scott Currie who also happens to be the inventor of BIML. He has made his work available at https://github.com/bimlscript/BETDPI
In February 2004 I joined a company called Conchango and had the pleasure of working with some very very smart people (Mick Horne, Dan Perrin, Pete Spencer, Paul Mcmillan, Steve Wright amongst others) who schooled me in the ways and means of building data integration solutions. One piece of IP that Conchango had revolved around a means of transporting a collection of data files in a well-known format. We called it the "Generic Feed Interface" and it broke down like this:
- Data was kept in CSV files. There was no limit to how many files were in a collection.
- Each collection had two extra files, an inventory file and a metadata file. Both of these were also CSV files.
- The inventory file listed all of the data files
- The metadata file defined the data type of each column in each data file
Here’s a very simple example of what this looked like, a collection consisting of one datafile:
As well as this format of defining data our tangible IP consisted of two DTS packages that could:
- be pointed at a database and produce such a collection
- be given a collection and import it into a specified database, creating target tables if required
This was used to extract data from systems of record which could then later be imported elsewhere for additional processing (e.g. as a source for a data warehouse). We liked this approach because we were loosely-coupling our data warehouse from the systems-of-record, this brought benefits such as the ability to extract data from source at a different time from which it was loaded into the target. It was a great piece of IP and was used at a time which heralded a very successful period for us in the Microsoft BI space. Doubtless many of you reading this have invented and/or used similar mechanisms for moving data.
It was with interest then that earlier this week I was alerted (by Matt Smith) to an effort to achieve the same called Tabular Data Package.
Tabular Data Packages include a JSON file for the same as which we used the Inventory and Metadata file but nevertheless its the same basic concept, it is a well-known format for transporting data in human-readable files. Tabular Data Packages are advocated by the Open Data Institute, see Jeni Tennison's blog post on the subject 2014: The Year of CSV.
It occurs to me that it would be really useful to have a couple of SSIS packages that work with a Tabular Data Package in the same manner that our DTS packages worked with our data files collection all those years ago. Then again, using the SSIS object model to dynamically generate packages based on some known metadata (which is what would be required here) is notoriously difficult, better would be if there existed a code-generation tool for SSIS packages. Luckily such a thing exists, its called BIML, it is free and is becoming a very very popular means for developing SSIS solutions.
Therefore I’m issuing a friendly challenge. Can anyone out there build a BIML script that can, given a database connection string, generate a Tabular Data Package containing all the data in that database? Furthermore, can someone also build a BIML script that can consume a Tabular Data Package and push all the data therein into a SQL Server database?
The challenge is set. Is anyone game?