THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Davide Mauri

A place for my thoughts and experiences the Microsoft Data Platform

  • Speaking at SQL Saturday #613 in Redmond, WA

    On April 15, 2017 I’ll be speaking at SQL Saturday in Redmond, WA. I’ve settled down after the recent changes, so it’s time to start to be an active part of the Microsoft Data Platform (former SQL Server) community again.

    I’ll be talking about monitoring SQL Server Integration Services and the amazing support for JSON in SQL Server 2016:

    SSIS Monitoring Deep Dive

    In this session we’ll see everything interesting is hidden in the SSISDB database, where you can gain a lot of insight on the outcome, the performance and the status of your SSIS Packages. I’ll share everything I’ve learned building the SSIS Dashboard we’re actually using in production and that you can test here We’ll see the internals of SSISDB database, how we can add custom logging information and how we can use all these data in order to know exactly what happened on a specific point in time.

    JSON & SQL Server: using it effectively

    JSON is a de-facto standard for machine-to-machine communication but also to store “liquid” data. Once a killer feature of NoSQL database, with SQL Server 2016 is now available also in a (post) relational environment.
    But how it fits in the relational database space? How the usage of JSON can simplify DBAs and DEVs everyday work? What about performances? And what about flexibility?
    In this session we’ll see how powerful JSON can be, when used correctly and not abused, providing solution to common problems
    (like in the well known “dynamic schema” case), simplifying development (for example to pass to Stored Procedure complex objects or variable numbers of parameters) with also very good performances, even if they maybe hidden in not-so-obvious places.

    The event is going to be an amazing one, as you can see from the agenda:

    SQLSaturday #613 - Redmond 2017

    So, if you live near there, don’t miss it.

    See there in a bit more than a month!

  • Apache Zeppelin 0.7 for SQL Server

    During the weekend I’ve updated my SQL Server interpreter for the amazing Apache Zeppelin to be compatible with the latest version — 0.7— that has been released just couple of days ago:

    Apache Zeppelin Release 0.7.0

    This new release has a lot of nice features (customizable charts, better multi-user support, improved tables and many others) and in order to start to use the new features also with SQL Server and SQL Azure, the first step was to sync the existing interpreter with version 0.7. Well, that’s done.


    You can download the source code and build it yourself from here:

    GitHub - yorek/zeppelin: Apache Zeppelin with support for SQL Server

    or get the Docker image with everything already built, for maximum comfort:

    If you’re new to Apache Zeppelin, you can take a look at the articles I’ve already written to help people to get started:

    Breaking Changes

    In order to take advantage of automatic interpreter discovery and native syntax highlight introduced in version 0.7, I had to rename the interpreter group and the interpreter name to sqlserver from tsql. This means that any binding to the tsql interpreter will be invalid which, in turn, means that all notebook using it won’t work. Luckily fixing this situation is easy: just create a new interpreter binding, using sqlserver as interpreter group and give it the same name you used before. Make sure the notebooks affected use this new binding (as the default one if in your notebooks you didn’t use the %interpreter notation) and you’re done.

    Updating Docker Container to version 0.7

    If you were already running a docker container with Apache Zeppelin for SQL Server, you may have notebook and interpreter configurations that you want to keep.

    The process to update everything without losing any existing work is the following (just be sure also to read the “Breaking Changes” section above!):

    Get the updated image

    docker pull yorek/zeppelin-sqlserver:v0.7

    Once it’s downloaded, check if you have any running container

    docker ps

    If you see an Apache Zeppelin container is running (it is named zeppelin if you followed my tutorials), stop it

    docker stop zeppelin

    Now create a new container pointing to the volumes of the previous version. The new container will automatically use the updated image

    docker run -name zeppelin2 -p 8080:8080 -volumes-from zeppelin -d yorek/zeppelin-sqlserver:v0.7

    List again all the existing containers (running and not running)

    docker ps -a

    Note that the old zeppelin container the is still there and if not needed can be removed

    docker rm zeppelin

    The new container can now be renamed to the usual name

    docker rename zeppelin2 zeppelin

    The old image can also be removed to free disk space

    docker rmi yorek/zeppelin-sqlserver:v0.6.2

    Now check that only one zeppelin container, with the version 0.7, is available:

    docker ps

    Upgrade done! Now you can start Apache Zeppelin for SQL Server with the usual command:

    docker start zeppelin

    and you’ll have access to all your existing notebooks and interpreter bindings.

  • Azure Functions, Dapper and SQL Server JSON to store form data

    A common problem

    Recently we had to setup a couple of contact form pages to allow users to register to the App 2.0 Beta program (now RTM) or to tell us which of the products we launched at CES 2017 they are interested into.

    Such kind of request are quite frequent and usually, from a developer perspective, they are the worst thing someone can ask. They are boring to architect, boring to write, boring to test. Still, business and marketing needs and love them.

    So how to satisfy this requirement in the fastest and easiest way, but still delivering high-quality and easy to maintain code, while also keeping an eye on integrating the data with the existing environment?

    Evaluating Azure DocumentDB

    Given that we are on Azure, one option was Azure DocumentDB. No need to define a schema upfront and it is usable just using REST so the UX developer could do anything on its own, just using JQuery. Azure DocumentDB can also be queried using a SQL language, so extracting data from it wouldn’t have been a problem.

    But at the end, I decided to go another way. All our data, after swimming in a Data Lake are saved in a Azure SQL database where we heavily rely on its column-store features. So having an additional database to integrate was something that would have made the solution a bit more more complex than the simplest possible one. The famous quote

    everything should be made as simple as possible, but no simpler

    is what drives all my architectural decisions, so I wasn’t really happy about going with Azure DocumentDB.

    With the additional fact that there are no really good tooling around Azure DocumentDB yet, I started to look for alternatives. The obvious alternative, aside from saving data into a blob, which was not on option since that would have been too simple, because it doesn’t offer any querying capability, was to use Azure SQL.

    Moving to Azure SQL?

    With Azure SQL you have great tooling (SQL Server Management Studio and now also the online query editor), we already have all knowledge in house, but surely the fact that it doesn’t allow to use just REST to read and write data was, again, something that wasn’t making me really happy.

    Beside that, Azure SQL seemed to be the perfect option. JSON is now natively supported, so there is no problem to store data without a strictly enforced schema.

    Since we’re already using SQL Azure, we wound’t even have to spend any additional cent for it. The only problem to solve was that you can’t use Azure SQL directly via JQuery.

    Serverless is the key

    The missing link — the REST interface — can easily be created using Azure Functions and a microORM like Dapper. Thanks to the serverless nature of Azure Functions all it’s need are the few lines of code to get the HTTP Request Body that contains the contact form “jsonifyied” data and store into the SQL Azure database.

    The created Azure Function gets called each time an HTTP Request is done, using an HTTP Trigger. Here the function.json file that defines the function bindings:

    "bindings": [
    "authLevel": "function",
    "name": "req",
    "type": "httpTrigger",
    "direction": "in"
    "name": "res",
    "type": "http",
    "direction": "out"
    "disabled": false

    and the function full code is here:

    Such additional layer is also a welcome addition since it allows to inject some additional validation checks and business if needed.

    I’ve used a Stored Procedure here for better decoupling, and it does just a simple insert, with some additional checks.

    It also turns some JSON data into columns, to make querying even easier.

    Dapper helps to make interacting with SQL Azure a breeze (two, yes two, lines of code to open the connection and write to SQL azure), so all simplicity requirements are finally met. No more impedance mismatch. No more.

    Given the existing external factors (resources, knowledge, integration) the simplest solution but not simpler has been achieved.

    Without any existing factors I would probably have chosen Azure DocumentDB. But as an afterthought, I still have the gut feeling that SQL Azure would have been better in the long term (for example I can connect Marketing Manager’s Excel directly to the SQL Azure, something that I cannot do yet with DocumentDB)…so maybe SQL Azure would be my choice onve again. After all software is made to serve people, and this should drive at least 50% (if not more) of the architectural choices, IMHO.

    In conclusion

    Since I joined Sensoria I’ve moved to work on the cloud 100% of my time. And the more I work on the cloud, the more it is clear to me that serverless and simplicity (which means, implementation-wise: microservices) is the way to go. Efficiency is increased so much in this way that it’s really worth the effort, and embracing PaaS clearly becomes an obvious choice.

  • Creating your first Dashboard with Apache Zeppelin and SQL Server/Azure SQL

    Now that Apache Zeppelin has been downloaded and configured, it’s time to create a dashboard. It will be very easy: all you have to do is figure out which data you want to show, write the corresponding T-SQL query and then add some charts and information to make it perfect.

    To create the first Apache Zeppelin dashboard, let’s use the new Wide World Imports sample database from Microsoft:


    Take the “Standard” version so that you can use it also on a non-premium Azure SQL instance, if you want to try it on Azure.

    Once you have restored the WideWorldImportersStandard database, run Apache Zeppelin 0.6.2 for SQL Server docker container:

    docker run --name zeppelin -p 8080:8080 -v /zeppelin-sqlserver/notebook -v /zeppelin-sqlserver/conf -d yorek/zeppelin-sqlserver:v0.6.2

    make sure you have configured an interpreter (call it “Azure” for example) with the connection information to point to WideWorldImportersStandard and than create a new note:

    And now it’s just a matter of writing the queries and turning the result into charts. As usual I’ve recorded a quick video (it’s just six-something minutes) to easily show how you can do that. (And I’ve also started from configuring the interpreter so that you can also review that part).



    In the video I also show how the markdown interpreter can be used to add information to the dashboard.

    The sample dashboard, that also includes the queries, can be downloaded here:

    My First Dashboard.json

    If you prefer to download only the queries and then DIY, here’s a SQL file with all the used queries:

    My First Dashboard.sql

    I really recommend you to start using Apache Zeppelin if you haven’t done it yet. It’s incredibly useful even for DBAs just to monitor SQL Server status. I’ll talk about this in a forthcoming post. Stay tuned!

  • Apache Zeppelin 0.6.2 for SQL Server Docker Image updated

    In order to be able to keep created notebooks and interpreters configurations when upgrading the docker image to a newer version, I changed the dockerfile to use docker volumes, as described here (if you’re interested into learning more):

    Manage data in containers

    to run the container, now, the command is

    docker run --name zeppelin -p 8080:8080 -v /zeppelin-sqlserver/notebook -v /zeppelin-sqlserver/conf -d yorek/zeppelin-sqlserver:v0.6.2

    The -v does the trick and will be very useful the first time a new image will be released, so that you’ll be able to keep all your notebooks without having to export them before and, in addition, also interpreter configurations will be preserved.

    The solution used until now (sharing a volume with the host) works nice, but unfortunately works only for notebooks. If you have a lot of different interpreter configured (like me) re-configuring them every time the image is updated is really time consuming and boring.

    To be sure that your container is using volumes, you can check it using the inspect command:

    docker inspect zeppelin

    The output is a JSON with detailed information on the container. Look for the Mounts node:

    If you are using a previous version of the image, my recommendation is to download this updated one so that you’ll be ready for future updates.

    If you’re not using Apache Zeppelin yet, you should really start. No matter if you are a Data Scientists or a DBA, Zeppelin is *really* useful to create nice visualization and dashboard just using T-SQL:


  • Apache Zeppelin 0.6.2 for SQL Server now with correct Syntax Highlighting

    After a little bit of work I’ve managed to have syntax highlight for T-SQL in Apache Zeppelin 0.6.2 working correctly.

    The tricky part is that the Ace Editor already supports T-SQL specific syntax highlighting from v 1.2.0, but Apache Zeppelin is still using version v 1.1.9.

    Moving the v 1.2.0 doesn’t work since it creates some compatibility issues so the only way to go was to manually patch and tweak Apache Zeppelin to use the highlighting file for T-SQL available in version 1.2.0.

    Said and done, now T-SQL is beautifully highlighted:

    SQL Server 2016 and vNext aren’t supported yet but I’ll work on this in future for sure.

    Both the GitHub repository and the Docker Hub are already updated. To update your docker image, if you already have downloaded it before, just do the usual pull:

    docker pull yorek/zeppelin-sqlserver:v0.6.2

    Remember to export your notebooks to be sure not to loose any work done so far (this behavior will be fixed soon). Then go for

    docker stop zeppelin

    docker rm zeppelin

    docker run -p 8080:8080 --name zeppelin -d yorek/zeppelin-sqlserver:0.6.2

  • Configure and use Apache Zeppelin with SQL Server: the Basics

    In the previous post I’ve shown how to download and run Apache Zeppelin 0.6.2 for SQL Server, using Docker to simplify the installation procedure. It’s now time to start to use the fantastic Apache Zeppelin to connect to SQL Server and run some queries.

    This version specifically created for SQL Server, already comes with a predefined and almost configured interpreter so that you just have to specify the SQL Server (or Azure SQL or Azure DW) server you want to connect to, the database, the username and the password and you’re good to go.

    Apache Zeppelin Interpreter

    An Apache Zeppelin interpreter is a library that allows Apache Zeppelin to connect to and use a specific database, data provider or even language.

    Notebooks and Paragraphs

    A notebook is made of one or more paragraph, which are the containers where you actually write and run code. To specify which interpreter a paragraph should use, you just have to write the interpreter name prefixed by the percentage (%) character:

    Interpreter Binding

    Each notebook can use one or more interpreter. The process of defining which interpreter a notebook can use is called binding. The interpreters shipped with Apache Zeppelin 0.6.2 for SQL Server are:

    T-SQL (%tsql)
    Markdown (%md)
    Python (%python)
    Shell (%sh)
    Angular (%angular)

    To specify which interpreter is available in the active notebook, you can just click on the the little gear icon on the top right:

    Apache Zeppelin 0.6.2 for SQL Server comes with the following interpreters already bound to each notebook by default:

    The default interpreter will be used if you don’t specify, for a paragraph, which interpreter you want to use. As you can see you can mix in a notebook different interpreters so that you can do almost everything you want. The Markdown is very useful since with that you can create self-explanatory notebooks.

    Interpreter Configuration

    Before using an interpreter you have to configure it. This can be done using the “Interpreter” menu accessible by clicking on the drop-down arrow visible on the right of the top search bar.

    Click on Interpreter menu item to enter configuration page

    Each interpreter has its own set of configuration options. For T-SQL you have to specify the following options:


    Beside the usual username, password, initial database you have couple of more options:


    The JDBC driver to use. Just leave the default, unless you want to use, for any reason, another JDBC driver to connect to SQL Server, which is very unlikely. I’ll probably hide this option in future now that Microsoft has officially released and open sourced JDBC driver for SQL Server and which is the JDBC driver to use.


    Is the “address” of the SQL Server instance you want to connect to, using the JDBC format:


    more details on the available properties can be found here: Building the Connection URL.


    Big Data means that you cannot work taking a look at all your data. It is just too big (by definition!). So you’ll normally want to avoid to return all of your rows if, by mistake, you forget to specify a where clause. Here you specify the maximum number of rows you want to be returned ever. 1000 is the default number and should be more than enough. If you want to change it, keep in mind that all downloaded data will live in your browser memory space, so you don’t really want to push that number too high.


    this is a very tricky and interesting property: it allows you to decide if all paragraph in the notebook will share the same connection or each paragraph will have a dedicated connection on it’s own.

    With a notebook connection you will be able to share #temp tables between paragraphs, for example, but you won’t to able to execute two or more paragraph in parallel. Each paragraph will, in fact, be executed using a FIFO logic.

    By using a paragraph connection style, on the opposite, each paragraph will be totally independent and isolated from the others (which means it will have it’s own spid), just like it happens when you use more than one document with SQL Server Management Studio. This connection style allows each paragraph to be executed in parallel with the others.

    Ready, set, go!

    Once the settings are saved, you can start to query your SQL Server instance:

    T-SQL query executed in Apache Zeppelin 0.6.2

    See it in action

    As I did the last time, I’ve also created a quick video to show, in less than 5 minutes, how you can configure Apache Zeppelin 0.6.2 for SQL Server and run your first queries:

  • Run Apache Zeppelin 0.6.2 for SQL Server

    As promised in my last post here is the first post that aim to show how to use Apache Zeppelin with SQL Server. First thing first: installation. Or maybe not.

    The first thing to do, usually, is installing the product. Since we’re talking Java here, things may get a little bit long and complex if, like me, you’re more a .NET guy. Even worse if your not a Java nor .NET guy. You’re just a DBA or a Data Scientist. Well Docker is here to help.

    Download and install Docker. It’s very easy an takes a few minutes only.

    Once it is running, open a console where you can send docker commands (any terminal if on Linux or macOS, PowerShell if running on Windows, Docker Quickstart Terminal if running using the Docker Machine toolbox) and go for the following commands:

    docker pull yorek/zeppelin-sqlserver:0.6.2

    docker run -p 8080:8080 --name zeppelin -d yorek/zeppelin-sqlserver:0.6.2

    The first download the docker image (depending on your connection speed it may take a while) and the second run the docker container with Apache Zeppelin inside. It also expose the port 8080 so that it can be used to reach the contained Apache Zeppelin process.

    That’s it. Now you can connect to your local machine and start using Apache Zeppelin:


    If you’re still using the “old” Docker Machine (maybe because, like me, you also need to use VMWare and cannot then install Hyper-V), you have to connect to your Docker Machine instead of localhost. To get the IP Address of your Docker Machine simply do

    docker-machine ip

    From the Docker Quickstart Terminal.

    To view the entire process in just a minute, here’s a short video: 

    Next stop: Configure Apache Zeppelin and run your first query against SQL Server.

  • SQL Server Interpreter for Apache Zeppelin 0.6.2

    I’ve updated the code-base to Apache Zeppelin 0.6.2 and I’ve also finished a first simple-but-working support to autocomplete (you can activate it using CRTL + .). Right now the autocomplete is based on the keywords specified here:

    Reserved Keywords (Transact-SQL)

    is not much, I know, but is something, at least. Next steps will be to read schemas, tables and columns from SQL Server catalogs table. And maybe extract the list of keywords from…somewhere else, to have a more complete coverage.

    I’ve also removed additional Interpreter that may not be useful if you just plan to use it against T-SQL/TDS compatible engines (SQL Server, Azure SQL and Azure DW), and configured the defaults so that it is ready to use SQL Server right from the beginning.

    The code — along with compilation/install/basic usage instructions — is available on GitHub:

    Apache Zeppelin 0.6.2 for SQL Server

    Right now I’ve tested it only on Ubuntu Linux 16.04 LTS 64bits. It should also work on native Windows, but since I haven’t tried it yet on that platform, I don’t know the challenged you may face in order to have the full stack (Java, Maven, Node, ecc. ecc.) working in order to be able to compile and run it.

    At the beginning of next week I’ll release a small tutorial to show how you can use Apache Zeppelin for SQL Server also on your Windows machine using Docker. I plan to do a few tutorials on the subject, since I find Apache Zeppelin very useful and I’m sure it will be loved also by many other SQL Server guys once one start to play with it.

    At some point I’ll will also release only the bin package so that one doesn’t have to compile it itself (but hey, do we love Linux right now, don’t we?) and so that it can just run on Windows, but for now I find the Docker container approach so much better than anything else (it “just runs” and I can do anything via GitHub and Docker Hub), that I’ll stay with this for a while.

  • Azure Functions to Schedule SQL Azure operations

    One of the things that I miss a lot when working on SQL Azure is the ability to schedule jobs, something that one normally does via SQL Server Agent when running on premises.

    To execute scheduled task, on Azure, Microsoft recommends to use Azure Automation. While this is surely one way of solving the problem, I find it a little bit too complex for my needs. First of all I’m not a PowerShell fan, and Azure Automation is all about PowerShell. Secondly, I just need to schedule some SQL statements to be executed and I don’t really need all the other nice features that comes with Azure Automation. With Azure Automation you can automate pretty much *all* the resources available on Azure but my interest, for now, is only on SQL Azure. I need something simple. As much as simple as possible.

    Azure Functions + Dapper are the answer. Azure Functions can be triggered via CRON settings, which means that a job scheduler can be easily built. 
    Here’s an example of a CRON trigger (in function.json)

    "bindings": [
    "name": "myTimer",
    "type": "timerTrigger",
    "direction": "in",
    "schedule": "0 30 4 * * *"
    "disabled": false

    CRON format is detailed here: Azure Function Timer Trigger. As a simple guideline, the format is:

    {second} {minute} {hour} {day} {month} {day of the week}

    In the sample above, it tells to Azure Function to be executed every day at 04.30. To turn such expression in something that can be more easily read, tools like

    are available online. If you use such tools, just keep in mind that many doesn’t support seconds, and then you have the remove them before using the tool.

    Dapper is useful because make executing a query really a breeze:

    using (var conn = new SqlConnection(_connectionString))
    conn.Execute("<your query here>");

    To use Dapper in Azure Function, a reference to its NuGet package has to be put in the project.json file

    "frameworks": {
    "net46": {
    "dependencies": {
    "Dapper": "1.50.2"

    It’s also worth mentioning that Azure Functions can be called via HTTP or Web Hook and thus also via Azure Logic Apps or Slack. This means that complex workflows that automatically responds to certain events can be put in place very quickly.

  • Temporal Tables

    I have delivered a talk about “SQL Server 2016 Temporal Tables” for the Pacific Northwest SQL Server User Group at the beginning of October . Slides are available on SlideShare here:

    and the demo source code is — of course — available on GitHub:

    The ability of automatically keep previous version of data is really a killer feature for a database since it lift the burden of doing such really-not-so-simple task from developers and bakes it directly into the engine, in a way it won’t even affect existing applications, if one needs to use it even in legacy solutions.

    The feature is useful even for really simple use cases, and it allows to open up a nice set of analytics options. For example I’ve just switched the feature on for a table where I need to store that status of an object that needs to pass through several steps to be processed fully. Instead of going through the complexity of managing the validity interval of each row, I’ve just asked the developer to update the row with the new status and that’s it. Now querying the history table I can understand which is the status that takes more time, on average, to be processed.

    That’s great: with less time spent doing technical stuff, more time can be spend doing other more interesting activities (like optimizing the code to improve performance where analysis shows they are not as good as expected). 

  • Azure SQL Database DTU Calculator

    One of the most common questions when you start to use SQL Azure is related to the choice of the level of service needed for your needs. On the cloud every wasted resource is a tangible additional cost, so it is good to chose the best service level the fits your needs, no more and no less. You can always scale it up later if needed.

    The "problem" is that the level is measured in DTU - Database Transaction Units - which a value that represents a mix of CPU, memory and I / O. The problem is that it is very difficult, if not impossible, to calculate this value for an existing on-premises server, so that you can have a compare it with the performance of your well-known on-premises server.

    Well, it *was* impossible. Now you can, thanks to this tool:

    Azure SQL Database DTU Calculator

    developed by Justin Henriksen, a Solution Architect specializing on Azure, that simplifies a lot the estimation effort. After running a PowerShell script to detect some metrics on the On-Premises Server, you have to upload the collected values n that site to get an idea of ​​what level of DTU is optimal in case you want to move that database or server to the cloud.

    Of course the more your workload is representative of a real-world scenario, the better estimates you will have: keep this in mind before taking any decision. In addition to this website, there are also two links very useful to better understand what level of service is best suited to your situation:


  • Operator progress changes in LQS

    This has maybe gone unnoticed since August is usually a “slow” month, but with the August release there has been a major change in SQL Server Management Studio and how it show the Live Query Statistics data.

    The operator level percentages shown in the Live Query Statistics is now the ratio between actual and estimated rows, which means that the value can get way higher than 100%. The purpose of this approach is to make easier to spot places where cardinality estimation got it wrong for some reason, so that you can go and try to understand the problem and fix the query in order to improve performance or reduce resource usage.

    A detailed post on this topic by Pedro Lopes of the SQL Tiger team is here:

    Now that also the Management Studio is following monthly release schedule, the post done by the SQL Server Release Services team about SSMS really needs to be read carefully, just to be sure not to miss this small-but-huge-impact changes:

  • Apache Zeppelin for SQL Server/Azure via Docker

    For those of you that are interested in Big Data, you may be interested in knowing that I've just release the first version or a working docker image that simplifies *a lot* the usage of Apache Zeppelin in a Windows environment.

    As you may know I'm working  on a SQL Server / SQL Azure interpreter for Apache Zeppelin in order to have a good mainstream tool for interactive data exploration and visualization also on the SQL Server platform

    I've just finished a new version of the SQL Server interpreter, rebuilt from scratch, now much cleaner then the first alpha version I release moths ago, and I also decided to use docker to avoid the "linux-pains" :) to everyone who just what to use Zeppelin and are not interested in *building* it.

    Here's a screenshot of the working container:


    If you want try it (and/or help with development, documentation, and so on) you can use the docker image here:

    Supporting docker is especially important since it make *really really* easy to deploy to container to Azure and connect it to SQL Azure/ Azure DW or SQL Server in a AzureVM. No manual build needed anymore.


  • Changing the BI architecture. From Batch to Real-time, from Bulk Load to Message Processing

    In a world of Microservices, CQRS and Event Sourcing is more and more common to have the requirement that the BI/BA solution you’re developing is able to deal with incoming information (more precisely, messages and events) in almost real time.

    It’s actually a good exercise to try to understand how you can turn your “classic” batch-based solution into a message even if you’re still following the batch approach because this new approach will force you to figure out how to deal with incremental and concurrent update. Problems that can help you to renew and refactor your exiting ETL solution to make it ready for the future. I really believe in the idea of continuous improvement which means that every “x” months you have to totally review an entire process of the existing solution, in order to see how it can be improved (this can can mean: make it faster, or cheaper, or easier to maintain and so on).
    It’s my personal opinion that if everything could be managed using event and messages, even the ETL process would be *much* more simpler and straightforward that what typically is today, and to start to go on that road, we need to stop to think in batches.

    This approach is even more important in the cloud since it allows a greater efficiency (and favor usage of PaaS instead of IaaS) and helps to have a cheaper solution. In the workshop I’m going to deliver at SQL Nexus I’ll show that this, today, is something that can be easily done on Azure.

    All of  this also perfectly fits in the Lambda Architecture, a generic architecture for building real-time business intelligence and business analytic solution.

    If you’re intrigued by these ideas, or you’re simply facing the problem to move the existing BI solution in the cloud and/or making it less batch and more real-time, the “Reference Big Data Lambda Architecture in Azure” at SQL Nexus at the beginning of May is what you’re looking for.

    Here’s the complete agenda. Near 7 hours of theory and a lot demos to show how well everything blend together and with practical information that allows you to start to use what you’ve learned right from the day after:

    • Introduction to Lambda Architecture
    • Speed Layer:
      • Event & IoT Hubs
      • Azure Stream Analytics
      • Azure Machine Learning
    • Batch Layer:
      • Azure Data Lake
      • Azure Data Factory
      • Azure Machine Learning
    • Serving Layer:
      • Azure Data Warehouse / or Azure SQL
      • Power BI

    See you in Copenhagen!


    In case you’re wondering, everything is also possible on-prem, obviously with different technologies. Way less cool, but who cares, right? We’re here to do our job with the best solution for the customer, and even if it’s not the coolest one, it may well do it’s job anyway. Yeah, I’m talking of SSIS, pretty old right now, but still capable of impressive things. Especially if you use it along with Service Broker or RabbitMQ, in order to create a real-time ETL solution.

This Blog


Privacy Statement