THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Davide Mauri

A place for my thoughts and experiences on SQL Server, Business Intelligence and .NET

  • Apache Zeppelin for SQL Server/Azure via Docker

    For those of you that are interested in Big Data, you may be interested in knowing that I've just release the first version or a working docker image that simplifies *a lot* the usage of Apache Zeppelin in a Windows environment.

    As you may know I'm working  on a SQL Server / SQL Azure interpreter for Apache Zeppelin in order to have a good mainstream tool for interactive data exploration and visualization also on the SQL Server platform

    I've just finished a new version of the SQL Server interpreter, rebuilt from scratch, now much cleaner then the first alpha version I release moths ago, and I also decided to use docker to avoid the "linux-pains" :) to everyone who just what to use Zeppelin and are not interested in *building* it.

    Here's a screenshot of the working container:

    50e88c51-3f96-4be4-bbfc-f8f5f877afca

    If you want try it (and/or help with development, documentation, and so on) you can use the docker image here:

    https://hub.docker.com/r/yorek/zeppelin-sqlserver/

    Supporting docker is especially important since it make *really really* easy to deploy to container to Azure and connect it to SQL Azure/ Azure DW or SQL Server in a AzureVM. No manual build needed anymore.

    https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-docker-machine/

    Enjoy!

  • Changing the BI architecture. From Batch to Real-time, from Bulk Load to Message Processing

    In a world of Microservices, CQRS and Event Sourcing is more and more common to have the requirement that the BI/BA solution you’re developing is able to deal with incoming information (more precisely, messages and events) in almost real time.

    It’s actually a good exercise to try to understand how you can turn your “classic” batch-based solution into a message even if you’re still following the batch approach because this new approach will force you to figure out how to deal with incremental and concurrent update. Problems that can help you to renew and refactor your exiting ETL solution to make it ready for the future. I really believe in the idea of continuous improvement which means that every “x” months you have to totally review an entire process of the existing solution, in order to see how it can be improved (this can can mean: make it faster, or cheaper, or easier to maintain and so on).
     
    It’s my personal opinion that if everything could be managed using event and messages, even the ETL process would be *much* more simpler and straightforward that what typically is today, and to start to go on that road, we need to stop to think in batches.

    This approach is even more important in the cloud since it allows a greater efficiency (and favor usage of PaaS instead of IaaS) and helps to have a cheaper solution. In the workshop I’m going to deliver at SQL Nexus I’ll show that this, today, is something that can be easily done on Azure.

    All of  this also perfectly fits in the Lambda Architecture, a generic architecture for building real-time business intelligence and business analytic solution.

    If you’re intrigued by these ideas, or you’re simply facing the problem to move the existing BI solution in the cloud and/or making it less batch and more real-time, the “Reference Big Data Lambda Architecture in Azure” at SQL Nexus at the beginning of May is what you’re looking for.

    Here’s the complete agenda. Near 7 hours of theory and a lot demos to show how well everything blend together and with practical information that allows you to start to use what you’ve learned right from the day after:

    • Introduction to Lambda Architecture
    • Speed Layer:
      • Event & IoT Hubs
      • Azure Stream Analytics
      • Azure Machine Learning
    • Batch Layer:
      • Azure Data Lake
      • Azure Data Factory
      • Azure Machine Learning
    • Serving Layer:
      • Azure Data Warehouse / or Azure SQL
      • Power BI

    See you in Copenhagen!

    PS

    In case you’re wondering, everything is also possible on-prem, obviously with different technologies. Way less cool, but who cares, right? We’re here to do our job with the best solution for the customer, and even if it’s not the coolest one, it may well do it’s job anyway. Yeah, I’m talking of SSIS, pretty old right now, but still capable of impressive things. Especially if you use it along with Service Broker or RabbitMQ, in order to create a real-time ETL solution.

  • Slide e Demos of my DevWeek sessions are online

    I’ve put on SlideShare and GitHub the slide deck and the demos used in my sessions at DevWeek 2016.

    If you were there or you’re simply interested in the topics, here’s the info you need:

    Azure ML: from basic to integration with custom applications

    In this session, Davide will explore Azure ML from the inside out. After a gentle approach on Machine Learning, we’ll see the Microsoft offering in this field and all the feature it offers, creating a simple yet 100% complete Machine Learning solution. We’ll start from something simple and then we’ll also move to some more complex topics, such as the integration with R and Python, IPython Notebook until the Web Service publishing and usage, so that we can integrate the created ML solution with batch process or even use it in real time with LOB application. All of this sound cool to you, yeah? Well it is, since with ML you can really give that “something more” to your customers or employees that will help you to make the difference. Guaranteed at 98.75%!

    Dashboarding with Microsoft: Datazen & Power BI

    Power BI and Datazen are two tools that Microsoft offers to enable Mobile BI and Dashboarding for your BI solution. Guaranteed to generate the WOW effect and to make new friends among the C-Level managers, both tools fit in the Microsoft BI Vision and offer some unique features that will surely help end users to take more informed decisions. In this session, Davide will show how we can work with them, how they can be configured and used, and we’ll also build some nice dashboards to start to get confident with the products. We’ll also publish them to make it available to any mobile platform existing on the planet.

    Event Hub & Azure Stream Analytics

    Being able to analyse data in real-time will be a very hot topic for sure in near future. Not only for IoT-related tasks but as a general approach to user-to-machine or machine-to-machine interaction. From product recommendations to fraud detection alarms, a lot of stuff would be perfect if it could happen in real time. Now, with Azure Event Hubs and Stream Analytics, it’s possible. In this session, Davide will demonstrate how to use Event Hubs to quickly ingest new real-time data and Stream Analytics to query on-the-fly data, in order to do a real-time analysis of what’s happening right now.

    SQL Server 2016 JSON

    You want JSON? You finally have JSON support within SQL Server! The much-asked-for, long-awaited feature is finally here! In this session, Davide will show how the JSON support works within SQL Server, what are the pros and cons, the capabilities and the limitations, and will also take a look at performance of JSON vs. an equivalent relational(ish) solution to solve the common “unknown-schema-upfront” and “I-wanna-be-flexible” problems.

  • SQL Nexus 2016 Agenda Online

    It’s here and it’s fantastic:

    http://www.sqlnexus.com/agenda.html

    Here’s my picks

    • SQL Server Integration Services (SSIS) in SQL Server 2016 – Matt Masson
    • Beautiful Queries – Itzik Ben Gan
    • From SQL to R and beyond - Thomas Huetter
    • Fun with Legal Information in SQL Server: Data Retrieval - Matija Lah
    • Big Data in Production - Brian Vinter
    • Integrate Azure Data Lake Analytics - Oliver Engels
    • DBA Vs. Hacker: Protecting SQL Server - Luan Moreno Maciel
    • Identity Mapping and De-Duplicating - Dejan Sarka
    • SQL Server 2016 and R Engine-powerful duo - Tomaž Kaštrun
    • Dynamic Search Conditions - Erland Sommarskog
    • Normalization Beyond Third Normal Form - Hugo Kornelis
    • Responding to Extended Events in near real-time - Sartori Gianluca

    See you there!

  • SQL Nexus 2016 in Copenhagen

    For 2nd to the 4th of May, in Copenhagen, the SQL Nexus conference will take place and it looks like is going to be one of those events that, if you live in Europe, you really cannot miss.

    SQL_Nexus_930x180px_webbanner_speaker

    Just visit the website to see how awesome is the speaker roster and the, even if the agenda is not yet there, you can already feel that is going to be *really* interesting:

    http://www.sqlnexus.com

    Now, Beside the following Pre-Conference

    Reference Big Data Lambda Architecture in Azure
    The Lambda Architecture is a new generic, scalable and fault-tolerant data processing architecture, that is becoming more and more popular now that big data and real-time analytics are frequently requested by end users, enabling them to make informed decisions more precisely and quickly. During this full-day workshop we'll see how the Azure Data Platform can perfectly support such an architecture and how to use each technology to build it. From Azure IoT Hub and Azure Stream Analytics to Azure Data Lake and Power BI, we'll build a small Lambda-Architecture solution so that you'll be able to become confident with it and its implementation using Azure technologies.

    that I’ll deliver with my friend Allan Mitchell that I’ve already mentioned before I’m happy to announce that I’ll also have a regular session on Machine Learning, a topic I really love:

    Azure ML: from basic to integration with custom applications
    In this session, Davide will explore Azure ML from the inside out. After a gentle approach on Machine Learning, we’ll see the Microsoft offering in this field and all the feature it offers, creating a simple yet 100% complete Machine Learning solution.
    We’ll start from something simple and then we’ll also move to some more complex topics, such as the integration with R and Python, IPython Notebook until the Web Service publishing and usage, so that we can integrate the created ML solution with batch process or even use it in real time with LOB application.
    All of this sound cool to you, yeah? Well it is, since with ML you can really give that “something more” to your customers or employees that will help you to make the difference. Guaranteed at 98.75%!

    See you there!

  • Using Apache Zeppelin on SQL Server

    At the beginning of February I started an exploratory project to check if Apache Zeppelin could be easily extended in order to interact with SQL Server and SQL Azure. In the last week I’ve been able to have everything up an running. Given that I’ve never used Java, JDBC and Linux since the nineties when I was at university, I’m quite pleased of what I achieved (in just a dozen of hours of no sleep). Here’s Zeppelin running a notebook connected to SQL Azure.

    image

    If you want to test it too, you just have to get it source code from the fork I’ve created here on GitHub, and follow the documentation in order to build it. I’ve just run through the tutorial I’ve put up, and in 15 minutes (max) from when you have logged in in your Ubuntu 15.10 installation, you should be able to have a running instance of Zeppelin with the SQL Server interpreter.

    Here’s the document that describes everything you need to do:

    https://github.com/yorek/incubator-zeppelin/blob/master/README.md

    Now, you may be wondering, why you should be interested in Zeppelin at all? Well, if you’re into Data Science you already know how important is the ability to interactively explore data. And with SQL Server 2016 able to run R code natively, the ability to do some interactive exploratory task is even more important. For yourself and for the business user you will work with. With Zeppelin (just like with Jupyter) creating an interactive query is as simple as that:

    image

    But even if you aren’t into Data Science, Apache Zeppelin is really useful because I really think that the lack of a nice online environment to query SQL Azure is quite annoying. I love SQL Server Management Studio, but sometimes I just need to write a quick-and-dirty query to see if everything in going in the right way or, even better, I’d like to create a (maybe not so) simple dashboard with data stored in SQL Azure or SQL Data Warehouse. And maybe I don’t have my laptop with me, and all I have is a browser.

    Well, Apache Zeppelin is just perfect for all these needs and it is actually much more than that. It’s future looks very promising, so having it also on the Microsoft Data Platform is will make our beloved SQL Server / SQL Azure / SQL Data Warehouse / Azure Data Lake even more enjoyable.

    Right now this version is a sort of on Alpha version and it works only on SQL Server and SQL Azure (I haven’t tested yet on Azure Data Warehouse but should work). It “just works” since, as said at the beginning, this was more and experiment than anything else. Now that I know it is feasible, I’ll rewrite the SQL Server support for Zeppelin (called “interpreter”) from scratch, since for this attempt I’ve started from the postgresql interpreter and as a result the code is not so good (it’s more a patchwork of “let’s try if this works” things)…even if it does the job. So if you download the source and take a look a the code…just keep this in mind, please :-).

    Enjoy it and, as usual, feedbacks are more than welcome. (And help, of course!)

    PS:

    Support to Azure Data Lake is not yet there. It will come ASAP, but don’t know when yet. :-)

  • Devweek 2016

    I’m really happy to announce that I’ll be back in London, at the DevWeek 2016 Conference, in April. I’ll be talking about

    Though the conference name may imply that it’s dedicated to Developers, in reality there are *a lot* of interesting sessions on Databases, Big Data and, more in general, the Data Management and Data Science area.

    Here’s the Agenda

    http://devweek.com/agenda

    I’ll be there along with another well-known name of this blog, Dejan Sarka, just to make sure that the BI/Big Data/Data Science and the likes is well represented among all those developers Smile.

    See you there!

  • (Initial) Conference Plan for 2016

    2016 has not started yet and already looks exciting to me! I already have plans for several conferences and I’d like to share it with you all in case you’re interested in some topics.

    I’ll be presenting at Technical Cloud Day, on January 26th a local Italian event and I’ll be speaking about

    • Azure Machine Learning
    • Azure Stream Analytics

    If you’re interested (and speak Italian) here’s the website:

    http://www.technicalcloudday.it/

    I’ll also be present at some international events, like

    SQL Konferenz

    here I’ll delivery my “classic” Agile Data Warehousing workshop, during the Pre-Con days.

    • Why a Data Warehouse?
    • The Agile Approach
    • Modeling the Data Warehouse
      • Kimball, Inmon & Data Vault
      • Dimensional Modeling
      • Dimension, Fact, Measures
      • Star & Snowflake Schema
      • Transactional, Snapshot and Temporal Fact Tables
      • Slowly Changing Dimensions
    • Engineering the Solution
      • Building the Data Warehouse
        • Solution Architecture
        • Naming conventions, mandatory columns and other rules
        • Views and Stored Procedure usage
      • Loading the Data Warehouse
        • ETL Patterns
        • Best Practices
      • Automating Extraction and Loading
        • Making the solution automatable
        • BIML
    • Unit Testing Data
    • The Complete Picture
      • Where Big Data comes into play?
    • After the Data Warehouse
      • Optimized Hardware & Software
    • Conclusion

    You can find more here:

    http://sqlkonferenz.de/agenda.aspx

    I’ll also have a regular session dedicated to SSISDB and its internals: SSIS Monitoring Deep Dive. I’ll show what’s inside and how you can use such knowledge to build (and improve) something like my SSIS Dashboard: http://ssis-dashboard.azurewebsites.net/

    SQL Nexus

    This is a new Nordic Conference where I’ll deliver along with Allan Mitchell where I’ll be presenting a new, super-cool IMHO, workshop. We’ll discuss about the Lambda Architecture, a new generic reference architecture to build Real-Time Analytics solution, and how it can be built using the features that Azure offers. We’ll show how to use Azure Event Hubs, Stream Analytics, Data Lake Power BI and may other cool technologies from Azure.

    You can find more details here:

    Reference Big Data Lambda Architecture in Azure
    The Lambda Architecture is a new generic, scalable and fault-tolerant data processing architecture, that is becoming more and more popular now that big data and real-time analytics are frequently requested by end users, enabling them to make informed decisions more precisely and quickly. During this full-day workshop we'll see how the Azure Data Platform can perfectly support such an architecture and how to use each technology to build it. From Azure IoT Hub and Azure Stream Analytics to Azure Data Lake and Power BI, we'll build a small Lambda-Architecture solution so that you'll be able to become confident with it and its implementation using Azure technologies.

    http://www.sqlnexus.com/pre--and-main-conference.html

    Well if you’re interested in one or more of these topics, you know where to go now. Bye!

  • Custom Data Provider in Datazen

    Playing with Datazen in the last days, I had to solve a quite interesting problem that took me some time but also allowed me to dig deeper into Datazen architecture in order to find a way to go past its (apparent) limits.

    Here’s the story, as I’m sure will be useful to someone else too.

    One of our current customer has a quite complex Analysis Services dynamic security. Beside applying security on who is accessing the data, they also want to apply security based on how someone access such data. In order to satisfy this requirement a specific extension to Excel (their chosen client) has been developed, and it uses the CustomData() MDX Fuction.

    So, here’s the problem: how can I specify values for CustomData property in the SSAS connection string in DataZen, as there is no such property exposed by default from the native SSAS data provider?

    Luckily DataZen support custom data providers, so it’s quite easy to create a new one that exposes the properties you need:

    http://www.datazen.com/docs/?article=server/managing_data_provider_schemas

    I’ve tried to go for the “Overriding built-in data providers” road but I wasn’t able to make it work. I tried to add the “CustomData” property to a file that overrides the default SSAS data provider setting but at the end the “CustomData” property was the only option I was available to see in the overridden native provider. So I created a new SSAS Data Provider and that’s it, everything works perfectly:

    <dataproviderschema>
        <id>MSSSAS</id>
        <enabled>true</enabled>
        <name>SSAS.EPSON</name>
        <type>ssas</type>
        <properties>
            <property>
                <name>Provider</name>
                <value>MSOLAP</value>
            </property>
            <property>
                <name>Data Source</name>           
            </property>
            <property>
                <name>Initial Catalog</name>
            </property>
            <property>
                <name>CustomData</name>
                <value>{00000000-0000-0000-0000-000000000000}</value>
            </property>
        </properties>
    </dataproviderschema>

    Be aware that Datazen do *a lot* a caching so you’ll have to stop the Core service BEFORE you edit/create the XML file, otherwise you may find it overwritten with cached data, and also be sure to IISRESET your web server otherwise you can easily get mad trying to understand why what you’ve just done is not showing up in the UI.

    Beside the caching madness, everything works great.

    Hope this helps!

  • Configuring Pass-Through Windows Authentication in Datazen

    I’ve been working with Datazen lately (I’m working with a customer that literally felt in love with it) and one of the last thing we tried as a port of a POC before going into real development, is integration with Windows Authentication.

    It’s really easy to do that, you just follow instructions here (in the section “Authentication Mode”)

    http://www.datazen.com/docs/?article=server/installing_server

    and it just works. As documentation suggest, you just have to specify the domain name and that’s it.

    Of course, after that, you may also want to enable pass-through authentication, so that once a user tries to access a dashboard via HTML interface, Datazen will use the logon credential, without going through and additional logon screen.

    Here things can be tricky if you just follow that documentation here:

    http://www.datazen.com/docs/?article=server/configuring_integrated_windows_authentication

    which is correct but only to a certain degree. Everything is correct, it’s only missing to say a *very* important thing that you have to know to make sure that it works as expected: you have to provide ALL FOUR SETTINGS (Server, UserName, Domain, Password) in order to make it work.

    If you forgot to do it during installation, no problem, you can do it later setting the

    • ad_server
    • ad_username
    • ad_domain
    • ad_password

    configuration values as explained here:

    http://www.datazen.com/docs/?article=server/server_core_settings

    After that, the magic happens, and everything works perfectly

    PS

    Of course you have to have configured Kerberos Authentication and Delegation correctly, but that’s another story.

  • SQL Server 2016 CTP 2.3: Management Studio and Data Tools Updates

    After SQL Server 2016 CTP 2.3 has been released, also Management Studio and SQL Server Data Tools has been updated too. Having three different teams working on three different products, means three different places one has to look for to become aware of the updates, so make it easier for everyone who’s asking, here the complete set of link to have a full SQL Server 2016 CTP 2.3 installation (SQL Server Platform + All the Clients):

    SQL Server 2016 CTP 2.3

    SQL Server 2016 Management Studio Preview August 2015 Update

    SQL Server 2016 Data Tools (with support for both Database & Business Intelligence) Preview August 2015 Update

    Enjoy!

  • SQL Saturday in Italy…English version!

    The 2015 is a special year for Italy, because the country hosts Expo 2015, which is the current Universal Exposition. For this reason, the Italian PASS chapter promotes a special edition of SQL Saturday, a free training event for SQL Server professionals. The SQL Saturday #454 in Turin on October 10, 2015 has the following characteristics:

    • More than 20 sessions, on SQL Server, Business Intelligence and Azure Data Platform.
    • All the sessions will be in English language.
    • The venue is in the center of Turin, close to the train station:
      • You can be at the expo in 40 minute
      • You can travel to Milan in less than 1 hour
    • Turin is usually less expensive than Milan and you might stop for at least one night, dedicating the Sunday after SQL Saturday to visiting the Expo or Milan.

      We want to provide the best experience to the attendees, and we also want to help those of you traveling with family and/or friends that might not interested to technical content. For this reason, we are planning a web page containing information for side and/or alternative activity during the SQL Saturday. You will get more information about that starting in July.

      However, we first need a good estimation of the number of attendees, in order to correctly size the venue and to verify the interest in side activities, so we will module the time to allocate in such a section of the upcoming web site. These operations have to be completed months ahead of the event.

      For this reason, we ask you to fill the survey at http://www.sqlsatexpo.com/, providing us important information about your intention of visiting Expo 2015 and about the number of people who will travel with you.

      If you are a speaker, please submit your sessions, considering that the agenda will prioritize three topics: SQL Server 2016, Power BI, and Azure Data Platform.

      See you in Turin!

    • SQL Konferenz 2015 Slide & Demo

      Last week I spoke at the SQL Konferenz in Darmstad near Frankfurt. The conference was great and I meet a lot of good SQL friends over there. For anyone interested here you can find slide & demos of the session I delivered:

      (Near) Real-Time Data Integration with SQL Server, On-Premises & Cloud
      http://www.slideshare.net/davidemauri/real-time-data-integration

      Schema-Less Table & Dynamic Schema
      http://www.slideshare.net/davidemauri/schema-less-table-dynamic-schema-44295422

      You’ll find a link to evaluated the session on SpeakerScore and to download the slides in the last slide of each deck.

      Enjoy!

    • Iris Multi-Class Classifier with Azure ML

      As many of us I’m passionate about informatics *and* mathematics which, of course, lead me to be passionate about the outcome of their marriage: Databases and Machine Learning.

      Now that Machine Learning is becoming a kind of a “commodity” thanks to AzureML I can finally start to use it in any projects, even the not-so-big-ones.

      AzureML, for those who doesn’t yet know it, is the Machine Learning offer for the cloud by Microsoft. You can freely start to use it just activating your subscription here:

      https://studio.azureml.net/Home

      Once activated you’ll find a lot of ready-to-be-used stuff. From “experiments” (kind of “programs”) and dataset and components and models (algorithms).

      One thing I noticed is missing is the full Iris Dataset, one of the most famous and used dataset to start to learn machine learning. In AzureML you can find a subset of it, usable for binary classification, but the original one is much more interesting since it can be used to do a multiclass classification.

      In order to fill this little gap and to create an easy tutorial to help everyone to start to get confident with AzureML and machine learning in general, I’ve created a 10-Step (well…Italian way of 10 steps Winking smile) tutorial that can be found here:

      http://www.slideshare.net/davidemauri/iris-multiclass-classifier-with-azure-ml 

      or here

      https://speakerdeck.com/yorek/iris-multi-class-classifier-with-azure-ml

      choose the website you prefer Smile and start to play!

      As usual, comments and feedbacks are more than welcome!

    • SSISDB Monitoring Queries on GitHub

      I’ve moved my SSISDB scripts from Gist to GitHub where I can maintain them more comfortably. So far, I’ve published 6 scripts:

      • ssis-execution-status: Latest executed packages
      • ssis-execution-breakdown: Execution breakdown for a specific execution
      • ssis-execution-dataflow-info: Data Flow information for a specific execution
      • ssis-execution-log: Information/Warning/Error messages found in the log for a specific execution
      • ssis-execution-lookup-cache-usage: Lookup usage for a specific package/execution
      • ssis-execution-package-history: Execution historical data

      I used them almost every day when I need to have a quick glance to what’s going on on Integration Services and when I need to do some deep analysis of errors and problems.

      You can find them here:

      https://github.com/yorek/ssis-queries

      If you’re also wondering what happened to the SSIS Dashboard project

      https://github.com/yorek/ssis-dashboard

      …don’t fear, it’s not dead. I’m still working on it, but since I’m working on it only in my free time, updates are taking much more time than expected.

      PS

      Funny enough, Andy Leonard published a script to analyze lookups just couple of hours before me. You may also want to take a look at his post: http://sqlblog.com/blogs/andy_leonard/archive/2015/01/16/advanced-ssis-parsing-ssis-catalog-messages-for-lookups.aspx

    More Posts Next page »

    This Blog

    Syndication

    Privacy Statement