THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Jorg Klein

Microsoft Data Platform MVP from the Netherlands

  • Azure Data Lake Store encryption using Azure Key Vault for key management

    Case
    You want to create an encrypted Azure Data Lake Store (ADLS)
    with a master encryption key that is stored and managed in your own existing Azure Key Vault.

    Using this setup, which is showed in the diagram below, all data in your Data Lake Store will be encrypted before it gets stored on disk. To decrypt the data, a master encryption key is required.
    In this scenario a “customer managed” key will be used, this means the key is created and managed in your own Azure Key Vault. This as an alternative to a key that is managed and owned by the Data Lake Store service, which is the default. Managing keys in the Key Vault gives additional possibilities like revoking access to the key for the ADLS service identity or even permanently deleting the key from the Key Vault.

    clip_image001

     

    Solution
    In this blog post I’ll guide you through to the 3 steps below, all in an automated way using PowerShell scripting and an Azure Resource Manager (ARM) template to create your encrypted ADLS. I plan to blog later about the possibilities that Visual Studio Team Services offers to perform these deployment tasks.

    1.       Create new “customer managed” key in existing Azure Key Vault

    2.       Create a new ADLS with data encryption enabled

    3.       Grant ADLS service principal access to Azure Key Vault and enable Key Vault managed encryption using your “customer managed” key

     

    Prerequisites

    ·       Create Azure Resource Group. I have created one named “adls-keyvault-demo” (akd)

    ·       Create Azure Key Vault if you do not already have one. I have created one named “akd-keyvault”

    ·      AzureRM 4.1.0. Module from the PowerShell Gallery. Required since we will use the new Enable-AzureRmDataLakeStoreKeyVault PowerShell function

     

    PowerShell script

    Executing the PowerShell script below creates the new key in your existing Azure Key Vault, it then creates a new ADLS using an ARM template (see below) and finally it will enable Key Vault managed encryption for your new ADLS. The comments in the script give further explanation and messages during execution will be written to the Windows PowerShell console to inform you on what’s happening. Make sure you have at least AzureRM 4.1.0 installed and the account you will use have sufficient permissions.

    The following variables are used:

    ·       subscriptionId - Azure Subscription ID

    ·       rg - Azure Resource Group name

    ·       keyVaultUri - Key Vault DNS Name. Check your Key Vault Properties in Azure Portal.

    ·       keyName - Name of Key Vault key that will be used for the ADLS

    ·       armTemplateFileAdls - Path of your ADLS ARM template JSON file. You can find the definition below the PowerShell script, copy/paste it into a JSON file and store it on disk

    ·       adlsName – Name of your ADLS


    # Variables; modify 
    $subscriptionId = "00000000-0000-0000-0000-000000000000"

    $rg = "adls-keyvault-demo"

    $keyVaultUri = "https://akd-keyvault.vault.azure.net/"

    $keyName = "akd-adls-key"

    $armTemplateFileAdls = "C:\CreateEncryptedADLS.JSON"

    $adlsName = "akdadls"

     

    #Authenticate to Azure and set the subscription context

    Login-AzureRmAccount

    Set-AzureRMContext -SubscriptionId $subscriptionId

     

    Write-Host "Get Key Vault Name from URI $keyVaultUri"

    $keyVaultHost = ([System.Uri]$keyVaultUri).Host

    $keyVaultName = $keyVaultHost.Substring(0, $keyVaultHost.IndexOf('.'))

     

    Write-Host "Creating software-protected key $keyName in Key Vault $keyVaultName"

    $adlsKey = Add-AzureKeyVaultKey -Destination Software -Name $keyName -VaultName $keyVaultName

     

    #Get current Version identifier of key which will be used for the creation the ADLS using the encryptionKeyVersion parameter

    $adlsKeyId = $adlsKey.Version.ToString()

     

    Write-Host "Create new encrypted ADLS by deploying ARM script $armTemplateFileAdls in resource group $rg"

    New-AzureRmResourceGroupDeployment -ResourceGroupName $rg -TemplateFile $armTemplateFileAdls `

    -DataLakeStoreName $adlsName -KeyVaultName $keyVaultName -DataLakeStoreKeyVaultKeyName $keyName -DataLakeStoreKeyVaultKeyVersion $adlsKeyId

     

    #Get the ADLS account and it's Service Principal Id

    $adlsAccount = Get-AzureRmDataLakeStoreAccount -Name $adlsName

    $adlsAccountSPId = $adlsAccount.Identity.PrincipalId

     

    Write-Host "Grant ADLS account Service Principal $adlsAccountSPName required permissions on the Key Vault"

    #Grant ADLS account access to perform encrypt, decrypt and get operations with the key vault

    Set-AzureRmKeyVaultAccessPolicy -VaultName $keyVaultName -ObjectId $adlsAccountSPId -PermissionsToKeys encrypt,decrypt,get -BypassObjectIdValidation

     

    Write-Host "Enable ADLS Key Vault managed encryption"

    Enable-AdlStoreKeyVault -Account $adlsAccount.Name

     

    Write-Host "ADLS $adlsName is now encrypted using key $keyName in Key Vault $keyVaultName"


     

    ARM Template ADLS


    {

      "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",

      "contentVersion": "1.0.0.0",

      "parameters": {

        "DataLakeStoreName": {

          "type": "string",

          "minLength": 1

        },

        "KeyVaultName": {

          "type": "string",

          "minLength": 1

        },

        "DataLakeStoreKeyVaultKeyName": {

          "type": "string",

          "minLength": 1

        },

        "DataLakeStoreKeyVaultKeyVersion": {

          "type": "string",

          "minLength": 1

        }

     

      },

      "resources": [

        {

          "type": "Microsoft.DataLakeStore/accounts",

          "name": "[parameters('DataLakeStoreName')]",

          "apiVersion": "2016-11-01",

          "location": "North Europe",

          "tags": {

            "displayName": "Datalake Store"

          },

          "identity": {

            "type": "SystemAssigned"

          },

          "properties": {

            "encryptionState": "Enabled",

            "encryptionConfig": {

              "type": "UserManaged",

              "keyVaultMetaInfo": {

                "keyVaultResourceId": "[resourceId('Microsoft.KeyVault/vaults', parameters('KeyVaultName'))]",

                "encryptionKeyName": "[parameters('DataLakeStoreKeyVaultKeyName')]",

                "encryptionKeyVersion": "[parameters('DataLakeStoreKeyVaultKeyVersion')]"

              }

            }

          }

        }

      ]

    }


     

    After you successfully execute the PowerShell script, navigate to the Azure portal to check if everything is OK.

    Data Lake Store à Settings à Encryption

    image

    The account is successfully encrypted using the Key Vault key. The ADLS account has a generated Service Principal named “RN_akdadls” which we granted permissions to the Key Vault in the PowerShell script.

     

    Key Vault à Settings à Keys

    clip_image005

    The key has been created and is enabled.

     

    Key Vault à Settings à Access policies

    clip_image007

    The ADLS Service Principal has an access policy that we set with the PowerShell script.

    Opening it shows the key permissions:

    clip_image009


    Special thanks to my Macaw colleague Simon Zeinstra for working together on this solution!

  • Hybrid-BI data platform network architecture

    This blog post gives an overview of a typical hybrid Azure/on-premises BI data platform network architecture. The most important settings, configurations, shortcomings and other things you should take into account are described. For more details and in-depth documentation links to Microsoft pages are provided at the bottom of this post.

    The following common scenarios are covered:

    ·       Write data from on-premises data sources to an Azure SQL Database or Azure SQL Data Warehouse using SQL Server Integration Services.

    ·       Use the On-Premises Data Gateway (or ADF gateway) to make on-premises sources available for various Azure services like PowerBI.com and Azure Data Factory, or to be able to process an Azure Analysis Services cube with on-premises data.

    ·       Connect to an Azure SQL Database using client tools or PowerBI.com.

    ·       Connect to an Azure Analysis Services tabular model using client tools or PowerBI.com.

    The architecture diagram below (click to open) shows how the different on-premises resources connect to Azure services. For every connection the ports that should be open in your firewall, network protocols, encryption methods and authentication types are shown.

    Azure DDP

    1.       For the SSIS ADO.NET driver, use the following connection string parameters to use an encrypted connection: Encrypt=True and TrustServerCertificate=False.

     

    2.       All connections to Azure SQL Database require encryption (SSL/TLS) at all times while data is "in transit" to and from the database.

    3.       Set “Allow access to Azure services” property for your Azure SQL Database server to ON. Be aware that this means not only the Azure services in your Azure Subscription can reach your Azure SQL Database server, but all Azure services worldwide, also from other customers.

     

    4.       Only SQL Server Authentication is currently supported when connecting to an Azure SQL Database from SSIS, Power BI Desktop or Excel. Power BI Desktop models deployed to PowerBI.com will therefore also connect to an Azure SQL Database using SQL Server Authentication. The latest version of SQL Server Management Studio does support Azure Active Directory Integrated Authentication.

     

    5.       To access Azure SQL Database from your local computer, ensure the firewall on your network and local computer allows outgoing communication on TCP port 1433. Outgoing traffic can be filtered to allow only traffic to Azure datacenter IP addresses for your region. This is sometimes a requirement before organizations want to allow outgoing traffic through port 1433. Inbound connections for port 1433 can be blocked.

     

    6.       By default in Power BI Desktop, the “Encrypt connections” option is checked for your data source. If the data source doesn't support encryption, Power BI Desktop will prompt to ask if an unencrypted connection should be used.

     

    7.       Port 443 is used for default communication to PowerBI.com. Ports 5671 and 5672 can be used for Advanced Message Queuing Protocol (AMQP). Ports 9350 thru 9354 can be used for listeners on Service Bus Relay over TCP. If you use the ADF gateway to write data to Azure SQL Database or Azure SQL DW outgoing traffic through port 1433 needs to be allowed.

    Links:

    Load data from SQL Server into Azure SQL Data Warehouse (SSIS)

    On-premises data gateway in-depth

    Data Factory Data Management Gateway

    Securing your SQL Database

    Overview of Azure SQL Database firewall rules

    Use Azure Active Directory Authentication for authentication with SQL Database or SQL Data Warehouse

    What is Azure Analysis Services?

  • Process Azure Analysis Services databases from Azure Automation

    In my last blog post I showed how to trigger Azure Data Factory (ADF) pipelines from Azure Automation. I also mentioned the option to process an Azure Analysis Services cube from Azure Automation. For example right after your ADF data processing finishes, which will probably be a common use case. In this blog post I show you how you can use the Analysis Services PowerShell provider, also known as SQLASCDMLETS, from Azure Automation.

    Create custom SQLASCMDLETS module
    The SQLASCDMLETS are not (yet) available in the PowerShell Gallery so unfortunately it’s not possible to import the cmdlets straight into Automation like I did with the ADF cmdlets in my previous blog post. Instead we have to create our own module which will contain the SQLASCMDLETS and its dependencies.

    UPDATE: You can now use the SqlServer PowerShell Gallery module. Creating a custom module like described below in the blue text is not required anymore. Instead, follow the 3 steps below and continue reading below the blue text.

    1. Navigate to https://www.powershellgallery.com/packages/SqlServer/

    2. Click the “Deploy to Azure Automation” button

    3. Import the SqlServer module to your Azure Automation account


    The required files come with SQL Server Management Studio (SSMS) which you can download and install for free. It’s important to note you need the latest version (140) of the SQLASCDMLETS which is shipped with the latest Release Candidate of SSMS. Download and install it.

    If you try to use the previous version of the SQLASCDMLETS (130) you will get an error in Automation because it tries to authenticate with a claims token while only windows authentication is supported by the 130 version of SQLASCDMLETS: “The value 'ClaimsToken' is not supported for the connection string property”.

    After installing SSMS you should now be able to see the following directory: C:\Program Files (x86)\Microsoft SQL Server\140\Tools\PowerShell\Modules\SQLASCMDLETS

    Copy the SQLASCMDLETS folder to a temporary location, for example C:\SQLASCMDLETS.
    You will need the following files:

    • Microsoft.AnalysisServices.PowerShell.Cmdlets.dll
    • SQLASCMDLETS.PSD1

    We also need the libraries SQLASCMDLETS depends on. Search your computer for the following files and copy paste them to the C:\SQLASCMDLETS folder. Make sure you copy them from a folder that has “140” in the path so you are sure you have the correct version.

    • Microsoft.AnalysisServices.Core.dll
    • Microsoft.AnalysisServices.Tabular.dll
    • Microsoft.AnalysisServices.dll

    Now zip the entire folder, make sure the name is “SQLASCMDLETS.zip”.


    Import custom SQLASCMDLETS module to Azure Automation
    Navigate to your Azure Automation account in the Azure portal.

    Click Assets:       

    Click Modules:       

    Click Add a module to import a custom module:       

    Now upload the SQLASCMDLETS.zip file:       

    The zip file will be extracted:       

    Wait until the extraction finished and the status changes to Available. Click on the module name:       

    You now see the available activities including the one we will use to process the Azure Analysis Services Database.


        


    Create Azure Automation Credential
    Now we need to create a Credential to be able to automatically login and run our PowerShell script unattended from Azure Automation.

    Navigate to Assets again and then click Credentials:
    clip_image016

    Click “Add a credential” and enter an organization account that has permissions to process your Azure Analysis Services database. Make sure you enter the User Principal Name (UPN) and not a Windows AD account. It is often the email address and may look like jorgk@yourorganizationalaccountname.com. Give the new Credential a name, I chose “adpo-auto-cred”. It will be referenced in the PowerShell script below.
    clip_image018


    Create Automation Runbook
    You can use the simple PowerShell script below to process your Azure Analysis Services database from Azure Automation. It will use the “adpo-auto-cred” credential to authenticate and will process your database using the Invoke-ProcessASDatabase SQLASCMDLETS function.
    Replace “dbname” with your database name and “server” with your server, e.g. asazure://westeurope.asazure.windows.net/yourserver and you are good to go.

    Copy/paste the script below to a Windows PowerShell Script (.ps1) file and name it “ProcessASDatabase.ps1”.


    $AzureCred = Get-AutomationPSCredential -Name "adpo-auto-cred"

     

    Add-AzureRmAccount -Credential $AzureCred | Out-Null

     

    Invoke-ProcessASDatabase -databasename "dbname" -server "server" -RefreshType "Full" -Credential $AzureCred 


    Navigate to your Azure Automation account in the Azure Portal and click “Runbooks”:
    clip_image020

    Click “Add a runbook”:
    clip_image022

    Click “Import an existing workbook” and select the ProcessASDatabase.ps1 file to import the PowerShell script as Runbook:
    clip_image024

    Runbook ProcessASDatabase is created. Click it to open it:
    clip_image026

    A Runbook must be published before you are able to start or schedule it. Click Edit:
    clip_image027

    Before publishing, test the Runbook first. Click on the “Test pane” button and then click Start:
    clip_image029

    The script executed successfully:
    clip_image031

    Connect to your Azure AS sever with SSMS and check the database properties to be sure processing succeeded:
    clip_image033

    Now publish the Runbook:
    clip_image035

    That’s it, you now have an Azure Automation Runbook that you can schedule, monitor and integrate with your other data platform related tasks!

  • Microsoft Data Platform MVP

    MVP_Logo_Preferred_Cyan300_RGB_300ppi

    What a great start of 2017! I have received my first Microsoft Data Platform MVP award:

    Mail

    Thank you to everyone who felt I deserved this! I am honored to be part of such a great group of experts now and I look forward to collaborate with everybody in 2017.

    Special thanks to my employer Macaw for always encouraging me to share my knowledge with the community for the past 10 years and giving me the space, capabilities and innovative projects to make this possible!

    Have a wonderful 2017!

    Jorg.

  • Orchestrate Azure Data Factory pipelines and other Azure Data Platform management tasks using Azure Automation

    Azure Data Factory (ADF) is a great SaaS solution to compose and orchestrate your Azure data services. It works fine to create, schedule and manage your data pipelines but it has limitations that can make it hard to use in some scenarios. The two main ones are:

    1.       Some tasks that you regularly want to perform can’t be accomplished with ADF.
    A few examples are:

    ·         Start/Pause an Azure SQL Data Warehouse

    ·         Start/Pause an Azure Analysis Services instance

    ·         Process an Azure Analysis Services cube

    ·         Manage on-premises resources

     

    2.       Triggering (running) ADF pipelines on demand.
    For most people it is hard to switch from the well-known SQL Agent jobs - in which it’s really easy to trigger SSIS processes and execute other tasks - to
    scheduling ADF pipelines. You now have to create activity windows and define data slices that are dependent on the availability of data sets. Most would like to be able to trigger an ADF pipeline either on demand or when some other task finished successfully.

    The tasks listed at point 1 - and almost everything else you can think of - can be done using Azure PowerShell with Azure Resource Manager. Microsoft does an awesome job here: every new SaaS solution has great ARM support from the start. To make use of PowerShell in Azure and more importantly to automate your scripts and processes, Azure Automation comes in to play. It makes it possible to run all your PowerShell scripts in Azure as a SaaS solution. It is an orchestrator in which you can execute all kinds of Data Platform related operations, both in Azure and on-premises as hybrid workers, enabling hybrid Data Platform orchestration scenarios.

    Triggering ADF pipelines on demand is a bit trickier. An “Execute now” command in PowerShell does not exist, which is understandable if you know how the scheduling and execution mechanism of ADF has been build. I don’t think it will be easy to change this, unless some major changes to ADF will be made. Fortunately, there is a workaround! If you ever deployed a new ADF pipeline you might have noticed that after deployment, pipelines start immediately when they have an activity window with a date/time in the past. Because it is possible to update the activity window date/time of a pipeline using PowerShell, it’s also possible to trigger a pipeline’s execution by changing its activity window date/time to some value in the past.

    In this blog post I will show how you can trigger an ADF pipeline on a daily basis and monitor its execution using PowerShell code that runs in an Azure Automation runbook. Because executing other tasks (point 1) using PowerShell is easy, it becomes possible to orchestrate advanced Azure Data Platform tasks, with the execution of an ADF pipeline as just a part of the total process.


    Azure Setup

    In this example I’ve used the Azure setup shown below. The ADF pipeline contains one simple copy activity that copies a file from one blob storage container to another.

    ·         Resource Group

    o   Blob Storage

    §  Container: input

    §  Container: output

    o   Azure Data Factory

    o   Azure Automation

     

    Azure Data Factory Setup

    I created the Azure Data Factory pipeline with the Copy Data wizard:

    clip_image001[4]

     


    I configured the pipeline to “Run regularly on schedule” with a recurring pattern of “Daily”, “every 1 day” (see the blue rectangle in the screenshot below).

    Choosing “Run once now” would set the pipeline property “PipelineMode” to “OneTime” and would disable the ADF scheduler. Initially you might think this seems to be the option you want to use as we want to trigger the pipeline ourselves on demand, but unfortunately that configuration has some limitations:

    ·         The ADF diagram view does not show one-time pipelines. This makes it impossible to monitor the pipeline using the Monitor & Manage dashboard.

    ·         One-time pipelines can’t be updated. This would block us from updating the activity window properties to trigger the pipeline.

    The “Start date time” is automatically set to yesterday with the current time. This triggers the pipeline to start running immediately after deployment; as explained ADF automatically triggers pipelines with a start date/time in the past. In this case, we want to trigger the execution ourselves, so set the date to some date in the future (see the red rectangle in the screenshot below).

    clip_image003[4]

    For this pipeline I chose to create a simple example that copies a file from one blob storage container to another. Of course, you can design your pipeline anyway you prefer, e.g. using the Copy Data wizard or Visual Studio. Just make sure you schedule it with a daily recurrence.


    Azure Automation Setup

    The next step is to create a new Azure Automation account. Within that account we need to import the Data Factory PowerShell cmdlets, because the standard set of PowerShell cmdlets in Automation do not contain any ADF related functions.

    Navigate to Assets:

    clip_image004[4]


    Click Modules:
    clip_image005[4]


     

    Browse the gallery:
    clip_image006[4]

     

    Search for “Data Factory”, select AzureRM.DataFactories and click Import:
    clip_image007[4]

    AzureRM.DataFactories should now appear in your list of available modules:
    clip_image009[4]

     

    Create Credential

    Now we need to create a Credential to be able to automatically login and run our PowerShell script unattended from Azure Automation.

    Navigate to Assets again and then click Credentials
    clip_image010[4]

     

    Click “Add a credential” and supply a user account that has the required permissions to access your Azure Data Factory. You can use the organizational account you use to login to the Azure Portal. It might look like jorgk@yourorganizationalaccountname.com or something like that.
    clip_image012[4]

    Automation Runbook
    We are now ready to create the Automation Runbook which will trigger the Azure Data Factory pipeline, by updating its Active Period to a date/time in the past.

    The script performs the following steps:

    1.       Authenticate with the Automation Credential

    2.       Connect to the Azure Data Factory

    3.       Update the pipeline active period to yesterday

    4.       Unpause the pipeline; execution will begin

    5.       Monitor the pipeline execution

    6.       Pause the pipeline

    Copy/paste the script below to a Windows PowerShell Script (.ps1) file and name it “TriggerAdfPipeline.ps1”.


     

    # Variables; modify 

    $rgn = "AzureDataPlatformOrchestration" #Resource Group Name

    $acn = "adpo-auto-cred" #Automation Credential Name

    $dfn = "adpo-adf" #Data Factory Name

    $pln = "CopyPipeline-cu6" #PipeLine Name

    $dsn = "OutputDataset-hgv" #DataSet Name (output dataset of pipeline that needs to be produced)

     

    # To test from PowerShell client, uncomment the 2 rows below and provide subscription ID

    #Login-AzureRmAccount

    #Set-AzureRMContext -SubscriptionId "00000000-0000-0000-0000-000000000000"

     

    # Authenticate

    # To test from PowerShell client, comment out the 2 rows below

    $AzureCred = Get-AutomationPSCredential -Name $acn

    Add-AzureRmAccount -Credential $AzureCred | Out-Null

     

    # Get data factory object

    $df=Get-AzureRmDataFactory -ResourceGroupName $rgn -Name $dfn

    If($df) {

           Write-Output "Connected to data factory $dfn in resource group $rgn."

    }

     

    # Create start/end DateTime (yesterday)

    $sdt = [System.DateTime]::Today.AddDays(-1) #Yesterday 12:00:00 AM

    $edt = [System.DateTime]::Today.AddSeconds(-1) #Yesterday 11:59:59 PM

     

    # Update active period to yesterday

    $apr=Set-AzureRmDataFactoryPipelineActivePeriod -DataFactory $df -PipelineName $pln -StartDateTime $sdt -EndDateTime $edt

    If($apr) {

           Write-Output "Pipeline $pln of data factory $dfn updated with StartDateTime $sdt and EndDateTime $edt."

    }

     

    # Unpause pipeline

    $rpl = Resume-AzureRmDataFactoryPipeline -DataFactory $df -Name $pln

    If($rpl) {

           Write-Output "Pipeline $pln resumed."

    }

     

    # Create arrays that hold all possible data factory slice states

    $failedStates = "Failed Validation", "Timed Out", "Skip", "Failed"

    $pendingStates = "Retry Validation", "PendingValidation", "Retry", "InProgress", "PendingExecution"

    $finishedStates = "Ready"

     

    # Wait while data factory slice is in pending state

    While (Get-AzureRmDataFactorySlice -DataFactory $df -DatasetName $dsn -StartDateTime $sdt -EndDateTime $edt | Where -Property State -In $pendingStates) {

           Write-Output "Slice status is:"

           Get-AzureRmDataFactorySlice -DataFactory $df -DatasetName $dsn -StartDateTime $sdt -EndDateTime $edt | Select -Property State

           Write-Output "Wait 15 seconds"

           Start-Sleep 15

    }

     

    # Since data factory slice is not pending (anymore), it is either failed or finished

    If(Get-AzureRmDataFactorySlice -DataFactory $df -DatasetName $dsn -StartDateTime $sdt -EndDateTime $edt | Where -Property State -In $failedStates){

           Write-Output "Slice failed."

           Get-AzureRmDataFactorySlice -DataFactory $df -DatasetName $dsn -StartDateTime $sdt -EndDateTime $edt | Select -Property State

    } ElseIf (Get-AzureRmDataFactorySlice -DataFactory $df -DatasetName $dsn -StartDateTime $sdt -EndDateTime $edt | Where -Property State -In $finishedStates) {

           Write-Output "Slice finished."

    } Else {

           Write-Output "No State found?"

           Get-AzureRmDataFactorySlice -DataFactory $df -DatasetName $dsn -StartDateTime $sdt -EndDateTime $edt | Select -Property State

    }

     

    # Pause pipeline

    $spr = Suspend-AzureRmDataFactoryPipeline -DataFactory $df -Name $pln

    If($spr){

           Write-Output "Pipeline $pln paused."

    }

     


     

    Navigate to your Azure Automation account in the Azure Portal and click “Runbooks”:

    clip_image013[4]

     

    Click “Add a runbook”:

    clip_image014[4]

     

    Now select the TriggerAdfPipeline.ps1 file to import the PowerShell script as Runbook:

    clip_image016[4]

     

    Runbook TriggerAdfPipeline is created. Click it to open it:

    clip_image018[4]

     

    A Runbook must be published before you are able to start or schedule it. Click Edit:

    clip_image020[4]

     

    Before publishing, test the Runbook first. Click on the Test pane button and then click Start:

    clip_image022[4]

     

    The pipeline slice ran successfully:

    clip_image024[4]

     

    Now publish the Runbook:

    clip_image026[4]

     

    You can now create webhooks, schedule and monitor your Runbook jobs:

    clip_image028[4]

     

     

    Extending the script with additional tasks

    It’s easy to perform other Azure Data Platform related tasks using PowerShell and Automation Runbooks. Because the script I created waits for the ADF pipeline to end, you can easily execute other tasks before or after the ADF’s execution. The following examples will most likely be useful:

    ·         Pause/Start Azure SQL Data warehouse

    o   Documentation

    o   PowerShell Gallery

     

    ·         Pause/Start Azure Analysis Services

    o   PowerShell Gallery

     

    ·         Process Azure Analysis Services

    o   Under investigation, coming soon…

     

  • Azure Data Lake Analytics U-SQL decryption possibilities

    The following information applies to the situation as of October 2016.

    One of the things that make U-SQL so powerful is C# integration. It gives you the possibility to create your own C# classes and methods and then use them in your U-SQL scripts. Recently we executed a couple of big data projects at Macaw using the Azure Data Lake Store (ADLS) and Azure Data Lake Analytics (ADLA). During one of these projects we had to decrypt some custom Rijndael (AES) encrypted JSON content in the ADLS and store the result as a CSV. These CSV files would then be encrypted by default by enabling encryption at rest for the entire ADLS.

    We were able to support two decryption scenarios: decrypt the entire file or decrypt a part of a file (particular sensitive string values). I will go into detail about each scenario.

    Scenario 1: Decrypt entire files
    Because we were dealing with JSON files and U-SQL does not offer a native JSON extractor yet, we used the JsonExtractor C# sample that is shared on GitHub by Michal Rys
    .
    We modified the Extract method of the JsonExtractor class to include a CustomDecryptor that will decrypt the input.BaseStream (= entire file). Next to that we set AtomicFileProcessing to true.


    using System.IO;

    using System.Linq;

    using System.Collections.Generic;

    using Microsoft.Analytics.Interfaces;

    using Newtonsoft.Json.Linq;

    using Custom.Framework.Utilities.Encoding;

     

    namespace CustomADLALibrary

    {

     

        [SqlUserDefinedExtractor(AtomicFileProcessing = true)]

        public class CustomExtractor : IExtractor

        {

            /// <summary/>

            private string rowpath;

     

            /// <summary/>

            public CustomExtractor(string rowpath = null)

            {

                this.rowpath = rowpath;

            }

     

            /// <summary/>

            public override IEnumerable<IRow> Extract(IUnstructuredReader input, IUpdatableRow output)

            {

     

                CustomDecryptor Decryptor = new CustomDecryptor();

                using (var reader = new StreamReader(input.BaseStream))

                {

                    var jsonAsString = reader.ReadToEnd();

                    var decryptedString = Decryptor.Decrypt(jsonAsString);

                    var root = JToken.Parse(decryptedString);

     

                    foreach (JObject o in SelectChildren(root, this.rowpath))

                    {

                        this.JObjectToRow(o, output);

     

                        yield return output.AsReadOnly();

                    }

                }

            }

     

            /// <summary/>

            private static IEnumerable<JObject> SelectChildren(JToken root, string path)

            {

                if (!string.IsNullOrEmpty(path))

                {

                    return root.SelectTokens(path).OfType<JObject>();

                }

                var o = root as JObject;

                if (o != null)

                {

                    return new[] { o };

                }

                return root.Children().OfType<JObject>();

            }

     

            /// <summary/>

            protected virtual void JObjectToRow(JObject o, IUpdatableRow row)

            {

                foreach (var c in row.Schema)

                {

                    JToken token = null;

                    object value = c.DefaultValue;

     

                    if (o.TryGetValue(c.Name, out token) && token != null)

                    {

                        value = JsonFunctions.ConvertToken(token, c.Type) ?? c.DefaultValue;

                    }

                    row.Set<object>(c.Name, value);

                }

            }

        }

    }


    The modified Extract method is using a CustomDecryptor object in which the actual decryption takes place:


    using System;

    using System.IO;

    using System.Security.Cryptography;

     

    public class CustomDecryptor

    {

     

        private readonly byte[] _key = { -- }; //YourKey

        private readonly byte[] _iv = { -- }; //YourIv

     

        private readonly ICryptoTransform _decryptor;

     

        public CustomDecryptor()

        {

            var myRijndael = new RijndaelManaged { Key = this._key, IV = this._iv, Padding = PaddingMode.PKCS7 };

     

            this._encryptor = myRijndael.CreateEncryptor(myRijndael.Key, myRijndael.IV);

            this._decryptor = myRijndael.CreateDecryptor(myRijndael.Key, myRijndael.IV);

        }

     

        public string Decrypt(string input)

        {

            // Create the streams used for decryption.

            using (MemoryStream msDecrypt = new MemoryStream(Convert.FromBase64String(input)))

            {

                using (CryptoStream csDecrypt = new CryptoStream(msDecrypt, _decryptor, CryptoStreamMode.Read))

                {

                    using (StreamReader srDecrypt = new StreamReader(csDecrypt))

                    {

                        return srDecrypt.ReadToEnd();

                    }

                }

            }

        }

    }


    From U-SQL we can now easily decrypt entire files. Make sure you publish the Microsoft.Json assembly and your custom assembly (CustomADLALibrary) that contains the modified JsonExtractor that is used before you try to execute the U-SQL job. See my previous blog post for instructions on how to programmatically register U-SQL assemblies using PowerShell.


    REFERENCE ASSEMBLY [Newtonsoft.Json];
    REFERENCE ASSEMBLY [CustomADLALibrary];

     

    DECLARE @INPUT_FILE string = @"somePath/encryptedFile.json";

    DECLARE @OUTPUT_FILE string = @"someOtherPath/decryptedFile.csv";

     

    @decryptedExtract =

        EXTRACT column1 string,

                column2 string

        FROM @INPUT_FILE

        USING new CustomADLALibrary.CustomExtractor();

    @result =

        SELECT *

        FROM @decryptedExtract;

    OUTPUT @result

    TO @OUTPUT_FILE

    USING Outputters.Csv(quoting : false);


    Scenario 2: Decrypt string values
    In this case the files were not completely encrypted but only particular sensitive string values in the files. Therefore the custom extractor was not needed and the decryption could take place directly in the U-SQL SELECT statement as shown in the example below.


    REFERENCE ASSEMBLY [CustomADLALibrary];

     

    DECLARE @INPUT_FILE string = @"somePath/stringEncryptedFile.csv";

    DECLARE @OUTPUT_FILE string = @"someOtherPath/decryptedFile.csv";

     

    @extract =

        EXTRACT column1 string,

                column2 string

        FROM @INPUT_FILE

        USING Extractors.Csv();

     

    @result =

        SELECT column1 AS unencryptedColumn,

               new WeakObjectEncryptor().Decrypt(column2) AS decryptedColumn

        FROM @extract;

     

    OUTPUT @result

    TO @OUTPUT_FILE

    USING Outputters.Csv(quoting : false);


    These decryption examples show one of the many possibilities of the new powerful new U-SQL language. It’s quite easy to use and understandable for both people with a SQL background and C# programmers. In this case I worked together with my colleague Luuk Gortzak who helped with the C# scripts. Credits to Luuk for helping me out!

  • Script Azure Data Lake Analytics assembly deployments to U-SQL Catalog

    Registering your custom assemblies using Visual Studio Data Lake Tools is easy, just right click your U-SQL C# Class Library project, click “Register Assembly”, fill in the details and your assembly gets published and created in your U-SQL Catalog.

    But what if you want to script this process to automate your deployments? That is not documented yet and as I follow the principle that we should be able to rebuild the whole environment with a press on the button, I found out how to do this.

    Step 0 – Prerequisites
    Install latest version of Azure PowerShell. The PowerShell script below won’t work with an older version.
    http://aka.ms/webpi-azps

    Step 1 – Build your dll from Visual Studio.
    We need the dll of your custom assembly, create it by right clicking your U-SQL Class Library project and choose to Build. You can see where the dll has been created on disk in the output window.

    Step 2 – Create a U-SQL job file that will create the assembly.
    Modify the U-SQL script below and store it in a file somewhere locally on disk. Name it createAssembly.usql


    USE DATABASE yourUsqlCatalog;

     

    CREATE ASSEMBLY IF NOT EXISTS yourCSharp

    FROM @"/Assemblies/yourCSharp.dll";


     

    Step 3 – Upload the assembly to your ADLS and execute the U-SQL job using PowerShell.
    Enter values for the variables in the PowerShell script below and execute it.


    #Variables; modify 

    $dataLakeStoreName = "yourAdlsAccount"

    $dataLakeAnalyticsName = "yourAdlaAccount"

    $assemblyLocalPath = "c:\yourCSharp.dll" #step 1

    $usqlScriptLocalPath = "c:\createAssembly.usql" #step 2

    $assemblyAdlsPath = "/Assemblies/yourCSharp.dll" #Assemblies folder will be created if not exists

    $azureSubscriptionId = "00000000-0000-0000-0000-000000000000"

     

    #Login (login pop up appears)

    Login-AzureRmAccount

     

    #Connect to the Azure Subscription in which your ADLA Catalog exists 

    Set-AzureRMContext -SubscriptionId $azureSubscriptionId 

     

    #Import dll to ADLS

    Import-AzureRmDataLakeStoreItem -AccountName $dataLakeStoreName -Path $assemblyLocalPath -Destination $assemblyAdlsPath

     

    #Submit new job to ADLA (createAssembly.usql)

    $job = Submit-AzureRmDataLakeAnalyticsJob -Name "Create Assembly" -AccountName $dataLakeAnalyticsName –ScriptPath $usqlScriptLocalPath -DegreeOfParallelism 1

     

     While (($t = Get-AzureRmDataLakeAnalyticsJob -AccountName $dataLakeAnalyticsName -JobId $job.JobId).State -ne "Ended"){

         Write-Host "Job status: "$t.State"..."

         Start-Sleep -seconds 10

     }

     

     Get-AzureRmDataLakeAnalyticsJob -AccountName $dataLakeAnalyticsName -JobId $job.JobId


     

    Step 4 – Validate.
    Validate if your dll is uploaded to a folder in your ADLS called “Assemblies”, and next, if your assembly is created in your U-SQL Catalog with Visual Studio Server Explorer (Azure).

    Step 5 – Reference your new assembly in your U-SQL scripts.
    You can now start using your assembly by referencing it in the first lines of code in your U-SQL script.


    USE DATABASE yourUsqlCatalog;

     

    REFERENCE ASSEMBLY yourCSharp;

    ...


     

  • Use AdlCopy to generate U-SQL jobs that copy data between Azure Blob Storage and Azure Data Lake Store

    AdlCopy is a command-line tool (it runs on the user’s machine) that allows you to copy data from Azure Storage Containers or Blobs into Azure Data Lake Store.

    You can use the AdlCopy tool in two ways:

    • Standalone, where the tool uses Data Lake Store resources to perform the task. This can be a cheap and more ad-hoc/manual option to move data from blob storage to ADLS compared to using Data Factory or ADLA U-SQL jobs. I assume only ADLS transaction costs will be made if your blob storage resides in the same region as your ADLS.
    • Using a Data Lake Analytics account, where the units assigned to your Data Lake Analytics account are used to perform the copy operation. You would typically use this option when the data to be moved is in the range of gigabytes and terabytes, and you want better and predictable performance throughput.

    Another advantage is that on execution the AdlCopy tool generates an ADLA job that contains U-SQL code for each source file from your blob storage that will be copied to the ADLS. You can reuse the code of this job to schedule execution of the copy process on a regular basis. Requirement for this scenario is that your blob storage source files have static paths and file names, as the U-SQL code of the generated job contains hardcoded references to the source files locations.

    Download AdlCopy: http://aka.ms/downloadadlcopy

    Instructions: https://azure.microsoft.com/en-us/documentation/articles/data-lake-store-copy-data-azure-storage-blob/

    Copy all data from blob storage container to ADLS folder using a generated U-SQL job that can be reused:

    1. Start Command Prompt

    2. Navigate to dir where AldCopy is installed:

    %HOMEPATH%\Documents\adlcopy

    3. Run the following command (standalone, copy is performed by ADLS):

    AdlCopy.exe /Source https://yourblob.blob.core.windows.net/yourcontainer/ /Dest swebhdfs://yourdatalakestore.azuredatalakestore.net/yourfolder/ /SourceKey -

    4. Data is copied:

    clip_image001

     

    5. Run the following command to execute the process using the Data Lake Analytics service:

    AdlCopy.exe /Source https://yourblob.blob.core.windows.net/yourcontainer/ /Dest swebhdfs://yourdatalakestore.azuredatalakestore.net/yourfolder/ /SourceKey - /Account yourDataLakeAnalyticsAccount /Units 2

    6. Now because the ADLA service has been used a U-SQL job was generated and executed, which can be seen in the azure portal:

    clip_image003

     

    7. The U-SQL code shows that for each file in the blob storage EXTRACT from blob storage and OUTPUT to ADLS statements were created:

    clip_image005

     

    8. You can now choose “Duplicate Script”, delete the DROP ASSEMBLY IF EXIST and CREATE ASSEMBLY statements which you don’t need again, and save your new job. You can now execute/schedule the copy process again without the need of the AdlCopy tool.

    clip_image006

  • Setup Azure Data Lake Analytics federated U-SQL queries to Azure SQL Database

    One of the major value propositions of U-SQL is that it allows to query data where it lives. For external systems, such as Microsoft Azure SQL Database, this is achieved with federated queries against data sources.

    image

     

     

    In order to query these external data sources, a data source object has to be created and referenced that abstracts the connection information as well as information about its capabilities to execute query expressions passed and translated from U-SQL to the its local query engine.

     

     

    Prerequisites

    ·         An Azure Subscription

    ·         Azure Data Lake Store (ADLS)

    ·         Azure Data Lake Analytics (ADLA)

    ·         Azure SQL Database (ASQLDB) or Azure SQL Data warehouse (ASQLDWH) with SQL login/password

    ·         Visual Studio 2015. Optional, to create and execute U-SQL queries, this can also be done in the Azure portal.

    https://www.visualstudio.com/products/free-developer-offers-vs

    ·         Azure Data Lake Tools for Visual Studio 2015

    https://www.microsoft.com/en-us/download/details.aspx?id=49504

    ·         Azure PowerShell

    http://aka.ms/webpi-azps

    ·         RX Permissions on your ADLS (data level) to be able to create the Catalog Secret

    o   Navigate to ADLS in Azure Portal

    o   Click Data Explorer

    o   Click Access

    o   Validate you have Read/Execute permissions.

    ·         Allow IP range in the ASQLDB server firewall for the ADLA services that fire the U-SQL queries

    o   Navigate to ASQLDB server in Azure Portal

    o   Click Settings

    o   Click Firewall

    o   Create new rule with range 25.66.0.0 to 25.66.255.255

     

     

    Steps:

    1.       Create ADLA Database using a U-SQL query:

    DROP DATABASE IF EXISTS YourADLADatabase;

    CREATE DATABASE YourADLADatabase;

     

    2.       Create Catalog Secret in ADLA Database which contains the password for the SQL login and connection string for the ASQLDB database using the Azure PowerShell script below:

    #Login (login pop up appears)

    Login-AzureRmAccount

    #Show your available Azure Subscriptions

    Get-AzureRmSubscription

    #Connect to the Azure Subscription in which your ADLA Database exists 

    Set-AzureRMContext -SubscriptionId 00000000-0000-0000-0000-000000000000

     

    #$passwd: password for ASQLDB / ASQLDWH which you want to federate

    #-Account: ADLA account name

    #-DatabaseName: Data Lake Analytics Database name

    #-Host: Host of ASQLDB / ASQLDWH 

    $passwd = ConvertTo-SecureString "YourPassword" -AsPlainText -Force

    $mysecret = New-Object System.Management.Automation.PSCredential("YourASQLDB_Secret", $passwd)

    New-AzureRmDataLakeAnalyticsCatalogSecret -Account "youradlaaccount" -DatabaseName "YourADLADatabaseName" -Secret $mysecret -Host "yourasqldb.database.windows.net" -Port 1433

     

    3.       Create CREDENTIAL with IDENTITY that matches the AzureRmDataLakeAnalyticsCatalogSecret name as used in the PowerShell script (YourASQLDB_Secret) in ADLA Database using the U-SQL query below:

    //Connect to ADLA Database

    USE DATABASE YourADLADatabaseName;

     

    //Create CREDENTIAL

    //USER_NAME: ASQLDB Username

    //IDENTITY: ADLA Catalog Secret, must match name chosen in PowerShell script

    CREATE CREDENTIAL IF NOT EXISTS YourASQLDB_Secret WITH USER_NAME = "YourASQLDB_Username", IDENTITY = "YourASQLDB_Secret";

     

    4.       Create Data Source in ADLA Database with a reference to the ASQLDB using the U-SQL query below:

    // Create External Data source on AZURESQLDB

    CREATE DATA SOURCE IF NOT EXISTS ASQL_YOURDB FROM AZURESQLDB WITH

           ( PROVIDER_STRING = "Database=YourASQLDB;Trusted_Connection=False;Encrypt=True"

           , CREDENTIAL = YourASQLDB_Secret

           , REMOTABLE_TYPES = (bool, byte, sbyte, short, ushort, int, uint, long, ulong, decimal, float, double, string, DateTime)

           );

     

    5.       Create an External Table in ADLA Database based on the Data Source using the U-SQL query below:

    // CREATE EXTERNAL TABLE

    // External tables are optional. You can refer to data source tables in queries directly (Lazy Metadata loading):

    // FROM EXTERNAL ASQL_PATIENTS LOCATION "dbo.sometable"

    CREATE EXTERNAL TABLE someExternalTable (

           [someColumn] string

    ) FROM ASQL_YOURDB LOCATION "dbo.someTable";

     

    6.       Query the federated external ASQLDB table and output result to file using the U-SQL query below:

    @query =

        SELECT someColumn

        FROM someExternalTable;

     

     OUTPUT @query TO "/Output/file.csv"

     USING Outputters.Csv();

  • See you at the PASS Summit 2015 in Seattle!

    My employer Macaw gave me the opportunity to attend the SQL PASS Summit this year, so I am traveling to Seattle tomorrow! I’m excited to be there this year because there are so many new technologies being pushed in our direction like the Cortana Analytics suite and SQL Server 2016. At the moment I am already involved in some PoC/pilot projects for some of our customers regarding these new technologies, so I can apply my new knowledge in the field the moment I come back to Holland.

    Regarding to the community, I’ll hope to meet some people I have been in contact with on internet for quite some time like some of the SQL Server MVPs, but of course it’s also a pleasure to meet some new people, so come to say “Hi” or send me message.

    I created a twitter account today which I will use to share some knowledge during the PASS, so follow me right now. See you there!

  • SharePoint Server 2016 IT Preview from a BI point of view

    SharePoint Online and Power BI have a lot of attention and focus lately, but what about the on-premises version of SharePoint that’s used in almost all serious BI solutions today? Well SharePoint Server 2016 IT Preview has just been released by Microsoft, by many stated as the last on-premises version of SharePoint ever. In this blog post I will focus on the BI features of SharePoint Server 2016.

    So what kind of BI features are improved and what’s new? It’s better start with the question: which BI features will remain available in SharePoint 2016, because a lot of BI features will be deprecated or removed.

    An overview of the main SharePoint BI capabilities and their availability in SharePoint 2016:

    · Excel Services

    Excel Services will no longer be available, it will completely move to Excel Online in Office Online.

    · PowerPivot for SharePoint and Power View add-ins

    These BI features are not available in the current release of the SharePoint 2016 IT Preview. Microsoft states they will implement them later this year, but when and in what form is uncertain.

    · Reporting Services Integrated Mode

    There is no news about Reporting Services Integrated Mode, but as I wrote in my last blog post, Reporting Services gets a big update in SQL Server 2016. One of the new possibilities will be the integration of Reporting Services and Power BI. So why still integrate with SharePoint?

    With the release of the new Power BI Desktop (former Power BI designer) a lot of Power Pivot functionality move to Power BI. For example, you can now import Excel Power BI artifacts (Data Model, Queries, Power View) into a Power BI Desktop file. This new functionality, together with the Reporting Services/Power BI integration that’s coming and the news that Excel Services will no longer be available in SharePoint 2016, raises the question if BI solutions will continue to need SharePoint in the future. All functionality seems to be moving to Power BI!

    On the other hand, Microsoft recently announced a collaboration with Pyramid Analytics which will offer the possibility to deploy a Power BI Desktop file to an on-premises Pyramid Analytics server, bringing Power BI on-premises. Pyramid Analytics offered SharePoint integration in the past, so maybe they will integrate again with SharePoint 2016, which results in a Power BI and SharePoint 2016 integration, making SharePoint more important than ever for on-premises solutions.

    It’s clear there is a lot of uncertainty about the future of SharePoint as part of the Microsoft BI platform! To be continued…

  • BI on your terms with SQL Server 2016

    The last few years Microsoft's strategy was all about cloud first (or cloud only?), releasing new BI products and updates to existing products to the cloud in high pace without almost any investments in on-premises BI. In 2015 Microsoft seems to change its course, they now aim more on the enabling of hybrid scenarios, investing a lot in both cloud (Power BI/Azure) and on-premises with SQL Server 2016.
    Microsoft’s message regarding BI for 2015/2016 is:  “BI on your terms”.

    BI on your terms means leveraging up-to-date possibilities for one or a combination (hybrid) of the following architectures:

    • Cloud with Azure and Power BI

    • On-Premises with SQL Server 2016

    • Server driven or Self-Service

    To be able to offer quality hybrid architectures Microsoft invests a lot in the on-premises BI suite with SQL Server 2016 and they have announced to keep investing in it the coming years. So not only cloud first like we have seen in previous years, but more on hybrid possibilities, and if you desire on-premises only.

    For the first time in many years an exciting version of SQL Server is coming in terms of BI. The main topics are:

    • Hybrid BI (Cloud/On-Premises)

    • Modern Reports

    • Enhanced Analysis

    • Mobile BI


    Below is an overview of the new BI related features per SQL Server 2016 service or product. As the length of this list shows, SQL Server 2016 will be a massive BI version!!

    Analysis Services Tabular

    • Enhanced modeling capabilities in the semantic layer

      • Many-to-many relationships

      • BI Directional cross filtering. This means you can not only filter on the 1 side of a 1 to many relationship in your tabular model, but also on the many side. For example, two connected tables, Sales à Product:

        • Product: product, product category

        • Sales: sales date, connection to product table

          Now select products sold filtering on sales date(many side) while also filtering on product category (1 side). This is not possible in today’s version of SSAS tabular.

    • Time intelligence

      • Date/time columns are automatically converted to rich date/time tables starting from the column’s MIN date till the MAX date found

    • New DAX functions

      • A lot of new functions that at the moment require quite complex formulas like present time, date difference, percentile, product, geomean, median, etc.

    • Performance improvements

      • For end users

        • Query engine optimized

      • For developers

        • Metadata operations; modeling related operations are much faster

      • For data processing

        • Parallel partition processing

    • Expose on-premises tabular models in the cloud (hybrid) à Power BI feature, possible already today with SQL Server 2012.


    Analysis Services Dimensional

    • Netezza as a Data Source (Netezza Data Warehouse | IBM - NDM Technologies)

    • Performance improvements

      • Unnatural hierarchies

      • Distinct counts

      • Other performance improvements in areas where multidimensional is not performant at the moment

    • DBCC (DataBase Check Consistency) support. Checks the logical and physical integrity of objects in the specified database.

    • Expose on-premises multidimensional cubes in the cloud with Power BI (hybrid)


    SQL Server Database Engine

    • Integration of R analytical engine, predictive analytic capabilities via T-SQL queries

    • PolyBase available without the need of PDW, makes it possible to query both structured relational SQL, and unstructured Hadoop data through T-SQL statements

    • Data encryption for stored data and data in motion

    • Row-level security

    • Updates to the in-memory OLTP engine, for example updateable in-memory nonclustered columnstore indexes

    • Parsing and storing native JSON data

    • XEvents-based monitoring in Management Studio


    Reporting Services

    • New look and feel and possibility to apply themes and branding using CSS

    • New visualizations, chart types like tree maps and sun bursts

    • Improved flexible parameter panel with support for:

      • Autocomplete

      • Search

      • Hierarchical tree display

    • Runs in all modern browsers on both desktops as tablets (any device)

    • Integration of R analytical engine

    • Power Query as a data source

    • Pin on-premises SSRS reports to Power BI Dashboards (hybrid)


    Integration Services

    • High Availability support

    • Power Query integration

    • Azure Data Factory integration (hybrid)

      • Execute on-premises SSIS packages from Azure Data Factory

      • Azure Data Factory data flow task

      • Azure storage connector

      • Azure commandlets

    • OData 4.0 support

    • Hadoop File System (HDFS) support

    • JSON support

    • New Oracle/Teradata connector (4.0)

    • Incremental deployment options

    • Custom logging levels

    • SSIS package templates to reuse ETL code


    Mobile BI

    • In the cloud with Power BI

      • Power BI App for Windows Phone (coming soon) and iOS

    • On-premises with Datazen Server

      • Now available for free for SQL Enterprise Edition customers (2008 or later)

      • All major platforms: Windows Phone, Android, iOS

      • Beautiful interface and data visualizations

      • Optimizable for Phone, Tablet and Laptop


    SharePoint vNext integration

    • Edit Mode of PowerPivot Excel workbooks in browser

    • Support for Excel vNext (Office 2016) DAX functions


    Master Data Services

    • Improved performance for large models

    • Row-level compression per entity

    • Improved user interface

    • Configurable retention settings

    • Enhanced security possibilities for read, write, delete and create operations and support for multiple system administrators with specific permissions

    • Excel Add-in is 15 times faster and is updated to support bulk entity based staging operation


    Visual Studio

    • Database and BI project types merged into one Visual Studio

    • New scripting language for tabular models. Currently tabular models are wrapped into multidimensional constructs and when you deploy it will be reverse engineered to the tabular model. The new native language for tabular will be easy to understand, modify and deploy.

    • SSIS designer supports previous versions of SQL Server


    Of course there is still also a lot of exiting news coming from the cloud side of Microsoft BI, for example the Azure Data Lake is announced, following the principles of my blogpost about the relational data lake. You can expect a post about the Azure Data Lake on this blog soon!


    P.S. Don’t forget to suggest and vote for feature requests for SQL Server yourself at:  http://aka.ms/SqlBiUserVoice

     

  • Power BI Analysis Services Connector Security

    The Power BI Analysis Services Connector can be used to connect from the Power BI service to your on premises tabular Analysis Services models.

    In this blogpost I will look into the security and authentication possibilities that the SSAS Connector offers. Dynamic row-level security based on the Power BI user name to an on premises SSAS Tabular model is possible, I will show how you can set this up and how it works in the background.

    If you want to know how to install and configure the SSAS Connector follow this excellent support guide.

    The SSAS Connector has some similarities with the BI Semantic Model Connection that’s available for SharePoint. This connection can connect either with Kerberos or by using the EffectiveUserName property, which matches the SSAS Connector:
    - A connection to a tabular model is made with stored credentials that require administrator permissions on the SSAS server.
    - The EffectiveUserName parameter is send to SSAS to impersonate the current user. Only SSAS administrators have permission to connect using EffectiveUserName.

    To investigate exactly how the connector works I have created a tabular model with a very simple underlying database model. The Sales table will be secured based on dynamic row-level security on the content of the EffectiveUserName column in the Department table.

    DataModelSQL


    The tabular model contains the dataset shown in the Excel pivot below. In an on premises situation my EffectiveUserName would be MACAW\jorgk, but because I login to Power BI with
    jorgk[at]macaw.nl I have created entries for both. Let’s see which one will be used later on.

    DataModel 


    I have created a SSAS Security Role with read permissions on my Active Directory account:

    image

    Role1

    Row-level security has been set on the EffectiveUserName column of the Department table:
    Role2


    The SSAS Connector was installed on the server by my colleague Dave, who has administrator permissions on the SSAS Server (inherited from the local Administrators group). For purpose of the demo we made sure my domain account was not in the local admin group and not in the SSAS Server admin group. Dave’s credentials will be the stored credentials that are used to connect to the tabular model from the SSAS Connector, passing the EffectiveUserName property just like a BISM connection is able to do as I explained before.

    Now I logged in to Power BI, created a report and the security is working, my report is filtered and shows only my data:

    14_securedPowerBi


    During the creation of my report Dave ran a Profiler trace to see what’s happening in the background. As we expected my connection came in under the account of Dave (blue) but with my account as EffectiveUserName (red):

    Profiler

    It’s interesting to see my EffectiveUserName is JorgK[at]macaw.nl but the Power BI report I’ve created shows the data of MACAW\jorgk! Why didn’t it show the data of the row that equals exactly to my EffectiveUsername JorgK[at]macaw.nl? Well that’s because SSAS does not authorize the user based on the textual value of the EffectiveUserName property. Instead it calls to AD to check if the Power BI username (based on the mail address) exists in AD, if this is the case the AD Account is returned and the user will be authorized with it.

    To wrap up this entire process I’ve created the following diagram:

    SSAS Connector Overview 

    1. From the Power BI tenant I connect to the SSAS Connector using my Power BI Account.
    2. The SSAS Connector connects to the tabular model using the stored credentials, in this case of my colleague Dave who has administrator permissions on the model and therefore the permissions to pass on my Power BI account as EffectiveUserName.
    3. The EffectiveUserName is checked in Active Directory. This will only be possible if DirSync has been set up, otherwise my Power BI account that comes from Windows Azure Active Directory will not be mapped to the on premises Active Directory.
    4. After Active Directory authorizes me, my on premises AD Account is sent to the tabular model.
    5. Row-level security is applied as configured in the SSAS Role.

     

    Next thing to do was of course to share the Power BI report in a dashboard to another colleague. As a test I shared it with my colleague Martijn (blue) who did not have any permissions on the tabular model at al. Unfortunately it seems the credentials of the user who shares the dashboard are stored in that dashboard/report and will be used to make the connection to the underlying SSAS data source. Martijn’s report was identical to mine and the SQL Profiler showed us that indeed the connection was again made with the admin account of Dave, and my account passed in the EffectiveUserName (red).

    MartijnAsJorgAfterSharing


    Conclusion

    It’s great to see we can finally connect to on premises models from Power BI using row-level security, which was not possible with the Data Management Gateway. This makes a hybrid Cloud/On Premises architecture a feasible option to implement. Unfortunately we are not there yet because it isn’t possible to impersonate a user after sharing a dashboard which is clearly not the functionality that’s desired. Let’s hope Microsoft will fix this in the coming months! Until then, be sure to advice your customers about this sharing security issue.

  • Relational Data Lake

    What is a Data Lake?
    Pentaho CTO James Dixon is credited with coining the term "Data Lake". As he describes it in his blog entry, "If you think of a Data Mart as a store of bottled water – cleansed and packaged and structured for easy consumption – the Data Lake is a large body of water in a more natural state. The contents of the Data Lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples."

    These days, demands for BI data stores are changing. BI data consumers not only require cleansed and nicely modeled data, updated on a daily basis, but also raw, uncleansed and unmodeled data which is available near real-time. With new and much more powerful tooling like Power BI, users can shape and cleanse data in a way that fits their personal needs without the help of the IT department. This calls for a different approach when it comes to offering data to these users.

    BI data consumers also demand a very short time-to-market of new data, they don’t want to wait for a few months until data is made available by a BI team, they want it today. The raw uncleansed form of data in a Data Lake can be loaded very quickly because it’s suitable for generated data loading technologies and replication, which makes this short time-to-market possible. Once users have discovered the data and have acquired enough insights that they want to share with the entire organization in a conformed way, the data can be brought to traditional Data Warehouses and cubes in a predictable manner.

    Furthermore there is rise in the presence of unstructured and or semi-structured data and the need to have “big data” available for adhoc analyses. To store and analyze these forms of data new technologies and data structures are required.

    When the Data Lake comes in place a lot of data streams from sources into the “lake” without knowing up front if it is eligible for answering business questions. The data can’t be modeled yet, because it’s not clear how it will be used later on. Data consumers will get the possibility to discover data and find answers before they are even defined. This differs fundamentally from the concept of a Data Warehouse in which the data is delivered through predefined data structures, based on relevant business cases and questions.

    Technology
    From a technology view, a Data Lake is a repository which offers storage for large quantities and varieties of both unstructured, semi-structured and structured data derived from all possible sources. It can be formed by multiple underlying databases which store these different structured forms of data in both SQL and NoSQL technologies.
    20141217_JK_Technologies

    For the semi-structured/unstructured side of data which is used for big data analytics, Data Lakes based on Hadoop and other NoSQL technologies are common. For the semi-structured/structured data, SQL technologies are the way to go.

    In this blog post I will describe the semi-structured/structured, relational appearance of the Data Lake in the form of a SQL Server database: The Relational Data Lake.
    RelationalDataLake2

    Extract Load (Transform)
    Data in a Data Lake is in raw form. Transformations will not be performed during loading and relationships and constraints between tables will not be created which is the default for transactional replication and keeps the loading process as lean and fast as possible. Because of the lack of transformations, movement of the data follows the Extract-Load-(Transform) (EL(T)) pattern instead of the traditional E-T-L. This pattern makes loading of data to the Data Lake easier, faster and much more suitable to perform using replication technologies or generated SSIS processes, for example with BIML. This creates a very attractive time-to-market for data which is added to the Data Lake. Latency of data is as low as possible, preferable data is loaded in near real-time: data should stream into the lake continuously.

    Transformations take place after the data is loaded into the Data Lake, where applicable. Cosmetic transformations like translations from technical object and column names to meaningful descriptions which end users understand or other lightweight transformations can be performed in new structures (like SQL views) that are created inside the Data Lake.

    Unlike Data Marts and Data Warehouses, which are optimized for data analysis by storing only the required attributes and sometimes dropping data below the required level of aggregation, a Data Lake always retains all attributes and (if possible) all records. This way it will be future proof for solutions that will require this data in a later moment in time or for users that will discover the data.

    Accessing data
    Data is made accessible through structures which can either be accessed directly, or indirectly through the exposure as OData Feeds. These structures are secured and are the only objects end users or other processes have access to. The feeds can be accessed with any tool or technology that is best suited to the task at any moment in time, for example using Power BI tooling like Excel PowerPivot/PowerQuery.

    We normally create SQL Views in which security rules and required transformation are applied.

    The Data Lake also acts as a hub for other repositories and solutions like Data Warehouses and Operational Cubes.

    Master Data
    Success of the Data Lake depends on good master data. When end users discover new raw data from the Data Lake they need to be able to combine it with high quality master data to get proper insights. Therefore a master data hub is a must have when a Data Lake is created. This hub should just be a database with master data structures in it, master data management on this data is preferable but not required. The master data hub should be a standalone solution, independent from the other BI solutions, as master data isn’t part of these solutions but is only used as data source. It should be sourced independently too, preferable using master data tooling or using tools like SSIS. Just like with data from the Data Lake, master data should also only be accessed through structures which can also be exposed as OData Feeds.

    Next to the purpose of combining master data with data from the Data Lake, the master data can be used as source for other BI solutions like Data Warehouses. In there, the master data structures are often used as Data Warehouse Dimensions. To prevent the unnecessary duplicate loading of master data in the Data Warehouse that already exists in the master data hub, it can be a good choice to leave the master data out of the Data Warehouse Dimensions. Only the business keys are stored which can be used to retrieve the data from the master data hub when required. This way the Data Warehouse remains slim and fast to load and master data is stored in a single centralized data store.

    Architecture
    The entire Data Lake architecture with all the described components are fit in the model below. From bottom to top the highlights are:

    • Extract/Load data from the sources to the Data Lake, preferably in near real-time.
    • The Data Lake can consist of multiple SQL (and NoSQL) databases.
    • Transformations and authorizations are handled in views.
    • The Data Lake acts as hub for other BI solutions like Data Warehouses and Cubes.
    • The master data hub is in the center of the model and in the center of the entire architecture. It’s loaded as a standalone solution and isn’t part of any of the other BI solutions.
    • Traditional BI will continue to exist and continue to be just as important as it has always been. It will be sourced from the Data Warehouses and cubes (and master data hub).
    • The Discovery Platform with its new Power BI tooling is the place where “various users of the lake can come to examine, dive in, or take samples.” These samples can be combined with the data from the master data hub.

    20141211JK_Data Lake BI Architecture

    Data Lake Challenges
    Setting up a Data Lake comes with many challenges, especially on the aspect of data governance. For example it’s easy to create any view in the Data Lake and lose control on who gets access to what data. From a business perspective it can be very difficult to deliver the master data structures that are so important for the success of the Data Lake. And from a user perspective wrong conclusions can be made by users who get insights from the raw data, therefore the Data Warehouse should still be offered as a clean trusted data structure for decision makers and a data source for conformed reports and dashboards.

    Summary
    The Data Lake can be a very valuable data store that complements the traditional Data Warehouses and Cubes that will stay as important as they are now for many years to come. But considering the increased amount and variety of data, the more powerful self-service ETL and data modeling tooling which appear and the shortened required time-to-market of near real-time data from source up and to the user, the Data Lake offers a future proof data store and hub that enables the answering of yet undefined questions and gives users personal data discovery and shaping possibilities.

    Thanks go to my Macaw colleague Martijn Muilwijk for brainstorming on this subject and reviewing this blog post.

  • Implement SSAS MD cell security using dimension security with blazing performance

    SQL Server Analysis Services (SSAS) Multidimensional (MD) is a great product, and in my opinion it’s still the only real option to go for when building complex enterprise BI solutions. It’s still very fast when implemented correctly and it’s mature and therefore very stable.

     

    The only real downside is cell security, which, in my opinion, is useless. It makes performance drop dramatically because it evaluates security cell-by-cell. I have seen reports that run in a few seconds without cell security taking 20 minutes with cell security implemented! Try to explain that to your customer.. It’s obvious that you can’t.

     

    Quite some workarounds exist for quite a while:

     

    • Make your measures invisible and create MDX calculations that either show or hide the measure value based on a dummy dimension. Drawbacks are the measure is hidden and not really secured and you need to create dummy dimensions/attributes and maintain them. http://www.sqljason.com/2010/04/implementing-measure-security-in-ssas.html
    • SSAS offers us the “Measures Dimension” which give you the possibility to secure measures like dimension members. Great, this is just what we need, but implementing it and creating MDX calculations based on the secured measures will give you errors for SSAS roles that do not have access to these measures. This is caused by the fact that the MDX script is executed after the security has been implemented. So if a user that doesn’t have access to a measure (set by dimension security) tries to connect to the cube while the MDX script contains a reference to this secured measure, this raises an error.
    • Create a hidden dimension on which you apply dimension security like described here: http://cwebbbi.wordpress.com/2011/12/22/replacing-cell-security-with-dimension-security/ Unfortunately this doesn’t work for measures.

     

    For a few years I’ve always implemented a workaround that uses the measures dimension. To prevent errors in the MDX script as described by option B above, I’ve added IsError() checks around all my calculation parts that could raise an error. For example, a simple calculation like Quantity * Price, where the price measure could be secured, looks like this:

     

    CREATE MEMBER CURRENTCUBE .[Measures].[Amount] AS
     
    IIf (
        IsError ( [Measures].[Price] ),
       
    NULL,
        [Measures].[Quantity] * [Measures].[Price]
      ),
     
    NON_EMPTY_BEHAVIOR =
      {
       
    IIf (
          IsError ( [Measures].[Price] ),
         
    NULL,
          [Measures].[Price]
        )
      } ;

     

    This calculation would not raise an error but NULL if a user doesn’t have permission to the Price measure. Quite straightforward, only trick here is the IsError() check in the NON_EMPTY_BEHAVIOR, of course you would have to do this in the script view of your SSAS calculations tab. In the form view this will look a bit strange but it doesn’t cause any errors: 

     

    clip_image001 

     

    Using this approach you are able to implement measure security using the dimension security while you are still able to create MDX calculations with NON_EMPTY_BEHAVIOR based on these secured measures. This made the report I talked about before to run in 5 seconds instead of the 20 minutes. I’ve used this approach for quite some years now, and it has always been sufficient.

     

    At the moment I’m working on a very big enterprise cube that contains almost 100 dimensions, 25 measure groups and millions of records. I’ve also implemented measure security like this and after a while I noticed some really heavy queries (reports) took quite some time, for example 30 seconds or 1 minute. Reason enough to re-think this approach. When I was at the SQL Server Days in Belgium last year, I’ve discussed this approach with Chris Webb. Of course he could understand what I was doing here and we agreed to email about this approach later on. Chris emailed me about a blogpost of him from some time ago: http://cwebbbi.wordpress.com/2008/01/22/dimension-security-tips-tricks-and-problems/. In this post Chris explains how to create secured calculations while using dimension securty by using named sets and scope statements:

     

    CREATE MEMBER CURRENTCUBE .Measures.Test AS
      NULL
    ;

     

    CREATE SET myset1 AS
     
    IIf (
        IsError (
    StrToMember ( "Measures.[Internet Sales Amount]" ) ),
        { },
        { Measures.Test }
      ) ;

     

    SCOPE ( myset1 ) ;
    This = Measures.[Internet Sales Amount] ;
    END SCOPE ;

     

    CREATE SET myset2 AS
     
    IIf (
        IsError (
    StrToMember ( "Measures.[Internet Sales Amount]" ) ),
        { Measures.[Internet Tax Amount] },
        { Measures.[Internet Sales Amount], Measures.[Internet Tax Amount] }
      ) ;

     

    SCOPE ( myset2 ) ;
    This = Measures.CurrentMember * 2 ;
    END SCOPE ;

     

    I did not know about this approach, so I tried it out. To be able to get good test results I used a very heavy query that hopefully nobody would ever try: a calculation over all sales data (millions of records) against all customers, all products and all time.  Unfortunately, the results were not very pleasing yet:

     

    1. Calculation with IIF/IsError checks in both the calculation part and the non empty part: ran in 50 seconds.

     

    2. Calculation with named set/scope approach: ran longer than 5 minutes, after that I stopped the query.

     

    So Chris and I emailed again and I was wondering if the lack of NON_EMPTY_BEHAVIOR (NEB) could be the cause of the slow performance of the named set/scope approach. Chris said that since SSAS 2008 the usage of NEB was not neccesary anymore, something I’ve heard about before. I always had the idea adding NEB did make a difference in some cases so I never stoped using it. So I kinda merged Chris’ and mine approach, adding the NEB using an IsError() function and I replaced the StrToMember check with a direct reference to the measure in the named sets because in my opinion that part was not neccesary:     

    IsError ( StrToMember ( "Measures.[Internet Sales Amount]" ) ) à became: Measures.[Internet Sales Amount].

     

    The result was almost unbelievable, the query now took not 5 minutes, not 50 seconds but only 2 seconds!! So adding NEB still can make a huge difference sometimes! After finding out about this, I contacted Chris again and he was also very surprised by the result.

     

    I’ve created an MDX calculation template with some explanation added as comments. The big performance gain is achieved because the IsError() check doesn’t need to be executed every time the calculation is executed because it’s already executed when the static named set is created (one single time), afterwards this value just seems to be reused. I think the same applies for the IsError() check in the NEB, it seems it’s only executed once and is reused. Anyway, the result is, your MDX calculations are just as fast with security applied as without. Something that was not possible for a long time for me and I guess everybody else too. Just use the template below as a standard for creating your calculations and benefit from this solution:

     

    --Calculation Template:

    ----------------------------------------

       ----------1: CHECKED CALCULATION: The secured "end product" used in other calculations----------

    CREATE MEMBER CURRENTCUBE.[Measures].[Quantity_CHECKED] AS

      NULL

      , VISIBLE = 0 ;

     

       ----------2: CHECK NAMED SET: Actual permission check performed here----------  

    CREATE SET [Quantity_CHECK] AS

      IIf (

       IsError ( [Measures].[Quantity] )

       , { }

       , { [Measures].[Quantity_CHECKED] }

      ) ;

     

       ----------3: SCOPE: Assignment of either the measure or "nothing" to CHECKED calculation (1)----------  

    SCOPE ( [Quantity_CHECK] ) ;

        This = [Measures].[Quantity] ;

    END SCOPE ;

     

       ----------4: Second secure calculation created here----------  

    CREATE MEMBER CURRENTCUBE.[Measures].[Price_CHECKED] AS

      NULL

      , VISIBLE = 0 ;

     

    CREATE SET [Price_CHECK] AS 

      IIf (

       IsError ( [Measures].[Price] )

       , { }

       , { Measures.[Price_CHECKED] }

      ) ;

    SCOPE ( [Price_CHECK] ) ;

        This = [Measures].[Price] ;

    END SCOPE ;

     

       ----------5: Calculation based on the secure calculations.

       ----------Non_Empty_Behavior set with IIf/IsError on measure. Using calculations inside NEB is not possible----------

    CREATE MEMBER CURRENTCUBE.[Measures].[Sales Amount] AS

      [Measures].[Quantity_CHECKED] * [Measures].[Price_CHECKED]

      , NON_EMPTY_BEHAVIOR =

      {

       IIf (

        IsError ( [Measures].[Price] )

        , NULL

        , { [Measures].[Price] }

       )

      }

      , VISIBLE = 1 ;

       -------------------------------------------

     

    Your calculations tab in SSAS will look like:
    clip_image003

     

     

    When I was investigating the inner working of the first part of the script I’ve added some comments to make it easy for myself to remember. I guess it’s also useful for anyone that want to know what’s really happening here:  

    CREATE MEMBER CURRENTCUBE.[Measures].[Quantity_CHECKED] --> Calculation is initially NULL and will be filled based on user rights later on.

    AS NULL,

    VISIBLE = 0; --> Make it invisible as this is a pure technical calculation that should not be used by end users. End users can use the measure this calculation is based on, if they have permission (set by dimension security on the Measures dimension).

     

    CREATE SET [Quantity_CHECK] AS --> Named Set will be used to perform the actual check for user rights, as this is a static named set this check will only be executed one time, at initialization, and will be reused.  

    IIf(IsError([Measures].[Quantity]) --> An IsError on the measure that needs to be checked will raise an error if a particular user has no rights on it because in that case the measure will simply not exist.

    , {} --> If an error was raised by the IsError function set the value of this set to nothing: {}

    , {[Measures].[Quantity_CHECKED]}); --> If no error was raised the user has rights to access the measure, in this case set the value of the set to the required CHECKED calculation created in the previous step.

    SCOPE([Quantity_CHECK]);  --> If in the Scope of the CHECK calculation, either the CHECK calculation can be {} (Nothing) or it can be the CHECKED calculation (based on user rights).  

    This=[Measures].[Quantity]; --> Assign the measure to This. This can be the CHECK calculation if the user has rights which will pass the measure through to the initial CHECKED calculation or it can be {}/Nothing which will pass the value of the measure to nothing instead of to the CHECKED calculation.

    END SCOPE-- So this Scope function either passes the measure to the CHECKED calculation or to nothing (the empty set {})

More Posts Next page »
Privacy Statement