THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Jorg Klein

Microsoft Data & Analytics consultant and Microsoft Data Platform MVP from the Netherlands

Orchestrate Azure Data Factory pipelines and other Azure Data Platform management tasks using Azure Automation

Azure Data Factory (ADF) is a great SaaS solution to compose and orchestrate your Azure data services. It works fine to create, schedule and manage your data pipelines but it has limitations that can make it hard to use in some scenarios. The two main ones are:

1.       Some tasks that you regularly want to perform can’t be accomplished with ADF.
A few examples are:

·         Start/Pause an Azure SQL Data Warehouse

·         Start/Pause an Azure Analysis Services instance

·         Process an Azure Analysis Services cube

·         Manage on-premises resources

 

2.       Triggering (running) ADF pipelines on demand.
For most people it is hard to switch from the well-known SQL Agent jobs - in which it’s really easy to trigger SSIS processes and execute other tasks - to
scheduling ADF pipelines. You now have to create activity windows and define data slices that are dependent on the availability of data sets. Most would like to be able to trigger an ADF pipeline either on demand or when some other task finished successfully.

The tasks listed at point 1 - and almost everything else you can think of - can be done using Azure PowerShell with Azure Resource Manager. Microsoft does an awesome job here: every new SaaS solution has great ARM support from the start. To make use of PowerShell in Azure and more importantly to automate your scripts and processes, Azure Automation comes in to play. It makes it possible to run all your PowerShell scripts in Azure as a SaaS solution. It is an orchestrator in which you can execute all kinds of Data Platform related operations, both in Azure and on-premises as hybrid workers, enabling hybrid Data Platform orchestration scenarios.

Triggering ADF pipelines on demand is a bit trickier. An “Execute now” command in PowerShell does not exist, which is understandable if you know how the scheduling and execution mechanism of ADF has been build. I don’t think it will be easy to change this, unless some major changes to ADF will be made. Fortunately, there is a workaround! If you ever deployed a new ADF pipeline you might have noticed that after deployment, pipelines start immediately when they have an activity window with a date/time in the past. Because it is possible to update the activity window date/time of a pipeline using PowerShell, it’s also possible to trigger a pipeline’s execution by changing its activity window date/time to some value in the past.

In this blog post I will show how you can trigger an ADF pipeline on a daily basis and monitor its execution using PowerShell code that runs in an Azure Automation runbook. Because executing other tasks (point 1) using PowerShell is easy, it becomes possible to orchestrate advanced Azure Data Platform tasks, with the execution of an ADF pipeline as just a part of the total process.


Azure Setup

In this example I’ve used the Azure setup shown below. The ADF pipeline contains one simple copy activity that copies a file from one blob storage container to another.

·         Resource Group

o   Blob Storage

§  Container: input

§  Container: output

o   Azure Data Factory

o   Azure Automation

 

Azure Data Factory Setup

I created the Azure Data Factory pipeline with the Copy Data wizard:

clip_image001[4]

 


I configured the pipeline to “Run regularly on schedule” with a recurring pattern of “Daily”, “every 1 day” (see the blue rectangle in the screenshot below).

Choosing “Run once now” would set the pipeline property “PipelineMode” to “OneTime” and would disable the ADF scheduler. Initially you might think this seems to be the option you want to use as we want to trigger the pipeline ourselves on demand, but unfortunately that configuration has some limitations:

·         The ADF diagram view does not show one-time pipelines. This makes it impossible to monitor the pipeline using the Monitor & Manage dashboard.

·         One-time pipelines can’t be updated. This would block us from updating the activity window properties to trigger the pipeline.

The “Start date time” is automatically set to yesterday with the current time. This triggers the pipeline to start running immediately after deployment; as explained ADF automatically triggers pipelines with a start date/time in the past. In this case, we want to trigger the execution ourselves, so set the date to some date in the future (see the red rectangle in the screenshot below).

clip_image003[4]

For this pipeline I chose to create a simple example that copies a file from one blob storage container to another. Of course, you can design your pipeline anyway you prefer, e.g. using the Copy Data wizard or Visual Studio. Just make sure you schedule it with a daily recurrence.


Azure Automation Setup

The next step is to create a new Azure Automation account. Within that account we need to import the Data Factory PowerShell cmdlets, because the standard set of PowerShell cmdlets in Automation do not contain any ADF related functions.

Navigate to Assets:

clip_image004[4]


Click Modules:
clip_image005[4]


 

Browse the gallery:
clip_image006[4]

 

Search for “Data Factory”, select AzureRM.DataFactories and click Import:
clip_image007[4]

AzureRM.DataFactories should now appear in your list of available modules:
clip_image009[4]

 

Create Credential

Now we need to create a Credential to be able to automatically login and run our PowerShell script unattended from Azure Automation.

Navigate to Assets again and then click Credentials
clip_image010[4]

 

Click “Add a credential” and supply a user account that has the required permissions to access your Azure Data Factory. You can use the organizational account you use to login to the Azure Portal. It might look like jorgk@yourorganizationalaccountname.com or something like that.
clip_image012[4]

Automation Runbook
We are now ready to create the Automation Runbook which will trigger the Azure Data Factory pipeline, by updating its Active Period to a date/time in the past.

The script performs the following steps:

1.       Authenticate with the Automation Credential

2.       Connect to the Azure Data Factory

3.       Update the pipeline active period to yesterday

4.       Unpause the pipeline; execution will begin

5.       Monitor the pipeline execution

6.       Pause the pipeline

Copy/paste the script below to a Windows PowerShell Script (.ps1) file and name it “TriggerAdfPipeline.ps1”.


 

# Variables; modify 

$rgn = "AzureDataPlatformOrchestration" #Resource Group Name

$acn = "adpo-auto-cred" #Automation Credential Name

$dfn = "adpo-adf" #Data Factory Name

$pln = "CopyPipeline-cu6" #PipeLine Name

$dsn = "OutputDataset-hgv" #DataSet Name (output dataset of pipeline that needs to be produced)

 

# To test from PowerShell client, uncomment the 2 rows below and provide subscription ID

#Login-AzureRmAccount

#Set-AzureRMContext -SubscriptionId "00000000-0000-0000-0000-000000000000"

 

# Authenticate

# To test from PowerShell client, comment out the 2 rows below

$AzureCred = Get-AutomationPSCredential -Name $acn

Add-AzureRmAccount -Credential $AzureCred | Out-Null

 

# Get data factory object

$df=Get-AzureRmDataFactory -ResourceGroupName $rgn -Name $dfn

If($df) {

       Write-Output "Connected to data factory $dfn in resource group $rgn."

}

 

# Create start/end DateTime (yesterday)

$sdt = [System.DateTime]::Today.AddDays(-1) #Yesterday 12:00:00 AM

$edt = [System.DateTime]::Today.AddSeconds(-1) #Yesterday 11:59:59 PM

 

# Update active period to yesterday

$apr=Set-AzureRmDataFactoryPipelineActivePeriod -DataFactory $df -PipelineName $pln -StartDateTime $sdt -EndDateTime $edt

If($apr) {

       Write-Output "Pipeline $pln of data factory $dfn updated with StartDateTime $sdt and EndDateTime $edt."

}

 

# Unpause pipeline

$rpl = Resume-AzureRmDataFactoryPipeline -DataFactory $df -Name $pln

If($rpl) {

       Write-Output "Pipeline $pln resumed."

}

 

# Create arrays that hold all possible data factory slice states

$failedStates = "Failed Validation", "Timed Out", "Skip", "Failed"

$pendingStates = "Retry Validation", "PendingValidation", "Retry", "InProgress", "PendingExecution"

$finishedStates = "Ready"

 

# Wait while data factory slice is in pending state

While (Get-AzureRmDataFactorySlice -DataFactory $df -DatasetName $dsn -StartDateTime $sdt -EndDateTime $edt | Where -Property State -In $pendingStates) {

       Write-Output "Slice status is:"

       Get-AzureRmDataFactorySlice -DataFactory $df -DatasetName $dsn -StartDateTime $sdt -EndDateTime $edt | Select -Property State

       Write-Output "Wait 15 seconds"

       Start-Sleep 15

}

 

# Since data factory slice is not pending (anymore), it is either failed or finished

If(Get-AzureRmDataFactorySlice -DataFactory $df -DatasetName $dsn -StartDateTime $sdt -EndDateTime $edt | Where -Property State -In $failedStates){

       Write-Output "Slice failed."

       Get-AzureRmDataFactorySlice -DataFactory $df -DatasetName $dsn -StartDateTime $sdt -EndDateTime $edt | Select -Property State

} ElseIf (Get-AzureRmDataFactorySlice -DataFactory $df -DatasetName $dsn -StartDateTime $sdt -EndDateTime $edt | Where -Property State -In $finishedStates) {

       Write-Output "Slice finished."

} Else {

       Write-Output "No State found?"

       Get-AzureRmDataFactorySlice -DataFactory $df -DatasetName $dsn -StartDateTime $sdt -EndDateTime $edt | Select -Property State

}

 

# Pause pipeline

$spr = Suspend-AzureRmDataFactoryPipeline -DataFactory $df -Name $pln

If($spr){

       Write-Output "Pipeline $pln paused."

}

 


 

Navigate to your Azure Automation account in the Azure Portal and click “Runbooks”:

clip_image013[4]

 

Click “Add a runbook”:

clip_image014[4]

 

Now select the TriggerAdfPipeline.ps1 file to import the PowerShell script as Runbook:

clip_image016[4]

 

Runbook TriggerAdfPipeline is created. Click it to open it:

clip_image018[4]

 

A Runbook must be published before you are able to start or schedule it. Click Edit:

clip_image020[4]

 

Before publishing, test the Runbook first. Click on the Test pane button and then click Start:

clip_image022[4]

 

The pipeline slice ran successfully:

clip_image024[4]

 

Now publish the Runbook:

clip_image026[4]

 

You can now create webhooks, schedule and monitor your Runbook jobs:

clip_image028[4]

 

 

Extending the script with additional tasks

It’s easy to perform other Azure Data Platform related tasks using PowerShell and Automation Runbooks. Because the script I created waits for the ADF pipeline to end, you can easily execute other tasks before or after the ADF’s execution. The following examples will most likely be useful:

·         Pause/Start Azure SQL Data warehouse

o   Documentation

o   PowerShell Gallery

 

·         Pause/Start Azure Analysis Services

o   PowerShell Gallery

 

·         Process Azure Analysis Services

o   Under investigation, coming soon…

 

Published Monday, November 14, 2016 12:18 PM by jorg

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

AndrewvR said:

That is really helpful, thank you.  I am aware of the Automation piece, but had not looked at it and so had not brought it into my daily processes/workflows.  But now I will!

November 18, 2016 2:31 AM
 

Valter said:

Hi ..

thanks for this great post..

unfortunately i think that this method only works when calling the pipeline for the first time.. than the "second time" setting  active period to "yesterday" does not force to run the pipeline again. This period got to be set as a date before the date of the last pipeline success run (if last pipeline forced run was yesterday to re run again i got to set active time period to 2 day ago).

Neither setting the active date period in the future at the end of the powershell script solve the problem.

Regards

January 26, 2017 8:55 AM
 

Man said:

There is workaround to make it work for "second time", but the df can only trigger once in every 15 mins:

1. The df pipeline schedule should be "every 15 minutes" instead of daily

2. Updated the “TriggerAdfPipeline.ps1” to make it work for "second time":

$sdt = [System.DateTime]::Today.AddDays(-1) #Yesterday 12:00:00 AM

$edt = [System.DateTime]::Today.AddSeconds(-1) #Yesterday 11:59:59 PM

to

$sdt = Get-Date

$sdt = $sdt.AddMinutes(-2)

$edt = Get-Date

$edt = $edt.AddMinutes(-1)

February 2, 2017 12:35 AM
 

Sri said:

Super.

February 24, 2017 10:02 PM

Leave a Comment

(required) 
(required) 
Submit

About jorg

Jorg Klein, Microsoft Data & Analytics consultant from the Netherlands.
Privacy Statement