THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Jamie Thomson

This is the blog of Jamie Thomson, a data mangler in London working for Dunnhumby

Iterate over a collection in Cronacle

In my new life as a Hadoop data monkey I have been using a tool called Redwood Cronacle as a workflow/scheduler engine. One thing that has shocked me after years of working in the Microsoft ecosystem is the utter dearth of useful community content around Cronacle. There simply isn’t anything out there about it, nobody blogs about Cronacle (top link when googling for “cronacle blog” is http://www.chroniclebooks.com/blog/), no forums, precisely zero questions (at the time of writing) on stackoverflow tagged Cronacle… there’s just nothing. Its almost as if nobody else out there is using it and that’s infuriating when you’re trying to learn it.

In a small effort to change this situation I’ve already posted one Cronacle-related blog post Implementing a build and deploy pipeline for Cronacle and in this one I’m going to cover a technique that I think is intrinsic to any workflow engine, iterating over a collection and carrying out some operation on each iterated value (you might call it a cursor). There’s a wealth of blog posts on how to do this using SSIS’s ForEach Loop container because its a very common requirement (here is one I wrote 10 years ago) but I couldn’t find one pertaining to Cronacle. Here we go…


We have identified a need to be able to iterate over a dataset within Cronacle and carry out some operation (e.g. execute a job) for each iterated value. This article explains one technique to do it.

My method for doing this has two distinct steps:

  1. Build a dataset and return that dataset to Cronacle
  2. Iterate over the recordset

There are many ways to build a dataset (in the example herein I execute a query on Hadoop using Impala) hence the second of these two steps is the real meat of this article. That second step is however its pointless without the first step, hence both steps will be explained in detail.

Here's my Cronacle Job Chain that I built to demo this:

image 

Step 1 - Build the collection

To emphasize a point made above, I could have used one of many techniques to build a collection to be iterated over, in this case I issued an Impala query:

image

N.B. The beeline argument --showheader has no effect when used with the -e option (only has an effect when a file is specified using the -f option). This is an important point as you will see below.

When this JobDefinition gets executed we can observe that the collection is assigned to the outParam parameter:

image

outParam is of type String, not a String array. The values therein are delimited by a newline character ("\n")

Step 2 - Iterate over the collection

The output parameter from the first job in the Job Chain is fed into an input parameter of the second job in the Job Chain:

image

From there we write Redwood Script (basically Java code) to split the string literal into an array and then execute an arbitrary Job Definition "JD_EchoInParameterValue_jamie_test" for each iterated value.

image

Thus, the code shown above is the important part here. It takes a collection that has been crowbarred into a string literal and splits it by \n into a string array then passes each element of that array to another job as a parameter. I’ve made the code available in a gist: https://gist.github.com/jamiekt/06a905ec9f8119416b4f

When executed observe that "JD_EchoInParameterValue_jamie_test" gets called three times, once for each value in the array ("col, "1", "2")

image

image

Summary

I’m still a Cronacle beginner so its quite likely that there is an easier way to do this. The method I’ve described here feels like a bit of a hack however that’s probably more down to my extensive experience with SSIS which has built-in support for doing this (i.e. the For Each Loop container).

Comments are welcome.

You can read all of my blog posts relating to Cronacle at http://sqlblog.com/blogs/jamie_thomson/archive/tags/Cronacle/default.aspx.

@Jamiet 

* SSIS is the tool that I used to use to do this sort of stuff in the Microsoft ecosystem

Published Friday, May 15, 2015 12:51 PM by jamiet
Filed under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

curtis said:

it works! thanks. i totally agree with your first paragraph. it is impossible to find anything useful on cronacle - i must have used just the right keyword to get this.

Thanks a bunch. This post was perfect. Exactly what i was trying to solve.

September 14, 2015 8:53 PM
 

Gerben said:

Hi Jamie,

Very interesting. Maybe you can benefit from the usage of Array parameters here as well to pass the dataset around. That will strip out the need for splitting the data and you can just iterate over the array in the next job.

The most active area on Cronacle can be found in the SAP community: http://scn.sap.com/community/bpa-by-redwood

Regards Gerben

November 6, 2015 3:34 AM
 

jamiet said:

Hi Gerben,

Array parameters. They sound interesting, not least because I'm currently working on something that requires a list of things to be passed to a Job Chain.

Thanks for the link. How come the best place for Cronacle content is a SAP site? Is SAP BPA simply a re-badged Cronacle?

Thanks

jamie

November 6, 2015 3:55 AM
 

Gerben said:

Hi Jamie,

Yes indeed, SAP BPA is the same code as Cronacle, but sold by SAP.

Since parameter values are unlimited now you can pass anything along. That makes the usage of Array parameters so useful as you can use native Arrays in your code without having to think about your item separators and decoding them

Regards Gerben

January 18, 2016 8:23 AM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Privacy Statement