THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Jamie Thomson

This is the blog of Jamie Thomson, a freelance data mangler in London

Data Explorer walkthrough – Parsing a Twitter list

Yesterday the public availability of Data Explorer, a new data mashup tool from the SQL Server team, was announced at Announcing the Labs release of Microsoft Codename “Data Explorer”. Upon seeing the first public demos of Data Explorer at SQL Pass 2011 I published a blog post Thoughts on Data Explorer which must have caught someone’s attention because soon after I was lucky enough to be invited onto an early preview of Data Explorer, hence I have spent the past few weeks familiarising myself with it. In this blog post I am going to demonstrate how one can use Data Explorer to consume and parse a Twitter list.

I have set up a list specifically for this demo and suitably it is a list of tweeters that tweet about Data Explorer – you can view the list at http://twitter.com/#!/list/jamiet/data-explorer. Note that some of the screenshots in this blog post were taken prior to the public release and many of them have been altered slightly since then; with that in mind, here we go.

First, browse to https://dataexplorer.sqlazurelabs.com/ and log in

When logged in select New to create a new mashup

image

Give your mashup a suitable name

image

You will be shown some options for consuming a source of data. Click on Formula

image

We’re going to be good web citizens and use JSON rather than XML to return data from our list. The URI for our Twitter API call is https://api.twitter.com/1/lists/statuses.json?slug=data-explorer&owner_screen_name=jamiet, note how I have specified the list owner (me) and the name of the list (what they call the slug) “data-explorer” as query parameters. If you go to that URL in your browser then you will be prompted to save a file containing the returned JSON document which, if all you want to do is see the document, isn’t very useful. In debugging my mashups I have found a service called JSON Formatter to be invaluable because it allows us to see the contents of a JSON document by supplying the URI of that document as a parameter like so: http://jsonformatter.curiousconcept.com/#https://api.twitter.com/1/lists/statuses.json?slug=data-explorer&owner_screen_name=jamiet. It might be useful to keep that site open in a separate window as you attempt to build the mashup below.

I’ve digressed a little, let’s get back to our mashup. We’re going to use a function called Web.Contents() to consume the contents of the Twitter API call and pass the results into another function, Json.Document(), which parses the JSON document for us. The full formula is:

= Json.Document(Web.Contents(“https://api.twitter.com/1/lists/statuses.json?slug=data-explorer&owner_screen_name=jamiet”))

image

When you type in that formula and simply hit enter you’re probably going to be faced with this screen:

image

Its asking you how you want to authenticate with the Twitter API. Calls to the https://api.twitter.com/1/lists/statuses.json resource don’t require authentication so anonymous access is fine, just hit continue. When you do you will see something like this:

image

The icon

image

essentially indicates a dataset, so each record of these results is in itself another dataset. We’ll come onto how we further parse all of this later on but before we do we should clean up our existing formula so that we’re not hardcoding the values “data-explorer” and “jamiet”.

The Web.Contents() function possesses the ability to specify named parameters rather than including them in the full URL. Change the formula to:

= Json.Document(Web.Contents("https://api.twitter.com/1/lists/statuses.json", [Query = [slug="data-explorer", owner_screen_name="jamiet"] ])) :

image

That will return the same result as before but now we’ve broken out the query parameters {slug, owner_screen_name} into parameters of Web.Contents(). That’s kinda nice but they’re still hardcoded; instead what we want to do is turn the whole formula into a callable function, we do that by specifying a function signature and including the parameters of the signature in the formula like so:

= (slug,owner_screen_name) => Json.Document(Web.Contents("https://api.twitter.com/1/lists/statuses.json", [Query = [slug=slug, owner_screen_name=owner_screen_name] ]))

image

Let’s give our new function a more meaningful name by right-clicking on the resource name which is currently set as “Custom1” and renaming it as “GetTwitterList”:

image

image

We have now defined a new function within our mashup called GetTwitterList(slug, owner_screen_name) that we can call as if it were a built-in function.

image

Let’s create a new resource as a formula that uses our new custom function and pass it some parameter values:

= GetTwitterList("data-explorer", "jamiet")

image

We still have the same results but now via a nice neat function that abstracts away the complexity of Json.Document( Web.Contents() ).

As stated earlier each of the records is in itself a dataset each of which, in this case, represents lots of information about a single tweet. We can go a long way to parsing out the information using a function called IntoTable() that takes a dataset and converts it into a table of values:

image

Here is the result of applying IntoTable() to the results of GetTwitterlist():

image

This is much more useful, we can now see lots of information about each tweet however notice that information about the user who wrote the tweet is wrapped up in yet another nested dataset called “user”.

All the time note how whatever data we are seeing and whatever we do to that data via the graphical UIs is always reflected in the formula bar; in the screenshot immediately above notice that we are selecting the “user” and “text” columns (the checkbox for “user” is off the screen but is checked).

We can now parse out the user’s screen_name using a different function – AddColumn(). AddColumn() taken an input and allows us to define a new column (in this case called “user-screen_name”) and specify an expression for that column based on the input. A picture speaks a thousand words so:

= Table.AddColumn(intoTable, "user_screen_name", each [user][screen_name])

image

There we have our new column, user_screen_name, containing the name of the tweeter that tweeted the tweet. At this point let’s take a look at the raw JSON to see where this got parsed out from:

image

Notice that the screen_name, UserEd_, is embedded 3 levels deep within the hierarchical JSON document.

We’re almost there now. The final step is to use the function SelectColumns() to select the subset of columns that we are interested in::

= Table.SelectColumns(InsertedCustom,{"text", "user_screen_name"})

image

Which gives us our final result:

image

At this point hit the Save button:

image

OK, so we have a mashup that pulls some data out of twitter, parses it and then….well…nothing! It doesn’t actually do anything with that data.  We have to publish the mashup so that it can be consumed and we do that by heading back to the home page (which is referred to as “My Workspace”) by clicking the My Workspace button near the top of the page:

image

Back in My Workspace you can select your newly created mashup (by clicking on it) and options Preview, Snapshot & Publish appear:

image

We’ll ignore Preview and Snapshot for now, hit the Publish button instead at which point we are prompted for a name that we will publish the mashup as:

image

Hitting Publish will do the necessary and make our data feed available at a public URI:

image

Head to that URL (https://ws41451459.dataexplorer.sqlazurelabs.com/Published/TwitterListDemo) and here’s what you see:

image

You can download the mashup output as a CSV file or an Excel workbook. You can also download the whole mashup so you can edit it as you see fit and, most importantly, you can access the output of the mashup via an OData feed at https://ws41451459.dataexplorer.sqlazurelabs.com/Published/TwitterListDemo/Feed/jamiet-dataexplorer-text_user_screen_name

We have used Data Explorer’s JSON parsing and dataset navigation abilities to pull out the data that we are interested in and present it in a neat rectangular data structure that we are familiar with. Moreover we have done it without installing any software and we have made that data accessible via an open protocol; that’s pretty powerful and, in my wholly worthless opinion, very cool indeed.

Have fun playing with Data Explorer. Feel free to download my Twitter List Demo mashup and mess about with it to your heart’s content.

@jamiet

Published Wednesday, December 07, 2011 2:40 PM by jamiet
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Chris Webb said:

I see you've been busy too...

December 7, 2011 9:27 AM
 

SSIS Junkie said:

Earlier today I posted Data Explorer walkthrough – Parsing a Twitter list in which I explained that I

December 7, 2011 3:56 PM
 

Tim said:

Thanks Jamie.

To me it looks like we have a few pieces if excel extracted with the twist of some new data source and function call along with a place to host the feed?  Am I missing something?

Also, you say most importantly OData. Are you being scarcastic or have change your mind about the technology?

December 10, 2011 10:16 AM
 

jamiet said:

Hi Tim,

When I said:

"most importantly, you can access the output of the mashup via an OData feed"

I was implying that the important thing was that the data was accessible, not that it was accessible via OData per se. That being said, I have nothing against OData and hence I'm wondering what you meant by:

"Are you being scarcastic or have change your mind about the technology?"

What did you presume that my opinion was in regard to OData?

cheers

JT

December 11, 2011 7:04 PM
 

SSIS Junkie said:

A short recap At the PASS Summit 2011 a project that existed as part of the now-defunct SQL Azure Labs

February 27, 2013 12:47 PM
 

Joel Chaudy said:

Hi,

If you want I can use the service www.twexlist.com to export Twitter list (private or public), followers or followings. Cheers.

April 15, 2013 11:56 AM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement