THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Jamie Thomson

This is the blog of Jamie Thomson, a freelance data mangler in London

POST rows to HBase REST API on HDInsight using Powershell

I’ve been very quiet on the blogging front of late. There are a few reasons for that but one of the main ones is that I’ve spent the past three months on a new gig immersing myself in Hadoop, primarily in Microsoft’s Hadoop offering called HDInsight. I’ve got a tonne of learnings that I want to share at some point but in this blog post I’ll start with a handy little script that I put together yesterday.

I’m using a tool in the Hadoop ecosystem called HBase which was made available on HDInsight in preview form about a month ago. HBase is a NoSQL solution intended to provide very very fast access to data and my colleagues and I think it might be well suited for a problem we’re currently architecting a solution for. In order to evaluate HBase we wanted to shove lots of meaningless data into it and in the world of HDInsight the means of communicating with your HDInsight cluster is Powershell. Hence I’ve written a Powershell script that will use HBase’s REST API to create a table and insert random data into it. Likely if you’ve googled this post then you’re already familiar with Hadoop, HDInsight, REST, Powershell, HBase, column families, cells, rowkeys and other associated jargon so I won’t cover any of those, what is important is the format of the XML payload that has to get POSTed/PUTted up to the REST API. That payload looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<CellSet>
  <Row key="myrowkey">
    <Cell column="columnfamily:column1">somevalue</Cell>
    <Cell column="columnfamily:column2">anothervalue</Cell>
  </Row>
</CellSet>

The payload can contain as many cells as you like. When the payload gets POSTed/PUTted the values therein need to be base64 encoded but don’t worry, the script I’m sharing herein takes care of all that for you. The script will also create the table for you. The data that gets inserted is totally meaningless and is also identical for each row, modifying the script to insert something meaningful is an exercise for the reader.

Another nicety of this script is that it uses Invoke-RestMethod which is built into Powershell 4. You don’t need to install other Powershell modules, nothing Azure specific. If you have Powershell 4 you’re good to go!

Embedding code on this blog site is ugly so I’ve made it available on my OneDrive: CreateHBaseTableAndPopulateWithData.ps1 Screenshot below gives you an idea of what’s going on here.

Hope this helps!

@Jamiet

UPDATE. I’ve posted a newer script CreateHBaseTableAndPopulateWithDataQuickly.ps1 which loads data in much quicker. This one sends multiple rows in each POST and hence I was able to insert 13.97m rows in 3 hours and 37 minutes which, given latency to the datacentre and that this was over a RESTful API, isn’t too bad. The previous version of the script did singleton inserts and hence would have taken weeks to insert that much data.

The number of POSTs and the number of rows in each POST are configurable.

SNAGHTML14e9116b

Published Friday, July 04, 2014 9:58 AM by jamiet
Filed under: , ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Shanmuga said:

Thanks for the post!

July 28, 2014 10:21 PM
 

Rahul@RahulSharma.in said:

I was wondering if MS has emulator available for HBase yet? I saw your question on MSDN but it looks there is nothing happening as of now.

It is very costly to run the HBase cluster on azure for development purpose. How do you get HBase on windows, as i have no experience with Linux so looks like a fought ride here...cygian or whatever they call that thing is same. Do you know if there is pre-built hyper-v for HBase that I can just use.

Not sure how replies work on this blog but if you can, my email is rahul@rahulsharma.in

September 8, 2014 8:28 PM
 

jamiet said:

Hi Rahul,

I was told some time ago that the emulator would soon include HBase but the most recent release of the emulator (28th August 2014) does *not* include HBase which I myself am disappointed about. In the meantime the recommendation from Microsoft is to use the Hortonworks Sandbox (http://hortonworks.com/products/hortonworks-sandbox/?utm_source=bing&utm_medium=ppc&utm_campaign=Hortonworks_Sandbox_Desktops&utm_content=textad&utm_keyword=hortonworks%20sandbox). I have successfully installed it in the past and its not a huge undertaking.

Regards

Jamie

September 9, 2014 2:57 AM
 

Rahul@RahulSharma.in said:

Thanks Jamie for getting back. I hope MS includes HBase in the emulator soon but it looks Hortonworks will work for now.

September 9, 2014 9:57 AM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement