THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Kevin Kline

Getting Ahead of the Curve – Big Data

I have to confess that I'm incredibly excited about BigData. I haven't been this excited about new innovations in IT since relational databases first appeared on the scene early in my career. But what is BigData? Back in those days, I can still feel the echos of adrenaline when I was hired to work on a NASA project that would involve over 100Mb of data. ONE HUNDRED MEGABYTES! Good grief, that was fantastically huge to us on the team. (That database was over 130Mb when I finally moved on to another project). And remember - PC software was installed using 640Kb floppy disks at the time. In fact, my Oracle v5 instance required shuffling through about a dozen floppy disks to get the thing installed on a 286 IBM PC. BigData today takes on an entirely meaning as database sizes scale into the petabytes. But the emphasis is still the same today as it was back in the 1980's - turning data into actionable information. However, with BigData, we can achieve amazing new insight from this data and mine for tidbits that would never have seen the light of day with smaller data sets. The two major themes to remember about big data are 1) the more data you have on a given domain, the more power you have, 2) the better the analysis you can perform on the data, the more power you have. In fact, theme 2 might be the most important thing to consider because lots of data is meaningless unless you can extract knowledge from it. And that's where better analytical techniques come into play. Here are some articles about Big Data that you might enjoy: Let me know what you think. Best regards, -Kev
Follow me on Twitter at kekline
Published Thursday, July 14, 2011 10:41 AM by KKline

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Geoff said:

I've been using Vertica for a couple of years now. I highly recommend it. I can't speak to the other products.

You use standard SQL, but it's column oriented in a MPP cluster. It was founded by database legend Michael Stonebraker and recently bought by HP.

I think the future of databases is this: specialized products. For datawarehousing you should use Netezza, Vertica, Terradata or something like that. For instances that require a lot of transactions, there will be products for that (like Stonebraker's VoltDB startup). For everything in between MySQL, Oracle, PostgreSQL, and SQL Server will be what's used.

I like SQL Server (hence, I'm reading this blog), but when I hear about huge SQL Server installations I wonder if they could be using a more specialized database. Just more tools in the tool belt. I would also add "when all you have is a hammer all the world is a nail."

What I would really like to know is does SSIS support these Big Data products? I've used Pentaho just fine and that would be interesting to know.

July 14, 2011 4:33 PM
 

BI Paul said:

Hi Geoff,

Your statement "For datawarehousing you should use Netezza, Vertica, Terradata or something like that"....

...not sure if you have heard but Microsoft also has technology in the MPP space, it is SQL Server 2008R2 Parallel Data Warehouse edition.  Please see the following link for more information.

http://www.microsoft.com/sqlserver/2008/en/us/parallel-data-warehouse.aspx

July 15, 2011 1:35 AM
 

Geoff said:

That's probably a big improvement. MPP helps address a lot scalability issues.

At least in the data warehousing space, and I'll use Vertica since that is what I have experience in, I would be skeptical (not cynical) that that would be better than a pure column-based solution. However, if you already have SQL Server experience, the performance difference might not matter. I guess it depends. As I alluded to, one size doesn't fit all.

I'm very curious as to what the performance differences would be on similar hardware configurations. Someone should do a study.

I'm also wondering if, for a SQL Server shop, if Sybase IQ would be a good option as a column-based database since Sybase and SQL Server are cousins and should both use similar SQL syntax.

July 15, 2011 10:51 AM
 

KKline said:

Great comments, guys.  I think the key idea "one size does not fit all" will be our bywords of the future.

I hadn't thought about Sybase IQ, Geoff.  That's something else I'll need to investigate.  =^)

-Kev

July 18, 2011 9:52 AM
 

Geoff said:

I should mention that Vertica using PostgreSQL syntax and functions. So if you have familiarity with that syntax, it might be an easier learning curve. But SQL differences shouldn't be that big of a deal.

The question I have for SQL Server would be how does MPP and the new columnstore perform for data warehousing.

July 18, 2011 10:03 AM

Leave a Comment

(required) 
(required) 
Submit

About KKline

Kevin Kline is a well-known database industry expert, author, and speaker. Kevin is a long-time Microsoft MVP and was one of the founders of PASS, www.sqlpass.org.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement