THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Paul Nielsen

www.SQLServerBible.com

15 shopping days left at ALFKI

I'm kicking off the next round of the SQL Server Bible series for Wiley. For the past two versions I used my own custom sample databases and skipped the MSFT sample databases (even though I feel that ALFKI's a friend and I'd love to shop there someday.)

I'm also wrapping up a contract where I helped design and develop very heavy transaction, large database. It drove home for me again the truth that testing on your notebook, or even with a 9Gb test database on a dev server, isn't the same as production with 19K very busy connections and a database that's growing Gbs per hour.

So for this next book, I want a more realistic database, one that won't possibly all fit into memory. I'm playing with data downloaded from the US Census Bureau and have pushed it to about 27Gb so far, but thought, why not ask the SQLBlog gang, what would be a great sample database? What's the criteria?

Here's what I have so far as the Great Sample Database goals:

  • Just a handful of tables, easy to understand and query
  • Small sample set of data for demos and unit testing
  • Data generator to push the database to several Gb with a realistic distribution of data.
  • Full stored procedure API data abstraction layer, so the database physical design can be refactored and then retested. E.g. to test different patterns for optional data, heaps vs. clustered, surrogate vs. natural keys.
  • Includes lat and long for spatial data
  • Includes mix of data types
  • Ability to demo various types of joins and SQL techniques.
  • Easy to download and install

So I ask you, what else might you want in a great sample database?

Published Monday, December 10, 2007 12:52 PM by Paul Nielsen
Filed under: , ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Denis Gobo said:

Why not 3 DBs?

SmallSample DB

MediumSample DB

LargeSample DB

or is that overkill?

December 10, 2007 3:25 PM
 

Adam Machanic said:

Which US Census download are you referring to?  TIGER?

December 10, 2007 3:28 PM
 

Paul Nielsen said:

Hi Denis, do you mean small, medium, large in the number of table and complexity or the amount of data?

-Paul

December 10, 2007 3:28 PM
 

Paul Nielsen said:

Hi Adam,

I found the 1990 Census data easy to download and use, so that's what I'm playing with, but if you have a better idea, I'm open.

-Paul

December 10, 2007 3:30 PM
 

Adam Machanic said:

Paul,

I'm not familiar with the Census data itself, and didn't even know you could download it.  There is a product called TIGER that you can download which contains line segments that describe roads, large buildings, bodies of water, and other formations.  I've worked with this data and it is a tremendous pain so I was surprised that you might be considering it :-)

December 10, 2007 3:39 PM
 

Denis Gobo said:

Paul,  I would say size, you can always use a handful of tables. You also don't need to populate all the tables with the same number of rows.....remember not everyone has a TB on their laptop

December 10, 2007 3:40 PM
 

Paul Nielsen said:

Denis,

I agree, that's why the sample database has to have a fixed loadable set of test data for unit testing and demo code that always generates the same answer to a query.

And a Populate script can pump data into the API so the reader/developer can push the database to whatever size they want for volumn testing, or run multiple copies of the script to test transaction locking and blocking.

-Paul

December 10, 2007 3:56 PM

Leave a Comment

(required) 
(required) 
Submit

About Paul Nielsen

Paul Nielsen believes SQL is the romance language of data. As such he’s a hands-on database developer, Microsoft SQL Server MVP, trainer, and author of SQL Server Bible series (Wiley). As a data architect, he developed the concepts of Smart Database Design and Nordic – an open source O/R dbms for SQL Server. He lives in Colorado Springs.

This Blog

Syndication

News

news item test
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement