I'm kicking off the next round of the SQL Server Bible series for Wiley. For the past two versions I used my own custom sample databases and skipped the MSFT sample databases (even though I feel that ALFKI's a friend and I'd love to shop there someday.)
I'm also wrapping up a contract where I helped design and develop very heavy transaction, large database. It drove home for me again the truth that testing on your notebook, or even with a 9Gb test database on a dev server, isn't the same as production with 19K very busy connections and a database that's growing Gbs per hour.
So for this next book, I want a more realistic database, one that won't possibly all fit into memory. I'm playing with data downloaded from the US Census Bureau and have pushed it to about 27Gb so far, but thought, why not ask the SQLBlog gang, what would be a great sample database? What's the criteria?
Here's what I have so far as the Great Sample Database goals:
- Just a handful of tables, easy to understand and query
- Small sample set of data for demos and unit testing
- Data generator to push the database to several Gb with a realistic distribution of data.
- Full stored procedure API data abstraction layer, so the database physical design can be refactored and then retested. E.g. to test different patterns for optional data, heaps vs. clustered, surrogate vs. natural keys.
- Includes lat and long for spatial data
- Includes mix of data types
- Ability to demo various types of joins and SQL techniques.
- Easy to download and install
So I ask you, what else might you want in a great sample database?