|
|
|
|
-
I skipped last year making blog resolutions, but this year I need to get myself back on the straight and narrow and encourage myself to do a few things. 1. Finish my book quickly and efficiently – Well, duh, I supposed, but the quickly and efficiently is the biggest important part. I have no idea when SQL Server 2012 will be released, but with RC0 having been publicly released, it isn’t going to be SQL Server 2013 now is it? 2. Blog regularly – Note I didn’t exactly say more, but I do want to be regular. Perhaps a substantive blog (more than telling people about some event) once a month here on SQL Blog while working on a book, and 2 or 3 times otherwise. On simple-talk my goal is to average 2 What Counts For A DBA blog entries. 2.1 Blog about my other (computer) love occasionally – I love gadgets (I now carry 6 music/video players, including 1 Windows Phone 7, 4 Zunes, my new Android player, and a Nintendo 3DS) and software (user interfaces are my favorite pet peeve), and I occasionally I feel like talking about them. 3. Do 1 or 2 new presentations, and reduce doing the ones on database design. I love talking about database design, and I will still put my main db design session in for conferences, but I have 3 new ideas for sessions that are about 50% code and 50% design (for example, a session on triggers, which I want to show not only how they can be used, but why they should be used (or honestly not be used, which is a far larger list indeed). 4. Develop and present a DB design a variable length (1-3 day?) seminar. I am pretty happy how the book is organized, based on how I forced myself to work on it this time. I had a bunch of ideas how I wanted to change things us and shorten the book, but during the process, I realized the natural progression of how it ought to be done. I feel like it translates well to a class, even a bit more than the pre-con I have written that I did for the Orlando SQL Saturday (and which I will be happy to do for any SQL Saturday for any size group (I don’t care so much about making money, as long as my expenses are covered after the room is paid for…though I wouldn’t mind making a buck :) 5. Get more involved with our local group. I attend, I help out, I did some work with trying to SQL Rally in Nashville, but I think I could do more to help out there. It helps that I work with Christine Leo, Kevin Kline, and Joe Webb, all super dynamic folks who kick butt. In my personal life, I have to make a number of changes that are probably apparent to anyone who has met me or seen a picture of me… That will probably be the hardest of them all, though possibly the most important. Wow, looking at the list I realize why I stopped making resolutions… Maybe I should just resolve to go to Disney World more? Well, that would be way too easy!
|
-
So it is Tuesday night, just a few days until my presentation entitled What Counts For A DBA and I am still not completely sure exactly what is going to go on. In fact, I don’t exactly plan to know what is going on until the presentation is over. On paper it seems like a simple idea. I am going to use 9 of the topics I have posted on my simple-talk blog about What Counts For a DBA (http://www.simple-talk.com/community/blogs/drsql/), and write them on my spinning wheel that looks like this:  The extra three spaces will be prize slots, which will (the first time they are landed on, earn the spinner a book: I will bring a copy of MVP Deep Dives 2, a copy of my relational design book, and a copy of Tom Larock’s DBA Survivor book too. In these thee spots I will have a Joker that lets the spinner choose their own topic, and two envelope spots that will let them choose an envelope that contains one of my queued up topics that I will ask the spinner to comment on, as well as anyone else in attendance. Any repeat topics will merit an envelope open as well. Will it work? I don’t have a clue. It will take me really stepping up my looseness when I speak and an audience that isn’t prepping their minds for the Women In Technology lunch. Either I dropped 120 bucks on a spinning wheel that takes up garage space and is never used or that is used a few times a year. Either way, hope to see you in Louisville this weekend!
|
-
 Well, we are finally here at what is the secular version of the holiday season for Microsoft SQL Server nerd types, the week of the SQLPASS Summit. This year, I am speaking 3 times and will also be doing the Quiz Bowl at the Welcome Reception, so I am going to be busy. If you are here and are interested in database design, please do stop by and check out my sessions. Monday and Tuesday I will be in side sessions that are NDA for much of the day, and that is probably all that I can say. Tuesday night will be the Quiz Bowl where Tim Ford and I will attempt to one up ourselves and attempt to (along with some unwitting (but hopefully full of wit) experts) entertain you with our yearly Jeopardy-esque wanna be game show. Wednesday, from 1-1:30, a bunch of the writers of the SQL Server MVP Deep Dives 2 book will be signing copies for you (there is a second signing at 7:15 - 8 AM on Friday, if you can stomach the earliness). If you want to see what is in the book, check http://www.manning.com/delaney/. The book will be on sale starting Tuesday afternoon for you if you want is signed or not. Note that all author proceeds of the book are going to charity (this edition’s proceeds going to http://www.operationsmile.org/), so you get a good book with lots of different subjects, and make a donation to a worthy cause. I am almost certain that you get an ebook edition of the book for free with the paper version that you can read on your portable device as well. At 4:45 - 6 PM, I will be doing my "Characteristics of a Great Relational Database " session that was picked up as an alternate . This needs to be a very interactive session, and my hope is to learn a bit from your ideas as well as the slides I have prepared. The session is fairly light and a bit humorous, so if you are feeling particularly serious and sour, well, my Thursday session is far less fun: Thursday afternoon, from 3-4:15 PM, I will be presenting "Database Design Fundamentals" which presents a more deep dive on the concepts that go into designing a relational database. My goal here is to present the basics of the process of creating a database from conception until you are ready to start typing CREATE TABLE statements. Friday, I will be in a panel discussion called "Are you a Linchpin? Career management lessons to help you become indispensible.", representing the corporate developers in the world who like that they have one set of problems to solve that while the basis never varies, can never really be solved because the demand outstrips the realities of the day. Other members of the panel include Jeremiah Peschka, Stacia Misner, Kevin Kline, Brent Ozar, Thomas LaRock, Andy Warren, Andy Leonard Saturday I go home...
|
-
So I have been a bit remiss on my blogging the book duties. The fact is, the first 8 chapters were fairly heavy rewrites and reworkings, and even a good amount of new material. But when doing a new version of a book that has already existed, you do need to reuse a good deal of the material from previous version. Chapters 9, 10, and 11 are these chapters for this edition of the book. The chapters are: - Chapter 9 - Database Security and Security Patterns – The biggest change to this chapter was the differences that Contained Database bring to the picture.
- Chapter 10 - Table Structures and Indexing – Minimal changes to this chapter, mostly concerning some of the changes to compression. Since columnstore indexes aren’t really pertinent to OLTP databases, I didn’t do anything with them. An example of a columnstore index will appear in Chapter 12
- Chapter 11 - Coding for Concurrency – Again, not a tremendous difference for Denali, so just a bit of touch up.
Then comes Chapter 12. Back in the original version of the book, I had a reporting chapter, and I wasn’t amazingly pleased with it. So in 2005 I cut it, and decided to leave it out. But it always felt like a bit of a hole in the book, having to say to “don’t denormalize and do reporting in your OLTP database, build a data warehouse, which I won’t talk about”. Since those early years, I have learned a lot about data warehousing, attended a Kimball class on on dimensional modeling, and have designed our corporate data warehouse (with the requisite original “failure” before training, naturally.) But I had to face facts, I was not the right person to write even a chapter on reporting/dimensional modeling. I had made a decision to not have cowriters for this version, a bit for space reasons (I loved having Kevin Kline as a cowriter for the past two versions, but I just didn’t have space last time, and we made it a download), a bit for coordination reasons (I had a devil of a time with a spatial section for the last book), and a bit for selfish reasons (you can figure that out for your own self.) But a few months ago, as the outline gelled, I decided I just needed something about dimensional modeling… So I reached out to a particular writer named Jessica Moss (@jessicammoss) that I have known for a while, and had recently worked with on my data warehouse project as a mentor to help our team grow quite a bit in our ETL skills and to help mold our design. So I asked her if she wanted to write a chapter on dimensional modeling, and she accepted. I have seen her early version of the chapter, and I am very excited to add it as chapter 12. Only one more chapter to write, number 13, which I will blog about in the next day or two once I get my ideas down on “paper” (and if assuming I don’t decide to split the chapter, will annoy my editor, so probably not!)
|
-
You probably have noticed that I haven’t blogged all that much these days. Part of the reason has been taking on way too many projects/speaking engagements/writing projects etc. I am generally proud of all of these things, but the most project that I am probably the most proud of is the MVP Deep Dives 2 book. The project was helmed by the one and only Kalen Delaney, whom I have always admired for many reasons. Working with her on the project has been awesome, even when she had to crack the whip on us pokey editors. The other editors are a group of names that, let’s face it, if you have heard of me, you have heard of them and probably long before me (well, unless you are a family member of mine!) Greg Low who rules www.sqldownunder.com), Brad McGehee the smiling face of many Red Gate ads and blogger at http://www.bradmcgehee.com/, Paul Nielsen, writer of the SQL Server Bible series until 2008 and a good friend of mine who loves database design like I do and Paul Randal and Kimberly Tripp of SQLSkills fame. It is an honor to be even mentioned along with these names. You see the table of contents and purchase the book here: http://www.manning.com/delaney/. Just like last time, the book starts out with a section on Architecture, edited by me. It is the smallest section, which is probably to be expected. The 6 chapters in Architecture focus on the softer topics, including constraints/uniqueness, storage, generalization of designs, and general characteristics of designs. I hope you like it, but one thing that is excellent about a book like this is that there are 60 chapters with something for everyone. If you are going to be at SQLPASS, there is scheduled to be a good number of books for purchase there, and we will have a mass signing again. I always feel funny about signing books, but the book signing was a lot of fun last time. The charity for this book is Operation Smile (http://www.operationsmile.org/). All author and editor proceeds will go to them. Last time we raised well into a 5 digit sum for War Child, so it isn’t chicken feed at the least.  One last note, I should note that over the project I lost an author due to his time commitments and some difficultly with the size of the chapter. I feel bad that we couldn’t work it out and I hope that his material (that was really good in its original form, though far too large for this format, and when we cut it down it just couldn’t be saved. The terrible part of being an editor is occasionally having to make a hard decision. In any case, this person’s writing, reasoning, etc were all good and if I see it posted later I will link to it for you.
|
-
I have seen a lot of other people giving advice about what to do on your first trip to the SQL PASS Conference and I want to give you my two cents worth as well. Many people will be pushing the social aspects of the conference and that is excellent advice which I too will emphasize, but in my mind there is one main thing you need to do: Make it worth it. Someone has shelled out a pretty large sum of money to get you there, and they want to see some return on investment in order for you or your coworkers to do it again. There are far too many choices out there for training, and PASS is a great bet to really learn a breadth of information in a short amount of time. Add to that a pre-con or two and you can get some deep insight to in the short period of time. One thing that I love about it is that there are so many sessions that even after 10 years I can get some extremely deep information from super geniuses Conor Cunningham and Bob Ward (and not of the Wile E Coyote variety!) as well as some very deep information from a host of others, all on topics that I am already quite good at and still learn a very valuable thing or two (and sometimes even more). Then I can pick up get beginner and intermediate topics on stuff that I am just interested in. If you have problems you need solved, write them down and bring them with you. Bring your laptop with demonstrations of your problem. I know I love to help out people with design problems if they have enough information to make it easy to see what they are trying to do. The SQLCat team (http://sqlcat.com/) usually has a great presence and will talk to you about problems, and there are labs to try out features that you might not usually have access to. Add to that the lounge with a bunch of current and future MVPs hanging out willing to give you some time talking about SQL Server related topics. Just don’t come to most technical sessions and expect to ask a question that takes 10 minutes of explaining and get your solution while everyone else waits… So take it somewhat seriously and learn something to take back to your company and show that the investment was worth it. And try not to quit and change jobs the week after the Summit, if you can. Nothing kills a training budget like people getting the feeling that they are paying their employees to go to a job fair for a week. Now, as long as you can make the investment pay off for whomever has paid for you to come to the Summit (even if it is you!), now have fun. There are tons of opportunities to have fun at the Summit. On the opening night we have a Quiz Bowl game where we quiz some of the smartest (goofiest) people in the SQL community on various insane topics. There is a PASS Party one night, a dinner you can sign up to attend on Monday and if you keep your ears open, plenty of other happenings around the Summit. Right around the conference center there is a theater, numerous restaurants, an excellent arcade, so there are plenty of places around to hang out with your new PASS friends you might make. And if you take one of the shuttles to your hotel with other people, you will probably meet a few people heading to the conference right after you get off of the plane (if you don’t bump into someone on the plane!) One of the best things about attending a large conference like this is that you can meet a lot of people you probably read/watch on the Internet and find out that they are just people (albeit people who spend a good amount of free time punishing various keyboard devices a little extra). I said I would mention it, and social networking is a very useful tool, especially at conferences. My suggestion is to (at least a few weeks prior to the Summit,) sign up for twitter, get a twitter client and follow @sqlpass at a minimum (feel free to follow @drsql too!) Also use your twitter client (or if you refuse, a browser) to periodically watch a search of sqlpass: http://twitter.com/#!/search/sqlpass. All of the twitter types will be telling everything that is going on, so if you go to a session and don’t like it, you can find out another that is good. If you want to find a group of people out one night to hang with, there is always something going on. A handy tool I have started using on my Windows Phone 7 is an app called Spout (there is an iPhone and Android version too) that lets you watch a twitter stream, twitter search, facebook account, google reader, and several others in cool looking rotating display. I used it at Devlink last week and it was cool watching what everyone was saying about stuff based on a twitter search of “devlink”. And the best part of using twitter? The friends you make at the conference go home with you and become close friends over time, sometimes even those you never even physically meet. In the end, you can either go to the conference, attend some sessions and go home, or…end up with a head full of knowledge, some real new friends, a host of virtual friends, and a community that you can lean on when you have needs (of course, they will lean back too.) And if you really like this conference stuff, there are lots of user groups and one day little PASS conferences all over the country world these days called SQL Saturday that you can go to and see some of the same people and lots of new faces. Who knows, you might even find yourself compelled to speak at next year’s event!
|
-
When I was editing my chapter on implementing a database, I noticed a really nice improvement in the error message I had from the previous edition of the book. Instead of just telling me that there was a value in my modification statement that duplicated an existing value (or multiple values affected by the statement), it told me the duplicated value. To demo, I created the following quickie table in tempdb. USE tempdb GO --drop the object if it initially existed if object_id('test.testErrorMessage') IS NOT null DROP TABLE test.testErrorMessage IF schema_id('test') IS NOT NULL DROP SCHEMA test go CREATE SCHEMA test GO CREATE TABLE test.testErrorMessage ( testErrorMessageId INT NOT NULL CONSTRAINT PKtestErrorMessage PRIMARY KEY, otherColumn varchar(10) NOT NULL CONSTRAINT AKtestErrorMessage UNIQUE (otherColumn), ) GO INSERT INTO test.testErrorMessage (testErrorMessageId, otherColumn) VALUES (1,'First') GO Then, inserting a duplicate row for the primary key value: INSERT INTO test.testErrorMessage (testErrorMessageId, otherColumn) VALUES (1,'First') GO
And on 2008 R2, I get: Msg 2627, Level 14, State 1, Line 1 Violation of PRIMARY KEY constraint 'PKtestErrorMessage'. Cannot insert duplicate key in object 'test.testErrorMessage'. Now on Denali CTP3, you get a little bit more: Msg 2627, Level 14, State 1, Line 1 Violation of PRIMARY KEY constraint 'PKtestErrorMessage'. Cannot insert duplicate key in object 'test.testErrorMessage'. The duplicate key value is (1). Then, to show the same thing for the UNIQUE CONSTRAINT: INSERT INTO test.testErrorMessage (testErrorMessageId,otherColumn) VALUES (2,'First') On 2008 R2, you get the following Msg 2627, Level 14, State 1, Line 4 Violation of UNIQUE KEY constraint 'AKtestErrorMessage'. Cannot insert duplicate key in object 'test.testErrorMessage'. And again on Denali CTP3: Msg 2627, Level 14, State 1, Line 4 Violation of UNIQUE KEY constraint 'AKtestErrorMessage'. Cannot insert duplicate key in object 'test.testErrorMessage'. The duplicate key value is (First). You can see if you duplicate > 1 value, it gives you one of the items. It might be better if the message didn’t imply that it was the only duplicate value, but hey, it is a great improvement. If you think it ought to be tweaked to say “A duplicated key value is (…) or something, click here) INSERT INTO test.testErrorMessage (testErrorMessageId,otherColumn) VALUES (5,'Third'),(6,'Third'),(3,'Second'),(4,'Second') Msg 2627, Level 14, State 1, Line 1 Violation of UNIQUE KEY constraint 'AKtestErrorMessage'. Cannot insert duplicate key in object 'test.testErrorMessage'. The duplicate key value is (Third). It works with indexes also: ALTER TABLE test.testErrorMessage DROP CONSTRAINT AKtestErrorMessage CREATE UNIQUE INDEX UXtestErrorMessage ON test.testErrorMessage(otherColumn) INSERT INTO test.testErrorMessage (testErrorMessageId,otherColumn) VALUES (5,'Third'),(6,'Third'),(3,'Second'),(4,'Second') You get the following: Msg 2601, Level 14, State 1, Line 1 Cannot insert duplicate key row in object 'test.testErrorMessage' with unique index 'UXtestErrorMessage'. The duplicate key value is (Third).
Much nicer!
|
-
In this last kind of “creative” chapter, I will look at some of the ways you implement common problems in your relational database, and some of the ways you probably shouldn’t. The “should” sections will deal with: - Uniqueness – Beyond the simple uniqueness we have covered in the first chapters of the book, looking at some very realistic patterns of solutions that cannot be implemented with a simple uniqueness constraint.
- Data Driven Design – The goal of data driven design is that you never hard code values in your code that don’t have a fixed meaning. You break down your programming needs into situations that can be based on sets of data values that can be modified without affecting code.
- Hierarchies – A very common need is to implement hierarchies in your data. The most common example is the manager-employee relationship. In this section I will demonstrate the two simplest cases, and summarize other methods that you can explore
- Images, Documents, and Other Files – There are quite often a need to store documents in the database, like for a web users’ avatar picture, or a security photo to identify an employee, or even documents of many types. We will look at some of the methods available to you in SQL Server.
- Storing User-Specified Data – You can’t always design a database to cover every known future need. In this section I will cover some of the possibilities for letting the user extend their database themselves in a manner that can be somewhat controlled by the administrators.
- Generalization – In this section we will look at some ways that you will need to be careful with how specific you make your tables so that you fit the solution to the needs of the user.
This marks an increase of a 4 sections from the last book, when I added this Patterns chapter. I did take away a few bits about sequence and calendar tables, but I do plan to move this to a later chapter on development, where I will discuss the sorts of objects that I find nice to have in each database and why. For the anti-patterns, I am adding one more in this edition, on undecipherable data. - Undecipherable data – Too often you find the value 1 in a column with no idea what 1 means without looking into copious amounts of code.
- One-size-fits-all domain – One domain table to implement all domains rather than individual tables that are smaller and more precise
- Generic key references – Having one column where the data in the column might be the key from any number of tables, requiring you to decode the value rather than know what it is.
- Overusing unstructured data – The bain of existances of the dba, the blob of text column that the user swears they put well structured data in for you to parse out. Can’t eliminate a column for notes here and there, but overuse of such constructs lead to lots of dba pain.
This is one of my favorite chapters because it really gets to the core of design. Up until now we have stuck mostly to basics and fundamentals, building very basic structures and working with the object types available to us. Here we build practical solutions to solve common problems. From here, the next chapter we will move along to security, structures, and then putting the finishing touches on things. Hopefully soon I will have some exciting new about a final chapter that will tie it all together once I get final approval and acceptance from the guest writer on this chapter.
|
-
As the book progresses, I find myself veering from the original stated outline quite a bit, because as I teach about this more (and I am teaching a daylong db design class in August at http://www.sqlsolstice.com/… shameless plug, but it is on topic :) I start to find that a given order works better. Originally I had slated myself to talk more about modeling here for three chapters, then get back to the more implementation topics to finish out the book, but now I am going to keep plugging through the implementation tasks, then finish up with modeling task (which I hope I might end up getting some help with…emails are going out once I talk it over with my editor). In the last edition, the chapter on data protection was more inclusive, including programmatic data protection, including client code and stored procedures. But, keeping with the basic, implementation type chapters (and trying my best to shorten chapters to more realistic chunks (the free chapter shouldn’t be 1/2 of the book, or so I am told), I will put that off to probably the final chapter. This chapter was broken up into two main sections, Check Constraints and Triggers. I will demonstrate the following scenarios, and if you see anything missing, please do make suggestions Check Constraints - Simple value checks – Like when you want to make sure there are no blank string values CHECK (len(value) > 0)
- Value reasonableness checks – Like if a value should be a non-negative integer, CHECK (value >= 0)
- Checks using different tables – Like setting up a data driven format check
Triggers – Broken down by AFTER and INSTEAD OF Triggers - AFTER
- Range checks on multiple rows – Like when you want to make sure that the sum of rows related to (and including) the newly inserted rows is > 0
- Maintaining summary values (only as necessary) – Denormalization, pure and simple, but if you are going to do it, triggers are the way to go (you really shouldn’t)
- Cascading inserts – Like creating child rows to ensure a 1 to at least 1 row relationship is met, or creating a parent
- Child-to-parent cascades – Like deleting parent rows when the last child row is deleted
- Maintaining an audit trail – Also something that will come up in security, but implementing an audit trail of actions on a table. Less needed these days with auditing, but
- Relationships that span databases and servers – sometimes you just have to implement RI between databases, so it is back to 6.0 style RI
- INSTEAD OF
- Automatically maintaining values – For example, if you want to implement a bulletproof rowLastUpdatedTime column to know when the row last changed, but don’t trust the client (who does?)
- Formatting user input – Like formatting words in all caps, or proper case. Another thing that might be better done outside of SQL Server, but it is very straightforward to implement
- Redirecting invalid data to an exception table – For example, eliminating data that is outside of the norm. Possibly done better outside of SQL Server code, but if you really want to build something that takes previous data into consideration, this might be a reasonable way.
- Forcing no action to be performed on a table, even by someone who technically has proper rights – Simple do nothing trigger that works because in an instead of trigger you have to replicate the action, so you don’t.
It might seem weird to consider formatting data or redirecting invalid data to another table as data protection, but the point of data protection is to make sure that they data ends up in a reasonable state, and triggers can do some “magical” seeming stuff. Admittedly, triggers are not a fan favorite with many programmers because they do those magical stuff that they cannot directly control, but in many ways that is the point. If the dev forgets to update the last update date, the ETL may not see the row, and oops your data is out of sync. In any case, I do my best to make it clear that you don’t in fact have to do any of this, but here are the tools in the tool bag.
|
-
So I was notified a few days ago that one of my sessions was selected, and one is an alternate. Luckily, it was the one that I have the most experience with, and the alternate is my latest session that I am really quite happy with after doing it virtually and now at the SQL Saturday in Columbus. The selected session is: Database Design Fundamentals In this session I will give an overview of how to design a database, including the common normal forms and why they should matter to you if you are creating or modifying SQL Server databases. Data should be easy to work with in SQL Server if the database has been organized as close as possible to the standards of normalization that have been proven for many years. Many common T-SQL programming "difficulties" are the result of struggling against the way data should be structured and can be avoided by applying the basic normalization techniques and are obvious things that you find yourself struggling with time and again (i.e. using the SUBSTRING function in a WHERE clause meaning you can't use an index efficiently). | And I really look forward to seeing how it goes this year. Since last year’s version at PASS, where I finished out a year in less than great form, I have worked on this presentation quite a lot. Some of the examples were hard to read, and the data models just didn’t seem to bring it across, so I added tabular examples that really seemed to give it an extra push of interest when I did it for the Hampton Roads User Group earlier in the year. I am also hoping that they put me in the smallest room possible so the 40-60 people that I average every year with this session are all crammed on top of each other (why? Check answer 7 here: http://sqlpeople.net/admin/2011/05/09/louis-davidson/) The irony is, I am really glad that the other sessions didn’t get picked up because frankly I am swamped and the other ones would have been totally new sessions that I would have spent at least 100 hours on. I figured the Database Structures session wouldn’t make it because it would never compete with Kalen, Kim, Paul, and many many others… When I put it in, I was planning on doing Devlink in Chattanooga in August where it was picked up. However, I am now speaking at SQL Solstice in Raleigh the very same week, and there I am doing the full day Database Design Workshop that I did last year in Orlando for SQL Saturday. I will blog more about that later, but suffice it to say, there are 5 day long sessions by some awesome names: Ed Wilson (PowerShell), Andy Leonard (SSIS), Jessica Moss (BI), Andrew Kelly (Performance). I feel honored just be there with them, much less speaking at the same time. So if you are looking for some learning in mid August in the South/Southeastern US, SQL Solstice is shaping up to be pretty awesome, and for a quality/value ratio that is almost unbelievable, you can’t beat Devlink for a developer oriented conference with a solid DB track (including Joe Webb, Brad McGehee, Robert Cain and others). And duh, the PASS Summit is always awesome, though clearly not in the South/Southeastern US (at least not until 2013!) So I am a bit hesitant to post this because the past few times I have posted about something I have submitted they have failed to occur. But hey, I might as well. If I don’t I might get all 5 sessions I submitted and have to do them.. and that would be super tiring. So I put in 5, including a 1/2 day session and a pre-con and if I could choose only one, it would be the pre-con. Before you start thinking I am stating the obvious, I have done one pre-con by myself at the Orlando SQL Saturday last year and it was great to have enough time to actually do the topic of relational database design, including time to actually get the people in the session to create designs on their own. The 1/2 day session is intriguing. I am thinking that a 1/2 day would still give us time to do some database designs, though not likely on the whiteboard in teams. In either case, if I get to do either of these, I figure I would bring my Wacom Cintiq 12x monitor to interact onscreen… So, the sessions I submitted are, starting with the regular sessions: Database Design Fundamentals – A session I have done quite a few times over the years, including quite a few SQL Saturdays recently with major changes to the normalization bits based on newer, friendlier example code. Characteristics of a Great Relational Database – A new session that I have only done once for the AppDev virtual chapter. It is a basically a list of seven characteristics that make a database great, rather than just functional. Database Structures for Programmers – A session that I am writing for Devlink this year (and probably the Nashville SQL Server User Group in July) that is probably encroaching on a few of my idol’s sessions who talk about structures. My goal in suggesting it was to branch out a bit and start talking a bit more about non-relational stuff. A great lacking I think is sessions about the physical structures, particularly at the level I tend to give talks at (which is to talk about complex subjects in less than complex manners.) A lot of developers don’t want to get so deep it gets confusing, but they do need to know about how the physical implementation of SQL Server can be manipulated for your benefit. Database Design Patterns In Practice (1/2 day) – In this session, my goal will be to cover constraints, triggers, etc and how you can put them together to make interesting objects in your design. It will be highly interactive, more or less consisting of an overview of data protection/implementation objects, then a set of scenarios that we will implement together. Database Design Workshop (pre-con) – This session will cover database design from the basics to actually putting it into practice. It is a bit like drinking from a firehose, but it very practical in nature as we will be looking at stuff that any person who has to do database design will do regularly. So there you have it. If I get even one of them I will be happy to do it. Speaking at PASS may be the most stressful of all of the speaking I do since the competition is strong, there are lots of tracks,and I have regularly been in the final slot of the conference where I probably would have been too tired to attend a session myself. If you want to get a solid idea of why this gets to me, watch for my SQL People entry comes up sometime this month.
|
-
Assuming all goes as planned, I will be in Columbus, OH this Friday night and Saturday for SQL Saturday 75. I really love SQL Saturday events the best of all of the events because they are very intimate in nature. As a fairly antisocial person, I sometimes get overwhelmed by the size of other events, even the SQL Rally was just barely in my comfort range. Here the number of people and size of rooms just feels like home, like you are shooting the breeze with a group of friends. My session will be at 9:00 AM (full schedule), so don’t be late! Characteristics of a Great Relational Database When queried, most database professionals would mention normalized as one of the most important characteristics that tell the difference between a good and bad database design. I won't disagree in the least, but there is so much more to be considered. Even if you did a great job of normalization, poor naming, poorly implemented keys, too many or too few indexes, and so on can derail your design. In this session I will present seven primary characteristics of a design that differentiates between an ugly design that will have your colleagues nitpicking you to death and one that will have them singing your praises. Characteristics such as comprehendible, documented, secure, well performing, and more (including normalized, naturally) will be discussed.
It is the second time I will do this presentation, and the first time where I can see the faces of the recipients, so it will be nice to gauge how people like it. It is a lot of fun actually, though no matter what I am talking about I want to talk more about normalization, which I believe is the key to improving the databases of the future. So many of the session that are given at these things are geared towards systems that are already screwed up and limping along, and I really want to evangelize the merits of doing it right. That is where this presentation fits in, the time period between design and performance tuning, where you determine the future work that is done with the system. Is it well performing, understandable, easy to maintain and use? Or does it take a crew of ten thousand DBAs doing nothing but putting their fingers in the leaks to keep the thing running? And don’t forget all of those BI sessions too… The better you do with the relational database, the easier the dimensional designer/implementers have it too. Anyhow, I hope to see you all there (well, not all of you, just the SQL nerds who are reading this. The history buffs who are still wondering why we are going to be inside the founder of our continent this weekend, well, you I feel sorry for you.
|
-
A while back, I was working on a short article about Normalization for a book that never got published (admittedly I wasn’t getting paid for the article, and it wasn’t for charity, so I wasn’t that broken up over it.) The task at hand was to, in 2 pages or less, describe the process of normalization and help you to know when you have finished. In my upcoming book Pro SQL Server 2000 + N (where N > 10) Relational Database Design and Implementation, it takes about 45 pages. So it wasn’t really a realistic task, especially considering I have spent about a full paragraph letting you know how hard the task is going to be. The most important thing that is missing from this short introduction is examples, which I include in the book in truck loads. There are two distinct ways that Normalization is approached. In a very formal manner, there are a progressive set of “rules” that specify “forms” that you are working to achieve. There is nothing wrong with that definition, but progressing through the forms in a stepwise manner is certainly not how any seasoned data architect is likely to approach the problem of designing data storage. Instead, you design with the principles of normalization in mind, and use the normal forms as a test to your design. The problem with getting a great database design is compounded with how natural the process seems. The first database that the past uneducated version of me built had 10+ tables, all of obvious ones like customer, orders, etc. set up so the user interface could be produced to satisfy the client. However, tables for address and even order items were left as part of the main tables, making it a beast to work with for queries, and as my employer wanted more and more out of the system, the design became more and more taxed. The basics were there, but the internals were all wrong and the design could have used about 50 or so tables to flesh out the correct solution. Soon after (at my next company, sorry Terry), I gained a real education in the basics of database design, and the little 1000 watt halogen light bulb went off… That light bulb was there because what had looked like a more complicated in the college database class that no normal person would have created (bet you can’t guess what my grade was in that class!) was really there to help my design fit in with the tools that I was using. Turns out that the people who create relational database engines use the same concepts of normalization to help guide how the engine is created that I needed to for a database to work well. So if the relational engine vendors are using a set of concepts to guide how they create the engine, it turns out to be actually quite helpful if you follow along. First, lets look at the “formal” rules. The normalization rules are stated in terms of “forms”, starting at First Normal Form, and including several others some of which are numbered, some are named for the creators of the rule. (Note that in the strictest terms, to be in a greater form, you ought to also conform to the lesser form. So you can’t be in third normal form and not give in to the definition of the First). To be honest, it is rare that a data architect will actually refer to the normal forms in conversation specifically unless they are having a nerd argument with a developer that is trying to design an entire customer relationship management system in a single table, but understanding the basics of normalization is essential to understanding why it is needed. What follows is a very quick restatement of the normal forms: - First Normal form/Definition of a Table – Attribute and row “shape”
- All columns must be atomic—one value per column
- All rows of a table must contain the same number of values – no arrays
- Each row should be different from all other rows in the table – unique rows
- Boyce-Codd Normal Form – Every candidate key is identified, and all attributes are fully dependent on a key, and all columns must identify a fact about a key and nothing but a key.
- Encompasses:
- Second Normal Form - All attributes must be a fact about the entire primary key and not a subset of the primary key
- Third Normal Form - All attributes must be a fact about the primary key and nothing but the primary key
- Fourth Normal Form - There must not be more than one multivalued dependency represented in the entity. That is to say that every attribute relates to the key with a cardinality of one. Not a common rule to violate, but it definitely does occur.
- Fifth Normal Form - All relationships are broken down to binary relationships when the decomposition is lossless. Very rarely violated in typical designs.
There are other, more theoretical forms that I won’t mention, but they are rare to even encounter the definition. In the reality of the development cycle of life, the stated rules are not hard and fast rules, but merely guiding principles that can be useful to help you avoid certain pitfalls. In practice, we end up with denormalization, (meaning purposely violating a normalization principle for a stated, understood purpose, not ignoring the rules to get done faster) mostly to satisfy some programming or performance need from the consumer of the data (programmers/queriers/etc) Once you deeply “get” the concepts of normalization, you really will find that you build a database like a well thought out Lego creation, desiring how each piece will fit in to the creation before putting pieces together, because disassembling 1000 Lego bricks to make a small change makes Legos more like work than fun. Some rebuilding based on keeping agile can be needed, but the more you plan ahead, the less data you will have to reshuffle. In actual practice, the formal definition of the rules aren’t thought of at all, but instead the guiding principles that they encompass are. In my mind, I use the following four concepts in the back of my mind to guide the database I am building, falling back to the more specific rules for the really annoying/complex problem I am trying to solve: - Columns - One column, one value
- Table/row uniqueness – Tables have independent meaning, rows are distinct from one another.
- Proper relationships between columns – Columns either are a key or describe something about the row identified by the key.
- Scrutinize dependencies - Make sure relationships between three values or tables are correct. Reduce all relationships to binary relationships if possible.
The question in the title still has yet to be conquered. “How to know when you are done?” What I left out of the description of Normalization was the granularity you go with. The word “atomic” is a common way to describe a table or column that is normalized enough. Atomic would tend to indicate something that is broken down to its absolute lowest form. But unless you are not a nerd (and would you really be reading this if you weren’t?) we know that there are lots of particles smaller than an atom. When you try to mess with particles smaller than the atom, you get a mushroom cloud that even Timothy Leary would not have approved of. It is the same way with databases. Tables and columns split to their atomic level have one and only one meaning. Deal with them at a higher level, and you will suffer with lots of substrings, switching attributes that you use to find out what a table means in a situation. But break things down too far, and you will suffer even more. My best example of this is a column that holds a large quantity of text. If you never need to us part of the data using SQL, a single column is perfect (a set of notes that the user uses on a screen is a good example.) You wouldn’t want a paragraph, sentence, and character table to store this information. On the other hand, that same character column is abused when the users start putting coded information (because users WILL find a way to work if your software fails them). Then and you have to search for, you will need to begin working with the less comfortable string manipulation functions in SQL… And just try to index a part of a large text column. Possible? Sometimes. Best way to go? Never. The key to knowing what is normalization and what is an academic exercise for a nerd is to understand the needs of the users (commonly referred to as requirements, as in “Why don’t we ever have good requirements before we code!?!”). If it is clear that the user is planning on maintaining a list of values and will need to update them programmatically, then it is your job to make each value a row in a table. But if there is no requirement to ever search on a value in that list or programmatically access part of the value, then it might be overkill to do anything other than leave the value alone. It is often best to err on the side of caution, but the ideal relational storage for a document would be minimally at the word/punctuation level. If you are read this far and are convinced that would be the proper solution, then you need to get a complete book or take a class on the subject before you start creating a relational database. The reasonable answer to when you are done normalization is when users have exactly the right number of places to store the data they need and you can query/use the data without parsing it… Easy enough until the user changes their mind, huh?
|
-
So I am a bit hesitant to post this because the past few times I have posted about something I have submitted they have failed to occur. But hey, I might as well. If I don’t I might get all 5 sessions I submitted and have to do them.. and that would be super tiring. So I put in 5, including a 1/2 day session and a pre-con and if I could choose only one, it would be the pre-con. Before you start thinking I am stating the obvious, I have done one pre-con by myself at the Orlando SQL Saturday last year and it was great to have enough time to actually do the topic of relational database design, including time to actually get the people in the session to create designs on their own. The 1/2 day session is intriguing. I am thinking that a 1/2 day would still give us time to do some database designs, though not likely on the whiteboard in teams. In either case, if I get to do either of these, I figure I would bring my Wacom Cintiq 12x monitor to interact onscreen… So, the sessions I submitted are, starting with the regular sessions: Database Design Fundamentals – A session I have done quite a few times over the years, including quite a few SQL Saturdays recently with major changes to the normalization bits based on newer, friendlier example code. Characteristics of a Great Relational Database – A new session that I have only done once for the AppDev virtual chapter. It is a basically a list of seven characteristics that make a database great, rather than just functional. Database Structures for Programmers – A session that I am writing for Devlink this year (and probably the Nashville SQL Server User Group in July) that is probably encroaching on a few of my idol’s sessions who talk about structures. My goal in suggesting it was to branch out a bit and start talking a bit more about non-relational stuff. A great lacking I think is sessions about the physical structures, particularly at the level I tend to give talks at (which is to talk about complex subjects in less than complex manners.) A lot of developers don’t want to get so deep it gets confusing, but they do need to know about how the physical implementation of SQL Server can be manipulated for your benefit. Database Design Patterns In Practice (1/2 day) – In this session, my goal will be to cover constraints, triggers, etc and how you can put them together to make interesting objects in your design. It will be highly interactive, more or less consisting of an overview of data protection/implementation objects, then a set of scenarios that we will implement together. Database Design Workshop (pre-con) – This session will cover database design from the basics to actually putting it into practice. It is a bit like drinking from a firehose, but it very practical in nature as we will be looking at stuff that any person who has to do database design will do regularly. So there you have it. If I get even one of them I will be happy to do it. Speaking at PASS may be the most stressful of all of the speaking I do since the competition is strong, there are lots of tracks,and I have regularly been in the final slot of the conference where I probably would have been too tired to attend a session myself. If you want to get a solid idea of why this gets to me, watch for my SQL People entry comes up sometime this month.
|
-
This week I signed up for Plamen Ratchev’s PASS SQLRally pre-con, “Maximize Your SQL Server 2008 Coding Skills,” and here’s why you ought to join me. This is a bit of an odd experience because, well, if you readers had voted a few more times for me, I would be writing examples for my pre-con instead of attending this one :) Once I decided to attend SQLRally, May 11-13 in Orlando, I decided that I wanted to attend a pre-con for a couple of reasons. First off, I like to learn about how other people teach, and it is rare that you get a chance to attend a full day of learning for only $199. As a data architect, and an author, getting a feel for how other people design for and use language features is essential; otherwise you tend to miss some of the more interesting features that you don’t use frequently. As an example, I did a lot of research and even wrote a chapter in “Pro SQL Server 2005” on new T-SQL features in SQL Server 2005, and there was one interesting feature that I didn’t hear about for two years after the book came out…not until I attended a presentation where a presenter used the feature. (No one who reviewed the book, on the editorial staff or publicly, noticed either.) For SQL Server 2008, I didn’t spend a lot of time with some of the language features unless they specifically went hand-in-hand with design, such as the new date and time data types and FILESTREAM. In addition, due to some legacy vendor systems that held us back to SQL Server 2005, the production systems I have worked on have not used many of the new 2008 features until the last three months or so. With Denali bearing down on me, I felt that attending Plamen’s pre-con would be valuable in quickly getting me up to speed. (I also considered Kevin Kline’s “Leadership and Team Management Skills for the Database Professional” pre-con, but I was afraid I might end up with more hair on the sides of my head, possibly forming a point.) What I hope to get out of this pre-con is a pretty deep dive into the new 2008 features in T-SQL, particularly: 1. Some realistic usage of the MERGE statement, especially with an understanding of the caveats of using MERGE with triggers that I have heard about 2. A fairly complete working example of using the FILESTREAM data type 3. Some interesting uses for grouping sets 4. Complete coverage of table-valued parameters, including hopefully how to explain them to application programmers 5. Practical examples of how to use all of the new features, including ideas of how to design a database application to make use of them (again, ideas that can be incorporated into writing later) All in all, since some of these features are not utilized in the typical mainstream application (FILESTREAM grouping sets, and to a lesser extent, MERGE), getting some practice using them will certainly not be a bad thing. So come on out and join me for a day of talking about T-SQL—you can find all the registration information here. What else could you be doing, working?
|
-
Or “Finally getting to use SQL Server like the dang name of the book implies” I am smack dab in the middle of chapter 6 now, and I have finally figured out the concept and how it is going to be. Originally, chapter 6 was going to be the start of a series of design pattern chapters, but a funny thing happened when I reached the end of Chapter 5 on Normalization. The book felt incomplete. My feeling had been to assume that the reader had used SQL, knew how to execute a create table statement, an insert statement, etc. I am still going to do that, and I am trimming down this chapter from 96 pages in the previous version to 60 or 70 pages, but it still had to be done. I am taking a small database that I designed (a conference messaging database, twitterish, but not exactly. An alterative idea that I got from a RDW2 was a menu/ingredients, but to include a time element seemed too difficult) that has 5 tables to start with and 6 when it is all said and done. Keeping it small means all of the DDL can be printed in the book. The structure of the chapter is broken into two sections. The first section is preparing the model for implementation. The model is a logical model that is generally implementation non-specific, fully normalized, and could be built with any sort of key structure. The job is to transform it into a fully implementable SQL Server model, using whatever your standard primary key structure might be (I use surrogates with a required unique constraint for my tables). The reality of how we all really work is that most of these topics are done as we go along, but not always do we get to do the modeling from start to finish. When I end up implementing someone else’s design, I will usually normalize their tables if needed, then apply the following steps to standardize the model to whatever standards are in place. This main section would be named something like "Adapting the Logical Model” - Choosing names: I’ll mention naming concerns for tables and columns. The biggest thing here is making sure to have a standard and to follow it.
- Choosing key implementation: Throughout the earlier bits of the book, you’ve made several types of key choices. In this section, you will go ahead and add all the keys to the model.
- Determining domain implementation: I’ll cover the method to decide datatypes, nullability, and, also important, choosing between using a domain table or a column with a constraint for types of values where you want to have solid constraints on column values.
- Setting up schemas: Beginning in SQL Server 2005, you could set up groups of tables as schemas that provide groupings of tables for usage, as well as security.
- Adding Implementation Columns: There are columns that are common to almost every database that people implement that are not part of the logical design. For example the last time the row was updated, and by whom it was updated.
The next major section will be “Using DDL to Create the Database” In this section, I will go through the common DDL that is needed to build most every database you will encounter. Basically, using the model I have created, I will go through the process of creating a very basic database. For non-trivial utilizations, we will demonstrate them as patterns that can be followed to do certain tasks. If it feels goofy to cover that way, I just won’t include it… The second section of the chapter will cover the following sections: - Creating the basic table structures: Building the base objects with columns.
- Adding uniqueness constraints: Using primary and unique constraints to enforce uniqueness between rows in the table.
- Using default constraints: Assisting users in choosing proper values when it isn’t obvious.
- Adding relationships: Defining how tables relate to one another (foreign keys), including cascading options
- Documenting the database: Including documentation directly in the SQL Server object using extended properties.
- Validating the dependency information: Using the catalog views and dynamic management views, you can validate that the objects you expect to depend on the existence of one another do in fact exist, keeping your database cleaner to manage.
In some ways if feels like 2 chapters, and it is going to be pretty long, but I like that the chapter takes a model with logical terminology/domains and ends up a database in SQL Server. Which is what people are going to want out of the book anyhow. After this chapter I won’t do complete databases anymore (first off because they are a pain in the hiney to keep straight over 70 pages, because any mistake or forgotten concept means having to go back to when the snowball was really small and make sure everything is in there already. So we will stick to independent model snippets for the rest of the book.
|
|
|
|
|
|