<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Paul Nielsen : Normalization</title><link>http://sqlblog.com/blogs/paul_nielsen/archive/tags/Normalization/default.aspx</link><description>Tags: Normalization</description><dc:language>en</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>Denormalize for Performance? </title><link>http://sqlblog.com/blogs/paul_nielsen/archive/2008/10/03/denormalize-for-performance.aspx</link><pubDate>Fri, 03 Oct 2008 12:00:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:9209</guid><dc:creator>Paul Nielsen</dc:creator><slash:comments>18</slash:comments><comments>http://sqlblog.com/blogs/paul_nielsen/comments/9209.aspx</comments><wfw:commentRss>http://sqlblog.com/blogs/paul_nielsen/commentrss.aspx?PostID=9209</wfw:commentRss><description>&lt;DIV align=left&gt;Conventional wisdom says to “normalize to 3rd normal form then denormalize for performance.” Poppycock! I say for an OLTP operational database (not a reporting or BI database), a well normalized design will outperform a denormalized design for three good reasons:&lt;/DIV&gt;
&lt;DIV align=left&gt;&lt;BR&gt;1)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; If the denormalization duplicates data, then the DML operations have to write to multiple tables, this means extra code and extra work. &lt;/DIV&gt;
&lt;DIV align=left&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV align=left&gt;2)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; If the denormalization duplicates data, these will typically be keys or at least candidate keys, so they’ll be indexed which contributes to the index bloat problem. &lt;BR&gt;&lt;/DIV&gt;
&lt;DIV align=left&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV align=left&gt;3)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; If the denormalization repeats columns (item1, item2, item3, etc), then the data has to massaged before being inserted or updated, and when reading the data it typically needs to be unpivoted for set-based code, and these operations mean extra work.&lt;BR&gt;&lt;/DIV&gt;
&lt;DIV align=left&gt;Those who promote denormalization would say argue that denormalization is trading write performance for read performance, and this is true for reporting database, but in issue number 3 above, even read performance suffers. &lt;/DIV&gt;
&lt;DIV align=left&gt;&lt;BR&gt;Since denormalization tends to go with SQL code in the app layer (no database abstraction in the form of stored procedures), the denormalization also impacts extensibility, since it’s even more expensive now to modify the database schema. &lt;/DIV&gt;
&lt;DIV align=left&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV align=left&gt;In production databases I’ve built, I’ve tested a pure normalized design vs. the best&amp;nbsp;denormalized design I could come up with. Having a clearly defined database abstraction layer meant that I could run one script to modify the table structure, correct the data, and alter the stored procedure, and the app could continue to make the same stored procedure calls without breaking. In every case, the normalized design outperformed the denormalized design by about 15% due the normalized design’s more efficient code.&lt;/DIV&gt;&lt;img src="http://sqlblog.com/aggbug.aspx?PostID=9209" width="1" height="1"&gt;</description><category domain="http://sqlblog.com/blogs/paul_nielsen/archive/tags/abstraction/default.aspx">abstraction</category><category domain="http://sqlblog.com/blogs/paul_nielsen/archive/tags/data+modeling/default.aspx">data modeling</category><category domain="http://sqlblog.com/blogs/paul_nielsen/archive/tags/denormalization/default.aspx">denormalization</category><category domain="http://sqlblog.com/blogs/paul_nielsen/archive/tags/Normalization/default.aspx">Normalization</category></item><item><title>Another Reason to Normalize</title><link>http://sqlblog.com/blogs/paul_nielsen/archive/2007/10/16/another-reason-to-normalize.aspx</link><pubDate>Tue, 16 Oct 2007 18:48:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:3015</guid><dc:creator>Paul Nielsen</dc:creator><slash:comments>1</slash:comments><comments>http://sqlblog.com/blogs/paul_nielsen/comments/3015.aspx</comments><wfw:commentRss>http://sqlblog.com/blogs/paul_nielsen/commentrss.aspx?PostID=3015</wfw:commentRss><description>&lt;P class=MsoNormal style="MARGIN:0in 0in 10pt;"&gt;&lt;FONT face=Calibri size=3&gt;Database can be measured by six properties – four determined by the design; Usability, Integrity, Performance/Scalability, and Extensibility; and two being more a function of implementation than design; Availability and Security. &lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN:0in 0in 10pt;"&gt;&lt;FONT face=Calibri size=3&gt;Of these six factors, I’d argue that in the long run, extensibility is the one that is more expensive to repair. All six are necessary, but a database that has become brittle (touch one thing, everything else breaks) is either impossible or extremely expensive to correct. Database extensibility is bought with three techniques; a strongly enforced data abstraction layer, data driven design, and normalization. Normalization is typically credited with the integrity of the database, and this is true. But normalization also influences the extensibility of the database. Here’s why, &lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN:0in 0in 10pt;"&gt;&lt;FONT face=Calibri size=3&gt;A poorly designed, anti-normalized database is typically made to work using multiple layers of convoluted code. It takes a lot of extra code to keep anti-normalized data consistent. All this code is more expensive to maintain, and makes it very difficult to implement changes and new features. &lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN:0in 0in 10pt;"&gt;&lt;FONT face=Calibri size=3&gt;It’s not even a factor of pay-me-now or pay-me-later. An anti-normalized database doesn’t just cost a bit more to fix later; it costs so much to fix later that companies, or careers, fail due to the poor design. The bottom line is still that a man-day spent on database design saves, at least, a man-month of extra work later. &lt;/FONT&gt;&lt;/P&gt;&lt;img src="http://sqlblog.com/aggbug.aspx?PostID=3015" width="1" height="1"&gt;</description><category domain="http://sqlblog.com/blogs/paul_nielsen/archive/tags/Extensibility/default.aspx">Extensibility</category><category domain="http://sqlblog.com/blogs/paul_nielsen/archive/tags/Normalization/default.aspx">Normalization</category></item></channel></rss>