THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Paul Nielsen

www.SQLServerBible.com

Data Architecture

from SQL Server 2008 Bible Appendix A.

Data Architecture is the overarching design of the database(s) and how they interact with other software. Data architecture is defined by the Information Architecture Principle and the six attributes by which every database can be measured.

Information Architecture Principle

For any complex endeavor, there is value in beginning with a common principle to drive designs, procedures, and decisions. A credible principle is understandable, robust, complete, consistent, and stable. When an overarching principle is agreed upon, conflicting opinions can be objectively measured, and standards can be decided upon that support the principle.

The Information Architecture Principle encompasses the three main areas of information management: database design and development, enterprise data center management, and business intelligence analysis.

Information Architecture Principle: Information is an organizational asset, and, according to its value and scope, must be organized, inventoried, secured, and made readily available in a usable format for daily operations and analysis by individuals, groups, and processes, both today and in the future.

Unpacking this principle reveals several practical implications. First, there should be a known inventory of information, including its location, source, sensitivity, present and future value, and current owner. While most organizational information is stored in IT databases, un-inventoried critical data is often found in desktop databases and spreadsheets scattered throughout the organization.

Just as the value of physical assets varies from asset to asset and over time, the value of information is also variable and so must be assessed. The value of the information may be high for an individual or department, but less valuable to the organization as a whole; information that is critical today might be meaningless in a month; or information that may seem insignificant individually might become critical for organizational planning once aggregated.

If the data is to be made easily available in the future, current designs must be decoupled to avoid locking the data in a rigid, but brittle, database.

Database Attributes

Based on the Information Architecture Principle, every data store can be designed or evaluated by six interdependent database objectives. Four of these objectives are a function of design: usability, data integrity, performance, and extensibility. Availability and security are more a function of implementation than design.

With sufficient design effort and a clear goal of meeting all six objectives, it is fully possible to design and develop a database that meets all six database objectives. The idea that one attribute is gained only at the expense of another attribute is a myth.

Usability

The usability of a data store involves the completeness of meeting the organization's requirements, the suitability of the design for its intended purpose, the effectiveness of the format of data available to applications, and the ease of extracting information. The most common reason a database is less than usable is an overly complex or inappropriate design.

Usability is enabled in the design by:

  • A thorough understanding of the organizational requirements and lifecycle planning of their implementation.
  • Normalization and correct handling of optional data
  • Simplicity of design.
  • A well-defined abstraction layer with stored procedures and views.

Integrity

The ability to ensure that persisted data can be retrieved without error is central to the Information Architecture Principle, and the first major problem tackled by the database world. Without data integrity, a query's answer cannot be guaranteed to be correct, consequently, there's not much point in availability or performance.

As data is essentially entities and attributes, data integrity consists of entity integrity and domain integrity, which includes referential integrity and user-defined integrity. Transactional integrity, which deals with how data is written and retrieved, is defined by the ACID principles (atomicity, consistency, isolation, and durability), discussed in a later section, transactional faults, and isolation levels.

Integrity is enabled in the design by:

  • Normalization and correct handling of optional data.
  • A well-defined abstraction layer with stored procedures and views.
  • Data quality unit testing using a well defined and understood set of test data.

Performance / Scalability

Presenting readily usable information is a key aspect of the Information Architecture Principle. Although the database industry has achieved a high degree of performance, the ability to scale that performance to very large databases with more connections is still an area of competition between database engine vendors.

Performance is enabled in the design by:

  • A well-designed schema with normalization and generalization, and correct handling of optional data.
  • Set-based queries implemented within a well-defined abstraction layer with stored procedures and views.
  • Correct clustered and non-clustered indexes that eliminate bookmark lookups for the most common queries.
  • Tight, fast transactions that reduce locking and blocking.

Extensibility

The information architecture principle states that the information must be readily available today and in the future, which requires that the data store is extensible, able to be easily adapted to meet new requirements. As an industry, data integrity, performance, and availability are all mature and well understood, so the next major hurdle for the industry to conquer is extensibility.

Extensibility is enabled in the design by:

  • Normalization and correct handling of optional data.
  • Generalization of entities when designing the schema.
  • Data-driven design designs that model not only the obvious data (e.g. orders, customers), but also enable the organization to store the behavioral patterns, or process flow.
  • A well-defined abstraction layer with stored procedures and views that de-couple the database from all client access including client apps, middle-tiers, ETL, and reports.
  • Extensibility is also closely related to simplicity. Complexity breeds complexity, and inhibits adaptation.

Availability

The availability of information refers to the information's accessibility when required regarding uptime time, locations, and the availability of the data for future analysis. Disaster recovery, redundancy, archiving, and network delivery all affect availability.

Security

The sixth database objective based on the Information Architecture Principle is security. For any organizational asset, the level of security must be secured depending on its value and sensitivity. For software, the security begins with the physical security of the data center and the operating system's security. Information security includes three additional components: restricting access to specific data using the database engine's security, identifying the owner of the information, and confirming the veracity of the data by identifying the source, including updates.

Published Sunday, November 25, 2007 11:16 AM by Paul Nielsen

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Adam Machanic said:

I don't agree that performance or availability have anything to do with data or information architecture.  I think that these areas of architecture should be purely logical in focus, and not concerned with physical implementation details of indexes, file systems, networks, and other areas that are of interest when we worry about performance and HA.  Security might arguably also be considered an iffy area to deal with at the logical level, except perhaps if you're designing logical views that users with various levels of access might have over the database...

November 25, 2007 4:16 PM
 

David Portas said:

I would agree with Adam. It's extremely important always to distinguish between logical and physical issues.

November 26, 2007 6:03 PM
 

Shendil said:

Paul is talking about Database attribute not the logical model or architecture.  So it makes sense to mention security, performance and availability as there is clear separation of concern (between database, service and presentation) in terms of implementation.

May 15, 2008 6:09 PM
 

Jag said:

Hello,

A very helpful post for me, but could you help me with the following:

In four out of six attributes you mention 'correct handling of optional data', so I assume that is rather important, could you expand on this please, especially how it is relevant to the four attributes it is mentioned under?

thx

July 26, 2010 4:55 AM

Leave a Comment

(required) 
(required) 
Submit

About Paul Nielsen

Paul Nielsen believes SQL is the romance language of data. As such he’s a hands-on database developer, Microsoft SQL Server MVP, trainer, and author of SQL Server Bible series (Wiley). As a data architect, he developed the concepts of Smart Database Design and Nordic – an open source O/R dbms for SQL Server. He lives in Colorado Springs.

This Blog

Syndication

News

news item test
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement