THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Aaron Bertrand

Aaron is a senior consultant for SQL Sentry, Inc., makers of performance monitoring and event management software for SQL Server, Analysis Services, and Windows. He has been blogging here at sqlblog.com since 2006, focusing on manageability, performance, and new features; has been a Microsoft MVP since 1997; tweets as @AaronBertrand; and speaks frequently at user group meetings and SQL Saturday events.

The case against INFORMATION_SCHEMA views

In SQL Server 2000, INFORMATION_SCHEMA was the way I derived all of my metadata information - table names, procedure names, column names and data types, relationships... the list goes on and on. I used the system tables like sysindexes from time to time, but I tried to stay away from them when I could.

In SQL Server 2005, this all changed with the introduction of catalog views. For one thing, they're a lot easier to type. sys.tables vs. INFORMATION_SCHEMA.TABLES? Come on; no contest there - even on a case insensitive collation where you don't have to make those ugly upper-case words consistent. But further to that, I found them generally easier to use, in addition to including feature and implementation details that are not covered by the INFORMATION_SCHEMA views. A few examples:


IDENTITY columns

Here are the columns available in the INFORMATION_SCHEMA.COLUMNS view:

TABLE_CATALOG
TABLE_SCHEMA
TABLE_NAME
COLUMN_NAME
ORDINAL_POSITION
COLUMN_DEFAULT
IS_NULLABLE
DATA_TYPE
CHARACTER_MAXIMUM_LENGTH
CHARACTER_OCTET_LENGTH
NUMERIC_PRECISION
NUMERIC_PRECISION_RADIX
NUMERIC_SCALE
DATETIME_PRECISION
CHARACTER_SET_CATALOG
CHARACTER_SET_SCHEMA
CHARACTER_SET_NAME
COLLATION_CATALOG
COLLATION_SCHEMA
COLLATION_NAME
DOMAIN_CATALOG
DOMAIN_SCHEMA
DOMAIN_NAME

Nothing in there about IDENTITY properties. Whereas in sys.columns, we have the is_identity column, as well as the catalog view sys.identity_columns, with all kinds of additional information. Now, it should be no surprise that this data is missing from the INFORMATION_SCHEMA views, since IDENTITY columns are proprietary to SQL Server and aren't part of the standard. But if you need to retrieve metadata information, this will often include the need for information about IDENTITY columns, and if you're trying to avoid using the catalog views (or proprietary SQL Server functions like COLUMNPROPERTY), you'll be out of luck.


Indexes, partitions, statistics...

Since these are physical implementation details, there is no coverage anywhere in INFORMATION_SCHEMA. Go ahead, create an index (that is not also participating in a constraint), or partition a table, and try to find any evidence of either of these structures in any INFORMATION_SCHEMA view (including filtered indexes, XML indexes, spatial indexes, fulltext indexes and include columns). Also try to find out if a table has a clustered index, if a primary key is clustered or non-clustered, or if there any statistics available. This is all important data, and the person trying to find the information may not have any idea about the line between physical and logical, or why the "recommended" approach to obtaining metadata should treat them differently or ignore physical details altogether.


Non-standard Foreign Key constraints

SQL Server allows something that the standard doesn't allow: foreign key constraints referencing unique indexes. Here is an example:

CREATE TABLE dbo.FKExample
(
    
ColA INT,
    
ColB INT
);
CREATE UNIQUE INDEX UQ ON dbo.FKExample(ColA, ColB);

CREATE TABLE dbo.FKExampleRef
(
    
ColA INT,
    
ColB INT,
    
CONSTRAINT FK_UQ FOREIGN KEY(ColA, ColB)
        
REFERENCES dbo.FKExample(ColA, ColB)
);
Now to determine these relationships using INFORMATION_SCHEMA views, we'd normally perform a join between INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS, INFORMATION_SCHEMA.TABLE_CONSTRAINTS and INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE. But since the unique index is not a constraint, there is no data in INFORMATION_SCHEMA.TABLE_CONSTRAINTS, so no way to dig deeper and determine the columns used in the "constraint" without going out to catalog views.
SELECT CONSTRAINT_NAME, UNIQUE_CONSTRAINT_NAME 
  
FROM INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS;
CONSTRAINT_NAME UNIQUE_CONSTRAINT_NAME
--------------- ----------------------------
FK_UQ       UQ
SELECT CONSTRAINT_NAME
  
FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS
  
WHERE CONSTRAINT_NAME = 'UQ'
UNION ALL
SELECT CONSTRAINT_NAME
  
FROM INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE
  
WHERE CONSTRAINT_NAME = 'UQ';
-----------------
0 row(s) affected

Because this "constraint" was implemented only as an index, the INFORMATION_SCHEMA views do not contain any further information about it. But we can still get the information using sys.foreign_keys, sys.foreign_key_columns, and sys.columns, just like we would for a foreign key that referenced a unique constraint or a primary key constraint:

SELECT 
    
FK = fk.name,
    
FKTable = QUOTENAME(OBJECT_SCHEMA_NAME(fkcol.[object_id]))
        +
'.' + QUOTENAME(OBJECT_NAME(fkcol.[object_id])),
    
FKCol = fkcol.name,
    
' references => ',
    
PKTable = QUOTENAME(OBJECT_SCHEMA_NAME(pkcol.[object_id]))
        +
'.' + QUOTENAME(OBJECT_NAME(pkcol.[object_id])),
    
PKCol = pkcol.name
FROM sys.foreign_keys AS fk
INNER JOIN sys.foreign_key_columns AS fkc
ON fk.[object_id] = fkc.constraint_object_id
INNER JOIN sys.columns AS fkcol
ON fkc.parent_object_id = fkcol.[object_id]
AND fkc.parent_column_id = fkcol.column_id
INNER JOIN sys.columns AS pkcol
ON fkc.referenced_object_id = pkcol.[object_id]
AND fkc.referenced_column_id = pkcol.column_id
ORDER BY fkc.constraint_column_id;

Results:

 

If you were relying on a similar join within the INFORMATION_SCHEMA views, you would not realize that this relationship existed.


Computed columns

Computed columns are another area where the INFORMATION_SCHEMA views turn a blind eye. Consider the following simple example:

CREATE TABLE dbo.ComputedExample
(
    
a INT,
    
b AS (CONVERT(INT, a + 10))
);

Reviewing the list of INFORMATION_SCHEMA.COLUMNS columns listed above, or just running a query, you can see no place for the definition of the second column, or any discernible difference between the two columns - they're both integers and, apart from COLUMN_NAME and ORDINAL_POSITION, all the column values are identical. In sys.columns, we have a flag called is_computed, and we can retrieve the definition of the column from sys.computed_columns using the following query:

SELECT c.name, cc.[definition]
  
FROM sys.columns AS c
 
LEFT OUTER JOIN sys.computed_columns AS cc
  
ON c.[object_id] = cc.[object_id]
  
AND c.is_computed = 1
  
WHERE c.[object_id] = OBJECT_ID('dbo.ComputedExample')
  
ORDER BY c.column_id;

Results:

 

As far as the INFORMATION_SCHEMA views are concerned, b is just an ordinary column.


Sparse columns

Much like computed columns, there is no spot in the INFORMATION_SCHEMA.COLUMNS view to indicate whether a column is sparse or a column_set. In sys.columns you will find the columns is_sparse and is_column_set to quickly identify which columns in which tables have been defined as sparse and how the columns are used.


sys.sequences vs. INFORMATION_SCHEMA.SEQUENCES

SQL Server 2012 added INFORMATION_SCHEMA.SEQUENCES for the Sequence functionality they added. However, only the information covered by the standard is included in the view. If you want to determine if the Sequence is exhausted, for example, or the current value of the Sequence object, you won't find that from INFORMATION_SCHEMA.SEQUENCES; you'll have to go to sys.sequences anyway. Example:

CREATE SEQUENCE dbo.SequenceExample
    
AS INT
   
MINVALUE 1
   
MAXVALUE 65535
   
START WITH 1
   
CACHE 1000
   
CYCLE;
GO
SELECT NEXT VALUE FOR dbo.SequenceExample;
GO 3
SELECT current_value, is_exhausted
    
FROM sys.sequences
    
WHERE name = 'SequenceExample';

Results:

 


object_id is missing

As we know, much of SQL Server's internal mechanisms identify metadata by concepts such as object_id, which don't exist in the INFORMATION_SCHEMA views - they are entirely name-based. This makes joins much more cumbersome - instead of:

SELECT /* cols */
    
FROM sys.tables AS t
    
INNER JOIN sys.columns AS c
    
ON t.[object_id] = c.[object_id];

You have to say:

SELECT /* cols */
    
FROM INFORMATION_SCHEMA.TABLES AS t
    
INNER JOIN INFORMATION_SCHEMA.COLUMNS AS c
    
ON t.TABLE_SCHEMA = c.TABLE_SCHEMA
    
AND t.TABLE_NAME = c.TABLE_NAME;

It doesn't look like that much more code, but combined with all the other extra typing you're doing, it will add up if you are writing a lot of metadata queries.


Alias type inconsistency

If you use alias types (in spite of my warnings about them), you'll notice some inconsistency. In INFORMATION_SCHEMA.COLUMNS and INFORMATION_SCHEMA.ROUTINE_COLUMNS, for example, the alias name is in a column called DOMAIN_NAME. In INFORMATION_SCHEMA.PARAMETERS, the column name is USER_DEFINED_TYPE_NAME. And in INFORMATION_SCHEMA.SEQUENCES the column is called DECLARED_DATA_TYPE. (With the catalog views, everything maps back to sys.types columns user_type_id and/or system_type_id.) Now, the view INFORMATION_SCHEMA.COLUMN_DOMAIN_USAGE has a row for every usage within tables, views and table-valued functions - however your joins or unions to pull this information are going to be quite convoluted if you have to look up how the column is referenced differently in every single view.


Other ramblings

Further to these things that I have observed, Microsoft seems to be going out of its way to steer you away from the INFORMATION_SCHEMA views as well. For example, http://msdn.microsoft.com/en-us/library/ms186224.aspx says:

"Do not use INFORMATION_SCHEMA views to determine the schema of an object. The only reliable way to find the schema of a object is to query the sys.objects catalog view."

The warning really should state that the INFORMATION_SCHEMA views are incomplete; the wording above implies that they might be incorrect, which in itself is incorrect. The INFORMATION_SCHEMA views in this case use the underlying information from sys.objects, so if one were to be incorrect, they'd both be incorrect. I complained about this here:

http://connect.microsoft.com/SQLServer/feedback/details/686118/doc-information-schema-tables-gives-improper-warning-about-schema

Also, http://msdn.microsoft.com/en-us/library/ms186778.aspx says:

"Some changes have been made to the information schema views that break backward compatibility. These changes are described in the topics for the specific views."  

...and...

“The information schema views included in SQL Server comply with the ISO standard definition for the INFORMATION_SCHEMA.”

It is important to note that this statement of compliance was true for ANSI/ISO SQL-92, but not for revisions that have been made to the standard since then. There are changes present in SQL:1999, for example, that have not been reflected in the SQL Server implementation. And as I've highlighted above, there are many features in SQL Server that fall outside of the SQL standard that also do not appear within the INFORMATION_SCHEMA views. I suspect that at some point Books Online will be updated to provide a general warning on all INFORMATION_SCHEMA topics that they are incomplete and that they should only be used for the simplest of metadata queries, if at all. I've asked for this warning as well:
http://connect.microsoft.com/SQLServer/feedback/details/686121/doc-information-schema-topics-should-warn-that-the-views-are-incomplete

Until then, I guess you just have to know what's missing from the INFORMATION_SCHEMA views, and be aware of when you can use them, and when you have to fall back to the catalog views to get a complete picture.

Or, do what I do, and avoid the INFORMATION_SCHEMA views altogether. This will make your programming more consistent; after all, why use the catalog views only when you HAVE to?

I know that for some of you this is counter-intuitive because you deal with other RDBMS platforms where you also use INFORMATION_SCHEMA. I don't really have a good answer for folks in that boat, except to be cognizant of the inconsistencies and omissions on each platform, and perhaps to resign yourself to learning the metadata approach relevant to each platform rather than try a one-size-fits-all approach.
 

Published Thursday, November 03, 2011 12:49 PM by AaronBertrand

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

bender said:

Amen.  They've thrown me for loops on numerous occasions.

November 3, 2011 5:52 PM
 

Something for the Weekend – SQL Server Links 04/11/11 said:

November 4, 2011 6:37 AM
 

Jānis said:

Is there way to find ordinal possition in other way than querying INFORMATION_SCHEMA.COLUMNS?

Select ORDINAL_POSITION

From INFORMATION_SCHEMA.COLUMNS

November 4, 2011 7:18 AM
 

AaronBertrand said:

Janis, you can get the same information from sys.columns.column_id. This column doubles as the column identifier and the ordinal position of the column.

November 4, 2011 8:34 AM
 

Jānis said:

Thanks! Didn't saw that before!

seems that column_id always grows, but can have gaps, if drop some column in middle of table.

November 4, 2011 9:51 AM
 

AaronBertrand said:

Yes, that is true. Like IDENTITY columns, the primary goal is not to represent a sequence without gaps, but rather just ensure uniqueness. If they went out of their way to prevent gaps, it would mean that every time you change a table they'd have to renumber the column_ids *and* go out and fix all the column_id references in all the other tables and views across the schema (think of all the column_id references used in indexes, foreign keys and other constraints, even dependency metadata).

If you want to present a gapless ordinal position list to users without revealing that the table has changed, provided that they weren't going to use that to do programmatic lookups in the metadata themselves, you could simply say ORDINAL_POSITION = ROW_NUMBER() OVER (ORDER BY column_id) ...

November 4, 2011 10:03 AM
 

Robert L Davis said:

I'm in the same boat. I used to use them extensively and now I almost never use them. For the same reasons you outlined. I believe that they are maintained for ANSI SQl compliance only and this brings up another question. Is there any real value in being ANSI SQL compliant (to a limited degree)? Should SQL Server just drop the ANSI compliance altogether?

November 4, 2011 2:46 PM
 

AaronBertrand said:

I like the ANSI compliance bit where it works and makes sense. For example, I'm glad that OFFSET was implemented in a standard way, unlike how MySQL did it. Microsoft is guilty of the same kind of thing over time (TIMESTAMP still bites us today, for example). I'm also glad that they're filling out the missing spots in the standard regarding window functions.

That said, I'm not a big fan of letting ANSI compliance hold us back. If there are features that the standard doesn't cover, Microsoft can find itself in a pickle - especially when the standard finally adds those things. Should it implement those changes now, and risk non-compliance later, or should it stifle functionality growth to ensure compliance? If the latter, how long do we wait for the standard to catch up?

I think there is a case for adhering to the standard where possible, and I would never advocate contradicting the standard. But I don't believe it will ever be practical to have 100% compliance, no matter how much resources you put behind the thing. You are always going to have areas that are not practical to implement, and you're always going to be adding or enhancing features beyond the standard.

November 4, 2011 3:03 PM
 

Aaron Bertrand said:

This month's T-SQL Tuesday is being hosted by Brad Schulz ( blog ) and the topic is one that should attract

November 8, 2011 2:19 AM
 

Chris Anderson said:

Excellent coverage.  A good guideline can be if a query can be, or 'should be' scoped to only the logical design of the database, only the information_schema views should be used.  You then limit yourself to objects that affect the logical structure, and should have some reasonable level of expectation of being able to share that query across database platfroms.   If the query involves the physical design as well, then go for it using the catalog views, and don't have any expectation of being to share the exact code across platforms, only the concept.  

A salient point is speaking about 'indexes':

"Go ahead, create an index (that is not also participating in a constraint), or partition a table, and try to find any evidence of either of these structures in any INFORMATION_SCHEMA view".  Exactly!  the index doesn't affect the logical design.  In practice, a few blurry edges - that whole 'unique index' throws me for a loop as it is a physical design consideration that has logical impacts.  Would it be good practice to have a corresponding unique constraint to any unique indexes?

January 11, 2012 4:21 PM
 

AaronBertrand said:

Thanks Chris, just one counter-point, the logical argument rarely comes into play in my experience. In customer environments if I'm looking at schema it involves both. Even if there is an odd case where I'm only concerned about logical constructs, why should I change my programming methodology for just those cases, when the catalog views *also* completely cover the logical design?

January 11, 2012 4:27 PM
 

Mike said:

Thanks for this Aaron.

I've always been confused by which to use where.

The templates that come by default in SSMS use INFORMATION_SCHEMA when checking the existence of an object (see Drop/Create Stored Procedure templates), but then go and use the statement you outlined above about not using INFORMATION_SCHEMA to determine the schema of an object.

I never got that one.

February 20, 2012 2:42 PM
 

Why I am switching from Google to Bing | the musings of a bertrand… said:

February 29, 2012 2:55 PM

Leave a Comment

(required) 
(optional)
(required) 
Submit

About AaronBertrand

...about me...

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement