|
|
|
|
Aaron is a senior consultant for SQL Sentry, Inc., makers of performance monitoring and event management software for SQL Server, Analysis Services, and Windows. He has been blogging here at sqlblog.com since 2006, focusing on manageability, performance, and new features; has been a Microsoft MVP since 1997; tweets as @AaronBertrand; and speaks frequently at user group meetings and SQL Saturday events.
Today on the twitter hash tag #sqlhelp, @leenux_tux asked:
|
How can I figure out the percentage of fields that don't have data ? |
After further clarification, it turns out he is after what proportion of columns are NULL.
Some folks suggested using a data profiling task in SSIS. There may be some validity to that, but I'm still a fan of sticking to T-SQL when I can, so here is how I would approach it:
- Create a #temp table or @table variable to store the results.
- Create a cursor that loops through all of the columns for all tables, including total row count for the table, and whether or not the column is in fact nullable.
- Within the cursor, for columns that are nullable, get a count where columns are NULL, and insert the data for each column into the #temp table or @table variable.
- I think it's ok here to use WITH (NOLOCK) to obtain the count, since you are likely only looking for estimates
- Run queries against the results, depending on exactly what you are after - percentage for certain columns, percentage across the database, what have you.
Some sample code, though depending on your tables and data distribution I am not going to vouch for how long this will take - may be a good idea to restore a backup somewhere and run this kind of thing there, as opposed to your production system:
DECLARE @all_columns TABLE ( [table] NVARCHAR(512), [column] SYSNAME, nullable INT, row_count BIGINT, null_count BIGINT );
DECLARE @sql NVARCHAR(MAX), @table NVARCHAR(512), @column SYSNAME, @nullable BIT, @row_count INT, @null_count INT;
DECLARE c CURSOR LOCAL STATIC FORWARD_ONLY READ_ONLY FOR SELECT QUOTENAME(OBJECT_SCHEMA_NAME(c.[object_id])) + '.' + QUOTENAME(OBJECT_NAME(c.[object_id])), c.name, c.is_nullable, p.row_count FROM sys.columns AS c INNER JOIN ( SELECT [object_id], row_count = SUM(row_count) FROM sys.dm_db_partition_stats WHERE index_id IN (0,1) GROUP BY [object_id] ) AS p ON c.[object_id] = p.[object_id] WHERE OBJECTPROPERTY(c.[object_id], 'IsMsShipped') = 0;
OPEN c;
FETCH NEXT FROM c INTO @table, @column, @nullable, @row_count;
WHILE @@FETCH_STATUS = 0 BEGIN SET @null_count = 0;
IF @nullable = 1 AND @row_count > 0 BEGIN SET @sql = N'SELECT @null_count = COUNT(*) FROM ' + @table + ' WITH (NOLOCK) WHERE ' + QUOTENAME(@column) + ' IS NULL;';
EXEC sp_executesql @query = @sql, @params = N'@null_count INT OUTPUT', @null_count = @null_count OUTPUT; END
INSERT @all_columns([table], [column], nullable, row_count, null_count) SELECT @table, @column, @nullable, @row_count, @null_count;
FETCH NEXT FROM c INTO @table, @column, @nullable, @row_count; END
CLOSE c; DEALLOCATE c;
SELECT [table], [column], nullable, null_count, row_count, null_perc = CONVERT(DECIMAL(5,2), 100.0*null_count/NULLIF(row_count,0)) FROM @all_columns ORDER BY [table], [column];
|
Hope that helps. Even though it uses NOLOCK, dynamic SQL, and a cursor. I'm such a naughty database guy.
Comment Notification
If you would like to receive an email when updates are made to this post, please register here
Subscribe to this post's comments using
About AaronBertrand
...about me...
|
|
|
|
|