Partitioning for Fun and Profit
Over the past year or so I’ve been playing with a toy SQL monitoring application, just to sharpen my dev skills a bit. Part of the app is a SQL database to store historical performance data, in order to do trending. Fooling with the design of the performance database allowed me to look at EAV (Entity Attribute Value) versus other patterns and do some real performance measurements.
I went into this design with some preconceptions about EAV:
- I figured EAV could be a good fit for this problem, given that one would not know which of the hundreds of available performance counters the app would track until run time. At design time, all I know is that the database has to store some unknown, user-defined collection of performance counters. It seems like the classic EAV problem of run time, user-defined attributes.
- I predicted that EAV would be more convenient to implement than a dynamic / multiple table design because the schema would not change at run time. The question is, how much more convenient? Conversely, how much effort and risk is there in extending the schema at run time?
- I thought there would be tradeoffs, where EAV is bigger and slower than having tables built out, especially when it comes to reporting queries. Generally it's possible to boost database performance when you can partition sub-sets of rows into physically separate structures, rather than having them interleaved together in one huge table. The questions here are, how much bigger, and specifically which kinds of queries are slower? Are there concrete rules that can indicate when EAV will work well or not?
To set the stage for this, I am planning to store the performance data in a way that aligns with how Windows performance counters are structured: there’s a tree of counters labeled performance objects / counters / instances in perfmon or counter categories / counters / instance names in C#. It’s pretty easy to dump out a tabular list of available counters with a simple C# or PowerShell snippet:
using System;
using System.Text;
using System.Diagnostics;
namespace ListPerfCounters
{
class Program
{
static void Main(string[] args)
{
using (System.IO.StreamWriter outfile
= new System.IO.StreamWriter("c:\\CounterList.txt"))
{
PerformanceCounterCategory[] cats
= PerformanceCounterCategory.GetCategories();
foreach (PerformanceCounterCategory cat in cats)
{
// Exclude category "Thread" to eliminate an insane number of counters:
if (cat.CategoryName != "Thread")
{
// Single instance categories get labled as instance = "Default":
if (cat.CategoryType == PerformanceCounterCategoryType.SingleInstance)
{
PerformanceCounter[] counters = cat.GetCounters();
foreach (PerformanceCounter counter in counters)
{
outfile.WriteLine(cat.CategoryName + "\t"
+ counter.CounterName
+ "\tDEFAULT");
}
}
// Multi-instance categories are listed out per instance:
else if (cat.CategoryType == PerformanceCounterCategoryType.MultiInstance)
{
String[] instances = cat.GetInstanceNames();
foreach (String instance in instances)
{
PerformanceCounter[] counts = cat.GetCounters(instance);
foreach (PerformanceCounter counter in counts)
{
outfile.WriteLine(cat.CategoryName + "\t"
+ counter.CounterName + "\t"
+ instance);
}
}
}
//Ignore CategoryType = unknown
else
{
// do nothing
}
}
}
}
Console.WriteLine("Press the 'Any' key...");
Console.ReadKey();
}
}
}
The output is a tab-delimited list of most available counters:
|
Processor |
% Processor Time |
_Total |
|
Processor |
% User Time |
_Total |
|
Processor |
% Privileged Time |
_Total |
|
Processor |
Interrupts/sec |
_Total |
|
Processor |
% DPC Time |
_Total |
|
Processor |
% Interrupt Time |
_Total |
|
Processor |
DPCs Queued/sec |
_Total |
|
Processor |
DPC Rate |
_Total |
|
Processor |
% Idle Time |
_Total |
|
Processor |
% C1 Time |
_Total |
|
Processor |
% C2 Time |
_Total |
|
Processor |
% C3 Time |
_Total |
|
Processor |
C1 Transitions/sec |
_Total |
|
Processor |
C2 Transitions/sec |
_Total |
|
Processor |
C3 Transitions/sec |
_Total |
|
Processor |
% Processor Time |
0 |
|
Processor |
% User Time |
0 |
|
Processor |
% Privileged Time |
0 |
|
Processor |
Interrupts/sec |
0 |
|
Processor |
% DPC Time |
0 |
|
Processor |
% Interrupt Time |
0 |
|
Processor |
DPCs Queued/sec |
0 |
|
Processor |
DPC Rate |
0 |
|
Processor |
% Idle Time |
0 |
|
Processor |
% C1 Time |
0 |
|
Processor |
% C2 Time |
0 |
|
… etc … |
|
|
First the EAV schema
The EAV pattern essentially says that since we don’t know what attributes (what specific performance counters, in this case) need to be stored, we store the data in labeled rows, instead of in columns – rows being really easy to change at run time, while columns are not. The simplest implementation of this is with two tables, one small table of monitored servers, and one huge table of labeled performance counter values. If a “sample” is defined as a set of specific counter values at one point in time, then this big table will have a set of rows for each server, for each time point, where each row is labeled per performance counter:
CREATE DATABASE [CountersEAV]
GO
ALTER DATABASE [CountersEAV] SET RECOVERY SIMPLE
GO
USE [CountersEAV]
GO
CREATE TABLE dbo.[Servers](
ServerID int IDENTITY(1,1) NOT NULL,
ServerName varchar(128) NOT NULL,
[Description] varchar(255) NOT NULL,
CONSTRAINT PK_Servers PRIMARY KEY CLUSTERED ( ServerID ASC )
);
GO
CREATE UNIQUE NONCLUSTERED INDEX UniqueServerNames ON dbo.[Servers]
(
ServerName ASC
);
GO
CREATE TABLE dbo.CounterSamples(
SampleTime datetime NOT NULL,
ServerID int NOT NULL,
PerfObject nvarchar(30) NOT NULL,
Counter nvarchar(50) NOT NULL,
Instance nvarchar(30) NOT NULL,
Value sql_variant NOT NULL,
CONSTRAINT PK_CounterSamples PRIMARY KEY CLUSTERED
(
SampleTime ASC,
PerfObject ASC,
[Counter] ASC,
Instance ASC,
ServerID ASC
)
);
GO
ALTER TABLE dbo.CounterSamples
WITH CHECK ADD CONSTRAINT FK_CounterSamples_Servers FOREIGN KEY( ServerID )
REFERENCES dbo.[Servers] ( ServerID );
GO
ALTER TABLE dbo.CounterSamples CHECK CONSTRAINT FK_CounterSamples_Servers;
GO
Inserting the counter data into this table is trivial:
CREATE PROCEDURE saveCounterSample (
@SampleTime datetime,
@ServerID int,
@PerfObject nvarchar(30),
@Counter nvarchar(50),
@Instance nvarchar(30),
@Value float
) AS
BEGIN
SET NOCOUNT ON;
INSERT INTO dbo.CounterSamples (
SampleTime,
ServerID,
PerfObject,
[Counter],
Instance,
Value
) VALUES (
@SampleTime,
@ServerID,
@PerfObject,
@Counter,
@Instance,
@Value
);
END
GO
Next, EAV Sample Data
In order to see how this will perform at scale, we need to populate it with a lot of sample data, certainly more than will fit in memory. I’ve got a script that can do this by just creating random values for a reasonable set of different counters, in bulk:
INSERT INTO dbo.[Servers] ( ServerName, [Description] )
VALUES ( 'MyServer', 'Test Server' );
INSERT INTO dbo.[Servers] ( ServerName, [Description] )
VALUES ( 'BigServer','Big One' )
INSERT INTO dbo.[Servers] ( ServerName ,[Description] )
VALUES ( 'ProdServer' ,'Some Server' )
INSERT INTO dbo.[Servers] ( ServerName ,[Description] )
VALUES ( 'SQLProd01' ,'Yet Another Server' )
INSERT INTO dbo.[Servers] ( ServerName ,[Description] )
VALUES ( 'SQLProd02' ,'Still another' )
GO
Sample counter values (note: this makes a decent quantity of data, four weeks of one minute sample intervals for 194 counters on five imaginary servers 4 * 24 * 60 * 5 * 194 ~= 39,000,000 rows)
-- Populate with test data
-- Create rows for many counters representing a 24 hour period
DECLARE @now datetime;
SET @now = '2009-01-01';
DECLARE @val float;
WHILE @now < '2009-01-02' BEGIN
BEGIN TRANSACTION;
SET @val = RAND() * 10;
EXEC saveCounterSample @now, 1, N'Memory', N'Pages/sec', N'Default', @val;
SET @val = RAND() * 10;
EXEC saveCounterSample @now, 1, N'PhysicalDisk', N'Avg. Disk Queue Length', N'_Total', @val;
SET @val = RAND() * 10;
EXEC saveCounterSample @now, 1, N'Processor', N'% Processor Time', N'_Total', @val;
SET @val = RAND() * 10;
EXEC saveCounterSample @now, 1, N'Processor', N'% Processor Time', N'0', @val;
SET @val = RAND() * 10;
EXEC saveCounterSample @now, 1, N'Processor', N'% Processor Time', N'1', @val;
SET @val = RAND() * 10;
EXEC saveCounterSample @now, 1, N'Processor', N'% Processor Time', N'2', @val;
SET @val = RAND() * 10;
EXEC saveCounterSample @now, 1, N'Processor', N'% Processor Time', N'3', @val;
/* … repeat for many sample counters … */
SET @val = RAND() * 10;
EXEC saveCounterSample @now, 1, N'MSSQL$TEST2000:General Statistics', N'User Connections', N'DEFAULT', @val;
SET @val = RAND() * 10;
EXEC saveCounterSample @now, 1, N'MSSQL$TEST2000:Locks', N'Average Wait Time (ms)', N'_Total', @val;
COMMIT;
SET @now = DATEADD( minute, 1, @now );
END;
GO
-- Use the first day's values and a random multiplier to mock test data for a month:
DECLARE @i int;
SET @i = 1;
WHILE @i < 28 BEGIN
INSERT INTO dbo.CounterSamples (
SampleTime,
ServerID,
PerfObject,
[Counter],
Instance,
Value
) SELECT
DATEADD( DAY, @i, SampleTime ) as SampleTime,
ServerID,
PerfObject,
[Counter],
Instance,
CAST ( Value as float ) * ( RAND() * 1.5 ) as Value
FROM dbo.CounterSamples
WHERE '2009-01-01' <= SampleTime and SampleTime < '2009-01-02';
SET @i += 1;
END;
GO
-- Use the whole month's data and a random multiplier to generate mock data for more servers
DECLARE @i int;
SET @i = 2;
WHILE @i <= 5 BEGIN
INSERT INTO dbo.CounterSamples (
SampleTime,
ServerID,
PerfObject,
[Counter],
Instance,
Value
) SELECT
SampleTime,
@i as ServerID,
PerfObject,
[Counter],
Instance,
CAST ( Value as float ) * ( RAND() * 1.5 ) as Value
FROM dbo.CounterSamples
WHERE ServerID = 1;
SET @i += 1;
END;
GO
Index rebuild at the end, as the table is almost certainly highly fragged.
This design does function, but there are some challenges:
- The CounterSamples table is obviously huge, and highly repetitive because of the row labels.
- A query is most likely to “want” the data from one counter or another, as separate sets, but practically never from a mixture of different counters. Directly aggregating the “total % processor used” values with, for example, the “percent log used” values makes no sense, because the counters measure two unrelated metrics. So it seems like we have rows mixed together that most queries will want to separate.
Counterpoint: Dynamic Table Solution
Another possible schema is driven by the notion that at query time this data “wants” to be partitioned by counter, so all the values for each specific counter are in one place. Since we don’t know the counter list until run time, it means we have to code around the fact that tables need to be added dynamically to the schema. This does add to the complexity of the code, while possibly paying dividends in storage requirement and performance.
First, we can use the same Servers table as in the first example, so assume we have that one. Next, we will need to be able to dynamically generate a table to store counter samples for any specific performance counter. A template for such a table might look like this:
CREATE TABLE dbo.[LogicalDisk_% Disk Read Time__Total](
SampleTime datetime NOT NULL,
ServerID int NOT NULL,
Value float NOT NULL,
CONSTRAINT [PK_LogicalDisk_% Disk Read Time__Total] PRIMARY KEY CLUSTERED
(
SampleTime ASC,
ServerID ASC
)
);
GO
ALTER TABLE [dbo].[LogicalDisk_% Disk Read Time__Total]
WITH CHECK ADD CONSTRAINT [FK_LogicalDisk_% Disk Read Time__Total_Servers] FOREIGN KEY([ServerID])
REFERENCES dbo.[Servers] ([ServerID])
GO
ALTER TABLE dbo.[LogicalDisk_% Disk Read Time__Total] CHECK CONSTRAINT [FK_LogicalDisk_% Disk Read Time__Total_Servers]
GO
But, code has to be written to take a performance counter and generate a table like this at run time. In a “real” scenario, it probably makes sense to do that client side, but to mock this up I am going to make a fake, dynamic SQL driven stored procedure that will generate these tables:
CREATE PROCEDURE addCounterTable (
@PerfObject nvarchar(30),
@Counter nvarchar(50),
@Instance nvarchar(30)
) AS
BEGIN
DECLARE @tableName varchar(128);
SET @tableName = @PerfObject + '_' + @Counter + '_' + @Instance;
IF NOT EXISTS (
select * from INFORMATION_SCHEMA.TABLES
where TABLE_SCHEMA = 'dbo'
and TABLE_NAME = @tableName
) BEGIN
-- Requested table is missing, so create it
DECLARE @qtablename nvarchar(128)
DECLARE @qpkname nvarchar(128)
DECLARE @qfkname nvarchar(128)
SET @qtablename = QUOTENAME( @tableName );
SET @qpkname = QUOTENAME( 'PK_' + @tableName );
SET @qfkname = QUOTENAME( 'FK_' + @tablename + '_Servers' );
declare @createStmts nvarchar(max);
set @createStmts =
'CREATE TABLE dbo.' + @qtablename + '( ' + CHAR(10)
+ ' SampleTime datetime NOT NULL, ' + CHAR(10)
+ ' ServerID int NOT NULL, ' + CHAR(10)
+ ' Value float NOT NULL, ' + CHAR(10)
+ 'CONSTRAINT ' + @qpkname + ' PRIMARY KEY CLUSTERED ' + CHAR(10)
+ ' ( ' + CHAR(10)
+ ' [SampleTime] ASC, ' + CHAR(10)
+ ' [ServerID] ASC ' + CHAR(10)
+ ' ) ' + CHAR(10)
+'); ' + CHAR(10)
+ 'ALTER TABLE dbo.' + @qtablename + ' ' + CHAR(10)
+ 'WITH CHECK ADD CONSTRAINT ' + @qfkname + ' ' + CHAR(10)
+ 'FOREIGN KEY( ServerID ) ' + CHAR(10)
+ 'REFERENCES dbo.Servers ( ServerID ); ' + CHAR(10)
+ 'ALTER TABLE dbo.' + @qtablename + ' ' + CHAR(10)
+ 'CHECK CONSTRAINT ' + @qfkname + '; ' + CHAR(10)
-- PRINT @createStmts
EXEC( @createStmts )
END
END
GO
Similarly, client-side code would have to be written to insert counter samples into the database by intelligently using the correct table for each counter. Technically this isn’t very difficult, as one could put a check for a matching table into the constructor of a client-side object, which would validate that an appropriate table exists (or create one), and then use that table from that point on. I can also mock this in T-SQL with a dynamic SQL stored procedure, though I would not implement it this way on a real system:
CREATE PROCEDURE saveCounterSample (
@SampleTime datetime,
@ServerID int,
@PerfObject nvarchar(30),
@Counter nvarchar(50),
@Instance nvarchar(30),
@Value float
) AS
BEGIN
SET NOCOUNT ON;
DECLARE @tableName varchar(128);
SET @tableName = @PerfObject + '_' + @Counter + '_' + @Instance;
DECLARE @checkedTableName nvarchar(128);
SELECT @checkedTableName = TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'dbo'
and TABLE_NAME = @tableName;
IF( @checkedTableName is not null )
BEGIN
DECLARE @insertcmd nvarchar(max);
SET @insertcmd =
N'INSERT INTO dbo.' + QUOTENAME( @checkedTableName ) + N' ( ' + char(10)
+ N' SampleTime, ServerID, Value ' + char(10)
+ N') VALUES (' + char(10)
+ N' @SampleTime, @ServerID, @Value ' + char(10)
+ N');';
DECLARE @paramlist nvarchar(1000);
SET @paramlist = N'@SampleTime datetime, @ServerID int, @value float';
--PRINT @insertcmd;
EXECUTE sp_executesql
@insertcmd,
@paramlist,
@sampleTime = @sampleTime,
@serverID = @ServerID,
@Value = @Value;
END
ELSE
BEGIN
RAISERROR('The required table does not exist.', 10, 1);
END
END
GO
Dynamic Tables Sample Data
So, in a real app the tables would be created on demand. In order to load sample data, though, it’s simpler just to initialize the database with the whole sample set of tables:
-- Setup for Sample Data
EXEC addCounterTable N'Memory', N'Pages/sec', N'Default'
EXEC addCounterTable N'PhysicalDisk', N'Avg. Disk Queue Length', N'_Total'
EXEC addCounterTable N'Processor', N'% Processor Time', N'_Total'
EXEC addCounterTable N'Processor', N'% Processor Time', N'0'
EXEC addCounterTable N'Processor', N'% Processor Time', N'1'
EXEC addCounterTable N'Processor', N'% Processor Time', N'2'
EXEC addCounterTable N'Processor', N'% Processor Time', N'3'
EXEC addCounterTable N'Processor', N'% Processor Time', N'4'
EXEC addCounterTable N'Processor', N'% Processor Time', N'5'
EXEC addCounterTable N'Processor', N'% Processor Time', N'6'
EXEC addCounterTable N'Processor', N'% Processor Time', N'7'
EXEC addCounterTable N'Process', N'% Processor Time', N'sqlservr'
EXEC addCounterTable N'LogicalDisk', N'% Disk Read Time', N'_Total'
EXEC addCounterTable N'LogicalDisk', N'% Disk Time', N'_Total'
/* ... More executions to make req’d tables ... */
EXEC addCounterTable N'MSSQL$TEST2000:Locks', N'Average Wait Time (ms)', N'_Total'
Then generate the same quantity of sample data as in the first example:
-- Populate with Sample Data
-- Create 24 hours of samples
DECLARE @now datetime;
SET @now = '2009-01-01';
DECLARE @val float;
WHILE @now < '2009-01-02'
BEGIN
BEGIN TRANSACTION
SET @val = RAND() * 10;
EXEC saveCounterSample @now, 1, N'Memory', N'Pages/sec', N'Default', @val;
SET @val = RAND() * 10;
EXEC saveCounterSample @now, 1, N'PhysicalDisk', N'Avg. Disk Queue Length', N'_Total', @val;
SET @val = RAND() * 10;
EXEC saveCounterSample @now, 1, N'Processor', N'% Processor Time', N'_Total', @val;
SET @val = RAND() * 10;
EXEC saveCounterSample @now, 1, N'Processor', N'% Processor Time', N'0', @val;
SET @val = RAND() * 10;
EXEC saveCounterSample @now, 1, N'Processor', N'% Processor Time', N'1', @val;
SET @val = RAND() * 10;
EXEC saveCounterSample @now, 1, N'Processor', N'% Processor Time', N'2', @val;
SET @val = RAND() * 10;
EXEC saveCounterSample @now, 1, N'Processor', N'% Processor Time', N'3', @val;
/* ... More executions for all counters ... */
SET @val = RAND() * 10;
EXEC saveCounterSample @now, 1, N'MSSQL$TEST2000:Locks', N'Average Wait Time (ms)', N'_Total', @val;
COMMIT
SET @now = DATEADD( minute, 1, @now );
END
-- Use the first day's values and a random multiplier to mock test data for a month:
DECLARE @i int;
SET @i = 1;
WHILE @i < 28 BEGIN
DECLARE @insertstmt nvarchar(max);
SET @insertstmt = '';
SELECT @insertstmt = @insertstmt +
'insert into dbo.' + QUOTENAME( table_name ) + '( '
+ 'SampleTime, '
+ 'ServerID, '
+ 'Value '
+ ') select dateadd( day, ' + cast(@i as varchar(5) ) + ', SampleTime) as SampleTime, '
+ 'ServerID, '
+ 'Value * ( RAND() * 1.5 ) as Value '
+'from dbo.' + QUOTENAME( table_name ) + ' '
+ 'where ''2009-01-01'' <= SampleTime and SampleTime < ''2009-01-02'';' + CHAR(10)
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME != 'Servers';
-- print @insertstmt;
EXEC( @insertstmt );
SET @i += 1;
END
GO
-- Use the month's data and a random multiplier to generate mock data for more servers
DECLARE @i INT;
SET @i = 2;
WHILE @i <= 5 BEGIN
DECLARE @insertstmt NVARCHAR(max);
SET @insertstmt = '';
SELECT @insertstmt = @insertstmt +
'INSERT INTO dbo.' + QUOTENAME( table_name ) + '( '
+ ' SampleTime, '
+ ' ServerID, '
+ ' Value '
+ ') SELECT SampleTime, '
+ cast( @i as varchar(5) ) + ' as ServerID, '
+ ' Value * ( RAND() * 1.5 ) as Value '
+ 'FROM dbo.' + QUOTENAME( table_name ) + ' '
+ 'WHERE ServerID = 1;' + CHAR(10)
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME != 'Servers';
-- PRINT @insertstmt;
EXEC( @insertstmt );
SET @i += 1;
END;
First Impressions
The first thing I note when looking at the sample data is the size discrepancy: the first model requires 5.5 GB to store the counter data, where the second fits in 1.2 GB. Most of that difference has to do with the fact that the EAV “labels” for this case are quite wide, so there’s a lot of repeated text in the EAV model.
The second thing of note is the interleave of different types of perf counters in the one large table of samples, which is what we’re going to examine for performance. This depends some on the order of the cluster key, but essentially it looks like this:
|
2009-01-01 00:00:00.000 |
1 |
LogicalDisk |
% Disk Read Time |
_Total |
1.8793138675398 |
|
2009-01-01 00:00:00.000 |
2 |
LogicalDisk |
% Disk Read Time |
_Total |
2.52991540434083 |
|
2009-01-01 00:00:00.000 |
3 |
LogicalDisk |
% Disk Read Time |
_Total |
1.22021347428242 |
|
2009-01-01 00:00:00.000 |
4 |
LogicalDisk |
% Disk Read Time |
_Total |
0.696864840326046 |
|
2009-01-01 00:00:00.000 |
5 |
LogicalDisk |
% Disk Read Time |
_Total |
0.645544958044297 |
|
2009-01-01 00:00:00.000 |
1 |
LogicalDisk |
% Disk Read Time |
C: |
2.26820505352887 |
|
2009-01-01 00:00:00.000 |
2 |
LogicalDisk |
% Disk Read Time |
C: |
3.05343721676384 |
|
2009-01-01 00:00:00.000 |
3 |
LogicalDisk |
% Disk Read Time |
C: |
1.47271534391143 |
|
2009-01-01 00:00:00.000 |
4 |
LogicalDisk |
% Disk Read Time |
C: |
0.841068849517578 |
|
2009-01-01 00:00:00.000 |
5 |
LogicalDisk |
% Disk Read Time |
C: |
0.779129213808746 |
|
< … 100's of rows … > |
|
|
|
|
|
|
2009-01-01 00:01:00.000 |
1 |
LogicalDisk |
% Disk Read Time |
_Total |
7.72396922675191 |
|
2009-01-01 00:01:00.000 |
2 |
LogicalDisk |
% Disk Read Time |
_Total |
10.3979378149299 |
|
2009-01-01 00:01:00.000 |
3 |
LogicalDisk |
% Disk Read Time |
_Total |
5.01507038723846 |
|
2009-01-01 00:01:00.000 |
4 |
LogicalDisk |
% Disk Read Time |
_Total |
2.86411050056799 |
|
2009-01-01 00:01:00.000 |
5 |
LogicalDisk |
% Disk Read Time |
_Total |
2.65318607846297 |
|
2009-01-01 00:01:00.000 |
1 |
LogicalDisk |
% Disk Read Time |
C: |
9.49033453385117 |
|
< … &c > |
|
|
|
|
|
Rows that I imagine one query “cares” about (green) are dispersed through the table, with a lot of other rows in between (gray).
So, tune in for the next post where I’ll have report query performance information, and a look at the buffer cache contents when reading from these two designs.