I recently received a question about the memory required to run a Process
Defrag on a Tabular model in Analysis Services. The Process Defrag is useful
when you run incremental processing of a table frequently, or when some of the
values in the dictionary are no longer used in the table, for example if you
process the same partition multiple times in a table and/or remove partitions
from a table. Cathy Dumas wrote an interesting blog
post about the savings you can obtain by running process defrag.
I made some investigation and I’ve been assured that data is not completely
uncompressed in this process, even if some coding/encoding happens. In a rough
estimate, you need a free space equivalent to the size of the table you are
going to defrag (which is already in memory) plus buffer for transient data
structure, but data are managed in a compressed form, without requiring larger
memory buffers for uncompressed raw data. So, if all partitions of your table
and the column dictionaries require 100MB in RAM, then you need another 100MB of free RAM in order to
execute a Process Defrag.
Here is a more detailed description I received from Akshai Mirchandani:
In addition to the master copies, it requires enough memory for a new dictionary, and the final compressed data + some small temporary buffer space for the transient data structures (no big buffers like normal processing).
It is essentially going to read each column value, insert it into a new dictionary, get a new DataID back from the dictionary insert, and append that DataID to the current segment. It doesn’t need to do VertiPaq (compression) again, and it doesn’t keep the uncompressed data in buffers like the data processing algorithm does.
The important point here is that the analysis of the segment data to come up
with the best compression strategy no longer needs to be performed – and that’s
typically the most expensive step of the compression (the VERTIPAQ_STATE in
DISCOVER_STORAGE_TABLE_COLUMN_SEGMENTS shows whether this was done).
This is a good news if you are concerned with memory required to perform this operation.