THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SQLBI - Marco Russo

SQLBI is a blog dedicated to building Business Intelligence solutions with SQL Server.
You can follow me on Twitter: @marcorus

Memory Considerations about PowerPivot for Excel

PowerPivot for Excel is an Add-In which uses a local version of Analysis Services (SSAS) to process data and make calculation to respond to user queries made using Excel 2010. The SSAS engine is loaded in-memory in the process of Excel. Especially with a 32 bit of Excel, there are particular considerations about the use of the memory.

When an Excel file containing PowerPivot data is open, the memory consumption is not affected by the PowerPivot volume of data. These data are loaded only when the PowerPivot window is opened or when the PivotTables and/or PivotCharts based on PowerPivot data are updated.

Loading PowerPivot data into Excel requires two steps:

  1. The SSAS backup contained into the Excel file is restored into a temporary table (a folder named with an IMBI_ prefix is created in the C:\Users\<<USERNAME>>\AppData\Local\Temp directory). This requires disk space sufficient to restore these files, which are usually 2 to 3 times the size of the Excel file.
  2. The SSAS database, once decompressed, is also loaded in virtual memory. Thus, as a rule of thumb, loading and browsing data of an Excel file containing PowerPivot data requires as much memory as 2 to 3 times the size of the Excel file.

The memory available for PowerPivot in a 32 bit version of Excel is just above 1Gb (it is lower than the virtual memory addressable space). For this reason, is it not so hard to get this error during the process of a large set of data:

Memory error: Allocation failure : Not enough storage is available to process this command. .

The operation has been cancelled.

The best way to avoid this error is using a 64 bit of Excel. However, there are many reasons for not having a 64 bit of Excel even if the operating system is a 64 bit one. The main reason, in my personal case, is the use of Outlook AddIns that are not available for the 64 bit version and it is not possible to mix 32 and 64 bit of Office products on the same machine. Therefore, it is a good idea trying to optimize the PowerPivot data model in a way that makes a better use the available memory.

Reduce the number of rows

This is probably a useless suggestion. Reducing the number of rows, the size of the database is reduced too. If you are extracting data from a database and you don’t need the full detail during the analysis, you might import data at the cardinality really required by your analysis. For example: if a table contains all the sales transactions of a retail store, but you only need to analyze data at the month level, you might group data by month, reducing the number of rows to be loaded. Most of the times, this optimization is not possible, because the user doesn’t want to lose the granularity of analysis.

Reduce the number of columns

This is by far the most important suggestion. If you don’t need a column, don’t import it!

PowerPivot uses a column-oriented database technology. Each column has its own storage and indexed structure. Each column increases processing time and memory required for both processing and execution.

An important concept is that every column imported is relevant. Even if PowerPivot internally uses an Analysis Services engine, it is a particular type (in-memory) that doesn’t make distinctions between quantitative measures and qualitative attributes. Thus, a SalesQuantity column is considered as both a dimension attribute and a measure.

Optimize column data types

Not every column has the same weight for memory and processing time. A column with few distinct values will be lighter than a column with a high number of distinct values. As we said, this is important also for measures, which are considered also possible quantitative attributes. If the measure you are storing is a float and is the result of a calculation, you might consider reducing the number of digits to be imported. This will reduce the size of the dictionary, and possibly also the number of distinct values.

Columns with string data types

PowerPivot stores a dictionary of all the unique strings in each attribute. Thus, having a long string is not a problem by itself, nor having a few very long strings if the number of distinct values is low. The point is that the average length of a string in an attribute multiplied by the number of unique values of the attribute is the measure of the size required to store this dictionary. For this reason, importing a long description that is different for every transaction is not a good idea.

Avoid high-cardinality columns

Columns that contain a very high number of distinct values are very expensive for PowerPivot. For example, the Invoice ID in a Sales Transactions table is very useful for the end user when it is necessary to drill-down at a transaction level. However, you have to be aware that this single column might be the most expensive one of your PowerPivot dataset, especially during the process phase.

Consider Calculated Columns

Each calculated column is stored into the PowerPivot data model just as it was an imported column. This is not true for calculated measures, which are calculated at query time. If the same calculation can be defined as either a calculated column or a calculated measure, the latter is a better solution from a resource consumption point of view. However, a calculated measure loses the ability to navigate into its values as an attribute like you can do using a calculated column.

If you have to store a measure as a calculated column, consider reducing the number of digits of the calculation: you can use the DAX function named ROUND just for this purpose.

Normalize data

Normalizing data doesn’t have a big effect on the size of the resulting database. However, it might have a strong impact on both processing time and memory required to process data.

The key is to find a right balance. A full denormalized table, which is the case of a single SQL query that denormalizes all of the attributes resulting in a PowerPivot dataset made of a single table, has a long processing time and requires more memory to be processed. In fact, during the process PowerPivot holds 1-2 millions of rows in a buffer and, if the rows have are large, this restricts the memory available to store data.

At the same time, a complete normalization of data, like the one of a third normal form, could be not a good idea. The number of relationship increase very much and it makes necessary having many columns in the model just for technical reason (defining relationships), but they are of no use for the end user. The resulting model is much more complex and is not faster than a balance between these two extremes.

The right balance is the one offered by a typical star schema. Putting all the attributes of a single entity into one table, just like a dimension in a Kimball star schema, appears as the best tradeoff also for PowerPivot. The resulting model is also easier to navigate for the end users.

This level of normalization usually offers the best processing times, some saving in storage size and a better use of memory during processing of data (the phases of import and refresh data).

Published Tuesday, January 26, 2010 3:54 AM by Marco Russo (SQLBI)
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Dave Wickert said:

Excellent job Marco. I am going to comment about this in my blog. Very, very nice.

February 4, 2010 1:06 AM
 

SQLBI - Marco Russo said:

I recently talked about memory usage in PowerPivot and previously I wrote about difference modeling options

February 4, 2010 5:06 AM
 

SQLBI - Marco Russo said:

When you create a PowerPivot workbook or a BISM Tabular model (new in Analysis Services 2012), you store

January 26, 2012 3:54 PM
 

Charles Tabone said:

Marco,  

I'm reading you notes and I've come to the same conclusions.

Currently I am running SQL 2008 and Office 2013 ; window enterprise 2012

by upgrading to SQL 2014; with this help which Excel retrieves and saves the sheet?

I find in my current set up, the saving and retrieving an excel powerpivot is very time consuming; I have eliminated every possible problem; only thing left is Office 2013 itself; and SQL

April 4, 2014 11:49 AM
 

Marco Russo (SQLBI) said:

SQL 2014 doesn't make any difference on Power Pivot in Excel 2013.

If you have a large workbook in Excel (with a large Power Pivot model) you have to wait the time for loading/saving the file from/to disk. Using SSD is a real lifesaver here.

April 4, 2014 12:02 PM

Leave a Comment

(required) 
(required) 
Submit

About Marco Russo (SQLBI)

Marco Russo is a consultant, writer and trainer specialized in Business Intelligence with Microsoft technologies. He runs the SQLBI.COM website, which is dedicated to distribute resources useful for BI developers, like Integration Services components, Analysis Services models, tools, technical information and so on. Marco is certified as MCT, MCDBA, MCSD.NET, MCSA, MCSE+I.

This Blog

Syndication

Archives

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement