THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SQLBI - Marco Russo

SQLBI is a blog dedicated to building Business Intelligence solutions with SQL Server.
You can follow me on Twitter: @marcorus

  • From 0 to DAX at TechEd Pre-Conference Seminar #dax #msteched #tee13

    In June I and Alberto will deliver a pre-conference seminar at both TechEd North America (New Orleans, LA) and TechEd Europe (Madrid, Spain).

    This day is a very good quickstart for those of you that still didn't complete one of our books, or those of you that missed one of our workshop about Tabular or PowerPivot. If you are planning to go to TechEd, you might also consider attending a full day about DAX, following the From 0 to DAX one-day seminar. Here are the links:

    • TechEd North America – From 0 to DAX Pre-Conference Seminar (New Orleans, LA - June 2, 2013)
    • TechEd Europe – From 0 to DAX Pre-Conference Seminar (Madrid, Spain - June 24, 2013)

    And in case you are underestimating the importance of DAX in your future BI projects... read this blog post from Dandy Weyn - his privileged point of view inside Microsoft highlights how much DAX is important today and will be pervasive in the future!

  • Group Sales by Age of Customers #dax #powerpivot #tabular

    I published an article describing how to implement the grouping of sales transactions by age of customer at the moment of the transaction by using PowerPivot or Analysis Services. The same pattern can be used also for any kind of banding operation, this specific case is useful also to recycle the formula that gets the exact age of the customer for each transaction.

    An interesting point is related to performance optimization. The technique is based on adding a calculated column in a table that might contain millions of transactions. This is less expensive than adding a column that contains a foreign key and then a relationship between a table containing group definitions and the transactions table. Every relationship is expensive and generates additional structures (you can see more files in the Analysis Services database, too). Adding one or two columns that have a low number of distinct values (10-15 rows) usually has a lower memory cost than creating a relationship with another table. The article also contains PowerPivot examples for both Excel 2010 and Excel 2013.

    If only I could decouple attribute visualization from physical structure, I would put these “degenerate dimensions” in a separate folder, because in this way such attribute will be included in attributes belonging to the fact table, which might not be so clear in presenting data. However, I understand that such a decoupling could make live very hard to DAX clients (but probably for MDX it could be not a big issue).

  • DAX Studio for Excel 2013 finally available! #dax #excel #powerpivot #ssas #tabular

    I'm so happy that DAX Studio finally supports Excel 2013! As Darren Gosbell described in his blog, this release has a few internal changes that will better support future enhancements. I will port the code to capture the query plan for a query in this new release, but unfortunately it will require some weeks because I'm traveling a lot in these days.

    If you write DAX formulas and queries for PowerPivot or Analysis Services Tabular, DAX Studio is a must have tool: do you really want to live without a DAX Editor? There are a lot of possible improvements and I hope other contributors will give their help to this Codeplex project.

  • PowerPivot Workbook Size Optimizer #powerpivot #tabular

    Microsoft released the Workbook Size Optimizer for Excel, the first version of an Excel add-in for Excel 2013 that inspects the data model and suggest possible optimizations. Fundamentally, it tries to apply the best practices descripted in a white paper I mentioned a few weeks ago, removing useless columns and changing granularity to those that could reduce the overall memory cost of a table.

    imageThere are different setup available in the download page, depending on operating system (Windows 7 or Windows 8) and on Office version (32 or 64 bit). Once installed, you have a new tab in the Excel ribbon, called Workbook Size Optimizer, showing a single button that starts a wizard.

    I tried to run the optimizer with a workbook where I imported several tables from Adventure Works Data Warehouse sample database. The first page shows a few information about the workbook size and the option of automatic detection or manual choice of rules. The latter is an option you can request also later, so I started with the default.

    image

    After a short analysis, I received three smart suggestions (considered the model I have). We might wonder that removing UnitCost is a smart thing, because it could be required in order to perform calculations and rounding the value might be not correct for our analysis.

    image

    Since I requested to apply some changes, I have the option of changing which rules to apply. This corresponds to the choice you have if you choose “Let me choose the rules myself” in the first screen of the wizard.

    image

    I kept all the rules and after I click Next I had to wait several seconds in order to complete the optimization process. The result shows a few information about the result of the job.

    image

    This is a good starting point. Don’t blindly trust any suggestion and try to consider carefully the rules to apply in order to avoid losing important data for your analysis. Moreover, you might have a better knowledge of your data model than a wizard and consider the deletion of many useless columns (for your analysis) that are not identified by the wizard. My article Checklist for Memory Optimizations in PowerPivot and Tabular Models contains several best practices that you can apply to your data model.

  • Advanced DAX course in May - unique date in 1H 2013 #dax #tabular #ssas #powerpivot

    One year after the release of SQL Server 2012 I see the growing demand for DAX. There are two reasons for that: an higher number of PowerPivot users started to build more complex data models, and SSAS Tabular is starting to be adopted by a larger number of companies, with and without a previous experience on former versions of Analysis Services.

    For this reasons we decided to offer a first public edition of our Advanced DAX Workshop, a training on DAX that is aimed at Advanced PowerPivot users and Analysis Services developers that want to master the DAX language. Up to now, we offered this course only for private classes, because of the limited demand, but now there is enough interest and adoption to justify an open class.

    The goal of this DAX training is learning to write DAX expressions for measures and calculated columns, DAX queries for reporting needs, read DAX query plans and optimize DAX formulas. The course is a three-day workshop that includes many hands-on lab sessions, with exercises that will guide you in the learning process of the more advanced DAX concepts, enabling you to master the writing of DAX code.

    The course will be in London on May 13-15, 2013. There are direct flights with a huge number of countries and cities, also outside of Europe. We do not expect to deliver other editions of this course before other 5-6 months, so don’t lose the chance to attend this intensive DAX master course. I will be the teacher in this edition and Chris Webb will assist me in organization with Technitrain. So don’t wait, early bird discount will expire in a few days, register now and join us in London!

  • LASTNONBLANK and FIRSTNONBLANK functions work with any column #dax #powerpivot #ssas #tabular

    During a PowerPivot Workshop course we received an interesting question from a student: “Can I use LASTNONBLANK (and FIRSTNONBLANK) with a column which is not a date column?”

    The reason is that we introduce LASTNONBLANK in the Advanced Time Intelligence module, because its typical use case is on a date column. However, you can use these functions on any column, which raises the question about what happens at that point. The sort order used is the one that depends on the data type of the column. If it is a Text column, the alphabetical sort order is the reference order. If it is a number, then the numeric order is the reference.

    What happens if a column has the “Sort By Column” property set to another column? This sort order is *not considered* by LASTNONBLANK and FIRSTNONBLANK functions. Even if a PivotTable shows you data sorted according to Sort by Column property, any DAX formula ignores such a sort order. Thus, be careful writing your DAX queries if you have to do some assumptions on the sort order of a column using DAX functions that rely on sort order, such as LASTNONBLANK and FIRSTNONBLANK.

  • SQLLunch on April 23 in London and Cardiff #sqlpass #dax #sqllunch

    On April 23 I will present DAX in Action in London and Cardiff at SQLLunch event.
    How is it possible I will be in two places at the same time?
    This will be a remote presentation delivered in two locations, where you can have lunch while watching the session.

    What is this session about? This is the session description:

    Tabular is the new SSAS modeling experience and DAX is the new language to use to create BI solution with Tabular. How does it compare with MDX and Multidimensional? In this session, which is mostly based on demos, we will build a complex BI solution from scratch, starting from simple analysis and moving through complex scenarios, showing how you will leverage the tremendous speed of DAX to create complex solution on simple data models, focusing on the differences in building the same solution in MDX or DAX.

    These free events are organized by the UK SQL Server User Group. If you are interested, you can register using the following links.
    London : http://sqlserverfaq.com/events/534/SQLLunch-All-stuff-no-fluff-Marco-Russo.aspx
    Cardiff : http://sqlserverfaq.com/events/535/SQLLunch-All-stuff-no-fluff-Marco-Russo.aspx

  • PASS BA Conference 2013 keynote coverage tomorrow #passbac #sqlpass

    The PASS Business Analytics Conference starts today in Chicago. In the next two days, there will be two keynotes. The most famous Steven Levitt, author of Freakonomics, will be on stage on Friday, and tomorrow (Thursday) we will see Kamal Hathi and Amir Netz. I will have two speeches at the conference that I already described here, but before them I will cover the keynotes on my blog and on my twitter account (@marcorus).

    I do not expect a detailed coverage of technical details of the products we know and love. I look forward to see a conversation about the goals of data analytics: the why, the how and what are the tools available. Advances in tools and software are important, because they are enabling scenarios that were simply not possible before, but the directions for the road ahead are not written in stone. I will write my thoughts here during the conference – stay tuned!

  • Optimize memory in #powerpivot and #ssas #tabular

    Microsoft published an interesting article about how to optimize memory consumption of a PowerPivot Data Model in Excel 2013. All these suggestions are also valid for SSAS Tabular. I also wrote an article Checklist for Memory Optimizations in PowerPivot and Tabular Models with a summary of the best practices.

    The short list of things to do is very valuable:

    • Removing columns non necessary for analysis
      • Identity column (PK) of a fact table
      • Timestamps, guid and other info useful for auditing and replication, but with no data for analysis
    • If a column has too many distinct value and cannot be removed (i.e. transaction ID in a fact table for drillthrough), consider splitting the column into multiple distinct parts.
      • Each one of the parts will have a small number of unique values, and the combined total will be smaller than the original unified column.
      • Always separate date and time in two columns, instead of the original datetime.
      • In many cases, you also need the distinct parts to use as slicers in your reports. When appropriate, you can create hierarchies from parts like Hours, Minutes, and Seconds.
      • Keep only the granularity you really need.
    • Normalize columns keeping only those with the lower number of distinct values
      • For example, if you have quantity, price and total line amount, import quantity and price and calculate total line amount as SUMX( Sales, Sales[quantity] * Sales[price] ) instead of SUM( Sales[line amount] ) importing line amount.
    • Reduce precision of number to reduce distinct values (i.e. round to integer if decimal values are not relevant).

    The reason is that VertiPaq compress data at column level, creating a dictionary for each column and storing for each row only the number of bits required to store the index to the dictionary. More details in the article Optimizing High Cardinality Columns in VertiPaq I wrote a few months ago and on the SSAS 2012 Tabular book.

    A useful macro to analyze memory consumption and quickly identify the most expensive tables and columns in a PowerPivot workbook is available on Kasper De Jonge blog What is eating up my memory the PowerPivot / Excel edition. There is also a version for a Tabular database in his What is using all that memory on my Analysis server instance post.

  • New PowerPivot 2013 book available! #excel #powerpivot

    Our new book about PowerPivot 2013 is finally available in printed edition, too!

    excel2013powerpivotThe title is Microsoft Excel 2013: Building Data Models with PowerPivot and it is a partial rewriting of the previous book about PowerPivot for Excel 2010. In the previous book we had a target audience that included advanced Excel users and BI developers, because at that time there was no option to get the same engine in Analysis Services. But 30 months are elapsed, a new version of Analysis Services has been released and in this new book we focused mainly on Excel users. For this reason, we wrote a comprehensive book of all the feature of PowerPivot, but most important we tried to pass concepts of data modeling that might be pretty obvious for a DBA and a BI developer, but are completely new to an Excel user that never had the ability to create a data model with more than one table.

    This book is focused on Excel 2013, so we included specific feature of this release related to PowerPivot, such as writing DAX queries and linked back tables, and features unique to Excel 2013, such as Power View. However, all of the PowerPivot features (so the 85% of the book) are good also for PowerPivot for Excel 2010 in its latest release (SQL Server 2012 SP1 PowerPivot for Microsoft Excel 2010), so you can safely use this book for both version of Excel.

    You can download the first chapter of the book from the book page on SQLBI web site. And if you want to attend a training in a classroom or online, look at the complete list of available trainings on PowerPivot Workshop web site. The next online courses are scheduled on April 22-24, 2013 and June 17-19, 2013 (following online workshops are every other month).

    Here are the links to directly order the book on Amazon around the world:

    And here is the list of chapters:

    • Chapter 1 Introduction to PowerPivot
    • Chapter 2 Using the unique features of PowerPivot
    • Chapter 3 Introducing DAX
    • Chapter 4 Understanding data models
    • Chapter 5 Publishing to SharePoint
    • Chapter 6 Loading data
    • Chapter 7 Understanding evaluation contexts
    • Chapter 8 Understanding CALCULATE
    • Chapter 9 Using Hierarchies
    • Chapter 10 Using Power View
    • Chapter 11 Shaping the Reports
    • Chapter 12 Performing Date Calculations in DAX
    • Chapter 13 Using Advanced DAX
    • Chapter 14 Using DAX as a Query Language
    • Chapter 15 Automating Operations Using VBA
    • Chapter 16 Comparing Excel and SQL Server Analysis Services

    This book should help you starting with PowerPivot at the very beginning, and you will probably use only the first chapters at that point. Over time, you will use following chapters and will learn more advanced techniques. This is not a book you can digest in a couple of days (after all, it is 500 pages long!), it will be your companion for several months, until you will master PowerPivot!

  • Interviewed in SQL Down Under podcast #sqlserver #ssas #powerpivot

    I’ve been interviewed by Greg Low in SQL Down Under show 58, and this is *not* an April fool! We talked for one hour about Tabular, Multidimensional, Data Warehouse and just a little bit about music (you can discover which music genre I’m used to listen…).

    You can hear this interview from the SQL Down Under Show 58 page (it is an MP3 format) and if you like it there are many other past shows available. The PodCast is also available on iTunes and you can hear other podcasts in SQL Down Under page on iTunes Preview.

  • Pre-Conference Seminar about DAX at TechEd 2013 #msteched #dax #ssas

    If you are using Microsoft BI stack and you still didn’t start learning DAX, you should not wait any longer. One of the option you have is starting with one of our books, or you can also attend one of our workshop about Tabular or PowerPivot. But if you are planning to go to SQLBits or TechEd, you might also consider attending a full day about DAX, following the From 0 to DAX one-day seminar. Here are the links:

    • SQLBits – From 0 to DAX Training Day (Nottingham, UK - May 2, 2013)
    • TechEd North America – From 0 to DAX Pre-Conference Seminar (New Orleans, LA - June 2, 2013)
    • TechEd Europe – From 0 to DAX Pre-Conference Seminar (Madrid, Spain - June 24, 2013)

    Take a look at Early Bird expiration date – you can save a good portion of your budget by registering within March 22, 2013 at any of TechEd conference. For SQLBits there is a discounted price until April 7, 2013.

    Even if I’m not blogging much in these weeks, I can assure you we are working on more DAX content that we’ll publish in the next months. Stay tuned!

  • Cost of Process Defrag in Analysis Services Tabular #ssas #tabular

    I recently received a question about the memory required to run a Process Defrag on a Tabular model in Analysis Services. The Process Defrag is useful when you run incremental processing of a table frequently, or when some of the values in the dictionary are no longer used in the table, for example if you process the same partition multiple times in a table and/or remove partitions from a table. Cathy Dumas wrote an interesting blog post about the savings you can obtain by running process defrag.

     

    I made some investigation and I’ve been assured that data is not completely uncompressed in this process, even if some coding/encoding happens. In a rough estimate, you need a free space equivalent to the size of the table you are going to defrag (which is already in memory) plus buffer for transient data structure, but data are managed in a compressed form, without requiring larger memory buffers for uncompressed raw data. So, if all partitions of your table and the column dictionaries require 100MB in RAM, then you need another 100MB of free RAM in order to execute a Process Defrag.

    Here is a more detailed description I received from Akshai Mirchandani:

    In addition to the master copies, it requires enough memory for a new dictionary, and the final compressed data + some small temporary buffer space for the transient data structures (no big buffers like normal processing).
     
    It is essentially going to read each column value, insert it into a new dictionary, get a new DataID back from the dictionary insert, and append that DataID to the current segment. It doesn’t need to do VertiPaq (compression) again, and it doesn’t keep the uncompressed data in buffers like the data processing algorithm does.

    The important point here is that the analysis of the segment data to come up with the best compression strategy no longer needs to be performed – and that’s typically the most expensive step of the compression (the VERTIPAQ_STATE in DISCOVER_STORAGE_TABLE_COLUMN_SEGMENTS shows whether this was done).

    This is a good news if you are concerned with memory required to perform this operation.

  • Discount for PASS Business Analytics Conference 2013 #passbac #ssas #sqlpass

    One month ago I wrote about my sessions at PASS Business Analytics Conference 2013, in Chicago, IL on April 10-12, 2013. If you still have not registered, you can save $200 by using the code BAC228BL and you should hurry up, because there is another discount if you register within March 15, 2013.

    If you are too lazy to click on the previous post, I will speech in two sessions:

    • Modern Data Warehousing Strategy
    • Self-Service Data Modeling

    And now that Data Explorer Preview has been made public I can disclose that Data Explorer will be covered in my Self-Service Data Modeling session! I thought about writing an article about Data Explorer, but there is already a good coverage and I suggest you to read these blogs:

  • Microsoft BI Security White Paper #ssas #powerpivot #sharepoint #ssrs

    There is a new whitepaper from Microsoft, Microsoft BI Authentication and Identity Delegation, which describes all the authentication and delegation scenarios with Microsoft BI technologies:

    • Personal BI Scenarios (Excel)
    • Team BI Scenarios (SharePoint)
    • Corporate BI Scenarios (Reporting Services, Analysis Services)
    • Federated BI Scenarios (Multi-Forest AD, Extranet, Cloud)

    This is the new reference whitepaper to correctly plan and configure the security environment of a BI solution based on Microsoft BI stack.

More Posts Next page »

This Blog

Syndication

Archives

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement