THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

John Paul Cook

BI Beginner: Avoiding Mistakes With Averages

I have seen mistakes in business reports, academic papers, and training materials when it comes to displaying and calculating averages. At first I created a sample dataset using sales data for widgets, but after seeing so many graduation posts this month, I decided to use college grades in this example. In our sample dataset, college kid (hereafter known as CK) took 5 years to complete a 4 year university degree, but mom and dad are just happy CK graduated. Dad told CK that graduating with a 3.0 grade point average (GPA) would result in a new car as a graduation gift. The input Excel file has a single worksheet named grades that is referenced when Power BI measures are created. I’m assuming that you already know how to load data into Power BI and make a simple visualization. If not, I’ve covered those steps in this blog post.


Figure 1. Raw Excel data. The name of the Excel worksheet is grades, which is necessary for you to understand to follow the examples.

Let’s look at the plot in Power BI. CK shows a very definite upward, improving trend in grades and deserves at least a pat on the back.


Figure 2. Improving grades.

Does CK deserve more than a pat on the back? Has CK earned the car? Dad really intended for CK to have a 3.0 GPA upon graduation after 4 years, but mom pointed out that dad never said anything about a 4 year time limit. Mom said just be glad it was 5 years and not 6. CK decided to use a Card visualization to show the overall GPA.


Figure 3. Card visualization in Power BI.

CK did some quick drag and drop moves and realized something was wrong because the maximum possible GPA is 4.0.


Figure 4. GPA value is impossibly high.

CK did a mouseover on the GPA field under Visualizations to see what the problem was.


Figure 5. The default operation is to sum the data.

CK knew something was wrong, but was confident that it was an easy fix. After all, CK took statistics in the first year and made a C in that and every other course that year. CK used the dropdown list and selected Average.


Figure 6. Changing the default behavior of the Card visualization to obtain an average of averages.

CK triumphantly showed dad the results and mentioned that the Tesla model 3 will be out later in the year and it would be a nice car to have and worth waiting for. CK was willing to drive the old Taurus in the meantime. Dad pointed out that it is not valid to average averages. A weighted average is called for so that proportional contributions are properly weighted. That last year of taking 10 semester hours of Klingon to finally satisfy the foreign language requirement doesn’t equal that full first year of science, philosophy, and history courses.

Dad modified the Power BI model to properly calculate the overall GPA. There are several possible solutions. He took an approach to show multiple features of Power BI Desktop. The first step was adding a new calculated column. This process begins by going to the left edge of Power BI Desktop and selecting the Data view.


Figure 8. Creating a calculated column.

The calculated column definition is the product of the GPA and the Hours columns. Notice that the column definition uses the Excel worksheet name which is grades.

Product GPA Hours = grades[GPA] * grades[Hours]

Next, measures were added.


Figure 9. Adding the first measure.


Figure 10. Adding the third measure.


Figure 11. Adding the final measure.

The measures created are:

Minimum GPA = 0.0
Maximum GPA = 4.0
Target GPA = 3.0
Overall GPA = SUM(grades[Product GPA Hours]) / sum(grades[Hours])

Dad completed the model by adding a Gauge visualization to the canvas and setting the properties.


Figure 12. Gauge visualization showing the actual overall grade point average.

CK is driving the beat up Taurus with no Tesla to replace it.

The grades, graduates, and events in this blog post are fictional. No similarity to actual graduates (living or deceased) should be inferred. No graduates were harmed in the making of this model.

Published Sunday, May 28, 2017 9:31 PM by John Paul Cook
Filed under:



No Comments
Anonymous comments are disabled

About John Paul Cook

John Paul Cook is a database and Azure specialist in Houston. He previously worked as a Data Platform Solution Architect in Microsoft's Houston office. Prior to joining Microsoft, he was a SQL Server MVP. He is experienced in SQL Server and Oracle database application design, development, and implementation. He has spoken at many conferences including Microsoft TechEd and the SQL PASS Summit. He has worked in oil and gas, financial, manufacturing, and healthcare industries. John is also a Registered Nurse currently studying to be a psychiatric nurse practitioner. Contributing author to SQL Server MVP Deep Dives and SQL Server MVP Deep Dives Volume 2. Connect on LinkedIn

This Blog



Privacy Statement