THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

John Paul Cook

  • BI Beginner: Using R Visualizations

    Last week I showed a simple line plot of a hypothetical college student’s GPA. The plot could have been done using R. Before showing you a visualization that requires the power of R, I’m starting with the simple line plot recreated in R. For additional information about using R with Power BI desktop, David Iseminger has done a great job writing Power BI documentation that can be found at Microsoft’s Power BI site. He also wrote about configuring Power BI desktop to use an external R IDE.

    Begin by placing an R visualization on the canvas. Resize as needed.


    Figure 1. Place an R visualization on the canvas.

    Power BI desktop creates a data frame named dataset. You’re going to have to accept that and work with it. Power BI desktop also eliminates duplicate rows. The input Excel file I’m using doesn’t have any duplicate rows so it is unchanged by the call to R’s unique function. If you have duplicate rows and want to keep them, you’ll have to make your rows unique by adding a column to make them unique – this is similar to what you might do in SQL Server when you add an IDENTITY property to a column. David Iseminger covered this in the first link I provided.


    Figure 2. Drag the X-axis variable into place. Power BI creates a data frame named dataset and eliminates duplicate rows.


    Figure 3. Drag the Y-axis variable into place.

    Notice that unlike the Line chart visualization, the R visualization does not automatically create the plot. You have to click the Run button. Your R code must create visual output or you’ll get an error message.


    Figure 4. Click the Run button.


    Figure 5. Completed plot using the R visualization. The data appears as individual points by default.

    You can customize your R visualization. You need to read the R documentation for plot and par to find out how to do this. I used the following R code to modify my R visualization:

    plot(dataset,main="GPA Over Time",type="l",col="magenta")


    Figure 6. Using plot options and par to modify your R visualization.

    The next post in this series covers more advanced R visualization.

  • [OT] Great American Eclipse of 2017

    I’m amazed at how many people aren’t yet aware of the eclipse visible from the middle United States on August 21, 2017. Hotel reservations along the route are scarce and expensive. There’s a great website where you can find out all about it: If you think that if you aren’t in the central United States that it’s not something to plan for, you’re mistaken. You need to be outside even if you’re not in the path of totality.

    Universities all along the path of totality are having viewing events. Nashville is a prime viewing spot. Vanderbilt has eclipse viewing activities planned on that day, the first day of orientation for graduate students. If you’re near Portland, Oregon, you’re a reasonable drive from prime viewing spots. You might want to drive east of the Cascade mountains to safeguard against potential cloudy skies. Columbia, South Carolina is another great spot. The Great Smoky Mountains National Park is another spot near the east coast to consider.

    If you’re in San Francisco, Dallas, or New Orleans, 80% of the sun will be covered. That’s impressive! Brownsville, Texas and Bangor, Maine will have about 60% of the sun covered. Daytona Beach, Florida will have 90% of the sun covered. There’s really something for everybody in the 48 contiguous states.

    If you miss this, the April 8, 2024 is a great eclipse for Mexico, Texas, Arkansas, Indiana, Ohio, and Maine. Alaskans have to wait until March 30, 2033 or May 11, 2097. Montana is the place to be on August 23, 2044. That’s also a great day for our Canadian neighbors in Alberta.

    An eclipse is also known as a syzygy. A syzygy does not have to be an eclipse.

    I wonder if fireflies will come out during totality.

  • Intel Active Management Technology Vulnerability

    Certain Intel processors have a security vulnerability that you can read about at the Intel site or at the NIST Computer Security Resource Center. Make certain that you install the latest version of the Intel vulnerability analyzer which as of today was found here. The tool tells you if your computer is vulnerable to allowing a nonprivileged user can obtain privileged access to your machine. I ran the tool and received this message:


    Figure 1. Computer not vulnerable.

    If you install an old version of the tool and then install a newer version, expect to have to reboot.

  • BI Beginner: Avoiding Mistakes With Averages

    I have seen mistakes in business reports, academic papers, and training materials when it comes to displaying and calculating averages. At first I created a sample dataset using sales data for widgets, but after seeing so many graduation posts this month, I decided to use college grades in this example. In our sample dataset, college kid (hereafter known as CK) took 5 years to complete a 4 year university degree, but mom and dad are just happy CK graduated. Dad told CK that graduating with a 3.0 grade point average (GPA) would result in a new car as a graduation gift. The input Excel file has a single worksheet named grades that is referenced when Power BI measures are created. I’m assuming that you already know how to load data into Power BI and make a simple visualization. If not, I’ve covered those steps in this blog post.


    Figure 1. Raw Excel data. The name of the Excel worksheet is grades, which is necessary for you to understand to follow the examples.

    Let’s look at the plot in Power BI. CK shows a very definite upward, improving trend in grades and deserves at least a pat on the back.


    Figure 2. Improving grades.

    Does CK deserve more than a pat on the back? Has CK earned the car? Dad really intended for CK to have a 3.0 GPA upon graduation after 4 years, but mom pointed out that dad never said anything about a 4 year time limit. Mom said just be glad it was 5 years and not 6. CK decided to use a Card visualization to show the overall GPA.


    Figure 3. Card visualization in Power BI.

    CK did some quick drag and drop moves and realized something was wrong because the maximum possible GPA is 4.0.


    Figure 4. GPA value is impossibly high.

    CK did a mouseover on the GPA field under Visualizations to see what the problem was.


    Figure 5. The default operation is to sum the data.

    CK knew something was wrong, but was confident that it was an easy fix. After all, CK took statistics in the first year and made a C in that and every other course that year. CK used the dropdown list and selected Average.


    Figure 6. Changing the default behavior of the Card visualization to obtain an average of averages.

    CK triumphantly showed dad the results and mentioned that the Tesla model 3 will be out later in the year and it would be a nice car to have and worth waiting for. CK was willing to drive the old Taurus in the meantime. Dad pointed out that it is not valid to average averages. A weighted average is called for so that proportional contributions are properly weighted. That last year of taking 10 semester hours of Klingon to finally satisfy the foreign language requirement doesn’t equal that full first year of science, philosophy, and history courses.

    Dad modified the Power BI model to properly calculate the overall GPA. There are several possible solutions. He took an approach to show multiple features of Power BI Desktop. The first step was adding a new calculated column. This process begins by going to the left edge of Power BI Desktop and selecting the Data view.


    Figure 8. Creating a calculated column.

    The calculated column definition is the product of the GPA and the Hours columns. Notice that the column definition uses the Excel worksheet name which is grades.

    Product GPA Hours = grades[GPA] * grades[Hours]

    Next, measures were added.


    Figure 9. Adding the first measure.


    Figure 10. Adding the third measure.


    Figure 11. Adding the final measure.

    The measures created are:

    Minimum GPA = 0.0
    Maximum GPA = 4.0
    Target GPA = 3.0
    Overall GPA = SUM(grades[Product GPA Hours]) / sum(grades[Hours])

    Dad completed the model by adding a Gauge visualization to the canvas and setting the properties.


    Figure 12. Gauge visualization showing the actual overall grade point average.

    CK is driving the beat up Taurus with no Tesla to replace it.

    The grades, graduates, and events in this blog post are fictional. No similarity to actual graduates (living or deceased) should be inferred. No graduates were harmed in the making of this model.

  • In The Cloud: Azure Cosmos DB

    Azure Cosmos DB is Microsoft’s new global scale, distributed database as a service that supersedes Azure Document DB. Cosmos is a technology that enables you to create applications that are unimaginable with a single conventional relational database, even if the single relational database is stored in the cloud. Cosmos is something you need to know more about if you make your living working with data.

    Read more about Azure Cosmos DB here. If you want to try code against Cosmos and you don’t have an Azure account or don’t want to incur usage charges, you can download and install the Azure Cosmos DB Emulator. Some of the documentation refers to Document DB instead of Cosmos. Remember, Cosmos supersedes Document DB.


    Figure 1. Azure Cosmos DB emulator.

  • In The Cloud: Azure Portal and HTML5 Storage Capacity Exceeded

    While preparing blog posts last week, the following message appeared in my browser window. I was in the Azure portal when the message appeared. It’s not an error with Azure or the browser. It could happen when using any website or browser. I tried again and everything was fine. The browser had many tabs open for days. A lot of things had been done in the browser.

    Because technical training is part of what I do from time to time, I saw this as a good educational opportunity. I think all technically oriented web users are aware of cookies. HTML5 has something different called localStorage and sessionStorage. If you’re interested in learning more about this, I highly recommend reading this from W3 Schools.


    Figure 1. Not enough storage capacity in the browser when using the Azure portal.

    Seeing the message is a very rare event. My point in posting this is not about the message per se, but to make you aware of the modern storage features of HTML5.

  • [OT] Never Bet Against Cloud Computing

    Today at the Preakness horse racing event, Classic Empire dominated and led the race appearing to be the certain winner with Always Dreaming staying a close second and appearing to be a possible contender. At the end, Cloud Computing charged ahead leaving Classic Empire behind. Always Dreaming faded to eighth place. The victory of Cloud Computing was called an upset, but really people, had I been there, I would have put my money on Cloud Computing from the get go.

  • In The Cloud: Owner Role and Security Administration

    Azure uses Role Based Access Control (RBAC), which is something people generally don’t pay at lot of attention to when initially learning how to use the Azure portal. Take a close look at the screen capture shown below. The Delete button is disabled. There are definitely times you want to protect Azure resources from accidental deletion. That’s just basic good governance.


    Figure 1. Delete button disabled for a Data Catalog resource.

    The reason that the Delete button is disabled can be understood by going to subscription management.


    Figure 2. In the Azure portal, click the arrow to show more services, then select Subscriptions.


    Figure 3. Notice that I was in the User Access Administrator role which doesn’t have delete privilege.

    The individual vertical sections in the Azure portal are called blades. Clicking the Add button causes the Add Permissions blade to appear. You need to select a role and then select the member(s) to add to the role.


    Figure 4. The Add Permissions blade before selecting member(s) to add to the role.


    Figure 5. The Add Permissions blade after selecting member(s). Click Save to make the RBAC change.


    Figure 6. Notice that the Delete button is now enabled. It doesn’t immediately perform the deletion. It prompts you to make sure.


    Figure 7. Prompt to make sure you really want to do this. Notice that the Delete button is disabled at this point.

    Typing the resource group name is a pain. I always move the mouse over the resource group name in the warning, double-click, copy, and paste.


    Figure 8. Use copy and paste to simplify the deletion process.


    Figure 9. The Delete button is enabled after entering it into the box.


    Figure 10. Deletion in progress.

  • BI Beginner: Stacked Charts in Power BI

    People new to Power BI Desktop have asked me how to create a columnar chart where each column has different colors for different values stacked on top of each other. There’s no reason to be intimidated. Creating such a chart is very simple with only two additional requirements since yesterday’s port. First, your input data must classify the data into groups. You’ll either need groups specified in the data or be able to classify the data into groups based on some criteria. Today the sample data has two columns of sales data, one for sales of Widget A, and the other for sales of Widget B.


    Figure 1. Sales data for Widget A and Widget B instead of total overall sales.

    Second, you must chose an appropriate visualization. Not all visualizations are suitable for stacking different data values on top of each other. You’ll need a visualization that has stacked within the visualization’s name. You want stacked data, so you need a stacked visualization. I told you this was easy!

    It is assumed that you understand the content in yesterday’s post. Those initial steps are not repeated here so that we can focus on the concept of stacked charts. What’s different from yesterday is that two columns are placed underneath Value.


    Figure 2. You must drag both A Units Sold and B Units Sold to place them underneath Value.


    Figure 3. Final stacked column chart.


    Figure 4. You can change from a stacked column chart to a stacked bar chart by simply clicking the icon for a stacked bar chart.

    With Power BI, it’s very easy to change your visualization by clicking on a different visualization. I caution you that you do need to pay attention. Not all visualizations are equal. Take a close look at the Y-axis when you take this data and select the line chart visualization.


    Figure 5. Notice that the line chart visualization isn’t stacked and has a maximum Y-axis value of 20.


    Figure 6. The stacked area chart has a maximum Y-axis value of 30.

  • BI Beginner: Simple X-Y Plot

    When I worked for Microsoft giving presentations on data platform products, it was a common occurrence to have people tell me that they didn’t know how to use Power BI. This is the first in a series of posts showing how to do simple, useful tasks in Power BI Desktop. Power BI Desktop is free. Download the Power BI Desktop from here.

    You can either create a simple Excel file such as the one shown below or download the attached file to follow this tutorial. This tutorial begins with the steps to create a simple X-Y plot in Excel. Later the Excel file is used as input to Power BI.


    Figure 1. Sales data for simple X-Y plot.

    It’s easy to create an X-Y plot in Excel as shown below.


    Figure 2. Sample data saved in Excel to be loaded into Power BI.


    Figure 3. Steps to create a simple X-Y plot in Excel.


    Figure 4. Simple X-Y plot created by Excel.

    What follows is my answer to the question of how to do the same thing in Power BI. As you will see, it’s actually pretty easy. It will take you longer to read these instructions than it will take to actually do the work.

    Begin by loading your Excel file into Power BI Desktop.


    Figure 5. Loading data into Power BI Desktop.

    You’ll be prompted to select your worksheet.


    Figure 6. Select your worksheet and click Load.

    After your data is loaded, Power BI appears in Report View. It’s not like Excel where you’d immediately see your data values in a grid. You can see your data in a grid by clicking on Data, but that’s not what we need to do now.


    Figure 7. Report View in Power BI Desktop.

    Move your mouse under Visualizations and click on the Line chart icon. This is the part where people tell me they have trouble. What to do next, what goes where?

    The Axis is the X-axis. So where to the Y-axis value belong? Under Values – where else, now that you think about it!


    Figure 8. Line chart parameters.


    Figure 9. Drag the Year to the spot just under Axis.


    Figure 10. Drag Units Sold to the spot just under Values.



    Figure 11. Finished line chart.


    Figure 12. Line chart resized.

    It’s really easy to change the visualization type. You should experiment by clicking on various visualization to see what they look like.


    Figure 13. Click on Stacked column chart to change how it looks.

    You probably want to do some formatting. Click the icon for the paint roller brush to explore the formatting options. Your formatting changes remain intact even if you change your visualization.


    Figure 14. Click on the Format icon to expose the formatting properties.


    Figure 15. Use the Title properties to change the title.


    Figure 16. Extensive formatting just to show what is possible.


    Figure 17. Notice that the formatting changes carry across different visualizations.

    Save your Power BI model when you’re done!

    The next post in this series is here where you can learn how to make stacked charts.

  • In The Cloud: The Importance of Being Organized

    People often ask me about learning how to use Azure SQL Database as well as many other Azure products. If you want to learn, you’ve got to have an Azure account. Get one for free or use your personal or corporate MSDN account.

    Where I see people struggling with Azure is in not being organized. Naming conventions are essential. Think about it. Once you go into production, you need good naming conventions and discipline enforcing them. You need to have good governance of your cloud resources starting with your early forays into learning how to work in the cloud. Regardless of who your cloud provider is, you need to have meaningful, descriptive names.

    You’re not just learning about, for example, Azure SQL Database when you create your first database in the cloud. You’re also learning and experimenting with governance. You’re figuring out what naming conventions make things better. When I started learning Azure, I tended to use suffixes to identify Azure resources. That didn’t work well over time. I switched to using prefixes, actually using what is known as Hungarian notation. What’s the best naming convention to use? The one your actually use! The point is that having some naming system is better than having none at all. So, to paraphrase Robert Duvall’s five P rule in The Killer Elite, proper planning prevents poor performance. By the way, the word affix can be used instead of saying “prefix or suffix” because an affix can be either. An affix can also appear in the body of a word, not just at either end.

    I tend to use the prefix db for Azure SQL Database, dw for Azure SQL Data Warehouse, rg for Resource Group, srvr for server. The server name is interesting. Azure resources that are exposed as URLs are lower case only. I recommend using mixed casing where possible. It makes your naming conventions easier to comprehend.


    Figure 1. Using meaningful names for Azure resources.

    One last detail. When you create an Azure SQL Database, your server admin login can’t be something hackers love such as sa or administrator. But you can use essay as your login. Think about it, pronounce it out loud. Seriously though, you should consider making your server admin login something less likely to be found in a dictionary attack.

  • Visual Studio Code

    Visual Studio Code is a free, relatively new (late 2016), open source, cross platform tool for editing code. I like the Git integration. It doesn’t right out of the box support SQL, but it’s easy to add it as shown below.

    It’s really simple. Click the icon for Extensions, enter sql for your filter, then find the mssql extension and click Install.


    Figure 1. Installing the Visual Studio Code extension for Microsoft SQL Server.

    The extension comes with extensive documentation, links, and animated tutorials. Click mssql in the list of extensions.


    Figure 2. Click mssql to load the documentation.

    This is the direct link to the mssql extension tutorial, the same one that is the first link in the screen capture above. You should use Microsoft’s documentation. I’m just making you aware that it exists. One think I want to point out is what happens when you hold down the Ctrl key and then press K. The K is that start of a two key sequence called a chord.


    Figure 3. Ctrl+K is the first of a two key sequence called a chord. Notice that is says “Waiting for second key of chord…”

    Pressing M completes the chord to display the language mode selection menu. I selected SQL. Notice SQL appears in the bottom right corner of the screen capture. It all sounds good to me, music to my ears.


    Figure 4. Language mode set to SQL.

    My only point here is to make you aware of a new tool to help you get your work done from any platform.

  • Charles Bonnet Syndrome and Implications for Digital Image Processing

    Those of you who know me or have read my biography are aware that I live in two worlds, information technology and healthcare. This post is about image processing in the brain and in a computer. In particular, this discussion is about what happens when there is insufficient data. I think there are some lessons from healthcare that can be applied to information technology.

    First, some background. Charles Bonnet was a Swiss naturalist (scientist who studies the natural world) who in 1760 wrote about his 89-year old grandfather’s visual hallucinations. The grandfather had cataracts which rendered him almost blind. In other words, his retinas were sending insufficient data to his brain. He reported seeing people, animals, and objects. This syndrome was named after Charles Bonnet and is also known as visual release hallucinations. Retinas and digital imaging sensors are alike in that they generate signals that must be processed to be understood. Processing is not error free. When the retina receives less input, it’s similar to trying to make sense of a a very pixelated image.

    When discussing machine learning, we mention training a model. Data is fed into a algorithm and a model is trained. New data such a digital image is submitted to the model. If an approximate match is found, the model classifies and categorizes the new image. The brain does something similar. Consider the case of a person with retinal damage caused by age related macular generation, diabetic retinopathy, glaucoma, or some other pathology. If the person is able to see brown, white, and black next to each other, the trained model in the brain may find a match on a beagle. The brain says brown, white, and black must be a beagle so I’ll put a beagle into the person’s field of view. Putting this into information technology terms, a beagle might be algorithmically correct based on the input data, but if there isn’t a beagle present, the outcome is a false positive.

    It’s important to understand that this discussion is about visual hallucinations in a person without any psychiatric problems. In fact, the people who have this syndrome are aware that the hallucinations are not real. In other words, they have insight into what is happening. The take away is that no matter how good our digital image processing systems are, we still need oversight from a thinking human being who asks and answers the question: Does this make sense? The Indian ophthalmologist G. J. Menon published a framework for evaluating visual hallucinations. In his framework, the presence or absence of insight into the hallucination is very significant. I’m suggesting that our work isn’t done when a model is trained. We need to consider developing problem  specific frameworks for critically appraising the results of machine learning.

    It’s important to consider the consequences of incorrect processing. There are reports of people with Charles Bonnet Syndrome fretting about their hallucinations and fearing they may be losing their grip on their sanity. They may not bring it up because of a fear of no longer being allowed to live independently in case they are diagnosed as psychiatrically ill. It’s a great relief to these patients when they finally find out that their hallucinations are actually normal. How family members and healthcare professionals treat a patient is quite different when they think a person is psychiatrically normal instead of abnormal. Following procedures and jumping to easy conclusions can have devastating consequences for people.

    You can read more about Charles Bonnet Syndrome at the website of the Charles Bonnet Syndrome Foundation. There are similar auditory hallucinations in psychiatrically normal people. Once again, the presence of insight about the hallucinations is significant. We need critical human thinking skills providing the sanity checks for signal processing.

  • PolyBase Error After Uninstalling JRE

    The new PolyBase feature in SQL Server 2016 has a dependency on the 64-bit Java Runtime or JRE. It must be installed prior to installing PolyBase. If you uninstall the JRE and install a later version of the JRE, you may experience a failure of PolyBase. I won’t speculate as to what will happen if you reinstall the same version of JRE that you uninstalled.

    Here’s the error message displayed from attempting CREATE EXTERNAL TABLE after the uninstall old JRE and install new JRE was done.

    Msg 105019, Level 16, State 1, Line 63
    EXTERNAL TABLE access failed due to internal error: 'An unexpected error has occurred while creating or attaching to JVM. Details: System.DllNotFoundException: Unable to load DLL 'jvm.dll': The specified module could not be found. (Exception from HRESULT: 0x8007007E)
       at JNI.JavaVM.JNI_GetCreatedJavaVMs(IntPtr& pVM, Int32 jSize1, Int32& jSize2)
       at Microsoft.SqlServer.DataWarehouse.Hadoop.HadoopBridge.JavaBridge.LoadOrAttachJVM()'

    It’s tempting to interpret the error message as a path problem and start fiddling with environment variables. If you’ve uninstalled the JRE, you should do a repair installation of SQL Server instead of trying path changes. After the repair was done, PolyBase started working again without a reboot.


    Figure 1. Select Maintenance, then Repair.


    Figure 2. Select the instance to repair.


    Figure 3. Click Repair.

  • In the Cloud: Free Azure Training

    Microsoft has many free Azure training courses found at Each course is estimated to take 16-24 hours to complete.

    Oh my, I just realized Free Azure Training as a TLA is FAT. As a nurse, I’d never tell anybody to get fat. But as an Azure architect, I want you to get FAT. Check it out and upgrade your skills!

This Blog



Privacy Statement