THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

John Paul Cook

  • Cosmos DB Free for One Year and Other Azure Services Always Free

    While the Cosmos DB emulator is a great way to learn Cosmos DB, it’s better to work directly with the full Cosmos DB service in Microsoft Azure. Previously this was a barrier to learning for many people. Now you can get 12 months of free access to Cosmos DB, Azure SQL Database, Linux and Windows virtual machines, and certain types of storage. Other Azure services are now always free including IoT and AI and Cognitive Services such as speech and face APIs.

    Go here to get all of the details.

    I previously published an article about learning Cosmos DB that includes important links helpful in learning Cosmos DB.

  • Learn Cosmos DB in edX Course

    Sign up for the new edX course Developing Planet-Scale Applications in Azure Cosmos DB at https://www.edx.org/course/developing-planet-scale-applications-microsoft-dat237x.

  • Can Uber, Lyft, and Waze Save Lives in Natural Disasters?

    After working as a registered nurse and volunteer firefighter during hurricane Harvey, I came up with ideas on how Uber, Lyft, and Waze could help. See my article on LinkedIn here.

  • Houston, Hurricane Harvey, and Technology

    Many people are staying in shelters after being displaced by the flooding caused by hurricane Harvey. There were requests made for mental health professionals to go to shelters and talk to people who need help. People with professional credentials were willing to help. Floodwaters prevented some of the willing and qualified to be able to get to the shelters. This seems like a great use of telehealth technology.

  • SQL 101 Learn SQL. Not too much. Mostly online.

    I’ve had people ask me how to learn SQL without installing a database. This past Sunday afternoon I saw a packed class of current and aspiring data scientists learn SQL in a great class taught by Ted Petrou. You can indeed learn SQL for free without installing anything. Let me paraphrase Michael Pollan, well known for “Eat food. Not too much. Mostly plants.” Here’s my suggestion for anyone who works with data:

    Learn SQL. Not too much. Mostly online.

    Let me explain. If you work with data, you need to know SQL. Even if you have a bias against or don’t use relational database management systems (RDBMS) such as SQL Server, Oracle, and DB2, you still need to know SQL. SQL syntax or similar syntax is used to query all types of data sources, not just RDBMS – thus my advice to “learn SQL.”

    There’s a lot to SQL syntax. Just look at the excellent books from Joe Celko and Itzik-Ben Gan. To get started with SQL, it’s important to keep it simple and focus on just the basics. You can learn simple syntax and accomplish a lot – thus my advice for “not too much” so that you don’t get overwhelmed. Ted Petrou recommended starting with SQL Queries for Mere Mortals after learning the basics online.

    There are many great free resources for learning SQL. Here’s a site that you can browse to and start immediately. And here’s another. Neither site requires registration. If you are willing to register for free, data.world is a great site for learning SQL. It’s what Ted used in his class. The advantage of data.world is that it has many interesting datasets you can work with.

    Why start learning online? Because you can start immediately and not be troubled with installing database software, database tools, and a database itself. Just navigate to the URL and start doing SQL – thus my advice for “mostly online.”

    Once you’ve learned a bit of SQL online, you might want to consider installing your own RDBMS. You can download Oracle for free. You’ll probably want to download the free version of TOAD to access your Oracle database. Mac users may want to install MySQL or PostgreSQL. MariaDB is free and available for Mac, Windows, and Linux. For Windows users, I recommend installing SQL Server Express. It’s free and has great, free tools. That’s really important. Your day to day experiences with SQL are going be experienced through the tool you use to access the RDBMS. For SQL Server, the free SQL Server Management Studio (SSMS) is probably your best choice of tool to use. Unlike other free tools such as TOAD, SSMS is full featured. If you want to do something more advanced involving other languages in addition to SQL, you could use Microsoft’s free Visual Studio Community Edition or Visual Studio Code, both available for download from here. Keep in mind that SQL Server 2017 is available for both Windows and Linux. I think you’ll find SQL Server to be particularly user friendly and easier to work with than other products. You can install SQL Server 2017 on Docker running on Windows, Linux, or Mac. Read more about it here. You can also create a SQL Server database in Azure, which is Microsoft’s name for the cloud. When you create an Azure SQL Database, you can choose to have a sample database installed at the same time.

    When you decide to have your own RDBMS to learn with, you’ll need a sample database. The classic Northwind and pubs databases are still available, but I think you should consider something a bit newer. The AdventureWorks database is newer, but it is being phased out. As of SQL Server 2016, Microsoft’s recommended sample database is Wide World Importers which you can obtain from GitHub. Of particular interest to data scientists is SQL Server 2017’s support for R and Python. These are advanced topics. To get started learning SQL or improving your data science skills, I’ll go back to what I said before.

    Learn SQL. Not too much. Mostly online.

  • Your Personal Health in Information Technology

    How is your health? How is your job in information technology affecting your health? I want to know! Since I started blogging here almost a decade ago as a new Microsoft Most Valuable Professional, I’ve become a registered nurse and now I’m a full-time psychiatric nurse practitioner student.

    There are articles and studies purporting that information technology is a low stress career field. Other articles and studies say that information technology is a high stress career field. I found articles that predated the emergence of the cloud claiming that IT is low stress. Has the cloud changed that? Are you stressed about the rate of change in IT? Or are you not stressed specifically because you have only a few years until retirement from your slowly moving company and you can be oblivious to the rate of change outside of your company? Do you find the rate of change invigorating and the perfect antidote to the monotony of endlessly churning out mundane code?

    Remember the other definition of DBA – Doing Business All night. Do you check email after hours? Do you feel like you’re never done? When I work as a nurse, I thoroughly enjoy clocking out at the end of my shift because I’m done. That’s it. No encroachment of that job into my personal time.

    How is your weight? Are you exercising? What are you eating? How much are you sleeping? Do you worry about job security? Do you yearn for more social interaction in your job or is it what you are comfortable with? Are you frustrated by your job or work environment? How do you deal with the frustrations?

    In school I have to write papers about various topics. I’m wondering if it would be a good use of my time to investigate mental health among IT professionals. Please comment or contact me directly.

  • SQL Server Dump Analyzer

    Recently Microsoft released a preview version of the SQL Server Diagnostics extension to SSMS. You can read more about it here. I downloaded it from here and you can see the results below. If you install the extension while SSMS is up and running, you’ll have to stop and restart SSMS in order to see the new menu option.

    image

    Figure 1. SQL Server Diagnostics extension to SSMS.

    If you want to see how the tool works, you’ll need to either wait for a dump or forcibly create one. The instructions on creating a SQL Server dump from 2010 are still current and available here.

    The easiest way to force the creation of a dump file is from within Task Manager, but I wanted to direct the output to a location of my choosing, so I used sqldumper.

    image

    Figure 2. Creating a dump file from Task Manager.

    You’ll need to use an Administrative Command Prompt to run sqldumper. The exact path depends on which version of SQL Server you are running. I’m running SQL Server 2016, so the path is C:\Program Files\Microsoft SQL Server\130\Shared where 130 corresponds to compatibility level of SQL Server 2016. SQL Server 2014 is 120 and so on.

    sqldumperAnnotated

    Figure 3. Output from sqldumper /? so you can see the command line options.

    You need the pid of sqlservr.exe to run sqldumper. Go to the Details tab in Task Manager.

    sqlservr

    Figure 4. My pid was 3892. Yours will be something else.

    Running sqldumper.exe resulted in the following output:

    sqldumperRun

    Figure 5. Output from creating a dump using sqldumper.

    I selected the Analyze Dumps option. This is a cloud based service, so be prepared to wait as your large dump file is sent to Microsoft. Notice that you have to pick an Azure data center as your dump’s repository. Can you upload this outside the country where your server is? Do your corporate policies allow a dump file to be shared outside your company?

    image

    Figure 6. Uploading dump file for analysis.

    After the upload finished, the analysis took several minutes. You have to consider corporate policies and government laws on submitting corporate data to the cloud. If it is allowable to upload your dump to an external cloud server, you might find this new service useful.

  • BI Beginner: Using R Forecasting Visualization

    There are new HTML 5 enabled custom visualization for making interactive R visualizations in Power BI Desktop. Read the documentation and find the download links here. Because of the dependency on R packages that are loaded with the R library command. The end result is that your required R packages end up installed in your machine’s R library. Some R experts, the people who can help you, may have a tendency to be pedantic about exactly what is a library and what is a package. I point out the terminology in the interest of facilitating communication after reading various posts on this topic. Fortunately, Power BI executes the library command for you and simplifies the installation process. However, problems can occur. After the discussion on installation and configuration problems, this post shows you how to use the new forecasting visualization. A cautionary note: It can take several seconds before the R visualization is rendered. Patience is a virtue.

    I originally had my Power BI Desktop point to the version of R installed with SQL Server 2016.

    image

    Figure 1. Navigation to Power BI Desktop Options.

    I decided that I wanted the full version of Microsoft’s R distribution and that’s what I recommend you use. After all, I wanted to have the exact same R installation that you would have if you only have Power BI desktop and no installation of SQL Server 2016.

    image

    Figure 2. Verifying your R home directory.

    I went to the official download site for Microsoft R Open. After installing R, I changed the home directory in Power BI Desktop to point to it instead of the version that came with SQL Server 2016. It may not have been necessary, but again, I wanted to have the exact same environment as a business user would have. Most business users will not have SQL Server installed. I did have errors as shown below. I’ve compared the contents of R that comes with SQL Server 2016 to the Microsoft R Open that I installed. It looks like either one should work, but that isn’t what I experienced. I’m going to do further testing. Keep in mind that the primary purpose of this blog post is to show you the capabilities of the Forecasting Visualization and how to use it.

    image

    Figure 3. R home directory set to the Microsoft R Open download.

    Even with all of the Microsoft R Open packages installed, you will need more R packages installed in your library. As you import custom visuals, you may see notices about needing more R packages. All you need to do is click Install to have the R library command issued for you.

    image

    Figure 4. What you see when a custom visualization has dependencies on R package you don’t have.

    Once Power BI finishes installing the required packages for you, it shows you a box telling you what was installed.

    image

    Figure 5. R packages installed.

    If you are still lacking some packages, you’ll see an error message similar to the following when you attempt to use the custom visualization.

    image

    Figure 6. Missing R packages preventing a custom visualization from working.

    For the benefit of search engines. I’ve included the text of some of the error message.

    Error Message:
    R script error.
    Loading required package: XML
    Warning messages:
    1: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
      there is no package called 'XML'
    2: In libraryRequireInstall("XML") :
      *** The package: 'XML' was not installed ***
    Loading required package: htmlwidgets
    Warning messages:
    1: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
      there is no package called 'htmlwidgets'
    2: In libraryRequireInstall("htmlwidgets") :
      *** The package: 'htmlwidgets' was not installed ***
    Loading required package: ggplot2
    Warning messages:
    1: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
      there is no package called 'ggplot2'
    2: In libraryRequireInstall("ggplot2") :
      *** The package: 'ggplot2' was not installed ***
    Loading required package: plotly
    Warning messages:
    1: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
      there is no package called 'plotly'
    2: In libraryRequireInstall("plotly") :
      *** The package: 'plotly' was not installed ***
    Loading required package: scales
    Warning messages:
    1: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
      there is no package called 'scales'
    2: In libraryRequireInstall("scales") :
      *** The package: 'scales' was not installed ***
    Loading required package: forecast
    Warning messages:
    1: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
      there is no package called 'forecast'
    2: In libraryRequireInstall("forecast") :
      *** The package: 'forecast' was not installed ***
    Loading required package: zoo
    Warning messages:
    1: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
      there is no package called 'zoo'
    2: In libraryRequireInstall("zoo") :
      *** The package: 'zoo' was not installed ***
    Loading required package: ggplot2
    Warning messages:
    1: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
      there is no package called 'ggplot2'
    2: In libraryRequireInstall("ggplot2") :
      *** The package: 'ggplot2' was not installed ***
    Error in ets(timeSeries, model = deModel, damped = damped) :
      could not find function "ets"
    Execution halted

    Once I imported all four custom interactive R visualizations, I was able to get all of them to work. I needed some time series data, so I downloaded gasoline sales data from the U.S. Census Bureau. Specifically, I downloaded this file. It wasn’t formatted and arranged like I needed, so I shaped the data and made it into a two column Excel file that I’ve shared with you as an attachment.

    To import a custom visualization for Power BI desktop, follow the steps shown below.

    image

    Figure 7. Import a custom visual by clicking the three dots as shown.

    image

    Figure 8. Make sure that the author of the custom visualization is trustworthy!

    image

    Figure 9. Select your pbviz file.

    image

    Figure 10. Visualization is installed. Click it to use it.

    image

    Figure 11. Enable your script visuals.

    image

    Figure 12. Using the Time Series Forecasting Chart custom visualization.

    image

    Figure 13. Output from the interactive visualization.

    image

    Figure 14. Move your mouse cursor to see the interactivity.

    image

    Figure 15. From that same position, the left mouse button was clicked and held down while selecting a portion of the visualization.

    image

    Figure 16. Result of zooming in to the selected area.

    You are able to configure your visualization by altering the parameters.

    image

    Figure 17. Forecasting settings are configurable.

  • Cosmos DB References

    Here is a list of links to get you started in understanding Cosmos DB, Microsoft’s new cloud based globally distributed multi-model database. Cosmos DB development started in 2010 as project Florence. When it was initially released to the public as part of Azure, it was called DocumentDB. Multi-model means it is more than just a document database as the screen capture shows:

    image

    Figure 1. Database models supported by Cosmos DB.

    When you create a Cosmos database, you must specify what type of data model you want. In addition to the original DocumentDB, graph, MongoDB, and key-value pair models are supported. As you can see in the screen capture, choosing a data model is choosing an API.

    Graph terminology in Cosmos DB refers to vertices and edges instead of how SQL Server 2017 refers to nodes and edges. The API for Cosmos DB graph is Gremlin. Gremlin is a language for traversing graphs that use a TinkerPop enabled provider such as Cosmos DB. TinkerPop is an open source graph computing framework.

    To learn about Azure Cosms DB, you might want to start with Rimma Nehme’s podcast that I previously blogged about here (direct link to the mp3 is here). At 46 minutes in length, this audio recording is good to listen to while you commute.

    Next, there are some Channel 9 videos that you might want to view or download as mp3 and listen to while you commute or exercise.

    Syam Kumar Nair Cosmos DB 25 minutes

    Rimma Nehme How to build globally-distributed, fast, billion-user applications with Azure Cosmos DB 50 minutes

    For understanding how to use the Gremlin API graph model in Cosmos DB, Chris Sims has a 9 minute YouTube video available here.

    If you aren’t familiar with Eric Brewer’s CAP theorem, take just a few minutes to read about it here.

    When reading or viewing media on Cosmos DB, pay attention to the consistency levels:

    • Strong
    • Bounded Staleness
    • Session
    • Consistent Prefix
    • Eventual

    When you are ready to read, start with the official Azure Cosmos DB Documentation. Open up the navigation menu to see what’s available.

    image

    Figure 2. Cosmos DB documentation.

    If you watch the videos, you will notice a graph explorer you’ll need and want to visualize your graphs. Download it from GitHub. Even if you aren’t ready to download and run it, you should at least browse there and take a look at what it looks like and what it can do for you.

    Finally, do not despair if you don’t have an Azure account or have exhausted all of your Azure credits. You can download and install the Cosmos DB emulator and start learning about Cosmos DB on your local machine. That’s what I recommend to keep your costs down as you learn. Be sure to watch Kirill Gavrylyuk’s video on that page.

  • Cosmos DB Podcast While You Commute

    Cosmos DB is Microsoft’s new Azure (i.e. Microsoft cloud) globally distributed multimodel database. There is a good podcast where Rimma Nehme explains it. I’ve long wanted to blog about how to configure USB flash drives to play mp3 files on automotive sound systems. You can downloads Rimma’s podcast as an mp3 available on this page.

    To play mp3 files on an automotive sound system, you must have a USB flash drive compatible with your system. You should format your USB flash drive as FAT32. On my car, the maximum size allowed is 32 GB. Larger capacity drives simply will not work.

    image

    Figure 1. Format USB flash drive as FAT32.

    What I discovered is that the mp3 files need to be placed in a top level folder. I created a folder called Data Platform. Copy your mp3 files into the folder.

    image

    Figure 2. Place your mp3 files in a top level folder on your FAT32 USB flash drive. I only have a single top level folder and have not tried having more than that.

    image

    Figure 3. Save the mp3 to a top level folder on your FAT32 USB flash drive.

    Insert your USB flash device into your car and learn something while you commute! As you can see, it’s easy to download an mp3 file and play it through your vehicle’s sound system.

    image

    Figure 4. Listen and learn as you commute.

    I’ve had mixed results in general getting the mp3 properties to appear on my car’s sound system. Sometimes the properties appear as can be seen in this example of podcasts used in my graduate studies.

    image

    Figure 5. mp3 properties displayed on car sound system.

  • SSMS Presenter Mode

    Recent builds of SQL Server Management Studio (SSMS) have a feature called PresentOn that can be accessed from Quick Launch.

    image

    Figure 1. Quick Launch location in SSMS.

    Enter Present in to Quick Launch and select it.

    image

    Figure 2. Select PresentOn.

    Using PresentOn is helpful when you give a presentation. It doesn’t increase the size of the results, which I find is a limitation when actually giving presentations. Overall it is a good feature that you might want to use.

    image

    Figure 3. Object Explorer and the Query Pane are larger, but not the Results Pane.

    To turn off PresentOn, select RestoreDefaultFonts.

    image

    Figure 4. Use RestoreDefaultFonts to turn off presenter mode.

    If you don’t need to increase the font size of the Object Explorer, which is part of your environment, you can use the zoom feature in the Query Pane. This feature is available all of the time. You don’t need to switch to presenter mode. If the results are in text format instead of grid format, the results will also be magnified.

    image

    Figure 5. Using the magnification feature.

    If you want more control over the formatting of SSMS, see my instructions found here.

  • Modeling an Adjacency List Hierarchy with SQL Server 2017 Graph Database

    The Northwind database implements the manager to employee hierarchy using an adjacency list, which is a type of graph. This post shows you how to migrate the adjacency list to a graph of one node and one edge. This is a continuation of what was started in this prior post.

    In the Employees table, the employee’s manager has an EmployeeID which is stored in the employee’s ReportsTo column. A foreign key relationship is defined where the ReportsTo value stores the primary key of the row containing the manager’s data. For the highest level manager, the ReportsTo column is NULL. The relationship can be queried using a self-join.

    image

    Figure 1. Employee to Manager hierarchy implemented in the Northwind database.

    The hierarchy can be displayed using a self-join.

    SELECT e.FirstName, e.LastName, m.FirstName AS ManagerFirstName, m.LastName AS ManagerLastName
    FROM dbo.Employees AS e
    INNER JOIN dbo.Employees AS m
    ON e.ReportsTo = m.EmployeeID;

    FirstName LastName  ManagerFirstname ManagerLastName
    --------- --------- ---------------- ---------------
    Nancy     Davolio   Andrew           Fuller
    Janet     Leverling Andrew           Fuller
    Margaret  Peacock   Andrew           Fuller
    Steven    Buchanan  Andrew           Fuller
    Michael   Suyama    Steven           Buchanan
    Robert    King      Steven           Buchanan
    Laura     Callahan  Andrew           Fuller
    Anne      Dodsworth Steven           Buchanan

    To implement the hierarchy using a node and an edge, run the following code:

    CREATE TABLE dbo.reportsTo AS EDGE;

    INSERT INTO reportsTo
    SELECT e.$node_id, m.$node_id
    FROM dbo.EmployeesNode e
    INNER JOIN dbo.EmployeesNode m
    ON e.ReportsTo = m.EmployeeID;

    To retrieve the employees and their managers, run the following query:

    SELECT e.FirstName, e.LastName, m.FirstName AS ManagerFirstName, m.LastName AS ManagerLastName
    FROM EmployeesNode AS e, reportsTo, EmployeesNode AS m
    WHERE MATCH (e-(reportsTo)->m);

    FirstName LastName  ManagerFirstname ManagerLastName
    --------- --------- ---------------- ---------------
    Nancy     Davolio   Andrew           Fuller
    Janet     Leverling Andrew           Fuller
    Margaret  Peacock   Andrew           Fuller
    Steven    Buchanan  Andrew           Fuller
    Michael   Suyama    Steven           Buchanan
    Robert    King      Steven           Buchanan
    Laura     Callahan  Andrew           Fuller
    Anne      Dodsworth Steven           Buchanan

    Astute observers will notice that something is missing from these queries. Before giving the answer away, look at the following query:

    ;WITH CTE (EmployeeID, EmployeeFirstName, EmployeeLastName, ManagerFirstName, ManagerLastName)
    AS
    (
    -- anchor query
        SELECT e.EmployeeID, e.FirstName, e.LastName, CAST(NULL        AS NVARCHAR(20)), CAST(NULL       AS NVARCHAR(10))
        FROM dbo.Employees AS e
        WHERE e.ReportsTo IS NULL
        UNION ALL
    -- recursive query
        SELECT e.EmployeeID, e.FirstName, e.LastName, CAST(m.FirstName AS NVARCHAR(20)), CAST(m.LastName AS NVARCHAR(10))
        FROM dbo.Employees AS e
        INNER JOIN dbo.Employees AS m
        ON e.ReportsTo = m.EmployeeID
        INNER JOIN CTE
        ON e.ReportsTo = CTE.EmployeeID
    )
    SELECT EmployeeFirstName, EmployeeLastName, ManagerFirstName, ManagerLastName
    FROM CTE;

    EmployeeFirstName EmployeeLastName ManagerFirstName ManagerLastName
    ----------------- ---------------- ---------------- ---------------
    Andrew            Fuller           NULL             NULL
    Nancy             Davolio          Andrew           Fuller
    Janet             Leverling        Andrew           Fuller
    Margaret          Peacock          Andrew           Fuller
    Steven            Buchanan         Andrew           Fuller
    Laura             Callahan         Andrew           Fuller
    Michael           Suyama           Steven           Buchanan
    Robert            King             Steven           Buchanan
    Anne              Dodsworth        Steven           Buchanan

    As you can see, special handling is required for handling the top of the hierarchy. A foreign key column in a relational table can allow NULL.

          NOTE: The CASTs were necessary to prevent the following error which has nothing to do with the subject of this post:

        Msg 240, Level 16, State 1, Line 1
        Types don't match between the anchor and the recursive part in column "ManagerFirstName" of recursive query "CTE".

    What about an edge? Can it handle a NULL?

    INSERT INTO reportsTo
    SELECT e.$node_id, NULL
    FROM dbo.EmployeesNode e
    WHERE e.ReportsTo IS NULL;

    Msg 515, Level 16, State 2, Line 70
    Cannot insert the value NULL into column 'to_obj_id_762C07225C8943A99CF4494DFF96B4EE', table 'Northwind.dbo.reportsTo'; column does not allow nulls. INSERT fails.

    The statement has been terminated.

    I propose an approach similar to the common table expression query shown above.

    SELECT FirstName, LastName, NULL AS ManagerFirstName, NULL AS ManagerLastName
    FROM EmployeesNode
    WHERE ReportsTo IS NULL
    UNION ALL
    SELECT e.FirstName, e.LastName, m.FirstName AS ManagerFirstName, m.LastName AS ManagerLastName
    FROM EmployeesNode AS e, reportsTo, EmployeesNode AS m
    WHERE MATCH (e-(reportsTo)->m);

    FirstName LastName  ManagerFirstName ManagerLastName
    --------- --------- ---------------- ---------------
    Andrew    Fuller    NULL             NULL
    Nancy     Davolio   Andrew           Fuller
    Janet     Leverling Andrew           Fuller
    Margaret  Peacock   Andrew           Fuller
    Steven    Buchanan  Andrew           Fuller
    Michael   Suyama    Steven           Buchanan
    Robert    King      Steven           Buchanan
    Laura     Callahan  Andrew           Fuller
    Anne      Dodsworth Steven           Buchanan

    I have another solution, but it is for another day.

  • Modeling Many-to-Many relationships in SQL Server 2017 Graph Database

    Relational databases do not natively support many to many relationships although a junction table is an easy workaround. Graph databases natively support many-to-many relationships. One of the first sample databases Microsoft provided for SQL Server was Northwind. It has three many-to-many relationships as can be seen in the entity relationship diagram shown below. I previously published a list of references for understanding graphs here.

    image

    Figure 1. Northwind database ER diagram.

    I’m going to focus on one junction table, the EmployeeTerritories table.

    image

    Figure 2. Many-to-many relationship implemented with the EmployeeTerritory junction table. The colors carry through to the code samples that follow.

    I always tell people to start with obtaining an understanding the data. Let’s run some queries to find out more about the nature of the data.

    SELECT * FROM dbo.Employees WHERE EmployeeID NOT IN (SELECT EmployeeID FROM dbo.EmployeeTerritories);
    SELECT * FROM dbo.Territories WHERE TerritoryID NOT IN (SELECT TerritoryID FROM dbo.EmployeeTerritories);

    Not all territories have employees. All employees have territories. As I pointed out in this post, directionality of an edge matters. It’s something you have to pay attention to. The following query shows how to display all of the rows in the many-to-many relationship.

    SELECT E.FirstName, E.LastName, T.TerritoryDescription
    FROM dbo.Employees AS E
    INNER JOIN dbo.EmployeeTerritories AS ET
    ON E.EmployeeID = ET.EmployeeID
    INNER JOIN dbo.Territories AS T
    ON T.TerritoryID = ET.TerritoryID;

    FirstName LastName  TerritoryDescription
    --------- --------- --------------------
    Nancy     Davolio   Wilton             
    Nancy     Davolio   Neward             
    Andrew    Fuller    Westboro           
    Andrew    Fuller    Bedford            
    Andrew    Fuller    Georgetow          
    Andrew    Fuller    Boston             
    Andrew    Fuller    Cambridge          
    Andrew    Fuller    Braintree          
    Andrew    Fuller    Louisville         
    Janet     Leverling Atlanta            
    Janet     Leverling Savannah           
    Janet     Leverling Orlando            
    Janet     Leverling Tampa              
    Margaret  Peacock   Rockville          
    Margaret  Peacock   Greensboro         
    Margaret  Peacock   Cary               
    Steven    Buchanan  Providence         
    Steven    Buchanan  Morristown         
    Steven    Buchanan  Edison             
    Steven    Buchanan  New York           
    Steven    Buchanan  New York           
    Steven    Buchanan  Mellvile           
    Steven    Buchanan  Fairport           
    Michael   Suyama    Phoenix            
    Michael   Suyama    Scottsdale         
    Michael   Suyama    Bellevue           
    Michael   Suyama    Redmond            
    Michael   Suyama    Seattle            
    Robert    King      Hoffman Estates    
    Robert    King      Chicago            
    Robert    King      Denver             
    Robert    King      Colorado Springs   
    Robert    King      Santa Monica       
    Robert    King      Menlo Park         
    Robert    King      San Francisco      
    Robert    King      Campbell           
    Robert    King      Santa Clara        
    Robert    King      Santa Cruz         
    Laura     Callahan  Philadelphia       
    Laura     Callahan  Beachwood          
    Laura     Callahan  Findlay            
    Laura     Callahan  Racine             
    Anne      Dodsworth Hollis             
    Anne      Dodsworth Portsmouth         
    Anne      Dodsworth Southfield         
    Anne      Dodsworth Troy               
    Anne      Dodsworth Bloomfield Hills   
    Anne      Dodsworth Roseville          
    Anne      Dodsworth Minneapolis
          

    Create and populate a node table of employee data.

    CREATE TABLE dbo.EmployeesNode(
        EmployeeID int IDENTITY(1,1) NOT NULL,
        LastName nvarchar(20) NOT NULL,
        FirstName nvarchar(10) NOT NULL,
        Title nvarchar(30) NULL,
        TitleOfCourtesy nvarchar(25) NULL,
        BirthDate datetime NULL,
        HireDate datetime NULL,
        Address nvarchar(60) NULL,
        City nvarchar(15) NULL,
        Region nvarchar(15) NULL,
        PostalCode nvarchar(10) NULL,
        Country nvarchar(15) NULL,
        HomePhone nvarchar(24) NULL,
        Extension nvarchar(4) NULL,
        Photo VARBINARY(MAX) NULL,  -- changed from deprecated IMAGE data type
        Notes NVARCHAR(MAX) NULL,   -- changed from deprecated TEXT data type
        ReportsTo int NULL,
        PhotoPath nvarchar(255) NULL
    ) AS NODE;
    GO

    SET IDENTITY_INSERT dbo.EmployeesNode ON

    INSERT INTO dbo.EmployeesNode
    (EmployeeID, LastName, FirstName, Title, TitleOfCourtesy, BirthDate, HireDate, Address, City, Region, PostalCode, Country, HomePhone, Extension, Photo, Notes, ReportsTo, PhotoPath)
    SELECT EmployeeID, LastName, FirstName, Title, TitleOfCourtesy, BirthDate, HireDate, Address, City, Region, PostalCode, Country, HomePhone, Extension, Photo, Notes, ReportsTo, PhotoPath
    FROM dbo.Employees;

    SET IDENTITY_INSERT dbo.EmployeesNode OFF

    Create and populate a node table of territories data.

    CREATE TABLE dbo.TerritoriesNode(
        TerritoryID nvarchar(20) NOT NULL,
        TerritoryDescription nchar(50) NOT NULL,
        RegionID int NOT NULL
    ) AS NODE;
    GO

    INSERT INTO dbo.TerritoriesNode
    (TerritoryID, TerritoryDescription, RegionID)
    SELECT TerritoryID, TerritoryDescription, RegionID
    FROM dbo.Territories;

    It’s necessary to create a query to retrieve node id pairs to populate the edge table. It’s a modification to the junction table query shown above. I suggest running just the SELECT portion of the following query so you can see what the edge table is being populated with.

    INSERT INTO belongsTo
    SELECT E.$node_id, T.$node_id
    FROM dbo.EmployeesNode E
    INNER JOIN dbo.EmployeeTerritories ET
    ON E.EmployeeID = ET.EmployeeID
    INNER JOIN dbo.TerritoriesNode T
    ON T.TerritoryID = ET.TerritoryID;

    Run a graph query to see the many-to-many relationship represented as 49 rows.

    SELECT EmployeesNode.FirstName, EmployeesNode.LastName, TerritoriesNode.TerritoryDescription
    FROM EmployeesNode, belongsTo, TerritoriesNode
    WHERE MATCH (EmployeesNode-(belongsTo)->TerritoriesNode);

    FirstName LastName TerritoryDescription
    --------- -------- --------------------
    Andrew    Fuller   Westboro           
    Andrew    Fuller   Bedford            
    Andrew    Fuller   Georgetow          
    Andrew    Fuller   Boston             
    Andrew    Fuller   Cambridge          
    Andrew    Fuller   Braintree          
    Steven    Buchanan Providence         
    Anne      Dodswort Hollis             
    Anne      Dodswort Portsmouth         
    Nancy     Davolio  Wilton             
    Steven    Buchanan Morristown         
    Steven    Buchanan Edison             
    Steven    Buchanan New York           
    Steven    Buchanan New York           
    Steven    Buchanan Mellvile           
    Steven    Buchanan Fairport           
    Laura     Callahan Philadelphia       
    Nancy     Davolio  Neward             
    Margaret  Peacock  Rockville          
    Margaret  Peacock  Greensboro         
    Margaret  Peacock  Cary               
    Janet     Leverlin Atlanta            
    Janet     Leverlin Savannah           
    Janet     Leverlin Orlando            
    Janet     Leverlin Tampa              
    Andrew    Fuller   Louisville         
    Laura     Callahan Beachwood          
    Laura     Callahan Findlay            
    Anne      Dodswort Southfield         
    Anne      Dodswort Troy               
    Anne      Dodswort Bloomfield Hills   
    Laura     Callahan Racine             
    Anne      Dodswort Roseville          
    Anne      Dodswort Minneapolis        
    Robert    King     Hoffman Estates    
    Robert    King     Chicago            
    Robert    King     Denver             
    Robert    King     Colorado Springs   
    Michael   Suyama   Phoenix            
    Michael   Suyama   Scottsdale         
    Robert    King     Santa Monica       
    Robert    King     Menlo Park         
    Robert    King     San Francisco      
    Robert    King     Campbell           
    Robert    King     Santa Clara        
    Robert    King     Santa Cruz         
    Michael   Suyama   Bellevue           
    Michael   Suyama   Redmond            
    Michael   Suyama   Seattle
                

    The results are the same, although the order is not. There is no implicit order in a relational database. I didn’t use a SORT to make sure you understand that you should not expect the same order of results from different queries.

    As you can see, the graph query has very simple code.

    I’m pretty sure somebody is going to ask what if the order of the nodes in the MATCH is changed?

    SELECT EmployeesNode.FirstName, EmployeesNode.LastName, TerritoriesNode.TerritoryDescription
    FROM EmployeesNode, belongsTo, TerritoriesNode
    WHERE MATCH (TerritoriesNode-(belongsTo)->EmployeesNode);

    No rows are returned because of the directionality of the edge. We defined our graph has having employees belonging to territories and not territories belonging to employees. It’s the direction of the arrow that matters, not the order of the nodes in the MATCH. The following query returns all 49 rows.

    SELECT EmployeesNode.FirstName, EmployeesNode.LastName, TerritoriesNode.TerritoryDescription
    FROM EmployeesNode, belongsTo, TerritoriesNode
    WHERE MATCH (TerritoriesNode<-(belongsTo)-EmployeesNode);

    Bidirectional matching is not supported. Original graph query with another arrow added.

    SELECT EmployeesNode.FirstName, EmployeesNode.LastName, TerritoriesNode.TerritoryDescription
    FROM EmployeesNode, belongsTo, TerritoriesNode
    WHERE MATCH (EmployeesNode<-(belongsTo)->TerritoriesNode);

    Msg 102, Level 15, State 1, Line 60
    Incorrect syntax near '>'.

    Graph nodes and edges appear under Graph Tables apart from regular tables.

    image

    Figure 3. Node and Edge tables appear under Graph Tables.

    My next post in this series shows how to migrate Northwind's adjacency list in the Employees table to SQL Server 2017 nodes and edges. 



  • Using Hyper-V for SQL Server CTP and RC Versions

    Yesterday at SQL Saturday I was asked how to connect to SQL Server 2017 CTP 2.1 that was running in a virtual machine by using Remote Desktop Connection. If you want to learn how to use new software before it is released, running it in a virtual machine is a really good idea. It is wise to keep prerelease software off of your primary machine. Sometimes people call this keeping your machine clean. Microsoft no longer refers to prerelease versions of SQL Server as beta versions. Instead, they call the beta versions Community Technology Preview (CTP) versions. Prior to the final production release, they may be some Release Candidate (RC) versions. I do not install beta, CTP, or RC versions of software on my primary machine. Instead, I install such versions in a virtual machine. If you override the default settings for interacting with your virtual machine, you can greatly enhance your user experience. Among other things, you will likely experience a need to easily get files into and out of your virtual machine. CAUTION: The steps shown within this post remove the protections separating the virtual machine from the host machine. If you want to investigate the effects of potential malware within the safety of a virtual machine, you should not follow any of the steps within this post.

    Windows 10 and Windows Server 2016 come with Hyper-V. After initially installing an operating system in a Hyper-V virtual machine, you will connect to it using the Virtual Machine Connection application. Instead of clicking the Connect button, you might want to familiarize yourself with the available options. Although I have primarily worked as a database consultant for the past 20 years, I’ve done quite a bit technical training and presentations as well. One of the things I stress to those I’m teaching is that you should not click on buttons without understanding. You need to know what defaults you are implicitly accepting. You should know if there are advantages to changing the default environment. Click the arrow by Show Options to see what is available.

    image

    Figure 1. Virtual Machine Connection. Click the arrow by Show Options.

    You might want to save your settings by checking the obviously named checkbox so that your customized settings will persist from session to session.

    image

    Figure 2. Click the Local Resources tab.

    Notice that here is a tab that says Local Resources.

    image

    Figure 3. Click the More… button.

    Click the More… button to see what additional options are available. Select the drives on your host machine that you want to make available to your virtual machine.

    image

    Figure 4. Select host operating system drives to make available to virtual machine.

    Once you’ve made all of your selections, connect to your virtual machine. Notice that the host’s machines files are available to the virtual machine.

    image

    Figure 5. Host operating system drives made available to virtual machine.

    Many people prefer the user experience of Remote Desktop Connection over that of the Virtual Machine Connection. I do. By default, Windows operating systems have Remote Desktop Connection turned off. You have to connect to your virtual machine to change that. You’ll do that with Virtual Machine Connection. Navigate to the System applet. It’s probably easiest to type system into the search bar and pick it from the list.

    image

    Figure 6. Navigate to the System applet.

    image

    Figure 7. Select Remote settings.

    By default, Windows operating systems are set to deny Remote Desktop Connection.

    image

    Figure 8. This is why you can’t connect to your virtual machine using Remote Desktop Connection.

    Override the default to enable Remote Desktop Connection.

    image

    Figure 9. Enable Remote Desktop Connection.

    Remote Desktop Connection presents you with the same options as Virtual Machine Connection, plus additional ones. You’ll probably want to override the defaults as show previously to enable access to drives on the host operating system.

    image

    Figure 10. Enabling access to host operating system drives from Remote Desktop Connection.

    Open Windows Explorer in the Remote Desktop Connection so that you can see both the local drives and drives on the host machine. Click the Network node in Windows Explorer. Notice the message about network discovery being turned off. This is another default setting of Windows. Click the message

    image

    Figure 11. Network devices are not visible by default inside the virtual machine.

    image

    Figure 12. Virtual machine now has access to network resources.

  • SQL Server 2017 Graph Database References

    This is a list of references I used in creating my SQL Saturday presentation on SQL Server 2017 graph databases. To get started with the graph database features of SQL Server 2017, I recommend that you start with the following official Microsoft documentation:

    https://docs.microsoft.com/en-us/sql/relational-databases/graphs/sql-graph-overview

    https://docs.microsoft.com/en-us/sql/relational-databases/graphs/sql-graph-architecture

    https://docs.microsoft.com/en-us/sql/relational-databases/graphs/sql-graph-sample it provides code for creating and query a simple graph database

    or download all three web pages as this single pdf https://docs.microsoft.com/pdfstore/en-us/SQL.sql-content/live/relational-databases/graphs.pdf

    Next I recommend reading this to compare and contrast SQL Server 2017 graph to built for purpose graph databases as well as learn about limitations of graph in SQL Server 2017: https://blogs.technet.microsoft.com/dataplatforminsider/2017/04/20/graph-data-processing-with-sql-server-2017/ Notice how it mentions directionality of edges.

    Once you’re ready for a larger graph database, look at https://blogs.msdn.microsoft.com/sqlcat/2017/04/21/build-a-recommendation-system-with-the-support-for-graph-data-in-sql-server-2017-and-azure-sql-db/. You might want to learn more about graph databases than what you will find in the Microsoft documentation. It’s good to know about built for purpose graph databases.

    Here’s a good article to explain SQL Server 2017 graph in more detail and also discuss the broader topic of graph databases outside of just Microsoft. Once again notice that directionality is mentioned. http://www.nikoport.com/2017/06/03/sql-graph-part-i/

    If you’d like to see another simple graph database in SQL Server 2017, you might want to look at https://stephanefrechette.com/sql-graph-sql-server-2017/

    No discussion of graph databases is complete without mentioning Neo4j. You can download and install the community edition for free. Neo4j is built for purpose. The Cypher query language is used to query Neo4j. Once you familiarize yourself with Cypher you’ll see that Microsoft’s graph extensions to SQL Server are similar to Cypher syntax. Two people who work for Neo4j authored a free O’Reilly ebook on graph databases that you can obtain from http://graphdatabases.com/. The title of the book is The Definitive Book on Graph Databases and Introduction to Neo4j, so it tells you that it is biased toward Neo4j. Once again, you should probably pay attention to edge directionality. If you want to read more but want something short instead of a book, try this Wikipedia page on graph databases.

    If you’ve checked out any of these links, I’m confident you’ve seen at least one force-directed graph visualization. You’ll want a software tool to create these. There is a Force-Directed Graph visualization for Power BI. I wasn’t able to get it to work with SQL Server 2017 CTP 2.1. You can download this sample Power BI report and see what the Force-Directed Graph visualization looks like.

    image

    Figure 1. Force-Directed Graph report in Power BI.

    image

    Figure 2. Mouseover on the node named Logan.

    When you are reading about graph databases, it’s important that you don’t let the terminology confuse you. Sometimes authors use the word connections instead of relationships when referring to graph databases. Just kind in mind that a relationship in a graph database context is different from relationship in a relational database context. Graph nodes represent entities.

    Graph databases are good for applications in fraud detection, management of hierarchies such as bill of materials (BOM – don’t pronounce it at an airport) and parts explosions (opposite of a bill of materials, also not good to discuss in the airport lounge), social networks, and purchase recommendations among many other things. I liked reading the academic paper Incremental Anomaly Detection in Graphs. Graph based anomaly detection and description: a survey is another good paper but it is not free to the general public.

    If you don’t understand SQL Server columnar indexes, you really need to understand them to have a foundation for tuning your graphs in SQL Server 2017. You can find an introduction to columnar indexes here. Once you understand columnar indexes, you’ll see why they can be useful in improving graph performance.

    In the presentation at SQL Saturday today, a few people were unfamiliar with the term ASCII art. (If you like ASCII art, the entire first Star Wars movie can be found as ASCII art here.) Somewhere in the linked documentation above, the operators for the graph queries were described as ASCII art. In other words, the characters – and > are combined together to make a graphic construct. In the code below, that graphic construct is an arrow to indicate directionality of an edge as shown in the following query:

    SELECT Restaurant.name
    FROM Person, likes, Restaurant
    WHERE MATCH (Person-(likes)
    ->Restaurant)
    AND Person.name = 'John';

    One person asked about graph queries that don’t return any data. Here’s some code to use with the sample data from the third link on this page. Add a new person to the Person node. Now we have a person who doesn’t like any restaurants because no companion inserts were done for the likes edge.

    INSERT INTO Person VALUES (6,'Dakota');

    SELECT Restaurant.name
    FROM Person, likes, Restaurant
    WHERE MATCH (Person-(likes)->Restaurant)
    AND Person.name = 'Dakota';

    The result is no rows returned. But couldn’t restaurants for Dakota’s friends be returned? No. Dakota doesn’t have any friends. No inserts into the friends edge were made for Dakota. Run the query and you’ll see that no rows are returned.

    SELECT Restaurant.name
    FROM Person person1, Person person2, likes, friendOf, Restaurant
    WHERE MATCH(person1-(friendOf)->person2-(likes)->Restaurant)
    AND person1.name='Dakota';

    As I pointed out in the presentation, notice that the Person node in the previous query is aliased to person1 and person2 in a manner similar to aliasing relational tables for a self-join. I also said that that I was working on modifying the Northwind database to use SQL Server 2017 graph tables. You can see the first blog post in this series here.

    Here’s a good paper explaining how hierarchies are graphs.

More Posts Next page »

This Blog

Syndication

Archives

Privacy Statement