It’s been awhile since I wrote the last blog on the data mining / machine learning algorithms. I described the Neural Network algorithm. In addition, it is a good time to write another post in order to remind the readers of the two upcoming seminars about the algorithms I have in Oslo, Friday, September 2nd, 2016, and in Cambridge, Thursday, September 8th. Hope to see you in one of the seminars. Finally, to conclude this marketing part: if you are interested in the R language, I am preparing another seminar “EmbRace R”, which will cover R from basics to advanced analytics. Stay tuned.
Now for the algorithm. If you remember the post, a Neural network has an input, an output, and one or more hidden layers. The Neural Network algorithm uses the hyperbolic tangent activation function in the hidden layer and the sigmoid function in output layer. However, the Sigmoid function is called the Logistic function as well. Therefore, describing the Logistic Regression algorithm is simple after I described the Neural Network. If a neural network has only input neurons that are directly connected to the output neurons, it is a Logistic Regression. Or, to repeat the same thing in a different way: Logistic Regression is Neural Network with zero hidden layers.
This was quick:-) To add more meat to the post, I am adding the formulas and the graphs for the hyperbolic tangent and sigmoid functions.
I am closing my plan for the second semester of this year. Before listing the events I plan to attend, just a quick comment. I had conversation about some specific events and why don’t I visit them many times, especially about the events in vicinity. My answer is pretty simple. I try to plan my events for six months in advance. My schedule for the year 2016 is full. I simply can’t visit the events that are announced only couple of months in advance. I prefer long-term planning.
Anyway, here is my list, pretty long again.
- SQL Grill, Lingen, Germany, August 19th: one presentation - Statistics with T-SQL
- SQLSaturday #532 - Oslo 2016, September 2nd-3rd:
- SQLSaturday #520 - Cambridge 2016, September 8th-10th:
- SQLSaturday #555 - Munich 2016, October 8th: not confirmed yet.
- SQLSaturday #538 - Sofia 2016, October 15th: not confirmed yet.
- PASS Summit 2016, October 25th-28th, Seattle, WA:
- SQLSaturday #569 - Prague 2016, December 3rd: not confirmed yet.
- SQLSaturday #567 - Slovenia 2016, December 9th-10th: since I am one of the organizers, this one is confirmed:-)
And this should be enough for this year:-)
So we are back again
The leading event dedicated to Microsoft SQL Server in Slovenia will take place on Saturday, 10th December 2016, at the Faculty of Computer and Information Science of the University of Ljubljana, Večna pot 113, Ljubljana (http://www.fri.uni-lj.si/en/about/how_to_reach_us/).
As always, this is an English-only event. We don’t expect the speakers and the attendees to understand Slovenian However, this way, our SQL Saturday has become quite well known especially in the neighboring countries. Therefore, expect not only international speakers, expect international attendees as well. There will be 30 top sessions, two original and interesting pre-conference seminars, a small party after the conference, an organized dinner for the speakers and sponsors… But first of all, expect a lot of good vibrations, mingling with friends, smiling faces, great atmosphere. You might also consider visiting Ljubljana and Slovenia for couple of additional days. Ljubljana is a very beautiful and lively city, especially in December.
In cooperation with Kompas Xnet d.o.o. we are once again organizing two pre-conference seminars by three distinguished Microsoft SQL Server experts:
The seminars will take place the day before the main event, on Friday, 9th December 2016, at Kompas Xnet d.o.o., Stegne 7, Ljubljana. The attendance fee for each seminar is 149.00 € per person; until 31st October 2016 you can register for each seminar for 119.00 € per person.
Hope we meet at the event!
This is a tip that should help installing SQL Server 2016 (tested on CTP33, RC2 and RC3) Master Data Services. The documentation is pretty old and incomplete (I already sent the feedback).
The page “Web Application Requirements (Master Data Services)” (https://msdn.microsoft.com/en-us/library/ee633744.aspx) should be seriously updated.
First of all, there should be documented also how to use operating systems Windows Server 2012 R2 and Windows 10. I managed to install it on Windows Server 2012 R2. However, there is a bullet missing in the Role and Role Services part. In the Performance section, only Static Content Compression is mentioned. However, Dynamic Content Compression is needed as well.
I managed to get it up and running
I got some questions about virtual machine / notebook setup for my Business Intelligence in SQL Server 2016 DevWeek post-conference workshop. I am writing this blog because I want to spread this information as quickly as possible.
There will be no labs during the seminar, no time for this. However, I will make all of the code available. Therefore, if the attendees would like to test the code, they need to prepare their own setup. I will use the following SW:
Windows Server 2012 R2
SQL Server 2016 components
- Database Engine
- R Services
- SQL Server Management Studio (this is not included in SQL Server setup anymore)
- SQL Server Data Tools
- R Tools for Visual Studio
- R Studio
Excel 2016 Professional Plus with add-ins
- MDS add-in
- Power Pivot
- Power Query
- Power Map
- Power View
- Azure ML add-in
Excel 2013 Professional Plus with add-ins
- Data Mining add-in (this add-in does not work for Excel 2016 yet, this one is announced for Excel 2016 only later this year, after SQL Server 2016 release)
Power BI Apps and Services
- Power BI Desktop
- Power BI Service (they need to create a free account at PowerBI.com)
- Azure ML (they need to create a free account at AzureML.com)
AdventureWorks demo databases version 2016, 2014 or 2012
I know the list is long:-) However, nobody needs to test everything. Just pick the parts you need and you want to learn about.
See you soon!
Traditionally, I write down a list of presentations I am giving on different events every semester.
This semester, I am already a bit late. I am still missing some info. So here is the list of the events I am planning to attend. I will add events and correct the list as needed later. Here is the updated info. Of course, more updates will come when I get the relevant information.
- Bulgarian UG meeting, Sofia, January 14th: presentation Introducing R and Azure ML
- Slovenian UG meeting, Ljubljana, February 18th: presentation Introducing R and Using R in SQL Server 2016, Power BI, and Azure ML
- SQL Server Konferenz 2016, Darmstadt, February 23rd – 25th:
- pre-conference seminar Data Mining Algorithms in SSAS, Excel, R, and Azure ML
- presentation SQL Server & Power BI Geographic and Temporal Predictions
- PASS SQL Saturday #495, Pordenone, February 27th:
- presentation SQL Server 2012-2016 Columnar Storage
- presentation Enterprise Information Management with SQL Server 2016
- DevWeek 2016, London, April 22nd – 26th:
- post-conference seminar Business Intelligence in SQL Server 2016
- presentation Using R in SQL Server 2016 Database Engine and Reporting Services
- presentation SQL Server Isolation Levels and Locking
- SQL Nexus, Copenhagen, May 2nd – 4th: presentation Identity Mapping and De-Duplicating
- SQL Bits 2016, Liverpool, May 4th – 7th: presentation Using R in SQL Server 2016 Database Engine and SSRS
- SQL Day, Wroclaw, May 16th – 18th:
- pre-conference seminar Data Mining Algorithms in SSAS, Excel, R, and Azure ML
- presentation: Statistical Analysis with T-SQL
- presentation: Anomaly Detection
- PASS SQL Saturday #508, Kyiv, May 21st: information to follow.
- PASS SQL Saturday #510, Paris, June 25th: information to follow.
- PASS SQL Saturday #520, Cambridge, September 10th: information to follow. And yes, this is already quarter 3, but I am late with this ist anyway
A neural network is a powerful data modeling tool that is able to capture and represent complex input/output relationships. The motivation for the development of neural network technology stemmed from the desire to develop an artificial system that could perform "intelligent" tasks similar to those performed by the human brain. Neural networks resemble the human brain in the following two ways:
- A neural network acquires knowledge through learning
- A neural network's knowledge is stored within inter-neuron connection strengths known as synaptic weights
The Neural Network algorithm is an artificial intelligence technique that explores more possible data relationships than other algorithms. Because it is such a thorough technique, the processing of it is usually slower than the processing of other classification algorithms.
A neural network consists of basic units modeled after biological neurons. Each unit has many inputs that it combines into a single output value. These inputs are connected together, so the outputs of some units are used as inputs into other units. The network can have one or more middle layers called hidden layers. The simplest are feed-forward networks (pictured), where there is only a one-way flow through the network from the inputs to the outputs. There are no cycles in the feed-forward networks.
As mentioned, units combine inputs into a single output value. This combination is called the unit’s activation function. Consider this example: The human ear can function near a working jet engine. Yet, if it were only 10 times more sensitive, you would be able to hear a single molecule hitting the membrane in your ears! What does that mean? When you go from 0.01 to 0.02, the difference should be comparable with going from 100 to 200. In biology, there are many types of non-linear behavior.
Thus, an activation function has two parts. The first part is the combination function that merges all of the inputs into a single value (weighted sum, for example). The second part is the transfer function, which transfers the value of the combination function to the output value of the unit. The linear transfer function would do just the linear regression. The transfer functions are S-shaped, like the sigmoid function:
Sigmoid(x) = 1 / (1 + e(-x)).
A single hidden layer is optimal, so the Neural Network algorithm always uses a maximum of one (or zero for Logistic Regression).
The Neural Network algorithm uses the hyperbolic tangent activation function in the hidden layer and the sigmoid function in output layer. You can see a Neural Network with a single hidden layer in the following picture.
Training a neural network is the process of setting the best weights on the inputs of each of the units. This backpropagation process does the following:
- Gets a training example and calculates outputs
- Calculates the error – the difference between the calculated and the expected (known) result
- Adjusts the weights to minimize the error
Like the Decision Trees algorithm, you can use the Neural Network algorithm for classification and prediction. The interpretation of the Neural Network algorithm results is somewhat more complex than the interpretation of the Decision Trees algorithm results. Consequently, the Decision Trees algorithm is more popular.
Our third SQL Saturday in Ljubljana is over. Two weeks seems to be enough time to sleep over and see things a bit from a distance. Without any further delays, I can declare that it is clear that the event was a pure success
Let me start with the numbers, comparing total number of people, including speakers, sponsors, attendees, and organizers, with previous two SQL Saturdays in Ljubljana:
- 2013: 135 people from 12 countries
- 2014: 215 people from 16 countries
- 2015: 253 people from 20 countries
You can clearly see the growth. Even the keynote was full, like the following picture shows.
We again experienced very small drop rate; more than 90% or registered attendees showed up. That’s very nice, showing respect to the speakers and sponsors. So thank you, attendees, for being fair and respectful again!
We had more sponsors than previous years. This was extremely important, because this time we did not get the venue for free, and therefore we needed more money than for the previous two events. Thank you, sponsors, for enabling the event!
Probably the most important part of these SQL Saturday events are the speakers. We got 125 sessions submitted by 51 speakers from 20 countries! We were really surprised. We take this a sign of our good work in the past. 30 great sessions with two state of the art precon seminars is more than we expected, yet still not enough to accommodate all speakers that sent the submissions. Thank you all speakers, those who were selected and those who were not! I hope we see you again in Slovenia next year. You can see some of the most beautiful speakers and volunteers in the following picture (decide by yourself if there is somebody spoiling the picture).
Next positive surprise were the volunteers. With these number of speakers and attendees, we would not be able to handle the event without them. We realized that we have a great community, consisting of some really helpful people, that we can always count on. Thank you all!
I think I can say for all three organizers, Mladen Prajdić, Matija Lah, and me, that we were more tired than any year before. However, hosting a satisfied crowd is the best payback you can imagine And the satisfaction level was high even among the youngest visitors, as you can see from the following picture.
Of course, we experienced also some negative things. However, just a day before the New Year evening, I am not going to deal with them now. Let me finish this post in a positive way
Decision Trees is a directed technique. Your target variable is the one that holds information about a particular decision, divided into a few discrete and broad categories (yes / no; liked / partially liked / disliked, etc.). You are trying to explain this decision using other gleaned information saved in other variables (demographic data, purchasing habits, etc.). With limited statistical significance, you are going to predict the target variable for a new case using its known values of the input variables based on results of your trained model.
Recursive partitioning is used to build the tree. The data is split into partitions using a certain value of one of the explaining variables. The partitions are then split again and again. Initially the data is in one big box.
The algorithm tries all possible breaks of both input (explaining) variables for the initial split. The goal is to get purer partitions considering the classes of the target variable. You know intuitively that purity is related to the percentage of the cases in each class of the target variable. There are many better, but more complicated measures of the purity, for example entropy or information gain.
The tree continues to grow using the two new partitions as separate starting points and splitting them more. You have to stop the process somewhere. Otherwise, you could get a completely fitted tree that has only one case in each class. The class would be, of course, absolutely pure. This would not make any sense. The results could not be used for any meaningful prediction. This phenomenon is called “over-fitting”. There are two basic approaches to solve this problem: pre-pruning (bonsai) and post-pruning techniques.
The pre-pruning (bonsai) methods prevent growth of the tree in advance by applying tests at each node to determine whether a further split is going to be useful; the tests can be simple (number of cases) or complicated (complexity penalty). The post-pruning methods allow the tree to grow and then prune off the useless branches. The post-pruning methods tend to give more accurate results, but they require more computation than pre-pruning methods.
Imagine the following example. You have the answers to a simple question: Did you like the famous Woodstock movie? You also have some demographic data: age (20 to 60) and education (ranged in 7 classes from the lowest to the highest). In all, 55% of the interviewees liked the movie and 45% did not like it.
Can you discover factors that have an influence on whether they liked the movie?
Starting point: 55% of the interviewees liked the movie and 45% did not like it.
After checking all possible splits, you find the best initial split made at the age of 35.
With further splitting, you finish with a full-grown tree. Note that not all branches lead to purer classes. Some of them are not useful at all and should be pruned.
Decision trees are used for classification and prediction. Typical usage scenarios include:
- Predicting which customers will leave
- Targeting the audience for mailings and promotional campaigns
- Explain reasons for a decision
- Answering questions such as “What movies do young female customers buy?”
Decision Trees is the most popular data mining algorithm. This is because the results are very understandable and simple to interpret, and the quality of the predictions is usually very high.
It is alive! It was really hard to make the choices. Nevertheless, the schedule is public now. To everybody that submitted proposals – thank you very much again! We are sorry we cannot accommodate everybody. Please note that even if you were not selected, we would be happy to see you in Ljubljana.
I am continuing with my data mining and machine learning algorithms series. Naive Bayes is a nice algorithm for classification and prediction.
It calculates probabilities for each possible state of the input attribute, given each state of the predictable attribute, which can later be used to predict an outcome of the predicted attribute based on the known input attributes. The algorithm supports only discrete or discretized attributes, and it considers all input attributes to be independent. Starting with a single dependent and a single independent variable, the algorithm is not too complex to understand (I am using an example from my old book about statistics - Thomas H. Wonnacott, Ronald J. Wonnacot: Introductory Statistics, Wiley 1990).
Let’s say I am buying a used car. In an auto magazine I find that 30% of second-hand cars are faulty. I take with me a mechanic who can make a shrewd guess on a basis of a quick drive. Of course, he isn’t always right. Of the faulty cars he examined in the past he correctly pronounced 90 % faulty and wrongly pronounced 10% ok. When judging good cars, he correctly pronounced 80% of them as good, and wrongly 20% as faulted. In the graph, we can see that 27% (90% of 30%) of all cars are actually faulty and then correctly identified as such. 14% (20% of 70%) are judged faulty, although they are good. Altogether, 41% (27% + 14%) of cars are judged faulty. Of these cars, 67% (27% / 41%) are actually faulty. To sum up: Once the car has been pronounced faulty by the mechanic, the chance that it is actually faulty rises from the original 30% up to 67%. The following figure shows this process.
The calculations in the previous slide can be summarized in another tree, the reverse tree. You can start branching with opinion of the mechanic (59% ok and 41% faulty). Moving to the right, the second branching shows the actual conditions of the cars, and this is the valuable information for you. For example, the top branch says: Once the car is judged faulty, the chance that it actually turns faulty is 67%. The third branch from the top displays clearly: Once the car is judged good, the chance that it is actually faulty is just 5%. You can see the reverse tree in the following figure.
As mentioned, Naïve Bayes treats all of the input attributes as independent of each other with respect to the target variable. While this could be a wrong assumption, it allows multiplying the probabilities to determine the likelihood of each state of the target variable based on states of input variables. For example, let’s say that you need to analyze whether there is any association between NULLs in different columns of your Products table. You realize that if Color is missing, 80% of Weight values are missing as well; and if Class is missing, 60% of Weight values are missing as well. You can multiply these probabilities. If Weight is missing, you can calculate the product:
0.8 (Color missing for Weight missing) * 0.6 (Class missing for Weight missing) = 0.48
You can also check what happens to the not missing state of the target variable, the Weight:
0.2 (Color missing for Weight not missing) * 0.4 (Class missing for Weight not missing) = 0.08
You can easily see that the likelihood that Weight is missing is much higher than the likelihood it is not missing when Color and Class are unknown. You can convert the likelihoods to probabilities by normalizing their sum to 1:
P (Weight missing if Color and Class are missing) = 0.48 / (0.48 + 0.08) = 0.857
P (Weight not missing if Color and Class are missing) = 0.08 / (0.48 + 0.08) = 0.143
Now when you know that the Color value is NULL and the Class value is null, then you have nearly 86% chances that you get NULL also in the Weight attribute. This might lead you to some conclusions where to start improving your data quality.
In general, you use the Naive Bayes algorithm for classification. You want to extract models describing important data classes and then assign new cases to predefined classes. Some typical usage scenarios include:
- Categorizing bank loan applications (safe or risky) based on previous experience
- Determining which home telephone lines are used for Internet access
- Assigning customers to predefined segments.
- Quickly obtaining a basic comprehension of the data by checking the correlation between input variables.
I am finishing my list of conferences and seminars I am attending in the second half of the year 2015. Here is my list.
- Kulendayz 2015 – September 4th-5th. Although I will have huge problems to get there on time, I would never like to miss it. I have one talk there.
- SQL Saturday #413 Denmark – September 17th-19th. You can join me already on Friday for the Data Mining Algorithms seminar.
- SQL Saturday #434 Holland – September 25th-26th. If you miss the Denmark Data Mining Algorithms seminar, I am repeating it in Utrecht.
- SQL Server Days 2015 Belgium – September 28th-29th. This will be my first talk at SQL Server Days.
- SQL Saturday #454 Turin – October 10th. I was not confirmed as a speaker yet, but I still plan to go there, to combine the SQL part with the Expo in Milan part.
- PASS Summit 2015 Seattle – October 27th-30th. I still continue to be present at every single summit:-) This year I have one presentation.
- SharePoint Days 2015 Slovenia – November 17th-18th. No, I don’t like SPS. I will just have one small BI presentation there.
- SQL Saturday #475 Belgrade – November 28th. First SQL Saturday in Serbia. I simply must be there.
- SQL Saturday #460 Slovenia – December 11th-12th. I am co-organizing this event. Everybody is welcome, this will be fully English-speaking event. Don’t miss beautiful, relaxed and friendly Ljubljana!
That’s it. For now:-)
This is a bit different post in the series about the data mining and machine learning algorithms. This time I am honored and humbled to announce that my fourth Pluralsight course is alive. This is the Data Mining Algorithms in SSAS, Excel, and R course. besides explaining the algorithms, I also show demos in different products. This gives you even better understanding than just reading the blog posts.
Of course, I will continue with describing the algorithms here as well.