

Browse by Tags
All Tags » Data Mining » sql server (RSS)
Showing page 1 of 2 (12 total posts)

I got some questions about virtual machine / notebook setup for my Business Intelligence in SQL Server 2016 DevWeek postconference workshop. I am writing this blog because I want to spread this information as quickly as possible.
There will be no labs during the seminar, no time for this. However, I will make all of the code available. ...

A neural network is a powerful data modeling tool that is able to capture and represent complex input/output relationships. The motivation for the development of neural network technology stemmed from the desire to develop an artificial system that could perform "intelligent" tasks similar to those performed by the human brain. Neural ...

Decision Trees is a directed technique. Your target variable is the one that holds information about a particular decision, divided into a few discrete and broad categories (yes / no; liked / partially liked / disliked, etc.). You are trying to explain this decision using other gleaned information saved in other variables (demographic data, ...

I am continuing with my data mining and machine learning algorithms series. Naive Bayes is a nice algorithm for classification and prediction.
It calculates probabilities for each possible state of the input attribute, given each state of the predictable attribute, which can later be used to predict an outcome of the predicted attribute based on ...

This is a bit different post in the series about the data mining and machine learning algorithms. This time I am honored and humbled to announce that my fourth Pluralsight course is alive. This is the Data Mining Algorithms in SSAS, Excel, and R course. besides explaining the algorithms, I also show demos in different products. This gives you even ...

With the KMeans algorithm, each object is assigned to exactly one cluster. It is assigned to this cluster with a probability equal to 1.0. It is assigned to all other clusters with a probability equal to 0.0. This is hard clustering.
Instead of distance, you can use a probabilistic measure to determine cluster membership. For example, you can ...

Hierarchical clustering could be very useful because it is easy to see the optimal number of clusters in a dendrogram and because the dendrogram visualizes the clusters and the process of building of that clusters. However, hierarchical methods don’t scale well. Just imagine how cluttered a dendrogram would be if 10,000 cases would be shown on ...

Clustering is the process of grouping the data into classes or clusters so that objects within a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters. Dissimilarities are assessed based on the attribute values describing the objects.
There are a large number of clustering algorithms. The ...

Data mining is the most advanced part of business intelligence. With statistical and other mathematical algorithms, you can automatically discover patterns and rules in your data that are hard to notice with online analytical processing and reporting. However, you need to thoroughly understand how the data mining algorithms work in order to ...

This is the third part of the fraud detection whitepaper. You can find the first part and the second part in my previous blog posts about this topic. Data Preparation The problem of credit card fraud detection is not trivial. With every transaction processed, only a limited amount of data is available, making it difficult if not impossible to ...
1



