

Browse by Tags
All Tags » unsupervised me... » Data Mining (RSS)

R is the hottest topic in SQL Server 2016. If you want to learn how to use it for advanced analytics, join my seminar at SQL Nexus conference on my 1st in Copenhagen. Although there is still nearly a month before the seminar, there are less than half places still available. You are also very welcome to visit my session Using R in SQL Server, Power ...

This is a bit different post in the series about the data mining and machine learning algorithms. This time I am honored and humbled to announce that my fourth Pluralsight course is alive. This is the Data Mining Algorithms in SSAS, Excel, and R course. besides explaining the algorithms, I also show demos in different products. This gives you even ...

Support vector machines are both, unsupervised and supervised learning models for classification and regression analysis (supervised) and for anomaly detection (unsupervised). Given a set of training examples, each marked as belonging to one of categories, an SVM training algorithm builds a model that assigns new examples into one category. An SVM ...

Principal component analysis (PCA) is a technique used to emphasize the majority of the variation and bring out strong patterns in a dataset. It is often used to make data easy to explore and visualize. It is closely connected to eigenvectors and eigenvalues.
A short definition of the algorithm: PCA uses an orthogonal transformation to convert ...

With the KMeans algorithm, each object is assigned to exactly one cluster. It is assigned to this cluster with a probability equal to 1.0. It is assigned to all other clusters with a probability equal to 0.0. This is hard clustering.
Instead of distance, you can use a probabilistic measure to determine cluster membership. For example, you can ...

Hierarchical clustering could be very useful because it is easy to see the optimal number of clusters in a dendrogram and because the dendrogram visualizes the clusters and the process of building of that clusters. However, hierarchical methods don’t scale well. Just imagine how cluttered a dendrogram would be if 10,000 cases would be shown on ...

Clustering is the process of grouping the data into classes or clusters so that objects within a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters. Dissimilarities are assessed based on the attribute values describing the objects.
There are a large number of clustering algorithms. The ...

Data mining is the most advanced part of business intelligence. With statistical and other mathematical algorithms, you can automatically discover patterns and rules in your data that are hard to notice with online analytical processing and reporting. However, you need to thoroughly understand how the data mining algorithms work in order to ...

This is the fourth part of the fraud detection whitepaper. You can find the first part, the second part, and the third part in my previous blog posts about this topic. Data Mining Models We create multiple mining models by using different algorithms, different input data sets, and different algorithm parameters. Then we evaluate the models in ...

While working on different fraud detection projects, I developed my own approach to the solution for this problem. In my PASS Summit 2013 session I am introducing this approach. I also wrote a whitepaper on the same topic, which was generously reviewed by my friend Matija Lah. In order to spread this knowledge faster, I am starting a series of ...



