

Browse by Tags
All Tags » statistics (RSS)
Showing page 1 of 3 (25 total posts)

R is the hottest topic in SQL Server 2016. If you want to learn how to use it for advanced analytics, join my seminar at SQL Nexus conference on my 1st in Copenhagen. Although there is still nearly a month before the seminar, there are less than half places still available. You are also very welcome to visit my session Using R in SQL Server, Power ...

It’s been awhile since I wrote the last blog on the data mining / machine learning algorithms. I described the Neural Network algorithm. In addition, it is a good time to write another post in order to remind the readers of the two upcoming seminars about the algorithms I have in Oslo, Friday, September 2nd, 2016, and in Cambridge, Thursday, ...

A neural network is a powerful data modeling tool that is able to capture and represent complex input/output relationships. The motivation for the development of neural network technology stemmed from the desire to develop an artificial system that could perform "intelligent" tasks similar to those performed by the human brain. Neural ...

I am continuing with my data mining and machine learning algorithms series. Naive Bayes is a nice algorithm for classification and prediction.
It calculates probabilities for each possible state of the input attribute, given each state of the predictable attribute, which can later be used to predict an outcome of the predicted attribute based on ...

Support vector machines are both, unsupervised and supervised learning models for classification and regression analysis (supervised) and for anomaly detection (unsupervised). Given a set of training examples, each marked as belonging to one of categories, an SVM training algorithm builds a model that assigns new examples into one category. An SVM ...

Hierarchical clustering could be very useful because it is easy to see the optimal number of clusters in a dendrogram and because the dendrogram visualizes the clusters and the process of building of that clusters. However, hierarchical methods don’t scale well. Just imagine how cluttered a dendrogram would be if 10,000 cases would be shown on ...

Clustering is the process of grouping the data into classes or clusters so that objects within a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters. Dissimilarities are assessed based on the attribute values describing the objects.
There are a large number of clustering algorithms. The ...

We are close to the publishing day of the TSQL Querying book. Of course, like always in this series, the main author of the book is Itzik BenGan. This time, besides me, Adam Machanic and Kevin Farlee are the coauthors. The information I want to share now is that you can get a substantial discount if you preorder the book today, Monday, February ...

I was just starting to work on a post on column statistics, using one of my favorite metadata functions: sys.dm_db_stats_properties(), when I realized something was missing.
The function requires a stats_id value, which is available from sys.stats. However, sys.stats does not show the column names that each statistics object is attached ...

The session Skewed Data, Poor Cardinality Estimates, and Plans Gone Bad by Kimberly Tripp (@KimberlyLTripp) has been published on channel SQLPASS TV. Abstract When data distribution is heavily skewed, cardinality estimation (how many rows the query optimizer expects each operator to process) can be wildly incorrect, resulting in ...
1



