Dejan Sarka : association ruleshttp://sqlblog.com/blogs/dejan_sarka/archive/tags/association+rules/default.aspxTags: association rulesenCommunityServer 2.1 SP2 (Build: 61129.1)Embrace R @ SQL Nexus 2017 & SQL Saturday #626http://sqlblog.com/blogs/dejan_sarka/archive/2017/04/02/embrace-r-sql-nexus-2017-sql-saturday-626.aspxSun, 02 Apr 2017 10:47:48 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:62990Dejan Sarka0http://sqlblog.com/blogs/dejan_sarka/comments/62990.aspxhttp://sqlblog.com/blogs/dejan_sarka/commentrss.aspx?PostID=62990<p>R is the hottest topic in SQL Server 2016. If you want to learn how to use it for advanced analytics, join my <a href="http://www.sqlnexus.com/pre-conference/">seminar</a> at SQL Nexus conference on my 1st in Copenhagen. Although there is still nearly a month before the seminar, there are less than half places still available. You are also very welcome to visit my session <a href="http://www.sqlnexus.com/agenda/">Using R in SQL Server, Power BI, and Azure ML</a> during the main conference.</p> <p>For beginners, I have another session in the same week, just this time in Budapest. You can join me at the <a href="http://www.sqlsaturday.com/626/Sessions/Schedule.aspx">Introducing R</a> session on May 6th at SQL Saturday #626 Budapest.</p> <p>Here is the description of the seminar.</p> <p>As being an open source development, R is the most popular analytical engine and programming language for data scientists worldwide. The number of libraries with new analytical functions is enormous and continuously growing. However, there are also some drawbacks. R is a programming language, so you have to learn it to use it. Open source development also means less control over code. Finally, the free R engine is not scalable.</p> <p>Microsoft added support for R code in SQL Server 2016 and, Azure Machine Learning, or Azure ML, and in Power BI. A parallelized highly scalable execution engine is used to execute the R scripts. In addition, not every library is allowed in these two environments.</p> <p>Attendees of this seminar learn to program with R from the scratch. Basic R code is introduced using the free R engine and RStudio IDE. Then the seminar shows some more advanced data manipulations, matrix calculations and statistical analysis together with graphing options. The mathematics behind is briefly explained as well. Then the seminar switches more advanced data mining and machine learning analyses. Attendees also learn how to use the R code in SQL Server, Azure ML, and create SQL Server Reporting Services (SSRS) reports that use R.</p> <ul> <li>The seminar consists of the following modules: </li> <li>Introduction to R </li> <li>Data overview and manipulation </li> <li>Basic and advanced visualizations </li> <li>Data mining and machine learning methods </li> <li>Scalable R in SQL Server </li> <li>Using R in SSRS, Power BI, and Azure ML</li> </ul> <p>Hope to see you there!</p><img src="http://sqlblog.com/aggbug.aspx?PostID=62990" width="1" height="1">Data Miningstatisticsdata analysissql serversupervised methodsunsupervised methodpatternsfraud detectionO'Reillydata overviewdata preparationCommunity EventConferenceconstraintsmachine learningRmarket basket analysisassociation rulesclusteringSVMNaive BayesDecision TreesNeural NetworkLogistic RegressionSQL NexusData Mining Algorithms – Pluralsight Coursehttp://sqlblog.com/blogs/dejan_sarka/archive/2015/07/30/data-mining-algorithms-pluralsight-course.aspxThu, 30 Jul 2015 10:00:17 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:59233Dejan Sarka0http://sqlblog.com/blogs/dejan_sarka/comments/59233.aspxhttp://sqlblog.com/blogs/dejan_sarka/commentrss.aspx?PostID=59233<p>This is a bit different post in the series about the data mining and machine learning algorithms. This time I am honored and humbled to announce that my fourth Pluralsight course is alive. This is the <a href="http://www.pluralsight.com/courses/data-mining-algorithms-ssas-excel-r">Data Mining Algorithms in SSAS, Excel, and R</a> course. besides explaining the algorithms, I also show demos in different products. This gives you even better understanding than just reading the blog posts.</p> <p>Of course, I will continue with describing the algorithms here as well.</p><img src="http://sqlblog.com/aggbug.aspx?PostID=59233" width="1" height="1">Data Miningdata analysiscoursesql serversupervised methodsunsupervised methodfraud detectionPluralsightmachine learningRmarket basket analysisassociation rulesclusteringSVManomaly detectionData Mining Algorithms – Association Ruleshttp://sqlblog.com/blogs/dejan_sarka/archive/2015/03/10/data-mining-algorithms-association-rules.aspxTue, 10 Mar 2015 16:19:52 GMT21093a07-8b3d-42db-8cbf-3350fcbf5496:58163Dejan Sarka2http://sqlblog.com/blogs/dejan_sarka/comments/58163.aspxhttp://sqlblog.com/blogs/dejan_sarka/commentrss.aspx?PostID=58163<p>The Association Rules algorithm is specifically designed for use in market basket analyses. This knowledge can additionally help in identifying cross-selling opportunities and in arranging attractive packages of products. This is the most popular algorithm used in web sales. You can even include additional discrete input variables and predict purchases over classes of input variables.</p> <h3>Association Rules Basics</h3> <p>The algorithm considers each attribute/value pair (such as product/bicycle) as an item. An <b>itemset</b> is a combination of items in a single transaction. The algorithm scans through the dataset trying to find itemsets that tend to appear in many transactions. Then it expresses the combinations of the items as <b>rules</b> (such as “if customers purchase potato chips, they will purchase cola as well”).</p> <p>Often association models work against datasets containing nested tables, such as a customer list followed by a nested purchases table. If a nested table exists in the dataset, each nested key (such as a product in the purchases table) is considered an item.</p> <h3>Understanding Measures</h3> <p>Besides the itemsets and the rules, the algorithm also return some measures for the itemsets and the rules. Imagine the following transactions:</p> <ol> <li>Transaction 1: Frozen pizza, cola, milk </li> <li>Transaction 2: Milk, potato chips </li> <li>Transaction 3: Cola, frozen pizza </li> <li>Transaction 4: Milk, pretzels </li> <li>Transaction 5: Cola, pretzels </li> </ol> <p>The Association Rules measures include:</p> <ul> <li><b>Support</b>, or frequency, means the number of cases that contain the targeted item or combination of items. Therefore, support is a measure for the itemsets.</li> <li><b>Probability</b>, also known as <b>confidence</b>, is a measure for the rules. The probability of an association rule is the support for the combination divided by the support for the condition. For example, the rule "If a customer purchases cola, then they will purchase potato chips" has a probability of 33%. The support for the combination (potato chips + cola) is 20%, occurring in one of each five transactions. However, the support for the condition (cola) is 60%, occurring in three out of each five transactions. This gives a confidence of 0.2 / 0.6 = 0.33 or 33%. </li> <li><b>Importance</b> is a measure for both, itemsets and rules. When importance is calculated for an itemset, then when importance equals one, the items in the itemset are independent. If importance is greater than one, then the items are positively correlated. If importance is lower than one, then the items are negatively correlated. When importance is calculated for a rule “If {A} then {B}, then the value zero means there is no association between the items. Positive importance means that the probability for the item {B} goes up when the item {A} is in the basket, and negative importance of the rule means that the probability for the item {B} goes down when the tem {A} is in the basket.</li> </ul> <h3>Common Business Use Cases</h3> <p>You use the Association Rules algorithm for market basket analyses. You can identify cross-selling opportunities or arrange attractive packages. This is the most popular algorithm used in web sales.</p> <p>You can even include additional input variables and predict purchases over classes of input variables.</p><img src="http://sqlblog.com/aggbug.aspx?PostID=58163" width="1" height="1">business intelligencedata analysiscontinuous learningData Modelingmachine learningmarket basket analysisassociation rules