THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Jorg Klein

Microsoft Business Intelligence consultant from the Netherlands

  • SharePoint Server 2016 IT Preview from a BI point of view

    SharePoint Online and Power BI have a lot of attention and focus lately, but what about the on-premises version of SharePoint that’s used in almost all serious BI solutions today? Well SharePoint Server 2016 IT Preview has just been released by Microsoft, by many stated as the last on-premises version of SharePoint ever. In this blog post I will focus on the BI features of SharePoint Server 2016.

    So what kind of BI features are improved and what’s new? It’s better start with the question: which BI features will remain available in SharePoint 2016, because a lot of BI features will be deprecated or removed.

    An overview of the main SharePoint BI capabilities and their availability in SharePoint 2016:

    · Excel Services

    Excel Services will no longer be available, it will completely move to Excel Online in Office Online.

    · PowerPivot for SharePoint and Power View add-ins

    These BI features are not available in the current release of the SharePoint 2016 IT Preview. Microsoft states they will implement them later this year, but when and in what form is uncertain.

    · Reporting Services Integrated Mode

    There is no news about Reporting Services Integrated Mode, but as I wrote in my last blog post, Reporting Services gets a big update in SQL Server 2016. One of the new possibilities will be the integration of Reporting Services and Power BI. So why still integrate with SharePoint?

    With the release of the new Power BI Desktop (former Power BI designer) a lot of Power Pivot functionality move to Power BI. For example, you can now import Excel Power BI artifacts (Data Model, Queries, Power View) into a Power BI Desktop file. This new functionality, together with the Reporting Services/Power BI integration that’s coming and the news that Excel Services will no longer be available in SharePoint 2016, raises the question if BI solutions will continue to need SharePoint in the future. All functionality seems to be moving to Power BI!

    On the other hand, Microsoft recently announced a collaboration with Pyramid Analytics which will offer the possibility to deploy a Power BI Desktop file to an on-premises Pyramid Analytics server, bringing Power BI on-premises. Pyramid Analytics offered SharePoint integration in the past, so maybe they will integrate again with SharePoint 2016, which results in a Power BI and SharePoint 2016 integration, making SharePoint more important than ever for on-premises solutions.

    It’s clear there is a lot of uncertainty about the future of SharePoint as part of the Microsoft BI platform! To be continued…

  • BI on your terms with SQL Server 2016

    The last few years Microsoft's strategy was all about cloud first (or cloud only?), releasing new BI products and updates to existing products to the cloud in high pace without almost any investments in on-premises BI. In 2015 Microsoft seems to change its course, they now aim more on the enabling of hybrid scenarios, investing a lot in both cloud (Power BI/Azure) and on-premises with SQL Server 2016.
    Microsoft’s message regarding BI for 2015/2016 is:  “BI on your terms”.

    BI on your terms means leveraging up-to-date possibilities for one or a combination (hybrid) of the following architectures:

    • Cloud with Azure and Power BI

    • On-Premises with SQL Server 2016

    • Server driven or Self-Service

    To be able to offer quality hybrid architectures Microsoft invests a lot in the on-premises BI suite with SQL Server 2016 and they have announced to keep investing in it the coming years. So not only cloud first like we have seen in previous years, but more on hybrid possibilities, and if you desire on-premises only.

    For the first time in many years an exciting version of SQL Server is coming in terms of BI. The main topics are:

    • Hybrid BI (Cloud/On-Premises)

    • Modern Reports

    • Enhanced Analysis

    • Mobile BI


    Below is an overview of the new BI related features per SQL Server 2016 service or product. As the length of this list shows, SQL Server 2016 will be a massive BI version!!

    Analysis Services Tabular

    • Enhanced modeling capabilities in the semantic layer

      • Many-to-many relationships

      • BI Directional cross filtering. This means you can not only filter on the 1 side of a 1 to many relationship in your tabular model, but also on the many side. For example, two connected tables, Sales à Product:

        • Product: product, product category

        • Sales: sales date, connection to product table

          Now select products sold filtering on sales date(many side) while also filtering on product category (1 side). This is not possible in today’s version of SSAS tabular.

    • Time intelligence

      • Date/time columns are automatically converted to rich date/time tables starting from the column’s MIN date till the MAX date found

    • New DAX functions

      • A lot of new functions that at the moment require quite complex formulas like present time, date difference, percentile, product, geomean, median, etc.

    • Performance improvements

      • For end users

        • Query engine optimized

      • For developers

        • Metadata operations; modeling related operations are much faster

      • For data processing

        • Parallel partition processing

    • Expose on-premises tabular models in the cloud (hybrid) à Power BI feature, possible already today with SQL Server 2012.


    Analysis Services Dimensional

    • Netezza as a Data Source (Netezza Data Warehouse | IBM - NDM Technologies)

    • Performance improvements

      • Unnatural hierarchies

      • Distinct counts

      • Other performance improvements in areas where multidimensional is not performant at the moment

    • DBCC (DataBase Check Consistency) support. Checks the logical and physical integrity of objects in the specified database.

    • Expose on-premises multidimensional cubes in the cloud with Power BI (hybrid)


    SQL Server Database Engine

    • Integration of R analytical engine, predictive analytic capabilities via T-SQL queries

    • PolyBase available without the need of PDW, makes it possible to query both structured relational SQL, and unstructured Hadoop data through T-SQL statements

    • Data encryption for stored data and data in motion

    • Row-level security

    • Updates to the in-memory OLTP engine, for example updateable in-memory nonclustered columnstore indexes

    • Parsing and storing native JSON data

    • XEvents-based monitoring in Management Studio


    Reporting Services

    • New look and feel and possibility to apply themes and branding using CSS

    • New visualizations, chart types like tree maps and sun bursts

    • Improved flexible parameter panel with support for:

      • Autocomplete

      • Search

      • Hierarchical tree display

    • Runs in all modern browsers on both desktops as tablets (any device)

    • Integration of R analytical engine

    • Power Query as a data source

    • Pin on-premises SSRS reports to Power BI Dashboards (hybrid)


    Integration Services

    • High Availability support

    • Power Query integration

    • Azure Data Factory integration (hybrid)

      • Execute on-premises SSIS packages from Azure Data Factory

      • Azure Data Factory data flow task

      • Azure storage connector

      • Azure commandlets

    • OData 4.0 support

    • Hadoop File System (HDFS) support

    • JSON support

    • New Oracle/Teradata connector (4.0)

    • Incremental deployment options

    • Custom logging levels

    • SSIS package templates to reuse ETL code


    Mobile BI

    • In the cloud with Power BI

      • Power BI App for Windows Phone (coming soon) and iOS

    • On-premises with Datazen Server

      • Now available for free for SQL Enterprise Edition customers (2008 or later)

      • All major platforms: Windows Phone, Android, iOS

      • Beautiful interface and data visualizations

      • Optimizable for Phone, Tablet and Laptop


    SharePoint vNext integration

    • Edit Mode of PowerPivot Excel workbooks in browser

    • Support for Excel vNext (Office 2016) DAX functions


    Master Data Services

    • Improved performance for large models

    • Row-level compression per entity

    • Improved user interface

    • Configurable retention settings

    • Enhanced security possibilities for read, write, delete and create operations and support for multiple system administrators with specific permissions

    • Excel Add-in is 15 times faster and is updated to support bulk entity based staging operation


    Visual Studio

    • Database and BI project types merged into one Visual Studio

    • New scripting language for tabular models. Currently tabular models are wrapped into multidimensional constructs and when you deploy it will be reverse engineered to the tabular model. The new native language for tabular will be easy to understand, modify and deploy.

    • SSIS designer supports previous versions of SQL Server


    Of course there is still also a lot of exiting news coming from the cloud side of Microsoft BI, for example the Azure Data Lake is announced, following the principles of my blogpost about the relational data lake. You can expect a post about the Azure Data Lake on this blog soon!


    P.S. Don’t forget to suggest and vote for feature requests for SQL Server yourself at:  http://aka.ms/SqlBiUserVoice

     

  • Power BI Analysis Services Connector Security

    The Power BI Analysis Services Connector can be used to connect from the Power BI service to your on premises tabular Analysis Services models.

    In this blogpost I will look into the security and authentication possibilities that the SSAS Connector offers. Dynamic row-level security based on the Power BI user name to an on premises SSAS Tabular model is possible, I will show how you can set this up and how it works in the background.

    If you want to know how to install and configure the SSAS Connector follow this excellent support guide.

    The SSAS Connector has some similarities with the BI Semantic Model Connection that’s available for SharePoint. This connection can connect either with Kerberos or by using the EffectiveUserName property, which matches the SSAS Connector:
    - A connection to a tabular model is made with stored credentials that require administrator permissions on the SSAS server.
    - The EffectiveUserName parameter is send to SSAS to impersonate the current user. Only SSAS administrators have permission to connect using EffectiveUserName.

    To investigate exactly how the connector works I have created a tabular model with a very simple underlying database model. The Sales table will be secured based on dynamic row-level security on the content of the EffectiveUserName column in the Department table.

    DataModelSQL


    The tabular model contains the dataset shown in the Excel pivot below. In an on premises situation my EffectiveUserName would be MACAW\jorgk, but because I login to Power BI with
    jorgk[at]macaw.nl I have created entries for both. Let’s see which one will be used later on.

    DataModel 


    I have created a SSAS Security Role with read permissions on my Active Directory account:

    image

    Role1

    Row-level security has been set on the EffectiveUserName column of the Department table:
    Role2


    The SSAS Connector was installed on the server by my colleague Dave, who has administrator permissions on the SSAS Server (inherited from the local Administrators group). For purpose of the demo we made sure my domain account was not in the local admin group and not in the SSAS Server admin group. Dave’s credentials will be the stored credentials that are used to connect to the tabular model from the SSAS Connector, passing the EffectiveUserName property just like a BISM connection is able to do as I explained before.

    Now I logged in to Power BI, created a report and the security is working, my report is filtered and shows only my data:

    14_securedPowerBi


    During the creation of my report Dave ran a Profiler trace to see what’s happening in the background. As we expected my connection came in under the account of Dave (blue) but with my account as EffectiveUserName (red):

    Profiler

    It’s interesting to see my EffectiveUserName is JorgK[at]macaw.nl but the Power BI report I’ve created shows the data of MACAW\jorgk! Why didn’t it show the data of the row that equals exactly to my EffectiveUsername JorgK[at]macaw.nl? Well that’s because SSAS does not authorize the user based on the textual value of the EffectiveUserName property. Instead it calls to AD to check if the Power BI username (based on the mail address) exists in AD, if this is the case the AD Account is returned and the user will be authorized with it.

    To wrap up this entire process I’ve created the following diagram:

    SSAS Connector Overview 

    1. From the Power BI tenant I connect to the SSAS Connector using my Power BI Account.
    2. The SSAS Connector connects to the tabular model using the stored credentials, in this case of my colleague Dave who has administrator permissions on the model and therefore the permissions to pass on my Power BI account as EffectiveUserName.
    3. The EffectiveUserName is checked in Active Directory. This will only be possible if DirSync has been set up, otherwise my Power BI account that comes from Windows Azure Active Directory will not be mapped to the on premises Active Directory.
    4. After Active Directory authorizes me, my on premises AD Account is sent to the tabular model.
    5. Row-level security is applied as configured in the SSAS Role.

     

    Next thing to do was of course to share the Power BI report in a dashboard to another colleague. As a test I shared it with my colleague Martijn (blue) who did not have any permissions on the tabular model at al. Unfortunately it seems the credentials of the user who shares the dashboard are stored in that dashboard/report and will be used to make the connection to the underlying SSAS data source. Martijn’s report was identical to mine and the SQL Profiler showed us that indeed the connection was again made with the admin account of Dave, and my account passed in the EffectiveUserName (red).

    MartijnAsJorgAfterSharing


    Conclusion

    It’s great to see we can finally connect to on premises models from Power BI using row-level security, which was not possible with the Data Management Gateway. This makes a hybrid Cloud/On Premises architecture a feasible option to implement. Unfortunately we are not there yet because it isn’t possible to impersonate a user after sharing a dashboard which is clearly not the functionality that’s desired. Let’s hope Microsoft will fix this in the coming months! Until then, be sure to advice your customers about this sharing security issue.

  • Relational Data Lake

    What is a Data Lake?
    Pentaho CTO James Dixon is credited with coining the term "Data Lake". As he describes it in his blog entry, "If you think of a Data Mart as a store of bottled water – cleansed and packaged and structured for easy consumption – the Data Lake is a large body of water in a more natural state. The contents of the Data Lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples."

    These days, demands for BI data stores are changing. BI data consumers not only require cleansed and nicely modeled data, updated on a daily basis, but also raw, uncleansed and unmodeled data which is available near real-time. With new and much more powerful tooling like Power BI, users can shape and cleanse data in a way that fits their personal needs without the help of the IT department. This calls for a different approach when it comes to offering data to these users.

    BI data consumers also demand a very short time-to-market of new data, they don’t want to wait for a few months until data is made available by a BI team, they want it today. The raw uncleansed form of data in a Data Lake can be loaded very quickly because it’s suitable for generated data loading technologies and replication, which makes this short time-to-market possible. Once users have discovered the data and have acquired enough insights that they want to share with the entire organization in a conformed way, the data can be brought to traditional Data Warehouses and cubes in a predictable manner.

    Furthermore there is rise in the presence of unstructured and or semi-structured data and the need to have “big data” available for adhoc analyses. To store and analyze these forms of data new technologies and data structures are required.

    When the Data Lake comes in place a lot of data streams from sources into the “lake” without knowing up front if it is eligible for answering business questions. The data can’t be modeled yet, because it’s not clear how it will be used later on. Data consumers will get the possibility to discover data and find answers before they are even defined. This differs fundamentally from the concept of a Data Warehouse in which the data is delivered through predefined data structures, based on relevant business cases and questions.

    Technology
    From a technology view, a Data Lake is a repository which offers storage for large quantities and varieties of both unstructured, semi-structured and structured data derived from all possible sources. It can be formed by multiple underlying databases which store these different structured forms of data in both SQL and NoSQL technologies.
    20141217_JK_Technologies

    For the semi-structured/unstructured side of data which is used for big data analytics, Data Lakes based on Hadoop and other NoSQL technologies are common. For the semi-structured/structured data, SQL technologies are the way to go.

    In this blog post I will describe the semi-structured/structured, relational appearance of the Data Lake in the form of a SQL Server database: The Relational Data Lake.
    RelationalDataLake2

    Extract Load (Transform)
    Data in a Data Lake is in raw form. Transformations will not be performed during loading and relationships and constraints between tables will not be created which is the default for transactional replication and keeps the loading process as lean and fast as possible. Because of the lack of transformations, movement of the data follows the Extract-Load-(Transform) (EL(T)) pattern instead of the traditional E-T-L. This pattern makes loading of data to the Data Lake easier, faster and much more suitable to perform using replication technologies or generated SSIS processes, for example with BIML. This creates a very attractive time-to-market for data which is added to the Data Lake. Latency of data is as low as possible, preferable data is loaded in near real-time: data should stream into the lake continuously.

    Transformations take place after the data is loaded into the Data Lake, where applicable. Cosmetic transformations like translations from technical object and column names to meaningful descriptions which end users understand or other lightweight transformations can be performed in new structures (like SQL views) that are created inside the Data Lake.

    Unlike Data Marts and Data Warehouses, which are optimized for data analysis by storing only the required attributes and sometimes dropping data below the required level of aggregation, a Data Lake always retains all attributes and (if possible) all records. This way it will be future proof for solutions that will require this data in a later moment in time or for users that will discover the data.

    Accessing data
    Data is made accessible through structures which can either be accessed directly, or indirectly through the exposure as OData Feeds. These structures are secured and are the only objects end users or other processes have access to. The feeds can be accessed with any tool or technology that is best suited to the task at any moment in time, for example using Power BI tooling like Excel PowerPivot/PowerQuery.

    We normally create SQL Views in which security rules and required transformation are applied.

    The Data Lake also acts as a hub for other repositories and solutions like Data Warehouses and Operational Cubes.

    Master Data
    Success of the Data Lake depends on good master data. When end users discover new raw data from the Data Lake they need to be able to combine it with high quality master data to get proper insights. Therefore a master data hub is a must have when a Data Lake is created. This hub should just be a database with master data structures in it, master data management on this data is preferable but not required. The master data hub should be a standalone solution, independent from the other BI solutions, as master data isn’t part of these solutions but is only used as data source. It should be sourced independently too, preferable using master data tooling or using tools like SSIS. Just like with data from the Data Lake, master data should also only be accessed through structures which can also be exposed as OData Feeds.

    Next to the purpose of combining master data with data from the Data Lake, the master data can be used as source for other BI solutions like Data Warehouses. In there, the master data structures are often used as Data Warehouse Dimensions. To prevent the unnecessary duplicate loading of master data in the Data Warehouse that already exists in the master data hub, it can be a good choice to leave the master data out of the Data Warehouse Dimensions. Only the business keys are stored which can be used to retrieve the data from the master data hub when required. This way the Data Warehouse remains slim and fast to load and master data is stored in a single centralized data store.

    Architecture
    The entire Data Lake architecture with all the described components are fit in the model below. From bottom to top the highlights are:

    • Extract/Load data from the sources to the Data Lake, preferably in near real-time.
    • The Data Lake can consist of multiple SQL (and NoSQL) databases.
    • Transformations and authorizations are handled in views.
    • The Data Lake acts as hub for other BI solutions like Data Warehouses and Cubes.
    • The master data hub is in the center of the model and in the center of the entire architecture. It’s loaded as a standalone solution and isn’t part of any of the other BI solutions.
    • Traditional BI will continue to exist and continue to be just as important as it has always been. It will be sourced from the Data Warehouses and cubes (and master data hub).
    • The Discovery Platform with its new Power BI tooling is the place where “various users of the lake can come to examine, dive in, or take samples.” These samples can be combined with the data from the master data hub.

    20141211JK_Data Lake BI Architecture

    Data Lake Challenges
    Setting up a Data Lake comes with many challenges, especially on the aspect of data governance. For example it’s easy to create any view in the Data Lake and lose control on who gets access to what data. From a business perspective it can be very difficult to deliver the master data structures that are so important for the success of the Data Lake. And from a user perspective wrong conclusions can be made by users who get insights from the raw data, therefore the Data Warehouse should still be offered as a clean trusted data structure for decision makers and a data source for conformed reports and dashboards.

    Summary
    The Data Lake can be a very valuable data store that complements the traditional Data Warehouses and Cubes that will stay as important as they are now for many years to come. But considering the increased amount and variety of data, the more powerful self-service ETL and data modeling tooling which appear and the shortened required time-to-market of near real-time data from source up and to the user, the Data Lake offers a future proof data store and hub that enables the answering of yet undefined questions and gives users personal data discovery and shaping possibilities.

    Thanks go to my Macaw colleague Martijn Muilwijk for brainstorming on this subject and reviewing this blog post.

  • Implement SSAS MD cell security using dimension security with blazing performance

    SQL Server Analysis Services (SSAS) Multidimensional (MD) is a great product, and in my opinion it’s still the only real option to go for when building complex enterprise BI solutions. It’s still very fast when implemented correctly and it’s mature and therefore very stable.

     

    The only real downside is cell security, which, in my opinion, is useless. It makes performance drop dramatically because it evaluates security cell-by-cell. I have seen reports that run in a few seconds without cell security taking 20 minutes with cell security implemented! Try to explain that to your customer.. It’s obvious that you can’t.

     

    Quite some workarounds exist for quite a while:

     

    • Make your measures invisible and create MDX calculations that either show or hide the measure value based on a dummy dimension. Drawbacks are the measure is hidden and not really secured and you need to create dummy dimensions/attributes and maintain them. http://www.sqljason.com/2010/04/implementing-measure-security-in-ssas.html
    • SSAS offers us the “Measures Dimension” which give you the possibility to secure measures like dimension members. Great, this is just what we need, but implementing it and creating MDX calculations based on the secured measures will give you errors for SSAS roles that do not have access to these measures. This is caused by the fact that the MDX script is executed after the security has been implemented. So if a user that doesn’t have access to a measure (set by dimension security) tries to connect to the cube while the MDX script contains a reference to this secured measure, this raises an error.
    • Create a hidden dimension on which you apply dimension security like described here: http://cwebbbi.wordpress.com/2011/12/22/replacing-cell-security-with-dimension-security/ Unfortunately this doesn’t work for measures.

     

    For a few years I’ve always implemented a workaround that uses the measures dimension. To prevent errors in the MDX script as described by option B above, I’ve added IsError() checks around all my calculation parts that could raise an error. For example, a simple calculation like Quantity * Price, where the price measure could be secured, looks like this:

     

    CREATE MEMBER CURRENTCUBE .[Measures].[Amount] AS
     
    IIf (
        IsError ( [Measures].[Price] ),
       
    NULL,
        [Measures].[Quantity] * [Measures].[Price]
      ),
     
    NON_EMPTY_BEHAVIOR =
      {
       
    IIf (
          IsError ( [Measures].[Price] ),
         
    NULL,
          [Measures].[Price]
        )
      } ;

     

    This calculation would not raise an error but NULL if a user doesn’t have permission to the Price measure. Quite straightforward, only trick here is the IsError() check in the NON_EMPTY_BEHAVIOR, of course you would have to do this in the script view of your SSAS calculations tab. In the form view this will look a bit strange but it doesn’t cause any errors: 

     

    clip_image001 

     

    Using this approach you are able to implement measure security using the dimension security while you are still able to create MDX calculations with NON_EMPTY_BEHAVIOR based on these secured measures. This made the report I talked about before to run in 5 seconds instead of the 20 minutes. I’ve used this approach for quite some years now, and it has always been sufficient.

     

    At the moment I’m working on a very big enterprise cube that contains almost 100 dimensions, 25 measure groups and millions of records. I’ve also implemented measure security like this and after a while I noticed some really heavy queries (reports) took quite some time, for example 30 seconds or 1 minute. Reason enough to re-think this approach. When I was at the SQL Server Days in Belgium last year, I’ve discussed this approach with Chris Webb. Of course he could understand what I was doing here and we agreed to email about this approach later on. Chris emailed me about a blogpost of him from some time ago: http://cwebbbi.wordpress.com/2008/01/22/dimension-security-tips-tricks-and-problems/. In this post Chris explains how to create secured calculations while using dimension securty by using named sets and scope statements:

     

    CREATE MEMBER CURRENTCUBE .Measures.Test AS
      NULL
    ;

     

    CREATE SET myset1 AS
     
    IIf (
        IsError (
    StrToMember ( "Measures.[Internet Sales Amount]" ) ),
        { },
        { Measures.Test }
      ) ;

     

    SCOPE ( myset1 ) ;
    This = Measures.[Internet Sales Amount] ;
    END SCOPE ;

     

    CREATE SET myset2 AS
     
    IIf (
        IsError (
    StrToMember ( "Measures.[Internet Sales Amount]" ) ),
        { Measures.[Internet Tax Amount] },
        { Measures.[Internet Sales Amount], Measures.[Internet Tax Amount] }
      ) ;

     

    SCOPE ( myset2 ) ;
    This = Measures.CurrentMember * 2 ;
    END SCOPE ;

     

    I did not know about this approach, so I tried it out. To be able to get good test results I used a very heavy query that hopefully nobody would ever try: a calculation over all sales data (millions of records) against all customers, all products and all time.  Unfortunately, the results were not very pleasing yet:

     

    1. Calculation with IIF/IsError checks in both the calculation part and the non empty part: ran in 50 seconds.

     

    2. Calculation with named set/scope approach: ran longer than 5 minutes, after that I stopped the query.

     

    So Chris and I emailed again and I was wondering if the lack of NON_EMPTY_BEHAVIOR (NEB) could be the cause of the slow performance of the named set/scope approach. Chris said that since SSAS 2008 the usage of NEB was not neccesary anymore, something I’ve heard about before. I always had the idea adding NEB did make a difference in some cases so I never stoped using it. So I kinda merged Chris’ and mine approach, adding the NEB using an IsError() function and I replaced the StrToMember check with a direct reference to the measure in the named sets because in my opinion that part was not neccesary:     

    IsError ( StrToMember ( "Measures.[Internet Sales Amount]" ) ) à became: Measures.[Internet Sales Amount].

     

    The result was almost unbelievable, the query now took not 5 minutes, not 50 seconds but only 2 seconds!! So adding NEB still can make a huge difference sometimes! After finding out about this, I contacted Chris again and he was also very surprised by the result.

     

    I’ve created an MDX calculation template with some explanation added as comments. The big performance gain is achieved because the IsError() check doesn’t need to be executed every time the calculation is executed because it’s already executed when the static named set is created (one single time), afterwards this value just seems to be reused. I think the same applies for the IsError() check in the NEB, it seems it’s only executed once and is reused. Anyway, the result is, your MDX calculations are just as fast with security applied as without. Something that was not possible for a long time for me and I guess everybody else too. Just use the template below as a standard for creating your calculations and benefit from this solution:

     

    --Calculation Template:

    ----------------------------------------

       ----------1: CHECKED CALCULATION: The secured "end product" used in other calculations----------

    CREATE MEMBER CURRENTCUBE.[Measures].[Quantity_CHECKED] AS

      NULL

      , VISIBLE = 0 ;

     

       ----------2: CHECK NAMED SET: Actual permission check performed here----------  

    CREATE SET [Quantity_CHECK] AS

      IIf (

       IsError ( [Measures].[Quantity] )

       , { }

       , { [Measures].[Quantity_CHECKED] }

      ) ;

     

       ----------3: SCOPE: Assignment of either the measure or "nothing" to CHECKED calculation (1)----------  

    SCOPE ( [Quantity_CHECK] ) ;

        This = [Measures].[Quantity] ;

    END SCOPE ;

     

       ----------4: Second secure calculation created here----------  

    CREATE MEMBER CURRENTCUBE.[Measures].[Price_CHECKED] AS

      NULL

      , VISIBLE = 0 ;

     

    CREATE SET [Price_CHECK] AS 

      IIf (

       IsError ( [Measures].[Price] )

       , { }

       , { Measures.[Price_CHECKED] }

      ) ;

    SCOPE ( [Price_CHECK] ) ;

        This = [Measures].[Price] ;

    END SCOPE ;

     

       ----------5: Calculation based on the secure calculations.

       ----------Non_Empty_Behavior set with IIf/IsError on measure. Using calculations inside NEB is not possible----------

    CREATE MEMBER CURRENTCUBE.[Measures].[Sales Amount] AS

      [Measures].[Quantity_CHECKED] * [Measures].[Price_CHECKED]

      , NON_EMPTY_BEHAVIOR =

      {

       IIf (

        IsError ( [Measures].[Price] )

        , NULL

        , { [Measures].[Price] }

       )

      }

      , VISIBLE = 1 ;

       -------------------------------------------

     

    Your calculations tab in SSAS will look like:
    clip_image003

     

     

    When I was investigating the inner working of the first part of the script I’ve added some comments to make it easy for myself to remember. I guess it’s also useful for anyone that want to know what’s really happening here:  

    CREATE MEMBER CURRENTCUBE.[Measures].[Quantity_CHECKED] --> Calculation is initially NULL and will be filled based on user rights later on.

    AS NULL,

    VISIBLE = 0; --> Make it invisible as this is a pure technical calculation that should not be used by end users. End users can use the measure this calculation is based on, if they have permission (set by dimension security on the Measures dimension).

     

    CREATE SET [Quantity_CHECK] AS --> Named Set will be used to perform the actual check for user rights, as this is a static named set this check will only be executed one time, at initialization, and will be reused.  

    IIf(IsError([Measures].[Quantity]) --> An IsError on the measure that needs to be checked will raise an error if a particular user has no rights on it because in that case the measure will simply not exist.

    , {} --> If an error was raised by the IsError function set the value of this set to nothing: {}

    , {[Measures].[Quantity_CHECKED]}); --> If no error was raised the user has rights to access the measure, in this case set the value of the set to the required CHECKED calculation created in the previous step.

    SCOPE([Quantity_CHECK]);  --> If in the Scope of the CHECK calculation, either the CHECK calculation can be {} (Nothing) or it can be the CHECKED calculation (based on user rights).  

    This=[Measures].[Quantity]; --> Assign the measure to This. This can be the CHECK calculation if the user has rights which will pass the measure through to the initial CHECKED calculation or it can be {}/Nothing which will pass the value of the measure to nothing instead of to the CHECKED calculation.

    END SCOPE-- So this Scope function either passes the measure to the CHECKED calculation or to nothing (the empty set {})

  • SSIS Denali CTP3 – What’s new?

    Last week Microsoft released CTP3 of SQL Server Integration Services (SSIS), code name: Denali. In this blog post I will look into the new key features and some of the minor improvements in SSIS Denali.

    1. Development

    Shared Connection Managers
    Connection Managers can now be shared on SSIS project level.
    You can create them in the solution explorer, in the folder Connection Managers:
    clip_image001

    Once created, they will appear automatically in all your SSIS packages. The names are in bold so you can recognize them between your package-level connection managers:
    clip_image002

    You can also create Shared Cache Connection Managers for your cached lookups. This will be very useful for lookups that are performed multiple times, for example when you look up dimension tables multiple times from your fact table ETL. You can just convert the local cache connection manager by right clicking it:
    clip_image003

    What happened to the Shared Data Sources we knew from prior SSIS versions? Well they are no more. Shared Data Sources only lived at design time and not at runtime so they were not really useful anyway. Shared Connection Managers do live at runtime and offer even more possibilities as I will show later on, so they replace the old Shared Data Sources.

    Data Flow - Column mappings
    SSIS always mapped columns from source to transformations or destinations with the help of lineage ids. Every column had a unique metadata ID that was known by all components in the data flow. If something changed in the source this would break the lineage ids and raised error messages like: The external metadata column collection is out of synchronization with the data source columns.
    To fix this error you would re-map all broken lineage ids with the “Restore Invalid Column References Editor”.
    In Denali lineage-ids are no longer used. Mappings are done on column names, which is great because you can now use auto map on column names and even copy/paste pieces of another data flow and connect them by mapping the corresponding column names.

    Data Flow - Flexible order of authoring
    This improvement helps you edit data flow components even when they don’t have an input attached. Theoretically you can build your data flow backwards; start with the destination and track back to the source.

    Data Flow - Groupings
    You can now group data flow components. Select the components you wish to group, right click and select Group:
    clip_image004 


    The result is some sort of a data flow sequence container:
    clip_image005

    By clicking the arrow it will collapse:
    clip_image006

    Data flow groups are 100% eye candy; you can’t set any properties on them.

    Data Flow - Data Quality Services Cleansing transformation
    With this transformation you can apply data quality rules in the data flow. This is done by using a Knowledge Base which can be created by yourself or downloaded from the Windows Azure Marketplace. For example you could apply a rule that checks if a given postal code column is valid for a particular town column in your record.

    Data Flow - Data Tap
    In Denali, we have the possibility to attach a “tap” at a data flow path (arrow). This tap captures all data coming through and dumps it in CSV files.

    2. Configuration

    SSIS Denali is fundamentally different to its predecessors when it comes to the concept and usage of configurations. SSIS package configurations are obsolete (they will still be available if you really want to use them ;-)) and parameters and environments are in the new kids in town.

    Parameters
    SSIS Parameters look a lot like SSIS variables but there are some differences. There are two types of these parameters:
    1. Package Parameters:
    Look at Package parameters as C# parameters, which are passed as input to a C# function(=your package). You can set them when executing (call) a package and the lifecycle of the parameters are limited to the SSIS package itself.

    The difference with SSIS variables? You can set the parameters while executing a package with SSMS or an Execute Package Task.

    Define your Package Parameters at the Package Parameters tab:
    clip_image007

    2. Project Parameters:
    Identical to Package Parameters, except for the scope, these Parameters exist on project level and can be referenced throughout all SSIS packages in your project. Package Parameters only live inside a single SSIS package.

    You can define Project Parameters in the solution explorer within your SSIS project:
    clip_image008

    Both Package and Project Parameters can be referenced from your SSIS packages, you recognize them by their prefix, $Package or $Project:
    clip_image009

    Setting parameters in a Execute Package Task is achieved by the new Parameter bindings tab:
    clip_image010

    It’s also possible to parameterize SSIS tasks on the Control Flow by right clicking them and choose Parameterize:
    clip_image011

    Loads of capabilities here! I now realize that I have created a workaround for Package Parameters with my SSIS Package design pattern for loading a data warehouse where I (mis)used Package Configurations as Package Parameters. Creating a new package design pattern for Denali definitely goes on my TODO list!

    Environments
    Environments are a collection of SSIS package settings that can be define on the SSIS Server. At runtime, the environment will override these settings in the SSIS packages. You can create multiple environments and when you run your SSIS packages you can select which environment it should use. It’s also possible to let multiple SSIS projects run under one environment, so flexibility all around the clock.

    To make you understand the principle of Environments right away I have created a simple example that you will commonly use: Create two Environments, one with development settings and one with production settings.

    I have deployed a Test SSIS project to SSMS which contains one SSIS package with one Shared Connection Manager. Notice the new Integration Services folder structure:
    clip_image012

    Next right click Environments and choose Create Environment:
    clip_image013

    Type Development as name for the Environment and click OK:
    clip_image014

    Now double click the Development Environment:
    clip_image015

    Click on the Variables tab and create an Environment Variable which will overwrite the Shared Connection Manager. Type in the connection string for the development server as the Value and click OK:
    clip_image017

    Next create another Environment with the name Production and also create an Environment Variable with the name SCM_EnvironmentVar. The only difference between these two variables should be the value of the variable; it contains the production server connection string instead of the development value.
    You now have two Environments with one Environment Variable for the Shared Connection Manager each:
    clip_image018

    We now need to reference the Development and Production Environments in the Test project so they can be used. Right click the Test project and choose Configure:
    clip_image019

    Go to the references page and click Add..
    clip_image021

    Add both Environments:
    clip_image023

    Both environment are now referenced from the Test SSIS project and are available. Let’s configure the Shared Connection Manager so it will use the value specified in one of the environments. Click on the Parameters page, the Connection Managers tab and the next to the ConnectionString property of the Shared Connection Manager:
    clip_image025

    Now select Use Environment Variable and select SCM_EnvironmentVar. Click OK:
    clip_image026

    The name of our Environment Variable is shown in the Value box of the ConnectionString property:
    clip_image028

    We have now succesfully attached the Environment Variable to override the ConnectionString property with a value from either the Development or the Production Environment. Executing the package under one of both Environments is very easy. To do so, right click the package and choose Run:
    clip_image029

    The Run Package dialog box appears and you instantly see a message that reminds you to assign a value to the ConnectionString property of connection manager SharedConnectionManager:
    clip_image030

    At the bottom of the dialog box you can select the Environment your package should use. Select the .\Development Environment and click OK:
    clip_image031

    In the overview report you see the package has succesfully ran under the Development Environment:
    clip_image032

    If you change the package to use the Production Environment and you run the package again, you’ll get the following execution information:
    clip_image033

    Pretty straightforward and easy to use. The big difference? All the configuration work has been done in SQL Server Management Studio! Exactly the place where it should be done. In the current SSIS versions most of this work was done within Visual Studio and the configuration was done by editing tables. So unfriendly! Big improvement, great work MS!
    In this example I have set a property of a Shared Connection Manager but of course it is possible to set Package or Project Parameters with Environment Variables.

    3. Deployment

    We have had the option to deploy SSIS packages from Visual Studio for a long time thanks to BIDS Helper. Microsoft finally added this functionality out of the box in Denali. They even did more: you can now also build your projects from within Visual Studio!

    There are two ways to deploy (push) your projects to the new SSIS server, directly from Visual Studio or by using an Integration Services Project Deployment File. They both use the new Integration Services Deployment Wizard.
    Deployments directly from Visual Studio can be done by right clicking your project and then choose Deploy. Now, the Deployment Wizard will pop straight up:
    clip_image034

    The other way is building your project first. This was already possible in SSIS version(s) prior to Denali but did not really have useful meanings.

    In Denali this is different. Building your project will generate an .ispac file which contains your entire project and this can be used by the Deployment Wizard. Again a very nice new feature.

    Handing over a new release of a SSIS solution to a customer’s IT department for deployment can now be done in a very nice, professional, manner. Ispac files can also be opened by Visual Studio to import your SSIS project.
    Building your project can be done by right clicking your project and choosing Build:
    clip_image035

    The output window displays the following:
    clip_image036

    When looking at the bin\Development folder we see the new Test.aspac file:
    clip_image037

    clip_image038

    Double clicking (or choosing Deploy in Visual Studio) this file will start the new Integration Services Deployment Wizard:

    You can now select the project that you wish to deploy. You can either select to use the .ispac Project deployment file or choose to deploy an existing project located in any Integration Services catalog. The last option is useful when you wish to deploy from, for example, a TEST SSIS solution that is already located in a catalog to an ACCEPTATION environment in another catalog.
    clip_image039

    The project is loaded and validated:
    clip_image040

    Next, you now need to select the destination, which has to be a SSIS Server:
    clip_image041

    Review your selections and click on Deploy to start the actual deployment:
    clip_image042

    The results show a successful deployment. Notice the protection level has been changed. What happened here?
    The wizard clears the protection level, as we know from Visual Studio, to prevent “cannot decrypt password” errors.

    Instead of using protection levels, passwords will be stored in clear text. The entire package (with the passwords in it) will be stored encrypted in the SSIS Server tables you are deploying to.
    clip_image043

    The project has now been deployed to the server:
    clip_image044

    When you right click the Test project and choose for Versions you are able to see the current version of your project:
    clip_image046

    If you deploy a newer version later on, the existing version(s) stay on the server and you can easily role back to a previous version if you’ve made a bad deployment.

    The diagram below shows the entire deployment life cycle (source: TechNet):
    clip_image048

    4. Management

    The new SSIS Server is the central storage and administration point of your SSIS solutions. No longer is this a standalone server that you’ll need to manage. Basically it is a database with Stored Procedures.

    You now have an Integration Services node available in SSMS when you connect to the database engine:
    clip_image049

    Under the Integration Services node you will find your SSISDB catalog which holds all your SSIS solutions with its packages, environments, etc. The physical storage of these objects will be conducted in a SQL Server database with the same name as the catalog:
    clip_image050

    This database also contains all the stored procedures containing all the programming code for the SSIS Server:
    clip_image051

    With the help of these stored procedures you can manage your SSIS Server: e.g. your parameter values, connection managers, and override properties by using environments.
    Next to the configuration functionalities I have discussed earlier you can also implement security to control access to the catalog, both on folder level and package level. When you right click a folder/package in the catalog and choose the properties you’ll get the following window where you can manage security access:
    clip_image052

    Finally you’ll get an out of the box reporting dashboard which is built on the out of the box logging functionality that SSIS offers. On this dashboard you’ll get information about the execution history of your package and its sub packages, view which parameters were used, view specific messages and get a performance report over time.
    All the information is logged automatically if any package runs on the server. The information is very detailed; you can even get the row counts between SSIS data flow transformations!

    When running a package you are able to select a logging level on the Advanced tab:
    clip_image053
    You can choose for:
    - None: turn logging of for performance reasons
    - Basic: error and warning logging
    - Performance: detailed trace information
    - Verbose: diagnostics and fault debugging

    When I run my Test package I’ll get the following execution dashboard. There are some hyperlinks that navigate to more detailed reports.
    clip_image055

    So no need for creating your own logging framework anymore, it’s all out of the box!

  • SSIS - Connect to Oracle on a 64-bit machine (Updated for SSIS 2008 R2)

    We recently had a few customers where a connection to Oracle on a 64 bit machine was necessary. A quick search on the internet showed that this could be a big problem. I found all kind of blog and forum posts of developers complaining about this. A lot of developers will recognize the following error message:

    Test connection failed because of an error in initializing provider. Oracle client and networking components were not found. These components are supplied by Oracle Corporation and are part of the Oracle Version 7.3.3 or later client software installation.
    Provider is unable to function until these components are installed.


    After a lot of searching, trying and debugging I think I found the right way to do it!

    Problems

    Because BIDS is a 32 bit application, as well on 32 as on 64 bit machines, it cannot see the 64 bit driver for Oracle. Because of this, connecting to Oracle from BIDS on a 64 bit machine will never work when you install the 64 bit Oracle client.

    Another problem is the "Microsoft Provider for Oracle", this driver only exists in a 32 bit version and Microsoft has no plans to create a 64 bit one in the near future.

    The last problem I know of is in the Oracle client itself, it seems that a connection will never work with the instant client, so always use the full client.
    There are also a lot of problems with the 10G client, one of it is the fact that this driver can't handle the "(x86)" in the path of SQL Server. So using the 10G client is no option!

    Solution

    • Download the Oracle 11G full client.
    • Install the 32 AND the 64 bit version of the 11G full client (Installation Type: Administrator) and reboot the server afterwards. The 32 bit version is needed for development from BIDS with is 32 bit, the 64 bit version is needed for production with the SQLAgent, which is 64 bit.
    • Configure the Oracle clients (both 32 and 64 bits) by editing  the files tnsnames.ora and sqlnet.ora. Try to do this with an Oracle DBA or, even better, let him/her do this.
    • Use the "Oracle provider for OLE DB" from SSIS, don't use the "Microsoft Provider for Oracle" because a 64 bit version of it does not exist.
    • Schedule your packages with the SQLAgent.

    Background information

    • Visual Studio (BI Dev Studio)is a 32bit application.
    • SQL Server Management Studio is a 32bit application.
    • dtexecui.exe is a 32bit application.
    • dtexec.exe has both 32bit and 64bit versions.
    • There are x64 and x86 versions of the Oracle provider available.
    • SQLAgent is a 64bit process.

    My advice to BI consultants is to get an Oracle DBA or professional for the installation and configuration of the 2 full clients (32 and 64 bit). Tell the DBA to download the biggest client available, this way you are sure that they pick the right one ;-)

    Testing if the clients have been installed and configured in the right way can be done with Windows ODBC Data Source Administrator:
    Start...
    Programs...
    Administrative tools...
    Data Sources (ODBC)


    ADITIONAL STEPS FOR SSIS 2008 R2

    It seems that, unfortunately, some additional steps are necessary for SQL Server 2008 R2 installations:

    1. Open REGEDIT (Start… Run… REGEDIT) on the server and search for the following entry (for the 32 bits driver): HKEY_LOCAL_MACHINE\Software\Microsoft\MSDTC\MTxOCI
    Make sure the following values are entered:

    image

    2. Next, search for (for the 64 bits driver): HKEY_LOCAL_MACHINE\Software\Wow6432Node\Microsoft\MSDTC\MTxOCI
    Make sure the same values as above are entered.

    3. Reboot your server.

  • Replication Services as ETL extraction tool

    In my last blog post I explained the principles of Replication Services and the possibilities it offers in a BI environment. One of the possibilities I described was the use of snapshot replication as an ETL extraction tool:
    “Snapshot Replication can also be useful in BI environments, if you don’t need a near real-time copy of the database, you can choose to use this form of replication. Next to an alternative for Transactional Replication it can be used to stage data so it can be transformed and moved into the data warehousing environment afterwards.
    In many solutions I have seen developers create multiple SSIS packages that simply copies data from one or more source systems to a staging database that figures as source for the ETL process. The creation of these packages takes a lot of (boring) time, while Replication Services can do the same in minutes. It is possible to filter out columns and/or records and it can even apply schema changes automatically so I think it offers enough features here. I don’t know how the performance will be and if it really works as good for this purpose as I expect, but I want to try this out soon!”

    Well I have tried it out and I must say it worked well. I was able to let replication services do work in a fraction of the time it would cost me to do the same in SSIS.
    What I did was the following:

    1. Configure snapshot replication for some Adventure Works tables, this was quite simple and straightforward.
    2. Create an SSIS package that executes the snapshot replication on demand and waits for its completion.
      This is something that you can’t do with out of the box functionality. While configuring the snapshot replication two SQL Agent Jobs are created, one for the creation of the snapshot and one for the distribution of the snapshot. Unfortunately these jobs are  asynchronous which means that if you execute them they immediately report back if the job started successfully or not, they do not wait for completion and report its result afterwards. So I had to create an SSIS package that executes the jobs and waits for their completion before the rest of the ETL process continues.

    Fortunately I was able to create the SSIS package with the desired functionality. I have made a step-by-step guide that will help you configure the snapshot replication and I have uploaded the SSIS package you need to execute it.

    Configure snapshot replication

    0 
    The first step is to create a publication on the database you want to replicate.
    Connect to SQL Server Management Studio and right-click Replication, choose for New.. Publication…

    1 
    The New Publication Wizard appears, click Next

    2
    Choose your “source” database and click Next

    3
    Choose Snapshot publication and click Next

    4 
    You can now select tables and other objects that you want to publish

    5
    Expand Tables and select the tables that are needed in your ETL process

    6
    In the next screen you can add filters on the selected tables which can be very useful. Think about selecting only the last x days of data for example.

    7
    Its possible to filter out rows and/or columns. In this example I did not apply any filters.

    8
    Schedule the Snapshot Agent to run at a desired time, by doing this a SQL Agent Job is created which we need to execute from a SSIS package later on.

    9
    Next you need to set the Security Settings for the Snapshot Agent. Click on the Security Settings button.

    10 
    In this example I ran the Agent under the SQL Server Agent service account. This is not recommended as a security best practice. Fortunately there is an excellent article on TechNet which tells you exactly how to set up the security for replication services. Read it here and make sure you follow the guidelines!

    11 
    On the next screen choose to create the publication at the end of the wizard

    12
    Give the publication a name (SnapshotTest) and complete the wizard

    13 
    The publication is created and the articles (tables in this case) are added


    Now the publication is created successfully its time to create a new subscription for this publication.

    14 
    Expand the Replication folder in SSMS and right click Local Subscriptions, choose New Subscriptions

    15 
    The New Subscription Wizard appears

    16 
    Select the publisher on which you just created your publication and select the database and publication (SnapshotTest)

    17 
    You can now choose where the Distribution Agent should run. If it runs at the distributor (push subscriptions) it causes extra processing overhead. If you use a separate server for your ETL process and databases choose to run each agent at its subscriber (pull subscriptions) to reduce the processing overhead at the distributor.

    18
    Of course we need a database for the subscription and fortunately the Wizard can create it for you. Choose for New database

    19 
    Give the database the desired name, set the desired options and click OK

    20
    You can now add multiple SQL Server Subscribers which is not necessary in this case but can be very useful.

    21 
    You now need to set the security settings for the Distribution Agent. Click on the …. button

    22
    Again, in this example I ran the Agent under the SQL Server Agent service account. Read the security best practices here

    23 
    Click Next

    24 
    Make sure you create a synchronization job schedule again. This job is also necessary in the SSIS package later on.

    25
    Initialize the subscription at first synchronization

    26
    Select the first box to create the subscription when finishing this wizard

    27
    Complete the wizard by clicking Finish

    28
    The subscription will be created

    29
    In SSMS you see a new database is created, the subscriber. There are no tables or other objects in the database available yet because the replication jobs did not ran yet.

    Now expand the SQL Server Agent, go to Jobs and search for the job that creates the snapshot:

     30
    Rename this job to “CreateSnapshot”

    Now search for the job that distributes the snapshot:

     31
    Rename this job to “DistributeSnapshot”

    Create an SSIS package that executes the snapshot replication

    We now need an SSIS package that will take care of the execution of both jobs. The CreateSnapshot job needs to execute and finish before the DistributeSnapshot job runs. After the DistributeSnapshot job has started the package needs to wait until its finished before the package execution finishes.
    The Execute SQL Server Agent Job Task is designed to execute SQL Agent Jobs from SSIS. Unfortunately this SSIS task only executes the job and reports back if the job started succesfully or not, it does not report if the job actually completed with success or failure. This is because these jobs are asynchronous.

    The SSIS package I’ve created does the following:

    1. It runs the CreateSnapshot job
    2. It checks every 5 seconds if the job is completed with a for loop
    3. When the CreateSnapshot job is completed it starts the DistributeSnapshot job
    4. And again it waits until the snapshot is delivered before the package will finish successfully

    33

    Quite simple and the package is ready to use as standalone extract mechanism. After executing the package the replicated tables are added to the subscriber database and are filled with data:

     32

    Download the SSIS package here (SSIS 2008)


    Conclusion

    In this example I only replicated 5 tables, I could create a SSIS package that does the same in approximately the same amount of time. But if I replicated all the 70+ AdventureWorks tables I would save a lot of time and boring work! With replication services you also benefit from the feature that schema changes are applied automatically which means your entire extract phase wont break. Because a snapshot is created using the bcp utility (bulk copy) it’s also quite fast, so the performance will be quite good.

    Disadvantages of using snapshot replication as extraction tool is the limitation on source systems. You can only choose SQL Server or Oracle databases to act as a publisher.

    So if you plan to build an extract phase for your ETL process that will invoke a lot of tables think about replication services, it would save you a lot of time and thanks to the Extract SSIS package I’ve created you can perfectly fit it in your usual SSIS ETL process.

  • Replication Services in a BI environment

    In this blog post I will explain the principles of SQL Server Replication Services without too much detail and I will take a look on the BI capabilities that Replication Services could offer in my opinion.

    SQL Server Replication Services provides tools to copy and distribute database objects from one database system to another and maintain consistency afterwards. These tools basically copy or synchronize data with little or no transformations, they do not offer capabilities to transform data or apply business rules, like ETL tools do.
    The only “transformations” Replication Services offers is to filter records or columns out of your data set. You can achieve this by selecting the desired columns of a table and/or by using WHERE statements like this:
    SELECT <published_columns> FROM [Table] WHERE [DateTime] >= getdate() - 60


    There are three types of replication:

    Transactional Replication

    Transactional replication components and data flow

    This type replicates data on a transactional level. The Log Reader Agent reads directly on the transaction log of the source database (Publisher) and clones the transactions to the Distribution Database (Distributor), this database acts as a queue for the destination database (Subscriber). Next, the Distribution Agent moves the cloned transactions that are stored in the Distribution Database to the Subscriber.
    The Distribution Agent can either run at scheduled intervals or continuously which offers near real-time replication of data!

    So for example when a user executes an UPDATE statement on one or multiple records in the publisher database, this transaction (not the data itself) is copied to the distribution database and is then also executed on the subscriber. When the Distribution Agent is set to run continuously this process runs all the time and transactions on the publisher are replicated in small batches (near real-time), when it runs on scheduled intervals it executes larger batches of transactions, but the idea is the same.


    Snapshot Replication

    Snapshot replication components and data flow
    This type of replication makes an initial copy of database objects that need to be replicated, this includes the schemas and the data itself. All types of replication must start with a snapshot of the database objects from the Publisher to initialize the Subscriber. Transactional replication need an initial snapshot of the replicated publisher tables/objects to run its cloned transactions on and maintain consistency.

    The Snapshot Agent copies the schemas of the tables that will be replicated to files that will be stored in the Snapshot Folder which is a normal folder on the file system. When all the schemas are ready, the data itself will be copied from the Publisher to the snapshot folder. The snapshot is generated as a set of bulk copy program (BCP) files. Next, the Distribution Agent moves the snapshot to the Subscriber, if necessary it applies schema changes first and copies the data itself afterwards. The application of schema changes to the Subscriber is a nice feature, when you change the schema of the Publisher with, for example, an ALTER TABLE statement, that change is propagated by default to the Subscriber(s).


    Merge Replication
    Merge replication is typically used in server-to-client environments, for example when subscribers need to receive data, make changes offline, and later synchronize changes with the Publisher and other Subscribers, like with mobile devices that need to synchronize one in a while. Because I don’t really see BI capabilities here, I will not explain this type of replication any further.


    Replication Services in a BI environment
    Transactional Replication can be very useful in BI environments. In my opinion you never want to see users to run custom (SSRS) reports or PowerPivot solutions directly on your production database, it can slow down the system and can cause deadlocks in the database which can cause errors. Transactional Replication can offer a read-only, near real-time database for reporting purposes with minimal overhead on the source system.

    Snapshot Replication can also be useful in BI environments, if you don’t need a near real-time copy of the database, you can choose to use this form of replication. Next to an alternative for Transactional Replication it can be used to stage data so it can be transformed and moved into the data warehousing environment afterwards.
    In many solutions I have seen developers create multiple SSIS packages that simply copies data from one or more source systems to a staging database that figures as source for the ETL process. The creation of these packages takes a lot of (boring) time, while Replication Services can do the same in minutes. It is possible to filter out columns and/or records and it can even apply schema changes automatically so I think it offers enough features here. I don’t know how the performance will be and if it really works as good for this purpose as I expect, but I want to try this out soon!

    Update:
    I got a questing regarding the supported Replication Services features in the different versions of SQL Server (Standard,Enterprise,etc). There is a nice table on MSDN that shows this!

  • SSIS Denali as part of “Enterprise Information Management”

    When watching the SQL PASS session “What’s Coming Next in SSIS?” of Steve Swartz, the Group Program Manager for the SSIS team, an interesting question came up:

    Why is SSIS thought of to be BI, when we use it so frequently for other sorts of data problems?

    The answer of Steve was that he breaks the world of data work into three parts:

    • Process of inputs

    • BI
       
    • Enterprise Information Management
      All the work you have to do when you have a lot of data to make it useful and clean and get it to the right place. This covers master data management, data quality work, data integration and lineage analysis to keep track of where the data came from. All of these are part of Enterprise Information Management.
      image


    Next, Steve told Microsoft is developing SSIS as part of a large push in all of these areas in the next release of SQL. So SSIS will be, next to a BI tool, part of Enterprise Information Management in the next release of SQL Server.

    I'm interested in the different ways people use SSIS, I've basically used it for ETL, data migrations and processing inputs. In which ways did you use SSIS?

  • Analysis Services Roadmap for SQL Server “Denali” and Beyond

    Last week Microsoft announced the “BI Semantic Model” (BISM). I wrote a blog post about this and now the Analysis Services team wrote an article named: Analysis Services – Roadmap for SQL Server “Denali” and Beyond.

    BiSemanticModel

  • Will SSAS, Cubes and MDX be abandoned because of the BI Semantic Model?

    At the PASS Summit that is happening in Seattle at the moment Microsoft announced the “BI Semantic Model” (BISM).

    It looks like BISM is something like the UDM that we now know from SSAS. While the UDM was the bridge between relational data to multidimensional data, BISM is the bridge between relational data to the column-based Vertipaq engine. Some compare BISM to Business Objects universes.

    The next version of SSAS will be able to either run in the old “UDM” mode or in “BISM” mode, a combination is not possible. Of course this will have some radical consequences, because there are a few major differences between the two modes:

    • The switch from multidimensional cubes to the in-memory Vertipaq engine
    • The switch from MDX to DAX

    So multidimensional cubes and MDX will be deprecated? No, not really, SSAS as we know it now will be a product in the future and will remain supported. But it looks like Microsoft will concentrate on BISM, mainly because multidimensional cubes and MDX are very difficult to learn. Microsoft wants to make BI more approachable and less difficult, just like with Self Service BI.
    I would say that it’s really time to start learning PowerPivot and DAX right now, if you have not already started learning it. If Microsoft will focus on the new BISM/Vertipaq technology that will be the future if you ask me.

    Chris Webb wrote an interesting article about BISM and it looks like he is not very enthusiastic about the strategy Microsoft takes here because this could be the end of SSAS cubes within a few years: “while it’s not true to say that Analysis Services cubes as we know them today and MDX are dead, they have a terminal illness. I’d give them two, maybe three more releases before they’re properly dead, based on the roadmap that was announced yesterday.”

    What’s also very interesting is the comprehensive comment on this article from Amir Netz. He explains BISM and UDM will live together in Analysis Services in the future and MOLAP is here to stay: “Make no mistake about it – MOLAP is still the bread and butter basis of SSAS, now and for a very long time. MDX is mature, functional and will stay with us forever.”

    Read the article from Chris Webb here and make sure you don’t miss the comment from Amir!

  • SQL Server code-named 'Denali' - Community Technology Preview 1 (CTP1)

    SQL Server Denali (SQL Server 2011) CTP1 has been released!

    Download it here

    SQL 2011 is expected to be ready in the third quarter in 2011! I’ve already blogged about a few new SSIS features here

    I will keep you posted!

  • SQL Azure Reporting is announced!

    sql-azure-logo-lg

    With SQL Azure Reporting Services you can use SSRS as a service on the Azure platform with all the benefits of Azure and the most features and capabilities of premise. It’s also possible to embed your reports in your Windows or Azure applications.

    Benefits of the Azure platform for Azure Reporting Services are:

    • Highly available, the cloud services platform has built-in high availability and fault tolerance
    • Scalable, the cloud services platform automatically scales up and down
    • Secure, your reports and SQL Azure databases are on a safe place in the cloud
    • Cost effective, you don’t have to set up servers and you don’t have to invest in managing servers
    • Use the same tools you use today to develop your solutions. Just develop your reports in BIDS or Report Builder and deploy to Azure

    Disadvantages are:

    • SQL Azure databases are the only supported data sources in the first version, more data sources are expected to come
    • No developer extensibility in the first version, so no custom data sources, assemblies, report items or authentication
    • No subscriptions or scheduled delivery
    • No Windows Authentication, only SQL Azure username/password is supported in the first version, similar to SQL Azure database. When SQL Azure database gets Windows Authentication, Azure Reporting will follow

    Despite the disadvantages of the first version I think SQL Azure Reporting Services offers great capabilities and can be extremely useful for a lot of organizations.
    I’m really curious about the CTP, which will be available before the end of this year. You can sign up for the SQL Azure Reporting CTP here

    Read more about SQL Azure Reporting here

  • MCITP – I passed the 70-455 “Upgrade: Transition Your MCITP SQL Server 2005 BI Developer to MCITP SQL Server 2008 BI Developer” exam!

    Recently I passed the 70-455 exam. This exam upgrades your SQL 2005 MCTS and MCITP certifications to SQL 2008.

    image

    The exam contains 2 sections(basically separate exams), each with 25 questions:
    - A part which covers exam 70-448: TS: Microsoft SQL Server 2008, Business Intelligence Development and Maintenance
    - A part which covers exam 70-452: PRO: Designing a Business Intelligence Infrastructure Using Microsoft SQL Server 2008

    You need to pass on both of the sections with a score that’s at least 700. If you fail one section, you fail on the entire exam.

     

    How did I study

    I searched the internet and the conclusion was that there is no preparation material available for the 70-452 exam but fortunately there was a self-paced training kit for the 70-448 exam, which also covers this exam. So i bought the book, scanned it for subjects that needed attention and fortunately that was enough to pass the exam for me.

    For the entire list of preparation materials for the 70-448 and 70-452 exams follow the links below:

    70-448 preparation materials

    70-452 preparation materials 

     

    My Current Transcript

    image

More Posts Next page »
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement