THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Davide Mauri

A place for my thoughts and experiences on SQL Server, Business Intelligence and .NET

  • PASS Summit 2014 – Agile BI & Agile Data Warehousing

    I'm really happy to be able to announce that I'll be delivering one full-day workshop and a regular session at PASS Summit 2014. Here's the titles and the short summary of my workshop and session:

    Agile Data Warehousing: from Start to Finish
    Full-Day Workshop
    In this workshop you'll learn how to create a Data Warehouse from the beginning to end: from the architectural definition to its building using an engineered and proven process, you'll see how effective and efficient an Agile approach can be.

    Agile BI: Unit Testing and Continuos Integration
    Regular Session
    The Agile principles can be applied to BI projects too, even the most extreme ones like Continuos Integration. In this session, you'll see how you can do that using several tools and processes all together to achieve a common goal: be Agile!

    These topics are among the one I love at most and I'm really happy (at, proud J) that they've been selected. I believe that the Agile approach IS the correct approach to make BI project a success, simply because it's the only one that can quickly return value to the business that, in turn, will start to appreciate the BI solution giving feedback to improve it and thus helping the IT department to improve it more and more every day. It's all about creating a virtuous circle that will make sure people will love your BI solution.

    Both Gartner and Forrester agrees that the Agile approach is the way to go. The first one, already back in 2011 noted that "50% of requirements change in the first year. […] Adopting Agile delivery allows you to fail fast and fail early, reducing the cost of failure and turning it into something to embrace and learn from", and the second one, in the latest research (Q1 2014) say that "business agility often equals the ability to adopt, react, and succeed in the midst of an unending fountain of customer driven requirements" and thus an Agile BI is needed to support this approach.

    Now, this is all nice, but what about costs? Agility can be too costly. In order to be effective all the low-add-value work MUST be automated, since you're doing thing in cycles, and thus each minute spent doing non-valuable work will be wasted over and over again. In the end, to be really agile, there must be a smooth and highly automated "construction yard" where quality is not directly dependent from (the quality and experience of your) people, and where results are testable and measureable.

    The workshop will be all about these topics. Agility, engineering, best practices, automation, change management, design patterns, testability.

    On the other side, the regular session will focus 100% (and even more) on the testability topics. I'll show that testing is one of the pillars that allow the Agile approach to exists and I'll also show how we can apply it to BI. This is really a frontier topic, and as such there is dedicated tool for that. We'll have to borrow them from Application Development and forge them again to be useful in the BI field. We'll push this concept to the limit, reaching the point where we'll also see how also Continuos Integration can be used in BI development to keep under strict control each change on improvement we do to our BI solution.

    Because BI is all about trust, delivered at right time.

    See you in Seattle!

  • GUID, fragmentation and performance: the eternal flame.

    In the last months a lot of discussion has been done around the topic of GUIDs and their usage as Clustered Keys. We had a very “hot” discussion on that several years in Italy, and looking on the web you can see that this kind of question surface gain periodically. Recently I discussed it also with my friend Thomas. Discussion that gave me some food for thought, as usual. So with this post I’d like to share my findings, hoping to write a blog post that can help to better understand this phenomenon that is always an hot topic, in any DBA and DEV discussion.

    Actually, the question is only a part of the more broader topic of “fragmentation and performance”, that is – IMHO – much  more interesting. It is well known that if you try to insert rows in a table as fast as possible, and you’re using an IDENTITY as your clustered index, this, instead of helping, will pose a barrier to the performance you can reach. A nice whitepaper explain the problem in detail:

    Diagnosing and Resolving Latch Contention on SQL Server

    Not only that, ORM like Hibernate or the Entity Framework loves to work with GUID since in that case you can created values without having to do a round-trip to SQL Server, allowing developers to be more free and independent from SQL Server, build a cleaner and easier to maintain code. (At least this is the common idea, of course I don’t agree with it at all, since it cover only one side of the problem: the developer point of view. But this is another story.)

    Also Microsoft uses it *a lot*. I you ever worked with Dynamics CRM you know that GUID simply pours out of the server. They are everywhere.

    So, are they a good thing or a bad one?

    As usual, forget to have simple, catch-all, answer. Computer science is so wide that it’s almost impossible to have a black or white approach. But luckily we work in a field where numbers and measurement still have a lot of importance, so I did some test to really understand how much fragmentation can impact the performance of a query.

    Here’s the test I did and the results they show.

    Test Setup

    First of all I created two tables


    and loaded them with ostress:

    ostress -Slocalhost -E -dIndexTest -Q"insert into dbo.IndexTest3F default values" -oC:\FT\xxx\TMP\ -q -n70 -r50000

    After that I reindex the table IndexTest3N so that I can do some test with a badly fragmented (IndexTest3F) and a total unfragmented (IndexTest3N) table using the following command:

    alter index pk__3N on dbo.IndexTest3N rebuild with (fillfactor = 50)

    I used fillfactor 50 to try to keep the page count for both table at the same level. As you can notice, and as quite obvious from the row size, this is not possible, since exactly two rows can fit in any page. For some reason, without reindexing the table, the fill factor is lower than expected.


    Unfragmented Table Tests

    To make sure that Buffer Cache was completely empty, I ran the following commands:


    And than I searched for a value I know it wasn’t in the table, using a non-indexed column, so that a table scan would be the only possible plan. Here’s the results, with I/O and TIME statistics set to on:


    This is a Fast Track machine (32 Cores, no HT, 256 GB Ram, 4 P2000 SANs) where I expect 6GB/Sec of transfer rate. Doing the math, this is the exact throughput I got


    Monitoring the Avg.Disk Bytes/Reads counter, I noticed that SQL Server was able to do I/O up to 512 Kb, as I would expect in this situation: SQL Server is doing an Index-Order Scan, and pages are all in the correct physical order.

    Forcing an Allocation-Order Scan, using a TABLOCK hint, shows the very same results.

    If table is all in memory, so we’re not doing any physical I/O, performance are phenomenal, as one would expect. Scanning of all 13GBs of data, happens to take little more than 100msec, with a CPU a little over 3000 msec.


    So far so good. Now let’s see what happens if we’re using an heavily fragmented table

    Fragmented Table Tests

    As soon as we run the same reference query again, we have some very different results:


    The query took more than 10 seconds instead of the original 2! This means being 5 time slower, with a throughput of less than 2GB/Sec. Due to such high fragmentation, and since it’s using the Index-Order scan strategy, SQL Server was never able to do I/O bigger than 64Kb, bringing the performance bar down a lot.

    Helping SQL Server to do an Allocation-Order scan with a TABLOCK hint helps a bit:


    but still performance a much lower than the one obtained with a non-fragmented table.

    Now, let’s see if something changes also with warm cache. We all know that memory is so fast, when compared to spindles, that fragmentation should not give any problem. Well, the tests shows something very different:


    Of course performance is great here if compared with cold-cache results, after all we’re still getting an answer only after 500msec. But look at CPU time. It’s 16 SECONDS! 16 seconds instead of 3, again, 5 time (and more) more expensive than the non-fragmented table, even when in memory.

    I must admit that I wasn’t expecting such difference. Differences that disappear when going for the Allocation-Order scan, where fragmentation is not key factor anymore, since SQL Server doesn’t follow the double-linked list found at the leaf leave of index pages:


    Other Tests

    I ran the very same test also on a much cheaper machine (the Fast Track machine used here has a cost, in hardware only, of more the 100.000$), not optimized for DWH workload and using SSD as a storage, and without any SAN, but using DAS.

    The test machine was a VM on my notebook, with 2 vCore, 8GB of RAM and Samsung 840 SSD for the storage.

    Results showed the very same behavior.


    Well, it’s pretty clear that fragmentation has a non-trivial impact on table scan performances. So is a non-sequential GUID such a bad thing? Here’s my opinion, give the test I ran:

    Do It When
    if you need a table where you only need to persist data and you need to have a very high insert rate (persist = insert row and then access only by PK, and only for singleton lookups), then GUID or any other better way to avoid contention in the last page is good. In my test on the Fast Track I could only reach 15.000 Batch/Sec with the usual IDENTITY on the ID column, while I went up to 50.000 Batch/sec with the GUID. In both cases I had 140 concurrent connections (using ostress.exe) doing a very simple INSERT…DEFAULT VALUES and CPU usage were around 20%. With GUID bottleneck was then the PFS page. I only have one filegroup (in addition to the PRIMARY), since this is a DWH machine and was configured to optimize that specific workload, so I’m sure that on a proper OLTP configured machine, the saturation point is much after that value.

    Don’t Do It When
    if you are doing anything else where you cannot foretold and be 100% sure that you’re never going to do a table scan, I wouldn’t recommend the usage of a value (GUID or not) that brings your table to a 99% fragmentation guaranteed. For example if you’re developing and ERP or accounting system, or a website or anything else where the optimization of insert-rate is not your primary goal…well, think twice before adopting GUID blindly.

    I’m sure this won’t be the last post on that subject, but at least there are now some numbers that we can use to discuss on it. And I would really love to see the SQL family act as scientists would do (aren’t we Data Scientists? Winking smile): do that same tests on your machines, post the results and let’s discuss them!

  • Data Warehouse Advanced Modeling Techniques

    I’m yet again pleased to announce that on the next month (19th February 2014) I’ll deliver the Advanced version of our Data Warehouse Workshop with my friend Thomas Kejser. The venue will be Stockholm again (really a lovely city) but this time we’ll be at the

    World Trade Center
    Klarabergsviadukten 70
    107 24 Stockholm, Sweden

    This is the agenda:

    • Logical data models: What are they good for?
      • Comparing Inmon and Kimball
    • Demystifying meta data: Meta Models and auto generated ETL
      • Metadata Driven ETL
    • Understanding modern hardware: Co-Located data in SMP and MPP systems
      • How to create a balanced system
    • Why normalization is bad for data warehouses, and what you can do about it
    • Best Practices, Naming Conventions, Lesson Learned
    • Choosing between column and row stores
    • Keys, master data and mapping layers
    • Tracking history in the warehouse
    • Handling changing user requirements
    • High speed data loading – and reloading
    • Unit Testing

    I can tell you that this is really something unique, just like the one we already did one year ago and the one we delivered at SQL Rally, the blends and summarize years and years of experience, that, if you’re into BI, you should not miss at all.

    See you there!

  • PowerQuery, ExpandTableColumn and the parent data

    I’ve been playing with PowerQuery in the last days and I’ve come across the need of accessing “parent” object data when using the ExpandTableColumn function. The solution is more tricky then one could imagine and I have to say a BIG “THANK YOU” to Miguel Llopis for showing it me.

    Now, let’s say you have a bunch of files in a directory. All files has the same structure (csv, json…anything you want) and you need to import all their content in one PowerPivot table.


    Of course the PowerQuery “From Folder” option is exactly what you need


    And there’s the result


    If you then click on the “Content” header  you will have the content of the files merged in just one resulting table.


    Cool, isn’t it? Now, let’s say they you also want to have the information about the source of your data. Like, for example, the name of the source file, so that you can know from which file the data comes from. Now things become much more complex since there is no object that allows you to access the “parent object” information. In order to solve this, you have to start to use PowerQuery functions ( page 11).

    First of all you need to create a function that will do the same operation you just did for all files in a folder, but limiting it only to a specific file, and turn that file in the parameter of such function. Here’s an example to do that:

    (filepath) =>
        Source = Table.FromColumns({Lines.FromBinary(File.Contents(filepath))}),
        SplitColumnDelimiter = Table.SplitColumn(Source,"Column1",Splitter.SplitTextByDelimiter(","),2)

    the function will take a file, whose path will be passed in the filepath parameter, turn the file into a table and split the values using comma as a delimiter. Since it is a function, you don’t need to have it visible in an Excel sheet, so you can deselect the “Load To Worksheet” option. Let’s give a name to our function and call it DoStuff.

    Now that our function is in place we just have to

    1. Load the folder content again
    2. Create a new column with the file path that will be passed to the freshly created function
    3. Remove all unneeded columns
    4. Create an additional column where we’ll call the created function for each file in the folder
    5. Expand the loaded content into columns

    Translated into PowerQuery language it becomes:

        Source = Folder.Files("D:\Temp\PowerQuery"),
        InsertedCustom = Table.AddColumn(Source, "File Path", each [Folder Path] & [Name]),
        RemovedOtherColumns = Table.SelectColumns(InsertedCustom,{"File Path"}),
        InsertedCustom1 = Table.AddColumn(RemovedOtherColumns, "Custom", each DoStuff([File Path])),
        #"Expand Custom" = Table.ExpandTableColumn(InsertedCustom1, "Custom", {"Column1.1", "Column1.2"}, {"Custom.Column1.1", "Custom.Column1.2"})
        #"Expand Custom"

    and here’s the result:



  • SQL Rally Nordic & Amsterdam slides & demos

    Last week I had the pleasure to speak at two GREAT conferences (as you can see from the wordcloud I’ve posted, here for Stockholm and here for Amsterdam. I used two different filtering techniques to produce the wordcloud, that’s why they look different. I’m playing a lot with R in these days…so I like to experiment different stuff).

    The workshop with my friend Thomas Kejser on “Data Warehouse Modeling – Making the Right Choices” and my sessions on “Automating DWH Patterns through Metadata” has been really appreciated by attendees, give the amount of good feedback I had on twitter and on some blog posts (Here and here).

    Of course many asked for slides & demos to download, so here you are!

    Automating DWH Patterns through Metadata – Stockholm

    Automating DWH Patterns through Metadata – Amsterdam

    I’m still trying to understand if and how I can publicly post slides & demos of the workshop, so for that you have to wait a couple of days.

    I will post about it as soon as possible. Anyway, if you were in the workshop and would like to get the slide & demos ASAP, just send me an email, I’ll happily sent the protected link to my skydrive folder to you.


  • SQL Rally Pre-Con: Data Warehouse Modeling – Making the Right Choices

    As you may have already learned from my old post or Adam’s or Kalen’s posts, there will be two SQL Rally in North Europe.

    In the Stockholm SQL Rally, with my friend Thomas Kejser, I’ll be delivering a pre-con on Data Warehouse Modeling:

    Data warehouses play a central role in any BI solution. It's the back end upon which everything in years to come will be created. For this reason, it must be rock solid and yet flexible at the same time. To develop such a data warehouse, you must have a clear idea of its architecture, a thorough understanding of the concepts of Measures and Dimensions, and a proven engineered way to build it so that quality and stability can go hand-in-hand with cost reduction and scalability. In this workshop, Thomas Kejser and Davide Mauri will share all the information they learned since they started working with data warehouses, giving you the guidance and tips you need to start your BI project in the best way possible―avoiding errors, making implementation effective and efficient, paving the way for a winning Agile approach, and helping you define how your team should work so that your BI solution will stand the test of time.

    You'll learn:

    • Data warehouse architecture and justification
    • Agile methodology
    • Dimensional modeling, including Kimball vs. Inmon, SCD1/SCD2/SCD3, Junk and Degenerate Dimensions, and Huge Dimensions
    • Best practices, naming conventions, and lessons learned
    • Loading the data warehouse, including loading Dimensions, loading Facts (Full Load, Incremental Load, Partitioned Load)
    • Data warehouses and Big Data (Hadoop)
    • Unit testing
    • Tracking historical changes and managing large sizes

    With all the Self-Service BI hype, Data Warehouse is become more and more central every day, since if everyone will be able to analyze data using self-service tools, it’s better for him/her to rely on correct, uniform and coherent data. Already 50 people registered from the workshop and seats are limited so don’t miss this unique opportunity to attend to this workshop that is really a unique combination of years and years of experience!

    See you there!

  • When To Break Down Complex Queries


    Some days ago my SolidQ colleague Luca  pointed me to the followin SQLCAT article

    When To Break Down Complex Queries

    that, as part of the reorganization of SQL CAt website, has emerged again from the depth of Internet.

    The article is very interesting and also allows me to underline again the importance of having a way to tell the optimizer when it should *not* expand subqueries into the main query. In this way the usage of #temp tables could be avoided in many cases, allowing for the usage of views or inline UDF instead of stored procedures.

    I opened the connect item back in 2010, so it’s time to bring more attention to it in order to have it in the product ASAP, and this article give me the option to do that.

    Please vote for it:

    Now that data volumes are increasing every day, more and more I find myself fighting with bad query plans due to bad cardinality estimation. When you are joining several big tables, you can be sure you’ll have to break your query in several pieces, to be sure to have good performance. Last time I had to do this was….the day before yesterday!

    I think it’s really time for an extended NOEXPAND hint. If you’re with me, vote vote and vote!

  • Conferences in the next months

    Next months look full of interesting conferences and I’ll be speaking in several of those. I’ll be speaking at the following ones:

    17 – 18 September 2013, Germany

    SQL Saturday 260 Milano
    8 October 2013, Italy

    SQL PASS Nordic
    4 – 6 November 2013, Sweden

    SQL PASS Amsterdam
    6 – 8 November 2013, Nederland

    SQL Saturday 257 Verona
    9 November 2013, Italy

    Advanced Data Warehouse Workshop
    20 November 2013, Stockholm

    SQL Saturday 264 Ancona
    13 December 2013, Italy

    I’ll deliver session on Data Quality & Master Data Management, Techniques for the Agile Business Intelligence, Automating ETL using Metadata (and BIML) and, at SQL PASS Nordic I’ll also deliver, again with my friend Thomas Kejser, a very nice workshop:

    Data Warehouse Modeling – Making the Right Choices

    It’s very likely that I’ll be missing the PASS Summit this year, but at some point I also have to do some work here and now…:)

    If you’re going to be to one of those conferences, please come and say “Hi”: it’s always great to meet people all around the world!

  • SQL Saturday 260 in Milan with Fabian Pascal

    I’m really happy to be able to announce that we’re going to have a SQL Saturday in Milano! We’re going to have a great list of speaker and exceptional content, as you can see from the list of proposed sessions:

    we had proposals from Klaus Aschenbrenner, Silvano Coriani and many other high-level speakers. But the most special session we’re going to have is the one delivered by Fabian Pascal one of the most knowledgeable and respected people on the Database Modeling topic!

    He’s going to deliver a very interesting session:

    Denormalization for performance: a costly illusion

    which is something is really not easy to find elsewhere. This kind of modeling session can really change the way you see the world and, most of all, are rarely free, especially if done by such kind of person.

    This is surely a SQL Sat you don’t want to miss! Plus, you also have the opportunity to make your wife/girlfriend happy, bringing her in one of the most (if not THE most) well known Fashion Cities! If you’re staying for more than one night you can stay just near one of the well known street, Corso Como, where you can just walk down right to the Dome, surrounded by nightlife and cool shops.

    We’re going to organize something EPIC! Don’t miss it!

  • Testing Hekaton: some results.

    In the last months I’ve been playing with Hekaton a while, since a customer of ours is evaluating its usage for one of its flagship products.

    Here’s I’d like to share with you what I’ve found doing this test. I think it is really interesting for two main aspect

    1. they show that Hekaton is not magic and still good best practices and good hardware must be in place
    2. Hekaton can be really fast, competing – in performances – with some NoSQL databases (but I’ll discuss about this in another post)

    For the test I’ve used a VM running on VMWare Workstation, with the guest having 2 vCPU and 4GB of Ram. Disk is pre-allocated vDisks stored on a Samsung 840 Pro SSD.

    Surely it’s not a server, but we wanted to try Hekaton  on (somehow) cheap machines in order to understand which kind of performance we can have also on (somehow) cheap hardware.

    The test was a simple insert of 10,000 entities with the following structure

    customerId: Int
    attribute1: varchar(100)
    attribute2: varchar(100)
    attribute3: varchar(100)

    Here’s the results:


    SO = Schema_Only
    SAD = Schema_And_Data
    NC = Native Compilation

    We didn’t test the usage of Bulk Insert ‘cause we were interested in understanding how fast we can be on singleton inserts. Insert was done calling a stored procedure made of only one INSERT command.

    As visible the usage of Hekaton at its full potential (SO+NC) improved the performance of near three times. Not really bad! (This kind of performance are fast as some NoSQL DB in my tests!)

    Dropping the Native Compilation increased the elapsed time a little bit, but that’s the price you have to pay if you want, for example, to use SEQUENCES or other features not actually supported by Native Compiled procedures. (Using SQL Server 2014 CTP1 right now)

    The real surprise was the fact that using the Schema_And_Data option for an Hekaton table basically removed all performance gains, making the difference between Hekaton tables and “classic” tables almost invisible.

    I really wanted to understand why since I was not really expecting this.

    Analyzing wait stats turned out that we where hitting a bottleneck at the transaction log level:

    Here’s the wait stats using SO:


    and here’s the SAD (by all means!) wait stats


    Now, everything it’s much more clear. Putting Log and Data on the same disk is STILL not a good idea. So what are the thing we’ve learned in these tests?

    Well, first of all, Hekaton can really change the game as it’s performance are really impressive and can really compete with some of the fastest NoSQL databases (CouchBase for example).As said before I’ll do a post on that in future, but before I want to do additional tests and wait for Hekaton to RTM status.

    As a second point, it shows that there is no magic going around here and one can’t simple hope to solve it’s performance problems simply enabling Hekaton. A good system architecture is still needed and even good understanding of the engine is a must (as always) in order to find the bottleneck and fix the problem.

    Unfortunately I wasn’t able to do additional test to show that the bottleneck can be removed due to time and hardware restrictions but I hope to be able to them in the near future so that I can update this post. In the meantime if someone wants to try to execute the same test, just drop me an email and I’ll share the very simple tool I created with you.

  • Azure HDInsight Preview–Be Warned or you CC will suffer…

    As soon as I had the possibility to test HDInsight on Azure I promptly started using it. Nothing really exciting, just creating the cluster and do some tests following the official tutorial you can find here:

    I’m using my MSDN MVP subscription for which I have 1500 hours of “Compute Services”. Well, be careful to keep your HDInsight cluster turned on. Even if you don’t use it it will consume resources. The the resource created behind the scene is a “LargeSKU” VM, as the detailed usage report that you can download says.


    And, wow, look! In just less then a week it has consumed ALL my available hours, and thus my CC started to be drained. Non that much, luckily, just a hundred of bucks (which is also not so few in this times of crisis) and even more luckily I’m very careful to monitor Azure expenses often since I still not trust the pay-per-use system so much to just leave it alone and without constant supervision. And as soon as I discovered that there was something strange going on I shout the HDInsight cluster down immediately.

    Unfortunately the expenses are reported under the generic “Compute Hours – Cloud Services” summary so without downloading the detailed billing report and analyzing it with Excel was impossible to me to understand that the resources consumption came from the HDInsight cluster.

    That’s why took me two days to understand the problem (at the beginning I thought the problem was my website) but on May 24th I finally understood what was happening.

    To be honest this is the only case where I had such bad surprise (I’m also using VMs, Web Roles and SQL azure and I never consumed more then what I expected so far so I’m not blaming MS at all here), but this is a lesson learned that I want to share with the community, hoping to help someone to avoid even worse surprises.

    The feedback to MS I’d like to give is that it would be very good if, at least for the “preview” features, there could be a specific settings to decide how much resources they could use before they are automatically shut down.

    And the conclusion, on my side, is that, at least for now, playing with Hadoop on this area of the Big Data universe is better done on-premise on the spare machine I have in the office.

    Of course what happened could also totally be my fault, even if I just did everything following the tutorial, but if someone with more experience on Azure HDInsight would like to leave a feedback I’ll be more than happy the ear it.

    So…keep your eyes open and your money safe


    Another feedback to MS: it would be very very very nice if the billing details can be accessed via OData. It would be a perfect match with PowerPivot capabilities!

  • SQL Server 2008R2 / 2012 Standard & Backup Compression

    This is something that is totally overlooked, in my experience, with SQL Server 2008 R2 and SQL Server 2012. On the contrary to what happens with SQL Server 2008, the Standard version of SQL Server 2008R2 & SQL Server 2012 supports backup compression:

    SQL Server 2008 R2 BOL

    SQL Server 2012 BOL

    Unfortunately a bug in the documentation said the opposite in the past, but it has been fixed quite a long ago now:

    Just keep it mind Smile

  • Extended Events did it again: monitoring Stored Procedures performance

    In the last days I  had to work quite a lot with extended events in order to deeply monitor SQL Server performance. One interesting request that came out while implementing the monitoring infrastructure, was the possibility to monitor the performance of a set of stored procedures, vital for the correct handling of an online booking procedure.

    The challenge was to give a sort of real-time monitor of procedure performances so that one can then create alert and/or do some stream-analysis to keep response time always under the desired amount of time.

    Here’s how you can do it using Extended Events, monitoring, for example, the execution of procedure uspGetManagerEmployees in AdventureWorks2012:

    CREATE EVENT SESSION [monitor_procedure_performance] ON SERVER
    ADD EVENT sqlserver.rpc_completed
            ACTION    (package0.collect_system_time,package0.process_id,sqlserver.client_app_name,sqlserver.client_connection_id,sqlserver.client_hostname,sqlserver.client_pid,sqlserver.database_id,sqlserver.database_name,sqlserver.nt_username,sqlserver.request_id,sqlserver.session_id,sqlserver.session_nt_username,sqlserver.sql_text,sqlserver.transaction_id,sqlserver.transaction_sequence,sqlserver.username)
            WHERE    ([object_name]=N'uspGetManagerEmployees')
    ADD EVENT sqlserver.module_end
            ACTION    (package0.collect_system_time,package0.process_id,sqlserver.client_app_name,sqlserver.client_connection_id,sqlserver.client_hostname,sqlserver.client_pid,sqlserver.database_id,sqlserver.database_name,sqlserver.nt_username,sqlserver.request_id,sqlserver.session_id,sqlserver.session_nt_username,sqlserver.sql_text,sqlserver.transaction_id,sqlserver.transaction_sequence,sqlserver.username)
            WHERE    ([object_name]=N'uspGetManagerEmployees')
    ADD TARGET package0.ring_buffer

    Once done, it’s all about decoding the XML that Extended Events returns:


        Analyze XEvent data

    IF (OBJECT_ID('tempdb..#t') IS NOT NULL) DROP TABLE #t
    IF (OBJECT_ID('tempdb..#r') IS NOT NULL) DROP TABLE #r

        cast(target_data as xml) xdoc
        sys.dm_xe_sessions s
    inner join
        sys.dm_xe_session_targets t on t.event_session_address = s.address
    where = 'monitor_procedure_performance'

    with cte as
            event_number = ROW_NUMBER() over (order by T.x),
            event_timestamp = T.x.value('@timestamp', 'datetimeoffset'),
            T.x.query('.') as event_data
        cross apply
            xdoc.nodes('/RingBufferTarget/event') T(x)
    cte2 as (
            --data_field = T2.x.value('local-name(.)', 'varchar(100)'),
            data_name = T2.x.value('@name', 'varchar(100)'),
            data_value = T2.x.value('value[1]', 'varchar(100)'),
            data_text = T2.x.value('text[1]', 'varchar(max)')
            cte c
        cross apply
            c.event_data.nodes('event/*') T2(x)
    cte3 as (
            data_name in ('collect_system_time', 'object_name', 'cpu_time', 'duration', 'logical_reads', 'row_count', 'database_name', 'database_id')
        (max(data_value) for data_name in (database_id, database_name, object_name, cpu_time, duration, logical_reads, row_count)) T

    --SELECT * FROM #t
    SELECT * FROM #r

        execution_date = cast(event_timestamp as date),
        execution_hour = datepart(hour, event_timestamp),
        execution_minute = datepart(minute, event_timestamp),
        duration_msec = avg(cast(duration as int)) / 1000.
    group by
        cast(event_timestamp as date), datepart(hour, event_timestamp), datepart(minute, event_timestamp), [object_name]

    That’s it. You can now keep monitored execution times of your procedure.

  • Trapping SQL Server Errors with Extended Events

    One very useful usage of Extended Events is the ability to trap SQL Server error without the need to have a server trace running (which, btw, is deprecated), with the additional feature of being able to query the data as soon as it comes in. This means that we a solution to monitor and trap errors as soon as they happen can be easily created, in order to help developers to fix problems as soon as they are detected. This is really, really, really helpful especially in very big applications, where the code base is quite old and there is no-one really knowing everything of the solution.

    To start a Extended Events sessions in order to trap SQL Server errors with severity greater than 10, just run the following script:

    ADD EVENT sqlserver.error_reported
            ACTION    (package0.collect_system_time,package0.last_error,sqlserver.client_app_name,sqlserver.client_hostname,sqlserver.database_id,sqlserver.database_name,sqlserver.nt_username,sqlserver.plan_handle,sqlserver.query_hash,sqlserver.session_id,sqlserver.sql_text,sqlserver.tsql_frame,sqlserver.tsql_stack,sqlserver.username)
            WHERE    ([severity]>10)
    ADD TARGET package0.event_file
            SET filename=N'D:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\XEvents\error_trap.xel'


    The problem with Exended Events is that they only talks XML which is surely flexible and extensible, but not at all confortable to be queried. That’s why I prefer to turn it into something relational. Using the xml nodes function and pivoting data can make the trick:

    IF (OBJECT_ID('tempdb..#e') IS NOT NULL) DROP TABLE #e

    WITH cte AS
            CAST(event_data AS XML) AS event_data
            sys.fn_xe_file_target_read_file('D:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\XEvents\error_trap*.xel', NULL, NULL, NULL)
    cte2 AS
            event_number = ROW_NUMBER() OVER (ORDER BY T.x)
        ,    event_name = T.x.value('@name', 'varchar(100)')
        ,    event_timestamp = T.x.value('@timestamp', 'datetimeoffset')
        ,    event_data
            event_data.nodes('/event') T(x)
    SELECT * INTO #e FROM cte2

    WITH cte3 AS
            --data_field = T2.x.value('local-name(.)', 'varchar(100)'),
            data_name = T2.x.value('@name', 'varchar(100)'),
            data_value = T2.x.value('value[1]', 'varchar(max)'),
            data_text = T2.x.value('text[1]', 'varchar(max)')
            #e c
            c.event_data.nodes('event/*') T2(x)
    cte4 AS
            data_name IN ('error_number', 'severity', 'message', 'database_name', 'database_id', 'client_hostname', 'client_app_name', 'collect_system_time', 'username')
        (MAX(data_value) FOR data_name IN ([error_number], [severity], [message], database_name, database_id, username, client_hostname, client_app_name, collect_system_time)) T
        [severity] > 10
        event_timestamp DESC

    and voilà, now all errors can be easily identified:


    It’s now very easy to create reports and dashboards to monitor the system in (near) real time. And this is vital when you run a 24x7 online business.

  • Forcing “Custom Properties” of a Data Flow Transformation to support DTS Expression

    Today I was using a 3rd Party Data Flow component that has several Custom Properties for which I need to change some of their values at runtime using a DTS Expression.

    To mimic the situation let’s use the “Percentage Sampling” that has two Custom Properties:


    Of the two Custom Properties only some (one in this case) are available also outside that dataflow, so that they can be targeted by a DTS Expression. Such properties are listed under the “Misc.” section of Data Flow properties


    and also in the Property Expression Editor window:


    Now, what if you need to make also the “hidden” custom properties available for DTS Expression usage? As you may have noticed, the SamplingSeed is not exposed outside the Data Flow. I tried to search the web, but after several minutes I gave up since I wasn’t able to find anything that could help. I then started to look into the SSIS object model and I found a nice property named expressionType in the IDTSCustomProperty interface that tells to the engine if the property value can be specified using DTS Expression or not:

    If the value is set to Notify than the usage of DTS Expression is possible otherwise, if the property is set to None, as the name implies, DTS Expression cannot be used.

    So all you need to do is to open the .dtsx file, look for the component you want to touch and its properties


    and add (if not exists) or change the expressionType attribute to Notify:


    Save the file and voilà, the property is now available for DTS Expression usage!


    Now, just be aware that is a sort of an hack, so double check it works for you. On the 3rd party components we’re using it works like a charm, and it just saved that day since without the ability to change some properties at run time, the deployment of our package in production could have been a nightmare.

    I tested this approach both on SQL Server 2012 and SQL Server 2008 and in both cases I hadn’t had any problems.

    Hope this helps someone Smile, enjoy!

This Blog


Privacy Statement