THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

SSIS Junkie

  • Visio stencils for SSIS

    My colleague Peter Avenant pointed out to me today that there are some Visio stencils available for SSIS.

    ssis visio stencil

    According to a blog post entitled Useful Visio stencil set for SQL Server 2005 by Josh Robinson they can be downloaded in a zip file and indeed, at the time of writing, they can. However, the supposed accompanying web page no longer exists and hence it doesn’t give me any confidence that the zip file will exist forever either therefore I’ve also uploaded the zip file up to my SkyDrive at  MS Visio SQL 2005 Stencils.zip.

    @Jamiet


  • Microsoft, OData and RDF

    Earlier today Chris Webb posted a blog entry entitled OData and Microsoft BI where he remarked upon the increasing importance of Microsoft’s burgeoning OData protocol for accessing data over HTTP. Chris also linked to Douglas Purdy’s blog post OData: The Movie which listed the following Microsoft and non-Microsoft technologies that support (or soon will support) OData either as a data producer or a data consumer:

    • Excel
    • .Net client
    • AJAX Client
    • PHP client
    • Java client
    • Visual Studio
    • Sharepoint
    • SQL Server (via Reporting Services)
    • Codename "Dallas"
    • Websphere

    To that list I would add Windows Azure which exposes data from its storage engine using OData and also the forthcoming Live Framework technology that is expected to surface user-centric data from Microsoft’s various Live Services properties (e.g. SkyDrive, Messenger, Calendar).

     

    Chris suggested that a pervasive format for data on the web would be a big step forward because any consumer that understood that format could talk to any producer that chose to expose it; win-win-win indeed. Unfortunately life is of course never that simple and as is so often the case there are many competing technologies in this space including (but, I suspect, not limited to) Microsoft’s OData, Google’s GData and the World Wide Web consortium’s Resource Description Framework (RDF).

     

    This is nothing new of course; we can draw parallels with Atom and RSS which were recent competing technologies used to expose blog syndication feeds. Interestingly both of these technologies generally sit side-by-side quite happily these days (read: Will the Online Identity War turn out like the XML Syndication War?) and it will be interesting to see whether the same will happen in the OData/GData/RDF space; will organisations opt to expose multiple service heads for their various databases? I await the answer to that one with interest.

     

    RDF is of particular interest chiefly because one of its main proponents is Sir Tim Berners-Lee, the inventor of the world wide web. RDF has been knocking around for about 10 years now and Berners-Lee sees it as the technology that will underpin the growth of the fabled Semantic Web; indeed he has been heavily involved with the recently announced service http://data.gov.uk which exposes data held by the UK government and it is no surprise that that service exposes its data using RDF.

     

    I don’t know too much about GData but I do have a working knowledge of both OData & RDF so perhaps an overview and comparison of the two might be useful. To this observer it appears as though OData is closely tied to the tables-rows-and-columns paradigm that anybody familiar with traditional relational databases will understand. An OData service exposes multiple entities (roughly analogous to tables) and the relationships between them.


    RDF however is built around the concept of triples which I assert is more analogous to a conceptual Entity-Attribute-Value model (that assertion is something I hope to explore in a future blog post). The key difference between RDF and OData though is that an RDF document can contain links to RDF documents that are hosted on other services and attributes of entities are defined in terms of those other services. Put more simply, RDF data is one massively distributed, non-centralised, interlinked web of data and if that sounds an awful lot like the mass of HTML documents which form the world wide web then you shouldn’t be too surprised given that Berners-Lee is so heavily involved. The notion of linking data together in this way has also given rise to the very descriptive moniker linked data which you may well hear mentioned in the same sentence as “semantic web” from time to time.

     

    This has been a rather rambling blog post and really its just me dumping some thoughts out onto your computer screen as you can tell from the rather unimaginative title. Nonetheless there is some interesting stuff going on here that I hope to explore in the future so if you're interested please return some time. If you have any thoughts I'd love to read them in the comments section below.

     

    @Jamiet


  • Tweetpoll and RESTful Northwind go bye-bye

    On 31st January 2010 Windows Azure and SQL Azure will transition to becoming services that you have to pay for which means that my three small demos that are hosted up there are going to disappear hence I thought now would be a good time to review before they are digitally ground into dust.


    Tweetpoll

    Tweetpoll was a demo that I wrote back in April 2009 and is hosted on Windows Azure at http://tweetpoll.cloudapp.net/. Here’s what it does:

    • periodically polls Twitter’s public timeline and gets returned a list of the latest tweets
    • pushes each returned tweet onto a Windows Azure queue as a message
    • picks each message off the queue, measures the length of the tweet, and increments a counter (in Windows Azure table store) for each length
    • Displays the distribution of tweet lengths on a graph at http://tweetpoll.cloudapp.net/ (using some questionable HTML and Javascript :)
    • Counts the number of hits on http://tweetpoll.cloudapp.net/ and maintains that count in Windows Azure table storage too
    • Displays the current contents of the Windows Azure queue

    Here’s a diagram that shows that a little better and demonstrates which parts use Azure web and worker roles. It was built at a time when one could only have one worker role per application therefore two threads are used to simulate the pushing onto and picking off the queue:

    tweetpoll architecture

    Its a very noddy application and the use of a queue is gratuitous but nevertheless its a useful demo of what can be achieved and enabled me to get some experience of building an app on Azure. The source code is available at http://tweetpoll.codeplex.com.

    I wrote a blog entry back in May 2009 called Tweetpoll – My first Windows Azure application is live that introduced Tweetpoll. In that blog entry I provided a screenshot of the graph that gets displayed at http://tweetpoll.cloudapp.net and here it is again:

    tweet length distribution May 2009

    Here’s what it looked like in July 2009:

    tweet length distribution July 2009

    And finally today in January 2010:

    tweet length distribution January 2010

    I said at the time that it would “be interesting to see if the graph smoothes out over time” and sure enough that is what has happened. We still get a massive spike around 140 characters of course but its clear to see that the curve has smoothed out before that spike. I also find it interesting that there are more tweets in the 30-50 range than 90-120; something for academics to ponder over perhaps. Finally, if you are wondering why some tweets appear to be over 140 characters in length then take a read of my blog entry The Longest Tweet where I explain why that is.

    Incidentally, by adding up the counter for each tweet length I can determine how many messages Tweetpoll has gotten off of the public timeline since April 2009 and I display that number at http://tweetpoll.cloudapp.net too:

    tweetpoll total tweets polled

    Last but not least here’s how many hits the site has had:

    tweetpoll page hits

    Again, if you’re interested in looking under the bonnet of Tweetpoll then the source code is available at http://tweetpoll.codeplex.com.


    RESTful Northwind

    When SQL Azure entered public beta I wanted to build an app that would demonstrate its capabilities so I built http://northwindazure.cloudapp.net/Northwind.svc/. Its is, quite simply, a copy of the venerable Northwind database hosted on SQL Azure with a WCF Data Services (née Astoria) head on top of it which is hosted on Windows Azure. If you hit the WCF Data Services head then this is what you see (make sure you have feed reading turned on in your browser):

    restful northwind head

    Some sample queries that you can issue:

    All the code is available on Codeplex at http://nwindazure.codeplex.com/.

    If you want to read more about RESTful Northwind then go read my blog entry RESTful Northwind on SQL Azure where I also talk a little about how I attempted to use datadude (aka Visual Studio Team System for DB Pros aka whatever-its-called-today) against SQL Azure.


    TwitterCache

    This one is a lot simpler than Tweetpoll or RESTful Northwind, it is simply a SQL database hosted up on SQL Azure. In December 2009 Brent Ozar blogged about how he had been archiving tweets from various people that he follows into a SQL Server database and he made a backup of that database available for anyone that wanted to use it. I thought it would be cool to host that database up on SQL Azure both because it would demonstrate SQL Azure’s abilities and because it would save anyone else having to go through the rigmarole of restoring it themselves so I copied the schema up there and pumped the data in using SSIS.

    Read more (including instructions on how to access it before it disappears) at TwitterCache now hosted on SQL Azure.


    These apps/services will be disappearing in about one week from now so if you’re interested in using them go and do it now. Remember, the code for Tweetpoll and RESTful Northwind is up on Codeplex at:

    Hope these have been useful to someone, they certainly have been for me!

    @Jamiet


  • “Excel found unreadable content” when exporting a Reporting Services report

    [This is just a quickie, I’m putting it up here for anyone that received the messages above and searches for it online!]

    Earlier today my team experienced some issues with a Reporting Services report that when attempting to export to Excel would cause Excel to throw the error “Excel found unreadable content” (I really should have got a screenshot but I didn’t think to do that. Sorry.)

    Of course we went down the usual routes. Searching online threw up a Microsoft KB article Error message when you try to open a workbook in Excel 2007: "Excel found unreadable content in Book_Name" which describes a situation related to Analysis Services.

    Turns out it was nothing at all to do with Analysis Services. The solution was actually a lot simpler – our Reporting Services server had run out of disk space and clearing down some space on that disk immediately solved the problem.

     

    So, if you encounter the above error – check your disk usage!

    Hope that helps!

     

    @Jamiet


  • The SQL developer gap (Warning: rant coming up)

    I recently submitted a bug to Microsoft Connect concerning the behaviour of the expression language in SSRS, if you’re interested you can read about it here: DateAdd() doesn't throw an error when it has invalid parameters.

    I got a swift and detailed reply (for which I am grateful) from a fellow on the SSRS team however the content of it wasn’t exactly music to my ears:

    While I do agree it would be desirable to have an error here, the SSRS engine makes use of the VB code engine for functions such as this. So, the call to DateAdd is just a passthrough as far as SSRS is concerned. The majority of the functions available in SSRS are actually just VB ones.

    In other words the inequities of SSRS’s expression language are down to the frailties of a 14 year old scripting language that should have been thrown out long ago on the coattails of Crystal Reports. Am I the only SQL Server BI developer that feels a little … oh I don’t know … unloved sometimes by the various BI teams in SQL Server land. Let’s have a look at some of the things that we have to put up with:

    • two different expression languages (one for SSIS, one for SSRS), neither of which support comments as far as I am aware, neither of which have any debugging features and neither of which are extensible
    • an expression language in SSRS that is based on a dead scripting language and rarely produces meaningful errors
    • an expression editor in SSIS for which there isn’t even context-sensitive help let alone intellisense
    • an MDX query editor in SSMS that doesn’t have intellisense
    • a script task editor in SSIS that shows read-only system variables in the ReadWriteVariables dropdown
    • I could go on…

    None of these issues are particularly prohibitive on their own but when you add them all up (along with many others) I’ve no doubt that we could all be a heck of a lot more productive were they all solved.

    I look around at our cousins in .Net land and I see their fancy WPF code editors, extensible languages, Javascript Intellisense, code navigation shortcuts, mockable code, code style checkers, MEF-compliant IDEs, type-inferencing, 3rd-party code editors, keyboard shortcuts, modern type-safe compilers, code generation templates, meaningful error messages, proper debuggers, runtime watches, collapsable code regions, IDEs with built-in refactoring support, built-in code formatting, code documentation tools, whitespace highlighting, code performance analysis tools, LINQ, fluent this-that-and-the-other etc…  and I can’t help but feeling a little hard done by. (Anyone that has used Resharper for .Net will know exactly what I mean.) Am I the only SQL guy that feels like a second class citizen in the Microsoft developer ecosystem?

    I must be fair and admit that we have it a lot better than other database platform developers (if you have ever used any of the plethora of Oracle developer tools out there you will know what I mean) but nonetheless it would be nice to see a little bit of love go the way of us lowly SQL bods don’t you think?

    OK, rant over. My apologies if you came here hoping for some words of wisdom but I’ve been wanting to get this off my chest for ages; normal service will be resumed forthwith!

    @Jamiet

    P.S. A valid response to this from the SQL Server team may be “submit your ideas to Connect”. I’d like it to be known that the majority of things I have listed above have already been submitted to Connect in some way shape or form and I would be happy to supply a list if one were required!

    I should also apologise to the SSRS team for making an example of them – the submission highlighted above is hardly the most annoying of problems listed here but it just happened to be the one that pushed me over the tipping point to writing this!


  • MERGE and OUTPUT – the swiss army knife of T-SQL

    The new T-SQL MERGE statement in SQL Server 2008 seems to be finding many uses over and above its de facto UPSERT usage scenario, probably the most popular of which is its ability to update one table from another in an ANSI-compliant manner as detailed expertly by Hugo Kornelius in his blog post Let's deprecate UPDATE FROM!

    Today I stumbled upon a different use for it, returning values using an OUTPUT clause from a table used as the source of data for an insertion. In other words, if I’m inserting from [tableA] into [tableB] then I may want some values from [tableA] after the fact, particularly if [tableB] has an identity. Observe my first attempt using a straight insertion where I am trying to get a field from @source.[id] that is not used in the insertion:

    DECLARE @source TABLE (
    
      
    [id] INT PRIMARY KEY
    [name] VARCHAR(10)
    );
    INSERT @source VALUES(1000,'Harold'),(2000,'Madge');
    DECLARE @destination TABLE (
      
    [id] INT PRIMARY KEY IDENTITY(1,1)
    NAME VARCHAR(10)
    );

    INSERT @destination ([name])
    OUTPUT INSERTED.[id] AS NEWID,[source].[id] AS OldId
    SELECT [name]
    FROM   @source;

    This failed with the error:

    image

    The reason this fails is that the INSERT statement has no FROM clause thus the OUTPUT clause can never know about where the data is sourced from; the FROM clause in the SELECT that is used to source the data for the insertion is not actually part of the INSERT statement. Let’s try this with a MERGE instead:

    DECLARE @source TABLE (
    
      
    [id] INT PRIMARY KEY
    [name] VARCHAR(10)
    );
    INSERT @source VALUES(1000,'Harold'),(2000,'Madge');
    DECLARE @destination TABLE (
      
    [id] INT PRIMARY KEY IDENTITY(1,1)
    NAME VARCHAR(10)
    );

    MERGE  @destination
    USING  (SELECT [id], [name] FROM @source) AS [source]
    ON     (1=0) --arbitrary join condition
    WHEN   NOT MATCHED THEN
           INSERT
    (name)
          
    VALUES  (source.Name)
          
    OUTPUT  INSERTED.id AS NEWID,[source].[id] AS OldId,INSERTED.name;

    This worked!

    image

    Notice how we are able to reference the source of our data in the OUTPUT clause of of our MERGE statement which we couldn’t do when using INSERT. Pretty cool huh!

    I remembered while writing this that Adam Machanic has already documented this capability in his blog post Dr. OUTPUT or: How I Learned to Stop Worrying and Love the MERGE and he has covered it in a lot more detail than I have here but still, I see no harm in reiterating and the code above is an easily executable demo of this cool little feature.

    @Jamiet 

    N.B. Thanks to Aaron Bertrand for his instructions here on producing well-formatted T-SQL code for blogs! I’m putting this comment here so I know where to find it next time!


  • Dimension cubes

    This is a quickfire blog post about a feature in Analysis Services that I have learnt about recently – Dimension Cubes. As of Analysis Services 2005 every single dimension in an Analysis Server database implicitly has a dimension cube created for it. Dimension cubes enable you to get around the limitation that every MDX query must have a FROM clause that references a cube and thus enable you to query over a single dimension without worrying what cube it is in. The name of a dimension cube is the same as that of the dimension itself but with a dollar sign pre-pended to it.

    Did that make any sense? Perhaps rather than try and explain what a dimension cube is I could probably just show you instead, I’ll use the [Adventure Works DW Standard Edition] sample database in SQL Server Analysis Services 2008 as an example. The following query returns all members of the [Product] level in the [Product Categories] hierarchy in the [Product] dimension:

    select  [Product].[Product Categories].[Product].members on columns
    from    [$Product]

    And here’s the result:

    image

    Notice how we are not querying over a regular cube but we are still able to interrogate the members of a dimension; I find this to be a really useful feature, especially when building reports. Still, I  would rather the members were arranged in a list vertically rather than horizontally but that doesn’t seem possible because a dimension cube by definition only has one dimension and the first axis referenced in an MDX query MUST be ‘columns’. Well, whoever came up with the notion of dimension cubes obviously thought of that because every dimension cube also contains a single measure that has the same name as the dimension itself. Hence we can write this:

    select   [Product].[Product Categories].[Product].members on rows
    ,        [Measures].[Product] on columns
    from    [$Product]

    and see our dimension members in a much more readable format:

    image

    It transpires that we can also use an empty set on the ‘columns’ axis in order to achieve the same thing:

    select   [Product].[Product Categories].[Product].members on rows
    ,        {} on columns
    from    [$Product]

    I’m sure many people reading this already knew about dimension cubes but until recently I did not and hence I assume others who might find them useful might not know about them either – hence this blog post. Did I miss anything?

    @Jamiet


  • Standard Point-in-time and time-interval representations

    One requirement in any database implementation that I have ever worked on is that the notion of a point-in-time has to be represented in some way. The tool that I use on a regular basis, SQL Server, provides numerous datatypes that aid in the representation of a point-in-time and I’m sure that most people reading this will be au fait with them. They are:

    Read more at Date and Time Data Types and Functions (Transact-SQL).

    date and time data types transact-sql

    I suspect (though stand to be corrected) that the accuracy with which people wish to record a point-in-time is usually to the nearest day or the nearest second however until very recently neither of these accuracies was specifically catered for with a dedicated datatype. Up until the introduction of the date datatype in SQL Server 2008 datetime was the datatype that most would use to record a point-in-time however the accuracy of datetime is neither day nor second, it is 0.00333 seconds; smalldatetime could be used but that also does not have accuracy to the day or second, it is to the minute.

    The point I’m trying to make is not that people are using the wrong datatypes to store their data, nor that everyone should just use the most accurate point-in-time datatype (that being datetime2 which is accurate to 100 nanoseconds). No, my point is that there are many different situations which have a need to store a point-in-time and SQL Server does not provide datatypes to cater for all of them. That is what prompted me to raise a Connect submission entitled [T-SQL] Month & Quarter datatype where I asked for datatypes that represent a Month or a Quarter. Simon Sabin commented that he also wanted datatypes to represent a time interval (and Aaron Bertrand pointed out that a Connect submission already exists for that) whilst Rob Farley ran with my idea and suggested that we should have week, hour, semester datatypes as well. There are some good justifications on the submission for why such datatypes are required so if I have failed to convince you here feel free to click through and take a read.

     

    The point of this blog post though is not to drum up support for my Connect submission (though that would be nice), instead it is to make reference to the reply I received from a gentleman on the SQL Server team going by the name ‘Jim’. Jim pointed me to an article on Wikipedia which talks about the ISO 8601 standard that provides an agreed standard for the representation and exchange of date and time related data.

    I previously assumed that the only standard date/time formats were something like YYYYMMDD &  YYYYMMDDTHH24:MI:SS but not so, ISO 8601 provides representations for centuries, years, months, weeks, ordinal dates, hours, minutes, seconds, fractions of time units, timezones and time intervals (but not quarter, unfortunately). I won’t go into much detail about the standard as you can go and read about it for yourself (and I recommend the wikipedia article as essential reading for any data professional, particularly the General Principles) but for the benefit of both myself and others I have provided below a summary of all the point-in-time representations that the wikipedia article mentions; and fascinating it is too (to me anyway). Take the time to have a read, you may be enlightened:

    Valid unambiguous representation Description
    21 21st Century
    1981 Year 1981
    2004-05 (but not 200405) 5th month of the year 2004
    20090106
    2009-01-06
    6th day of the 1st month of the year 2009
    2006W34
    2006-W34
    34th week of the year 2006
    2006W343
    2006-W34-3
    3rd day of the 34th week of the year 2006
    1981095
    1981-095
    95th day of the year 1981 (better known as 1981-04-05)
    21 21st hour of a day
    2135
    21:35
    35th minute of the 21st hour of a day
    213507
    21:35:07
    7th second of the 35th minute of the 21st hour of a day
    0000
    00:00
    2400
    24:00
    Midnight
    1430.5
    14:30.5
    14 hours, 30 minutes and one half minutes
    143050.25
    14:30:50.25
    14 hours, 30 minutes, 50 seconds and one quarter second
    0930Z
    09:30Z
    9 hours and 30 minutes coordinated universal time (aka UTC time aka Zulu time)
    +0100
    +01:00
    1 hour ahead of UTC
    1830Z
    22:30+04
    1130-0700
    15:00-03:30
    18 hours and 30 minutes UTC

    (the point being that these all represent the same point-in-time)
    2007-04-05T14:30 30th minute of the 14th hour of the 5th day of the 4th month of the year 2007
    (“T” is the standard separator between date and time)
    2007-04-05T24:00
    2007-04-06T00:00
    Midnight inbetween the 5th & 6th days of the 4th month of the year 2007

    Some things to note:

    • Some representations are overloaded (e.g. 21 can mean the 21st century or 21st hour of the day) so context can be a factor.
    • The week notation is still prone to ambiguities; when does week 1 start? More pertinently, when does week 2 start? Apparently the ISO defines a standard for defining this but it isn’t adhered to as stringently as ISO 8601.
    • The week notation can also be quite confusing, if the ISO directives are followed to the letter then “2009-W53-7” represents the 3rd day of the first month of 2010 (read the wikipedia article to understand why).

    The article also provides some standard representations of durations:

    Valid unambiguous representation Description
    P3Y A period of 3 years
    P6M A period of 6 months
    P4D A period of 4 days
    P12H A period of 12 hours
    PT6M A period of 6 minutes (note the T designation)
    P5S A period of 5 seconds
    P3Y6M4DT12H30M5S A period of three years, six months, four days, twelve hours, thirty minutes, and five seconds
    P23DT23H A period of 23 days and 23 hours

    and time intervals:

    Valid unambiguous representation Description
    2007-03-01T00:00:00Z/2008-05-11T15:30:00Z The interval between:
    • midnight (UTC) at the start of the first day of the 3rd month of the year 2007
    • the 30th minute of the 15th hour (UTC) of the 11th day of the 5th month of the year 2008
    2007-03-01T13:00:00Z/P1Y2M10DT2H30M The interval between:
    • the 13th hour (UTC) of the first day of the 3rd month of the year 2007
    • the point-in-time exactly 1 year, 2 months, 10 days, 2 hours and 30 minutes later

    One last interesting point of trivia, the second edition of ISO 8601 (published in 2000) allowed for 2-digit dates whereas the third (and at the time of writing, current) version (published in 2004) does not. Hmmm… did someone say Y2k?


    OK, time to wrap up; why am I writing this? Well, unambiguous representations of data are of paramount importance especially in the distributed heterogeneous XML/JSONified world of data exchange that we live in today so a good understanding of the standards that define those representations should be a pre-requisite to working in that arena. I hope this blog post has highlighted the fact that these standards exist and that you feel compelled to adhere to them in your future work. If you want to understand how damaging it can be to veer from the unambiguous formats then take a read of my recent blog post Unambiguous date formats.

    Oh, and if you want to vote for a month datatype in SQL Server, be my guest :)

    @Jamiet


  • Backing up Azure Table Storage

    I read with interest today a blog post from Yaron Goland entitled Do I need to backup/journal my Windows Azure Table Store?
    in which he spoke about the need to do exactly that i.e. Backup any data that you put into Windows Azure Table storage.

    Yaron's assertion was not that we need to protect ourselves from Azure failures, instead he asserted that we have a need to protect ourselves from ourselves. He outlines three classes of mistakes that developers of apps that utilise Azure Table storage may  (nay, will) make:

    Yaron makes an important point; data backups don't just protect against hardware failure, they protect against application failure too. This has been true for years and there is nothing special about this new class of distributed, geo-redundant, super-resilient data storage services that make our tried-and-tested procedures any the less important hence the message is simple:

    make sure your data is recoverable no matter what storage mechanism you are using

     

    Having said all that cloud storage services such as Windows Azure Table storage do present somewhat unique opportunities for protecting us against failures. Yaron proposed what he termed a "journal" system which captures all PUT, POST & DELETE operations against a service:

    I would like to have a journal that records every command issued against the system ... then when I find out about a data logic corruption bug caused by my front end I could at least try to figure out which of my users was likely to be affected by reviewing the journal

    The very nature of Windows Azure Table storage (i.e. CRUD operations against a simple partition key/row key scheme) means that such a journaling system could (and argubly should) be provided by Windows Azure itself rather than being something that the application developer has to build themselves. The service could also provide the ability to "replay" all operations against a given partion key/row key since a specified point in time.

    I have no idea whether the Windows Azure team or indeed any other cloud storage provider has contemplated offering such a service but I for one believe that it would be very valuable. What say you?

    @Jamiet


  • OData gunning for ubiquity across Microsoft products

    There was lots of news that came out of Microsoft’s recent Professional Developer Conference (PDC) event however one item that may have slipped under the radar of many is that the data format specification for WCF Data Services (previously known as ADO.Net Data Services aka Astoria) has been proposed as an open standard for data exchange across the web. Its name? The Open Data Protocol, or OData for short. From the OData website:

    The Open Data Protocol (OData) is a web protocol for querying and updating data. OData applies web technologies such as HTTP, Atom Publishing Protocol (AtomPub) and JSON to provide access to information from a variety of applications, services, and stores.

    You can read more at Open Data Protocol on MSDN.

    [It should be noted that WCF Data Services and OData are not the same thing. My interpretation is that WCF Data Services is a technology that allows data to be surfaced as an OData-compliant service.]


    Microsoft’s desire to get OData accepted as a universal data access protocol should not come as a surprise, I view OData as the next evolution in a journey that can count ODBC and OLE DB as significant milestones:

    1. According to wikipedia Microsoft created ODBC back in 1992 as a standard for accessing relational database management systems.
    2. OLE DB was the next evolution which provided a standard for accessing both relational and non-relational data sources in a uniform manner.
    3. Now we have OData which is an attempt to provide a uniform representation for any data that can be surfaced over that which is becoming the lowest common denominator for data exchange - HTTP.

    I’ve been following the evolution of Astoria/ADO.Net Data Services/WCF Data Services for quite some time now, even back as far as when an influential team within Microsoft were probably going to reject AtomPub (according to Dare Obasanjo) and instead use their own proprietary alternative (according to Yaron Goland). Back in March 2008 I noted with interest an announcement by George Moore on the Live Services development blog that ADO.Net Data Services (as it was then known) would form an integral part of Microsoft’s stack of data access protocols and technologies across all data storage technologies:

    The complete storage and developer tools stack revealed at MIX08 looks like this (described top to bottom). You are free to utilize this stack at any level of abstraction – there are no requirements to use all layers, and you are free to substitute your own developer tools against any layer

    Area Product, Library or Protocol
    4: Synchronization infrastructure: "Astoria Offline"
    Microsoft Sync Framework
    Feedsync AtomPub extensions
    3: Developer tools: ADO.NET Data Services
    .NET WCF Syndication libraries
    AtomPub URI namespace conventions
    2: Protocols: AtomPub
    Atom
    1: Underlying Products and Services: On premises: SQL Server
    Structured Cloud Storage: SQL Server Data Services
    Live services: Spaces Photos and Application Data Storage

    A Unified Standards-Based Protocols and Tooling Platform for Storage from Microsoft

    21 months later and we are really starting to see the fruits of those labours come to bear. Sharepoint 2010 Lists, SQL Server Reporting Services 2008 R2, Excel 2010 (through Powerpivot), Project “Dallas” & Windows Azure are all either an OData producer or consumer and we are promised that there are more to come. Microsoft have received deserved criticism down the years for different teams building competing technologies yet here we witness a quiet unification around a common data access stack and that can only be a good thing. Do I sense the impact of Ray Ozzie here?


    This has been a short commentary on the evolution of the OData protocol, any comments would be much appreciated.

    @Jamiet

    Further reading (some of which are linked to above):


  • Fun and games with Reporting Services expressions

    I have today been messing about with Reporting Services' expression language and as such have learnt a few things that I figured might be worth sharing.

    The report that I have been attempting to build has two parameters that define the effective reporting period for the report, effectively they are arguments in a WHERE clause:

    20091210params

    The business rule that I had to implement for these parameters was:

    The effective reporting period is the most recently completed month

    In other words, I had to determine the first day and last day of the previous month which for me (as I was trying to do this on 8th December 2009) was 1st November 2009 and 30th November 2009.

    How would you have gone about this? Here was my first attempt at determining "From Effective Date" (i.e. 1st November 2009) based on today's date:

    =CDate(
        CStr(
            Year(DateAdd("mm", -1, Now())) * 10000 +
            Month(DateAdd("mm", -1, Now())) * 100 +
            1
        )
    )

    You can probably decipher my logic here. Subtract one month from today's date and use the first day of the month of the result. Sound logic but unfortunately it failed with a totally unhelpful error message :

    "An error occurred during local report processing. <parameter name>"

    After investigation it seems as though the CDate() function does not like strings in the format YYYYMMDD so here was my next attempt:

    =CDate(
        CStr(Year(DateAdd(DateInterval.Month,-1,Now()))) + "-" +
        RIGHT("0" + CStr(Month(DateAdd(DateInterval.Month,-1,Now()))), 2) +
        "-01"
    )

    This one worked fine and I was going to stick with it until Summitcloud suggested using the DateSerial(...) function instead like so:

    =DateSerial(Year(DateAdd(DateInterval.Month,-1, Now())), Month(DateAdd(DateInterval.Month,-1, Now())), 1)

    That's a much nicer method in my opinion, no hacky string manipulation going on here, only three numeric arguments (which I have highlighted in different colours) to define year/month/day thus I changed to use DateSerial(…). Happy with that I turned to working out my "To Effective Date" using the same DateSerial() function. This time my logic was to get the first day of the current month and then subtract one day from it:

    =DateAdd(DateInterval.Day, -1, DateSerial(Year(Now()), Month(Now()), 1))

    Worked a treat!

    So I guess the lesson here is to make sure you know all the tools in your toolbox and what they all do. Don't use pliers (string manipulation) to tighten a nut when you already have a spanner(DateSerial)! (Does that analogy work? Maybe not!! :)

    @Jamiet


  • Unambiguous date formats : T-SQL Tuesday #001

    One of the most commonly used data types in SQL Server is [datetime] which unfortunately has some vagaries around how values get casted. A typical method for defining a [datetime] literal is to write it as a character string and then cast it appropriately. The cast syntax looks something like this:

    DECLARE @dt NVARCHAR(19) = '2009-12-08 18:00:00';
    
    SELECT CAST(@dt AS datetime);

    Unfortunately in SQL Server 2005 the result of the cast operation may be dependent on your current language setting. You can discover your current language setting by executing:

    SELECT @@LANGUAGE

    To demonstrate how your language setting can influence the results of a cast take a look at the following code:

    ALTER  DATABASE tempdb
    
    SET        COMPATIBILITY_LEVEL = 90 ; --Behave like SQL Server 2005
    USE tempdb
    GO
    DECLARE @t TABLE (
         
    dateString  NVARCHAR(19)
    );
    INSERT @t (dateString)
    VALUES ('2009-12-08 18:00:00') --'yyyy-MM-dd hh24:mi:ss'
    ,      ('2009-12-08T18:00:00') --'yyyy-MM-ddThh24:mi:ss'
    ,      ('20091208 18:00:00')   --'yyyyMMdd hh24:mi:ss'

    SET        LANGUAGE french;
    SELECT 'french' AS lang
    ,      DATENAME(MONTH,q.[dt]) AS mnth
    ,      q.[dt]
    FROM   (
          
    SELECT  CAST(dateString AS DATETIME) AS dt
          
    FROM    @t
          
    )q;
    SET        LANGUAGE us_english;
    SELECT 'us_english' AS lang
    ,      DATENAME(MONTH,q.[dt]) AS mnth
    ,      q.[dt]
    FROM   (
          
    SELECT  CAST(dateString AS DATETIME) AS dt
          
    FROM    @t
          
    )q;

    We are taking the value which can be described in words as “6pm on 8th December 2009”, defining it in three different ways, then seeing how the @@LANGUAGE setting can affect the results. Here are those results:

    french language datetime

    Notice how the interpretation of the month can change depending on @@LANGUAGE. If @@LANGUAGE=’french’ then the string '2009-12-08 18:00:00' is interpreted as 12th August 2009 (‘août’ is French for August for those that don’t know) whereas if @@LANGUAGE=’us_english’ it is interpreted as 8th December 2009.

    Clearly this is a problem because the results of our queries have a dependency on a server-level or connection-level setting and that is NOT a good thing. Hence I recommend that you only define [datetime] literals in one of the two unambiguous date formats:

    • yyyy-MM-ddTHH24:mi:ss
    • yyyyMMdd HH24:mi:ss

    That was going to be the end of this blog post but then I found out that this behaviour changed slightly in SQL Server 2008. Take the following code (see if you can figure out what the results will be before I tell you):

    ALTER  DATABASE tempdb
    
    SET        COMPATIBILITY_LEVEL = 100 ; --Behave like SQL Server 2008
    GO
    USE tempdb
    GO
    SET        LANGUAGE french;
    DECLARE    @dt NCHAR(10) = '2009-12-08 18:00:00'; --Ambiguous date format
    SELECT CAST(@dt AS datetime) AS [ExplicitCast]
    ,      DATENAME(MONTH,@dt) AS [MonthFromImplicitCast]
    ,      DATENAME(MONTH,CAST(@dt AS datetime)) AS [MonthFromExplicitCast];

    Here we are doing three different things with our nchar literal:

    • explicitly cast it as a [datetime]
    • extract the month name from the char literal using the DATENAME function (which results in an under-the-covers implicit cast)
    • extract the month name from the char literal using the DATENAME function after it has been explicitly casted as a [datetime]

    Note that the compatibility level is set to SQL Server 2008 and @@LANGUAGE=’french’. Here are the results:

    image

    (Were you correct?)

    Let’s take a look at what is happening here. The behaviour when we are explicitly casting as [datetime] hasn’t changed, our nchar literal is still getting interpreted as 12th August rather than 8th December when @@LANGUAGE=’french’. The [MonthFromExplicitCast] field is interesting though, it seems as though the implicit cast has resulted in the desired value of 8th December. Why is that?

    To get the answer we can turn to BOL’s description of the DATENAME function syntax:

    image

    The implicit cast is not casting to [datetime] at all, it is actually casting to [date] which is a new datatype in SQL Server 2008. The new date-related datatypes in SQL Server 2008 (i.e. [date], [datetime2], [time], [datetimeoffset]) disregard @@LANGUAGE and hence we get behaviour that is more predictable and, frankly, better.

    These new behaviours for SQL Server 2008 were unknown to me when I began this blog post so I have learnt something in the course of authoring it, I hope it has helped you too. No doubt someone somewhere is going to get nastily burnt by this at some point, make sure that it isn’t you by always using unambiguous date formats:

    • yyyy-MM-ddTHH24:mi:ss
    • yyyyMMdd HH24:mi:ss

    regardless of which version you are on!

    Don’t forget to check out other T-SQL Tuesday blog posts, see Adam Machanic’s blog post Invitation to Participate in T-SQL Tuesday #001: Date/Time Tricks for details!

    @Jamiet


  • Project Houston

    Of late I’ve started to wonder about the direction that Microsoft may take their various development tools in the future. There has been an obvious move toward embracing the open source development community, witness the presence of open source advocate Matt Mullenweg on stage at the recent PDC09 keynote as proof of that. Also observe the obvious move to embracing the cloud as evidenced through the introduction of Azure and SQL Azure.

    With all this going on though one thing struck me, Microsoft still have a need to keep the lights on and to that end one big way that they make money is by selling licenses for their development tools such as Visual Studio and SQL Server Management Studio and moreover one requires a license for Windows in order to run those tools. So no matter how much they say SQL Azure is open to none-Microsoft development shops (which indeed it is) you’re still pretty much reliant on some Microsoft software running on your laptop in order to make best use of it.

    I started to wonder when this situation would change; more specifically I started to wonder when the development tools that we use would also become cloud-based. After all, if we’re using cloud-based services does it not make sense to have cloud-based tools that work with them? I think it does. I noted with interest then that the next version of Visual Studio (aka Visual Studio 2010) is built using Windows Presentation Foundation (WPF). WPF is very closely related to Silverlight and hence I wonder if and when Microsoft will supply a version of their development tools that run purely in Silverlight thus opening up their development experience to a lot more of the open-source community that invariably choose not to develop on Windows.

     

    I’ve had these thoughts rolling around my head for a while but I haven’t wasted too many brain cells on discussing them or writing them down until today when I caught a glimpse of something that sparked my interest and then prompted this blog post. I was watching a session video from PDC entitled The Future of database development with SQL Azure by SQL Azure Program Manager David Robinson when he made mention of a pre-alpha project that they are working on called Project Houston. In a nutshell Project Houston is an attempt to build a Silverlight based development tool for SQL Azure. Here is a screenshot (apologies for the quality, this is a screenshot taken from a video):

    image

    Compared to SQL Server Management Studio (SSMS) there isn’t too much going on here although we can see the beginnings of an Object Explorer plus later on in the video David demonstrates a T-SQL query window that behaves pretty much like the query window in SSMS today. Davis did stress that the tool isn’t even at the alpha stage as yet let alone beta but nevertheless we should expect to see a v1 release sometime in 2010. I also found it interesting that they seem to have adopted the “backstage” UI from Microsoft Office 2010 (see the red “File” box in the upper left hand corner as evidence of that).

    There’s not much else to say about this right now. I’m writing this blog post because Project Houston was something that caught my attention and I thought others might also find it interesting. It is also the first incarnation of a Microsoft development tool that does not have a reliance on Windows and that in itself could be the start of a seismic shift in the way we develop for the Microsoft ecosystem.

    If you want to learn more about Project Houston then fast forward to 34m30s in David’s video, it lasts for about 4m30s.

    I’d love to read other people’s first impressions about this. Please feel free to write your thoughts in the comments section below.

    @Jamiet


  • TwitterCache now hosted on SQL Azure

    Earlier today Brent Ozar blogged about how he had been archiving tweets from various people that he follows into a SQL Server database.

    image

    He made a backup of that database available for download on his blog so that others could download it and have a play of their own. I thought that rather than have all those people that wanted to party on the data download and restore the database for themselves it might be fun (and prudent) to stick it somewhere where anyone could get it so (with Brent’s permission) I’ve hosted the database up on SQL Azure.

    Here are the credentials that you’ll need if you want to connect:

    server name lx49ykb7y5.database.windows.net
    username ro
    password r3@d0nly
    database name brentotweets

    You can connect using SQL Server Management Studio although if you’re not using the November 2009 community technology preview (CTP) of SQL Server 2008 R2 there are some hoops you need to jump through in order to do that. First thing to know is that you can’t connect the Object Explorer, only a query window. So, after hitting the “New Query” button here’s what you need to do:

    Firstly, enter the details as shown in this screenshot:

    image

    and then hit “Options”. On the options screen you need to enter the name of the database that you need to connect to which, as I mentioned above, is [brentotweets]:

    image

    Then hit “Connect”. You’ll probably get the following error about not being able to get to sys.configurations. Don’t worry about it, you can safely click OK and ignore it:

    image

    Once you’re in you can hit the database just like you would any other SQL Server database:

    image

    SQL Azure requires that you specify a range of IPs that can connect to the server and I’ve made it as wide as possible so I’m assuming it should be OK:

    image

    Have fun! Note that this won’t be available forever because at some point I’m going to have to start paying for it, you’ll be good for the next few weeks though.

    @Jamiet

     

     

    P.S. For those that are interested I pumped all the data up there using SSIS.

    image

    It took 41m 41s over my home broadband line from a Virtual PC guest. Bear in mind that I’m in London and the server is I believe somewhere in Texas, US. According to speedtest.net I’m getting 4.67Mbps up and 0.57Mbps down:

    image

     

     

     

     

     

    (Note to self: Get new ISP. Or move house!]


  • Why you shouldn't open .csv files in Excel

    1. Open up Notepad
    2. Type the following:
    3. a,"This is some text"
      b,This is some more text

    4. Save the file as "test.csv" 
    5. Open test.csv in Excel
    6. Without changing anything hit CTRL-S
    7. When prompted to keep the workbook in csv format say 'yes'
    8. Close Excel
    9. Open test.csv in Notepad
    Are the contents of the file the same as what you originally wrote? No? That's why you shouldn't open .csv files in Excel!

More Posts Next page »
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement