<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Search results matching tag 'London Datastore'</title><link>http://sqlblog.com/search/SearchResults.aspx?o=DateDescending&amp;tag=London+Datastore&amp;orTags=0</link><description>Search results matching tag 'London Datastore'</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>Exploring earnings data for the UK [Open Data]</title><link>http://sqlblog.com/blogs/jamie_thomson/archive/2013/02/11/exploring-earnings-data-for-the-uk-open-data.aspx</link><pubDate>Mon, 11 Feb 2013 13:34:02 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:47637</guid><dc:creator>jamiet</dc:creator><description>&lt;p&gt;I have a burgeoning interest in the world of Open Data which wikipedia describes as:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;em&gt;&lt;b&gt;Open data&lt;/b&gt; is the idea that certain &lt;/em&gt;&lt;a href="http://en.wikipedia.org/wiki/Data"&gt;&lt;em&gt;data&lt;/em&gt;&lt;/a&gt;&lt;em&gt; should be freely available to everyone to use and republish as they wish, without restrictions from &lt;/em&gt;&lt;a href="http://en.wikipedia.org/wiki/Copyright"&gt;&lt;em&gt;copyright&lt;/em&gt;&lt;/a&gt;&lt;em&gt;, &lt;/em&gt;&lt;a href="http://en.wikipedia.org/wiki/Patent"&gt;&lt;em&gt;patents&lt;/em&gt;&lt;/a&gt;&lt;em&gt; or other mechanisms of control. The goals of the open data movement are similar to those of other &amp;quot;Open&amp;quot; movements such as &lt;/em&gt;&lt;a href="http://en.wikipedia.org/wiki/Open_source"&gt;&lt;em&gt;open source&lt;/em&gt;&lt;/a&gt;&lt;em&gt;, &lt;/em&gt;&lt;a href="http://en.wikipedia.org/wiki/Open_hardware"&gt;&lt;em&gt;open hardware&lt;/em&gt;&lt;/a&gt;&lt;em&gt;, &lt;/em&gt;&lt;a href="http://en.wikipedia.org/wiki/Open_content"&gt;&lt;em&gt;open content&lt;/em&gt;&lt;/a&gt;&lt;em&gt;, and &lt;/em&gt;&lt;a href="http://en.wikipedia.org/wiki/Open_access_(publishing)"&gt;&lt;em&gt;open access&lt;/em&gt;&lt;/a&gt;&lt;em&gt;. The philosophy behind open data has been long established&lt;/em&gt;&lt;em&gt;, but the term &amp;quot;open data&amp;quot; itself is recent, gaining popularity with the rise of the &lt;/em&gt;&lt;a href="http://en.wikipedia.org/wiki/Internet"&gt;&lt;em&gt;Internet&lt;/em&gt;&lt;/a&gt;&lt;em&gt; and &lt;/em&gt;&lt;a href="http://en.wikipedia.org/wiki/World_Wide_Web"&gt;&lt;em&gt;World Wide Web&lt;/em&gt;&lt;/a&gt;&lt;em&gt; and, especially, with the launch of open-data government initiatives such as &lt;/em&gt;&lt;a href="http://en.wikipedia.org/wiki/Data.gov"&gt;&lt;em&gt;Data.gov&lt;/em&gt;&lt;/a&gt;&lt;em&gt;.        &lt;br /&gt;&lt;/em&gt;&lt;a title="http://en.wikipedia.org/wiki/Open_data" href="http://en.wikipedia.org/wiki/Open_data"&gt;&lt;em&gt;http://en.wikipedia.org/wiki/Open_data&lt;/em&gt;&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;To that end I follow &lt;a href="https://twitter.com/londondatastore" target="_blank"&gt;@LondonDatastore&lt;/a&gt; on Twitter as they are actively publishing Open Data pertaining to the city in which I live, London. Four days ago they &lt;a href="https://twitter.com/londondatastore/status/299543409553711104" target="_blank"&gt;announced&lt;/a&gt; that a new dataset had been released that provided earnings information, not just for London but for the whole country and going back many years too. The provided link, &lt;a title="http://data.london.gov.uk/datastore/package/earnings-workplace-borough" href="http://data.london.gov.uk/datastore/package/earnings-workplace-borough"&gt;http://data.london.gov.uk/datastore/package/earnings-workplace-borough&lt;/a&gt;, brings up a page from where one can download an Excel workbook containing some data. Unfortunately the data in that workbook is not, in my opinion, provided in such a manner that makes it easily explorable (&lt;a href="http://sqlblog.com/blogs/jamie_thomson/archive/2012/08/13/using-pivot-tables-in-the-office-excel-web-app.aspx" target="_blank"&gt;which I have complained about before on this blog&lt;/a&gt;); the data is spread over multiple worksheets:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_3BDA8261.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_5B1D2934.png" width="383" height="37" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Worse, the data is already aggregated and pivoted:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_330A7A15.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_66663D71.png" width="577" height="138" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;In other words the workbook does not contain the raw row-level data from which this pivoted data is produced. Thankfully a link (&lt;a title="https://www.nomisweb.co.uk/query/construct/summary.asp?mode=construct&amp;amp;version=0&amp;amp;dataset=99" href="https://www.nomisweb.co.uk/query/construct/summary.asp?mode=construct&amp;amp;version=0&amp;amp;dataset=99"&gt;https://www.nomisweb.co.uk/query/construct/summary.asp?mode=construct&amp;amp;version=0&amp;amp;dataset=99&lt;/a&gt;) is provided from where the raw data can be downloaded. I found the interface there to be slightly clunky but that’s a minor quibble – that raw data is available should be considered a major boon. I downloaded earnings data for:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Local Authority &lt;/li&gt;    &lt;li&gt;Gender &lt;/li&gt;    &lt;li&gt;Full Time or Part Time &lt;/li&gt;    &lt;li&gt;Year (1999-2012) &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;and have made that raw data available in a publicly accessible Excel workbook. You can view that workbook online (only a browser required) at &lt;a href="http://sdrv.ms/VPvjcD" target="_blank"&gt;AnnualPay_by_YearGenderLocalAuthorityPartorfulltime.xlsx&lt;/a&gt; (from there you can also download for your own analysis).&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_6521A492.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_6E9A6308.png" width="634" height="372" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The workbook also provides summaries over the raw data by way of pivot tables and charts. Arguably its clear to see, for example, that a gender imbalance exists although perhaps the gap may be lessening somewhat&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_088E232B.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_112E7BB7.png" width="904" height="470" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Its interesting to note that the average salary for Males dropped off in 2009/2010. Perhaps the economic events of 2008 are the cause for that, checking out data for only City of London (the traditional financial hub of the UK) suggests that may well be the case:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_5666D2E0.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_662667E4.png" width="901" height="441" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;There will be many more nuggets of information available in this data, all we need to do is set the data free so that people can find it for themselves. That is the aim of this blog post so hit the link: &lt;a title="http://sdrv.ms/VPvjcD" href="http://sdrv.ms/VPvjcD" target="_blank"&gt;AnnualPay_by_YearGenderLocalAuthorityPartorfulltime.xlsx&lt;/a&gt; and see what other nuggets you can find! If &lt;a href="https://twitter.com/search?q=dataviz" target="_blank"&gt;dataviz&lt;/a&gt; is an area that interests you then this is a cracking dataset to explore!&lt;/p&gt;  &lt;hr /&gt;  &lt;p&gt;One other point I want to make is that the raw data is provided as a mean average which means that each aggregated figure is a mean average of some mean averages. This isn’t good as it distorts the data as I demonstrate with this simple example:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_0C1E8DDA.png"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_0B4627F0.png" width="384" height="82" /&gt;&lt;/a&gt;&lt;/p&gt;    &lt;p&gt;We have an average salary for both male and female (5000 &amp;amp; 9000 respectively) and the average of those two averages is 7000. However if we take the total “total salary” / total “tally of people asked” (127000 / 23) then the average is quite different – 5521.73.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;That’s a large discrepancy even for only two rows of data and it highlights the problem of providing averages rather than the figures that created those averages. To me this is a data quality issue – the raw source data does not provide the requisite level of detail to enable accurate analysis. Quality of data is paramount.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://twitter.com/jamiet" target="_blank"&gt;@Jamiet&lt;/a&gt;&lt;/p&gt;</description></item><item><title>Using Pivot tables in the Office Excel Web App</title><link>http://sqlblog.com/blogs/jamie_thomson/archive/2012/08/13/using-pivot-tables-in-the-office-excel-web-app.aspx</link><pubDate>Mon, 13 Aug 2012 13:13:12 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:44680</guid><dc:creator>jamiet</dc:creator><description>&lt;p&gt;You may have noticed from my &lt;a href="http://sqlblog.com/blogs/jamie_thomson/archive/tags/excel/default.aspx" target="_blank"&gt;recent posts regarding Excel&lt;/a&gt; that I have a keen interest in the upcoming release of Office and in particular I’m interested in the capabilities of the Excel web app. One new feature that piqued my interest was that pivot tables and slicers are now displayed in the Excel Web App on SkyDrive and, even better, they are interactive (i.e. even if you don’t have permission to edit the Excel workbook you can still slice’n’dice the pivot table). I wanted to throw together a workbook that enabled me to check out these new features so I headed to my favourite free data resource, the &lt;a href="http://data.london.gov.uk/" target="_blank"&gt;London Datastore&lt;/a&gt;, and browsed through &lt;a href="http://data.london.gov.uk/census/" target="_blank"&gt;the data that they hold on the 2011 UK census&lt;/a&gt;, much of which is made available in Excel workbooks.&lt;/p&gt;  &lt;p&gt;Its great that London Datastore makes this data available however I dislike the fact that, in many cases, that data is not presented in its rawest form. For example here is a screenshot of a worksheet that shows &lt;strong&gt;Number of non-UK short-term residents by sex, local authorities in England and Wales&lt;/strong&gt; where the data has already been pivoted on Gender and displayed hierarchically by region yet it is not actually presented in an Excel pivot table:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_7DB205DF.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_77F6EC39.png" width="538" height="329" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Moreover, the raw unpivoted data is not available. We have effectively been barred from viewing the data in its rawest form. This effectively means I cannot easily interact with the data, we can’t even use that most basic of Excel features – filtering!&lt;/p&gt;  &lt;p&gt;To make matters worse, some of the sub-totals are wrong. The sum for &lt;strong&gt;Females&lt;/strong&gt; in &lt;strong&gt;Tyne and Wear (Met County)&lt;/strong&gt; (i.e. Gatehead, Newcastle upon Type, North Tyneside, South Tyneside &amp;amp; Sunderland) is 2000 yet the subtotal given on the spreadsheet is 2100:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_76462065.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_0E698AC1.png" width="460" height="124" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;So, not only can the data not be analysed it cannot be trusted either!&lt;/p&gt;  &lt;p&gt;I figured this could be a lot better so I pasted the leaf-level data into a new worksheet and manually manipulated it so that so that I had the raw data in a table, including Gender as a column:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_0D24F1E2.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_6B591950.png" width="713" height="152" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;From here its much easier to build a pivot table and, even better, that pivot table can be interacted with on the web using the Excel Web App. Why is that better? Simple, its because I can give you this URL: &lt;a title="http://sdrv.ms/Nb43St" href="http://sdrv.ms/Nb43St" target="_blank"&gt;http://sdrv.ms/Nb43St&lt;/a&gt; and you can go and interact with it yourself; you don’t even need to use Excel, all you need is a web browser.&lt;/p&gt;  &lt;p&gt;If you visit that URL you’ll see four worksheets:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_09C35A3A.png"&gt;&lt;img style="background-image:none;border-right-width:0px;margin:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_2906010D.png" width="244" height="45" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;“Information” &amp;amp; “Table M02” are the worksheets contained in the original workbook that I downloaded from London Datastore. “Raw data” is the table that I created manually and “Pivot” is the pivot table with slicers. Here’s a screenshot of that pivot table on SkyDrive:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;a href="http://sdrv.ms/Nb43St" target="_blank"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/jamie_thomson/image_6219AF09.png" width="546" height="575" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;From there you can filter, slice, sort, drill-down and do most of the other stuff that you’re used to doing using pivot tables – if you want more functionality (such as drillthrough, drag-and-drop dimensions between axes) simply download your own copy of the workbook and play around at will in Excel. Notice also that the sub-total for &lt;strong&gt;Females&lt;/strong&gt; in &lt;strong&gt;Tyne and Wear (Met County)&lt;/strong&gt; is, of course, correctly given as 2000, not 2100.&lt;/p&gt;  &lt;p&gt;Pretty cool, no? Finally one can interact with Excel pivot tables on the web and, best of all, those workbooks can be shared with the entire world – I have been waiting years for this. Sure, web based spreadsheets have existed for a long time but this is the first time (that I know of) that Excel pivot tables work seamlessly in a web browser and given the amount of Excel workbooks out there, that is a big deal. Click through, play around with the pivot table and slicers to your heart’s content and let me know your thoughts in the comments below.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://twitter.com/jamiet"&gt;@Jamiet&lt;/a&gt;&lt;/p&gt;</description></item></channel></rss>