<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Search results matching tag 'Open Data'</title><link>http://sqlblog.com/search/SearchResults.aspx?o=DateDescending&amp;tag=Open+Data&amp;orTags=0</link><description>Search results matching tag 'Open Data'</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>Weekend reading – Data Explorer, Quandl, Flatmerge and a SQL Saturday app</title><link>http://sqlblog.com/blogs/jamie_thomson/archive/2013/03/09/weekend-reading-data-explorer-quandl-flatmerge-and-a-sql-saturday-app.aspx</link><pubDate>Sat, 09 Mar 2013 18:30:04 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:48152</guid><dc:creator>jamiet</dc:creator><description>&lt;p&gt;Here are some things that have piqued my interest on the interwebs over the last few days.&lt;/p&gt;  &lt;h2&gt;Parameterized queries and Security in Data Explorer&lt;/h2&gt;  &lt;p&gt;&lt;a href="https://twitter.com/technitrain" target="_blank"&gt;Chris Webb&lt;/a&gt; put a &lt;a href="http://social.msdn.microsoft.com/Forums/en-US/dataexplorer/thread/69153a6d-2205-4456-bcac-3a4689c787cf" target="_blank"&gt;post&lt;/a&gt; up on the &lt;a href="http://social.msdn.microsoft.com/Forums/en-US/dataexplorer/threads" target="_blank"&gt;Data Explorer forum&lt;/a&gt; asking about parameterizing queries in Data Explorer and Miguel Llopis from the Data Explorer product team replied with some useful information:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;em&gt;…there is some risk for users to leak information to external sources when doing this, and so we try to prevent this from being done &amp;quot;by default&amp;quot;. You can disable this level of protection by clicking the &amp;quot;Fast Combine&amp;quot; button in the Data Explorer ribbon tab. More information about Fast Combine and Privacy Levels can be found in our Help contents: &lt;/em&gt;&lt;a href="http://office.microsoft.com/en-us/excel-help/privacy-levels-HA104009800.aspx"&gt;&lt;em&gt;http://office.microsoft.com/en-us/excel-help/privacy-levels-HA104009800.aspx&lt;/em&gt;&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Following Miguel’s link shows this information:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;a href="http://office.microsoft.com/en-us/excel-help/privacy-levels-HA104009800.aspx" target="_blank"&gt;&lt;img title="image" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/jamie_thomson/image_147A0709.png" width="565" height="345" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;So, default behaviour in Data Explorer is that the user is protected from inadvertently leaking information to 3rd parties.&amp;#160; Its good to know that security has been prevalent thinking within the Data Explorer team however users do need to be aware that this behaviour exists, hence my mentioning it here.&lt;/p&gt;  &lt;h2&gt;Quandl – a search engine for datasets&lt;/h2&gt;  &lt;p&gt;I have stumbled across a site called &lt;a href="http://www.quandl.com/" target="_blank"&gt;Quandl&lt;/a&gt; that looks interesting, it bills itself as “Intelligent search for numerical data”. Essentially this is a search engine for finding datasets on the web which should be a useful resource in the emerging world of self service BI.&lt;/p&gt;  &lt;p&gt;I’m writing this on a train so as an example I used Quandl to search for data on UK train journeys and first result was &lt;a href="http://www.quandl.com/EUROSTAT-EuroStat/RAIL_TF_TRAINMV_105-Train-movements-1000-Train-kilometre-Passenger-trains-United-Kingdom" target="_blank"&gt;Train movements : 1000 Train-kilometre : Passenger trains : United Kingdom&lt;/a&gt;:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_06979B41.png"&gt;&lt;img title="image" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_770463A4.png" width="571" height="593" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Here we have some raw data pertaining to train movements in the UK from 2004 to 2011. Quandl provides a chart of the data, a link to the source and an indication of the age of the data. It also enables us to download the data and provides Excel, CSV, JSON &amp;amp; XML as choices of data format.&lt;/p&gt;  &lt;p&gt;An interesting idea indeed, Quandl is in its infancy though I shall be keeping a watching brief to see if it turns out to be a success or not.&lt;/p&gt;  &lt;h2&gt;Publish your own datasets with Flatmerge&lt;/h2&gt;  &lt;p&gt;The aforementioned Chris Webb put me onto this one. &lt;a href="http://flatmerge.com" target="_blank"&gt;Flatmerge&lt;/a&gt; is a startup from Michigan, US that enables one to publish their data for public consumption:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_0325DDCC.png"&gt;&lt;img title="image" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_0F4757F3.png" width="758" height="211" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;em&gt;With the FlatMerge data storage platform it's easy to share data in the cloud and use it in other applications. Just upload data and let FlatMerge discover it's &lt;b&gt;actual&lt;/b&gt; data types and make the data and metadata available in JSON or XML format through (&lt;/em&gt;&lt;a href="http://www.odata.org/documentation/uri-conventions#QueryStringOptions"&gt;&lt;em&gt;OData&lt;/em&gt;&lt;/a&gt;&lt;em&gt;) URL queries.&lt;/em&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Flatmerge are using &lt;a href="http://www.odata.org" target="_blank"&gt;OData&lt;/a&gt;-compliant URI query formats and &lt;a href="http://twitter.com/Flatmerge/status/308267611731394560" target="_blank"&gt;they tell me&lt;/a&gt; that OData output is coming soon:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;em&gt;&lt;a href="http://twitter.com/Flatmerge/status/308267611731394560" target="_blank"&gt;We currently support some OData queries. Data/Meta is returned in plain JSON or XML. OData output is coming soon!&lt;/a&gt;&lt;/em&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;That Flatmerge chose to use OData to publish their data is interesting – I’ve long suspected that greater OData adoption wouldn’t be far away once Excel natively supported it as an external data source and Flatmerge have realised the value in doing this. Flatmerge enables one to publish data to the web, Quandl helps people find data on the web – perhaps these two should go out for coffee sometime &lt;img class="wlEmoticon wlEmoticon-smile" style="border-top-style:none;border-left-style:none;border-bottom-style:none;border-right-style:none;" alt="Smile" src="http://sqlblog.com/blogs/jamie_thomson/wlEmoticon-smile_06E3829C.png" /&gt;&lt;/p&gt;  &lt;h2&gt;SQL Saturday app for Windows Phone&lt;/h2&gt;  &lt;p&gt;&lt;a href="https://twitter.com/sqltechmike" target="_blank"&gt;Michael Wells&lt;/a&gt; has built a Windows Phone app for SQL Saturday (particularly pertinent for me at the moment as I am on my way home form &lt;a href="http://www.sqlsaturday.com/194/" target="_blank"&gt;SQL Saturday 194&lt;/a&gt;).&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_2626296F.png"&gt;&lt;img title="image" style="border-top:0px;border-right:0px;background-image:none;border-bottom:0px;padding-top:0px;padding-left:0px;border-left:0px;display:inline;padding-right:0px;" border="0" alt="image" src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_434BD179.png" width="547" height="450" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;It provides data about each event, including the all important schedule information.&lt;/p&gt;  &lt;p&gt;I have slightly mixed feelings about this. On the one hand its fantastic to see a community member voluntarily build a great FREE resource for the SQL community – massive credit to Michael for doing this. On the other hand it highlights one of my pet peeves about the current app culture that is prevalent on smartphones – this is an app that you can only use if you have a certain type of phone. The information presented here is valuable and given away for free, why is it hidden behind a gated app store? Should there not be a SQL Saturday website that is optimised for and viewable on any mobile web browser? Better still, its the schedule data here that is most valuable so why not publish that data in a format that allows one to view that schedule in one’s phone/PC/tablet calendar regardless of the type of device they are using? That data format, by the way, is &lt;a href="http://sqlblog.com/blogs/jamie_thomson/archive/tags/iCalendar/default.aspx" target="_blank"&gt;iCalendar&lt;/a&gt; which is something that regular readers are probably fed up of me &lt;a href="http://sqlblog.com/blogs/jamie_thomson/archive/2010/06/03/thinking-differently-about-bi-delivery.aspx" target="_blank"&gt;banging on about&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;I hope that doesn’t detract from Michael’s great efforts here; his app is fulfilling an important need, I just happen to think its a shame that that need even exists when there are mechanisms already in place for delivering this data to us in a more efficient matter. On the other hand its hard to argue with the ease at which apps deliver information to us so perhaps I should just quietly climb down off of my soapbox! Comments are welcome!&lt;/p&gt;  &lt;p&gt;&lt;a href="http://twitter.com/jamiet" target="_blank"&gt;@Jamiet&lt;/a&gt;&lt;/p&gt;</description></item><item><title>Exploring earnings data for the UK [Open Data]</title><link>http://sqlblog.com/blogs/jamie_thomson/archive/2013/02/11/exploring-earnings-data-for-the-uk-open-data.aspx</link><pubDate>Mon, 11 Feb 2013 13:34:02 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:47637</guid><dc:creator>jamiet</dc:creator><description>&lt;p&gt;I have a burgeoning interest in the world of Open Data which wikipedia describes as:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;em&gt;&lt;b&gt;Open data&lt;/b&gt; is the idea that certain &lt;/em&gt;&lt;a href="http://en.wikipedia.org/wiki/Data"&gt;&lt;em&gt;data&lt;/em&gt;&lt;/a&gt;&lt;em&gt; should be freely available to everyone to use and republish as they wish, without restrictions from &lt;/em&gt;&lt;a href="http://en.wikipedia.org/wiki/Copyright"&gt;&lt;em&gt;copyright&lt;/em&gt;&lt;/a&gt;&lt;em&gt;, &lt;/em&gt;&lt;a href="http://en.wikipedia.org/wiki/Patent"&gt;&lt;em&gt;patents&lt;/em&gt;&lt;/a&gt;&lt;em&gt; or other mechanisms of control. The goals of the open data movement are similar to those of other &amp;quot;Open&amp;quot; movements such as &lt;/em&gt;&lt;a href="http://en.wikipedia.org/wiki/Open_source"&gt;&lt;em&gt;open source&lt;/em&gt;&lt;/a&gt;&lt;em&gt;, &lt;/em&gt;&lt;a href="http://en.wikipedia.org/wiki/Open_hardware"&gt;&lt;em&gt;open hardware&lt;/em&gt;&lt;/a&gt;&lt;em&gt;, &lt;/em&gt;&lt;a href="http://en.wikipedia.org/wiki/Open_content"&gt;&lt;em&gt;open content&lt;/em&gt;&lt;/a&gt;&lt;em&gt;, and &lt;/em&gt;&lt;a href="http://en.wikipedia.org/wiki/Open_access_(publishing)"&gt;&lt;em&gt;open access&lt;/em&gt;&lt;/a&gt;&lt;em&gt;. The philosophy behind open data has been long established&lt;/em&gt;&lt;em&gt;, but the term &amp;quot;open data&amp;quot; itself is recent, gaining popularity with the rise of the &lt;/em&gt;&lt;a href="http://en.wikipedia.org/wiki/Internet"&gt;&lt;em&gt;Internet&lt;/em&gt;&lt;/a&gt;&lt;em&gt; and &lt;/em&gt;&lt;a href="http://en.wikipedia.org/wiki/World_Wide_Web"&gt;&lt;em&gt;World Wide Web&lt;/em&gt;&lt;/a&gt;&lt;em&gt; and, especially, with the launch of open-data government initiatives such as &lt;/em&gt;&lt;a href="http://en.wikipedia.org/wiki/Data.gov"&gt;&lt;em&gt;Data.gov&lt;/em&gt;&lt;/a&gt;&lt;em&gt;.        &lt;br /&gt;&lt;/em&gt;&lt;a title="http://en.wikipedia.org/wiki/Open_data" href="http://en.wikipedia.org/wiki/Open_data"&gt;&lt;em&gt;http://en.wikipedia.org/wiki/Open_data&lt;/em&gt;&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;To that end I follow &lt;a href="https://twitter.com/londondatastore" target="_blank"&gt;@LondonDatastore&lt;/a&gt; on Twitter as they are actively publishing Open Data pertaining to the city in which I live, London. Four days ago they &lt;a href="https://twitter.com/londondatastore/status/299543409553711104" target="_blank"&gt;announced&lt;/a&gt; that a new dataset had been released that provided earnings information, not just for London but for the whole country and going back many years too. The provided link, &lt;a title="http://data.london.gov.uk/datastore/package/earnings-workplace-borough" href="http://data.london.gov.uk/datastore/package/earnings-workplace-borough"&gt;http://data.london.gov.uk/datastore/package/earnings-workplace-borough&lt;/a&gt;, brings up a page from where one can download an Excel workbook containing some data. Unfortunately the data in that workbook is not, in my opinion, provided in such a manner that makes it easily explorable (&lt;a href="http://sqlblog.com/blogs/jamie_thomson/archive/2012/08/13/using-pivot-tables-in-the-office-excel-web-app.aspx" target="_blank"&gt;which I have complained about before on this blog&lt;/a&gt;); the data is spread over multiple worksheets:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_3BDA8261.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_5B1D2934.png" width="383" height="37" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Worse, the data is already aggregated and pivoted:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_330A7A15.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_66663D71.png" width="577" height="138" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;In other words the workbook does not contain the raw row-level data from which this pivoted data is produced. Thankfully a link (&lt;a title="https://www.nomisweb.co.uk/query/construct/summary.asp?mode=construct&amp;amp;version=0&amp;amp;dataset=99" href="https://www.nomisweb.co.uk/query/construct/summary.asp?mode=construct&amp;amp;version=0&amp;amp;dataset=99"&gt;https://www.nomisweb.co.uk/query/construct/summary.asp?mode=construct&amp;amp;version=0&amp;amp;dataset=99&lt;/a&gt;) is provided from where the raw data can be downloaded. I found the interface there to be slightly clunky but that’s a minor quibble – that raw data is available should be considered a major boon. I downloaded earnings data for:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Local Authority &lt;/li&gt;    &lt;li&gt;Gender &lt;/li&gt;    &lt;li&gt;Full Time or Part Time &lt;/li&gt;    &lt;li&gt;Year (1999-2012) &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;and have made that raw data available in a publicly accessible Excel workbook. You can view that workbook online (only a browser required) at &lt;a href="http://sdrv.ms/VPvjcD" target="_blank"&gt;AnnualPay_by_YearGenderLocalAuthorityPartorfulltime.xlsx&lt;/a&gt; (from there you can also download for your own analysis).&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_6521A492.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_6E9A6308.png" width="634" height="372" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The workbook also provides summaries over the raw data by way of pivot tables and charts. Arguably its clear to see, for example, that a gender imbalance exists although perhaps the gap may be lessening somewhat&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_088E232B.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_112E7BB7.png" width="904" height="470" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Its interesting to note that the average salary for Males dropped off in 2009/2010. Perhaps the economic events of 2008 are the cause for that, checking out data for only City of London (the traditional financial hub of the UK) suggests that may well be the case:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_5666D2E0.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_662667E4.png" width="901" height="441" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;There will be many more nuggets of information available in this data, all we need to do is set the data free so that people can find it for themselves. That is the aim of this blog post so hit the link: &lt;a title="http://sdrv.ms/VPvjcD" href="http://sdrv.ms/VPvjcD" target="_blank"&gt;AnnualPay_by_YearGenderLocalAuthorityPartorfulltime.xlsx&lt;/a&gt; and see what other nuggets you can find! If &lt;a href="https://twitter.com/search?q=dataviz" target="_blank"&gt;dataviz&lt;/a&gt; is an area that interests you then this is a cracking dataset to explore!&lt;/p&gt;  &lt;hr /&gt;  &lt;p&gt;One other point I want to make is that the raw data is provided as a mean average which means that each aggregated figure is a mean average of some mean averages. This isn’t good as it distorts the data as I demonstrate with this simple example:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_0C1E8DDA.png"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="image" border="0" alt="image" src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_0B4627F0.png" width="384" height="82" /&gt;&lt;/a&gt;&lt;/p&gt;    &lt;p&gt;We have an average salary for both male and female (5000 &amp;amp; 9000 respectively) and the average of those two averages is 7000. However if we take the total “total salary” / total “tally of people asked” (127000 / 23) then the average is quite different – 5521.73.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;That’s a large discrepancy even for only two rows of data and it highlights the problem of providing averages rather than the figures that created those averages. To me this is a data quality issue – the raw source data does not provide the requisite level of detail to enable accurate analysis. Quality of data is paramount.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://twitter.com/jamiet" target="_blank"&gt;@Jamiet&lt;/a&gt;&lt;/p&gt;</description></item></channel></rss>