A few weeks ago I wrote a blog entry entitled Microsoft Semantic Engine where I spoke about a technology called (unsurprisingly) the “Microsoft Semantic Engine” which looks like it is set to ship in a future version of SQL Server. In that blog post I said:
I have a casual interest in semantic technologies (I’ve talked about microformats, semantic web, RDF before) so anything that might meld these technologies with SQL Server really captures my attention. I shall be keeping a watching brief.
Well, with that in mind I today sat down to watch a video of a session from last week’s Professional Developer’s Conference (PDC) called “Microsoft Semantic Engine” by Naveen Garg and Duncan Davenport which you can find at http://microsoftpdc.com/Sessions/SVR32.
After watching the video my interest has been heightened even more so I’m writing this blog post to tell a little bit more about it.
Firstly, if I had to describe this Semantic Engine to my mum it would go something like this:
“Here’s a way of finding out what people out on the world wide web are saying about stuff”
If I had to describe it to an information worker, the like of which are the end users of the systems I build in my day-to-day job, it would go something like this:
“Here’s a technology that enables us to discover and store ad-hoc, pertinent information that we previously would not have been readily accessible to us”
If I had to describe it to a fellow techie then the description might go something like this:
“The Semantic Engine is a technology that enables us to mine unstructured data and query it using tools that are already familiar to SQL Server professionals. It extracts and indexes pertinent information from data sources that are presented to it through data adapters. Adapters may exist for documents, audio, video, web pages and a plethora of other unstructured or partially-structured data sources”
Hopefully one of those descriptions helps to convey to you what this thing is all about. I find this particularly fascinating because in my day job as a SQL Server BI developer I spend a lot of time building systems that expose structured data to people to enable them to make better decisions; the world of unstructured data is, I believe, largely untapped and with the rise of stuff like Twitter and Flickr there is an enormity of unstructured data out there that people would, I’m sure, be interested in harnessing for their decision-making activities.
If you would like to find out more then I’d encourage you to go and watch the video which is about 45 minutes long. Below are some notes that I took whilst watching the video; they mostly consist of paraphrased quotes from Naveen and some screenshots that may help you to conceptualize what this thing actually is and I have highlighted the ones that are of most interest to me. I await your comments with interest!
“Microsoft Semantic Engine” just a placeholder name, it will change.
“Unifying search, discovery as well as input into your content” (web feeds, corporate documents, emails)
“Information is rapidly changing” – Naveen gives Twitter as an example
“Lots of decisions are being made in emails”
“This technology is about making sure we can bring this together into a single platform where the traditional tools (T-SQL, Reporting) can get to this data through the existing interfaces or much more easily to allow you to make more informed decisions”
“Some studies say that more than 80% of data lie outside structured stores”
“There is a unique opportunity here both for Microsoft and developers to bring together all this information in a search concept that users are used to, bringing it into the structured relational world where you can do analytics on top of it”.
“Uses SQL Server as its store for all the information and indices that we have built and we have a web front-end that allows you to use RESTful APIs”
Naveen showed a demo where he was searching through images. A query of “descriptor:outdoor>0.85” returned a collection of photos that the engine thinks have a greater than 85% probability of being outside.
“We parse the content to look for meaningful information, some of them are called concepts. Time, place, people, documents, events are very important concepts and we expose them as what we call ‘kinds’”