THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Jamie Thomson

This is the blog of Jamie Thomson, a freelance data mangler in London

Twitter Search Term Extraction

During my presentation A whistlestop tour of SSIS addins yesterday at SQLBits I demonstrated how Jessica Moss & Andy Leonard’s Twitter Task could be used in conjunction with SSIS’s Term Extraction component to discover what people might be saying in their replies to a Twitter user.

The Term Extraction component is a little-known and rarely-used component in SSIS’s bag of tricks but its rather interesting and can be quite useful too. It uses data mining algorithms to to extract nouns and/or phrases that are passed through it and then give a score to each term or phrase based on the frequency of its occurrence. Here’s another description from the documentation:

The Term Extraction transformation extracts terms from text in a transformation input column, and then writes the terms to a transformation output column. The transformation works only with English text and it uses its own English dictionary and linguistic information about English.

You can use the Term Extraction transformation to discover the content of a data set. For example, text that contains e-mail messages may provide useful feedback about products, so that you could use the Term Extraction transformation to extract the topics of discussion in the messages, as a way of analyzing the feedback.

http://msdn.microsoft.com/en-us/library/ms141809(SQL.90).aspx

There appeared to be quite a bit of interest in what I demo’d although one anticipated question that came up from the audience was “Can the Twitter Task perform searches on Twitter?” and the answer, currently, is no.

Hence, I’ve put together a package that uses a script component to do exactly that – queries Twitter for tweets containing a given search term. To use it you simply enter whatever you want to search for in the SearchTerm variable:

image

and execute the package. Here are the results from the Term Extraction component after doing a search on Twitter for “sqlbits” and “november”:

image     image

Download the package from here: http://cid-550f681dad532637.skydrive.live.com/browse.aspx/Public/BlogShare/20091122

@Jamiet

Published Sunday, November 22, 2009 7:52 PM by jamiet

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Ben Rees said:

Thanks for this - and for the great presentation at SQLBits. And apologies for asking the question about searching using this package - hope it wasn't too much work!

November 23, 2009 5:14 PM
 

Luke Hayler said:

Jamie,

This is an awesome. Glad to see you figured it out, and that I have finally come top at something. Ha!

I wonder how long until it is incorporated into the TwitterTaskSuite?

November 27, 2009 11:42 AM
 

Martin said:

Hi,

Thanks for posting this I found it very interesting. I had trouble getting it working owing to my companies web proxy. May I suggest the following amendment to the CreateNewOutputRows method which uses an HTTPWebRequest as the input to the XMLReader to allow for the provision of web proxy credentials.

   public override void CreateNewOutputRows()

   {

       try

       {

           //var input = "semantic";

           var input = this.Variables.SearchTerm;

           var scrubbed = HttpUtility.UrlEncode(input);

           var wrq = HttpWebRequest.Create(string.Format("http://search.twitter.com/search.atom?rpp=100&lang=en&q={0}", input));

           wrq.Proxy = WebProxy.GetDefaultProxy();

           wrq.Proxy.Credentials = CredentialCache.DefaultCredentials;

           var reader = XmlReader.Create(wrq.GetResponse().GetResponseStream());

                //string.Format("http://search.twitter.com/search.atom?rpp=100&lang=en&q={0}", input));

           var feed = SyndicationFeed.Load(reader);

           foreach (SyndicationItem item in feed.Items)

           {

               Output0Buffer.AddRow();

               Output0Buffer.tweet = item.Title.Text;

               Output0Buffer.tweeter = item.Authors[0].Name;

               Console.WriteLine("\t{0} - {1}", item.Authors[0].Name, item.Title.Text);

           }

       }

January 15, 2010 8:31 AM
 

jamiet said:

Martin,

Really appreciate you posting this, thank you very much.

-Jamie

January 15, 2010 8:49 AM
 

Jan said:

This is awesome, you enabled me to get up archiving and analyzing my companies tweets in less than 2h. Thanks a ton!

Jan

July 16, 2010 12:39 PM
 

AlessioSteli said:

This was great! Super easy to impliment. Within one day I was able to add twitter data to our ODS. Thanks! Does anyone know if it is possible - or what is the best way - to retrive status posts from facebook and myspace?

August 4, 2011 4:44 PM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement