THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

John Paul Cook

Processing nonstructured data using FILESTREAM and FileTable

SQL Server 2012 simplifies the processing of unstructured data found in files. The FileTable enhancements to FILESTREAM allow documents to be known to SQL Server full-text search by simply copying them to a FileTable network share. This is much more convenient than it was in SQL Server 2008 R2 where the files had to be explicitly loaded into SQL Server as blobs before they could be processed by full-text search.

Although I’m a full-time IT consultant, I spend most of my spare time pursuing graduate studies in nursing. I’m always looking for ways to use technology to improve my efficiency as a student and also improve my effectiveness as a consultant. Some of the technical tips and tricks I’ve figured out for school have been useful to my clients in corporate environments. In this blog post, I’ve taken PowerPoint slides from my pharmacology course to evaluate the usefulness of using FileTable to process unstructured data found in the same kinds of files used in corporate environments.

image

Step 1. Access your SQL Server instance’s properties to enable FILESTREAM.

The first checkbox is to enable FILESTREAM in SQL Server. The second checkbox is to allow Windows to read and write FILESTREAM data – in other words, make your files accessible to Windows. The third checkbox is to allow remote users to access your FILESTREAM data. This is the option for real world usage. Typically a SQL Server is on a remote machine, not locally installed.

image

Figure 2. Check everything if you want maximum functionality.

Changing FILESTREAM settings requires a restart of the SQL Server service.

You must create a FILESTREAM enabled database or add FILESTREAM to an existing database. For this blog post, I chose to create a new database.

CREATE DATABASE School
ON
PRIMARY (NAME = School, FILENAME = 'd:\school\dbFiles\school.mdf')
,FILEGROUP FileStreamGroup
 CONTAINS FILESTREAM(NAME = schoolFilestream, FILENAME = 'd:\school\schoolFilestream')
 LOG ON (NAME = SchoolLog, FILENAME = 'd:\school\dbFiles\school.ldf');
GO

The folder for the data and log files must exist. The subfolder for the FILESTREAM must not exist and will be created when the database is created.

image

Figure 3. Folder structure showing that D:\school\dbFiles exists and D:\school\schoolFilestream does not exist.

After creating the database with the above T-SQL statement, the D:\school\schoolFilestream folder is created as shown below. As the dialog box shows, it is considered a system folder by Windows.

image

Figure 4. Click Continue to access the new folder and its contents.

image

Figure 5. The new D:\school\schoolFilestream folder and its contents.

 

image

Figure 6. FILESTREAM Directory Name (which must NOT be a full path) and Non-Transacted Access options.

The options specified created a folder fileContainer under a new mssqlserver share as shown below.

image

Figure 7. Network showing the fileContainer FILESTREAM directory name under the default share name of mssqlserver.

At this point the fileContainer folder is empty. Time to create a FileTable. Notice that the CREATE TABLE statement does not specify any columns. The default is to create a folder with the same name as the FileTable, but the example shows how to use FILETABLE_DIRECTORY to override the default and use pharmFiles for the folder name. Notice that the WITH clause is also used to create a unique constraint.

CREATE TABLE pharm AS FileTable
WITH (FILETABLE_DIRECTORY = N'pharmFiles'
         ,FILETABLE_STREAMID_UNIQUE_CONSTRAINT_NAME = ui_file_stream);

 

image

Figure 8. The CREATE TABLE statement created a new folder pharmFiles.

Within SSMS, you can see that a FileTable has predefined columns.

image

Figure 9. FileTable columns are predefined.

Now it is time to copy all of the pharmacology PowerPoint files to the pharmFiles folder on the network share.

image

Figure 10. PowerPoint files added to the previously empty pharmFiles folder.

Let’s examine the contents of the pharm FileTable after the file copy.

image

Figure 11. Notice that SQL Server recognizes all of the files that were copied to the pharmFiles folder on the mssqlserver share.

Notice that the files are of type pptx. SQL Server 2012 by default does not index pptx files. Use the following query to find out what file types are indexed.

SELECT * FROM sys.fulltext_document_types
ORDER BY document_type;

It is necessary to add an IFilter, which you will probably find is more than a one step process. As of the date of publication, the latest and greatest IFilter pack is the Microsoft Office 2010 Filter Pack (a.k.a. Filter Pack 2.0). It’s possible that when you read this, a newer version may be available, so do your due diligence and check. After installing the Filter Pack, you may need to install a service pack. There’s more than one service pack that has been released for Filter Pack 2.0. I will not provide any links because I don’t want to provide you with outdated information. Microsoft recommends that you obtain service packs by running Windows Update, which is what I did. As a best practice, after installing something from Windows Update, you should check for additional updates that could be required after what you just finished installing. I had three consecutive updates to do. You could have more or less than that.

The Office Filter Pack does not have a dependency on Microsoft Office. You don’t need to have Office installed on your SQL Server machine for the Filter Pack to work. It’s not a good idea to have Office installed on a server.

After installing and updating the Filter Pack, run the following command to load the new filters.

exec sp_fulltext_service 'load_os_resources', 1;

Update the system metadata about what filters are installed.

exec sp_fulltext_service 'update_languages';

Restart the filter daemon.

exec sp_fulltext_service 'restart_all_fdhosts’;

Confirm that the new filters were installed.

SELECT * FROM sys.fulltext_document_types
ORDER BY document_type;

Create a fulltext catalog and a fulltext index.

CREATE FULLTEXT CATALOG pharmCat AS DEFAULT;

CREATE FULLTEXT INDEX ON pharm (file_stream TYPE COLUMN file_type) KEY INDEX ui_file_stream;

Here’s a query to find out what files in the pharm folder contain the drug name Vancomycin.

SELECT *
FROM dbo.pharm
WHERE CONTAINS (file_stream, 'Vancomycin');

A single row is returned because only one file, Anti-infective.pptx mentions that drug. A future post will address using FileTable to make sense of unstructured data.

Published Saturday, September 07, 2013 10:40 PM by John Paul Cook

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Marc Online said:

Hi John,

Great article! It opened my eyes on how SQL Server 2012 works with Filestream and FileTable. It looks like,..to me anyway.., as a great way to store my, and other people's, unstructured data.

Thanks for sharing.

Sincerely,

Marc Online.

September 8, 2013 10:43 AM

Leave a Comment

(required) 
(required) 
Submit

About John Paul Cook

John Paul Cook is both a Registered Nurse and a Microsoft SQL Server MVP experienced in Microsoft SQL Server and Oracle database application design, development, and implementation. He has spoken at many conferences including Microsoft TechEd and the SQL PASS Summit. He has worked in oil and gas, financial, manufacturing, and healthcare industries. Experienced in systems integration and workflow analysis, John is passionate about combining his IT experience with his nursing background to solve difficult problems in healthcare. He sees opportunities in using business intelligence and Big Data to satisfy healthcare meaningful use requirements and improve patient outcomes. John graduated from Vanderbilt University with a Master of Science in Nursing Informatics and is an active member of the Sigma Theta Tau nursing honor society. Contributing author to SQL Server MVP Deep Dives and SQL Server MVP Deep Dives Volume 2.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement