THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Jorg Klein

Jorg Klein, Microsoft-only BI consultant from the Netherlands

SSIS – Delete all files except for the most recent one

Quite often one or more sources for a data warehouse consist of flat files. Most of the times these files are delivered as a zip file with a date in the file name, for example FinanceDataExport_20100528.zip

Currently I work at a project that does a full load into the data warehouse every night. A zip file with some flat files in it is dropped in a directory on a daily basis. Sometimes there are multiple zip files in the directory, this can happen because the ETL failed or somebody puts a new zip file in the directory manually. Because the ETL isn’t incremental only the most recent file needs to be loaded. To implement this I used the simple code below; it checks which file is the most recent and deletes all other files.

Usage is quite simple, just copy/paste the code in your script task and create two SSIS variables:

  • SourceFolder (type String): The folder that contains the (zip) files
  • DateInFilename (type Boolean): A flag, set it to True if your filename ends with the date YYYYMMDD, set it to false if creation date of the files should be used

Note: In a previous blog post I wrote about unzipping zip files within SSIS, you might also find this useful: SSIS – Unpack a ZIP file with the Script Task

Public Sub Main()

    'Use this piece of code to loop through a set of files in a directory
    'and delete all files except for the most recent one based on a date in the filename.

    'File name example:
    'DataExport_20100413.zip

    Dim rootDirectory As New DirectoryInfo(Dts.Variables("SourceFolder").Value.ToString) 'Set the directory in SSIS variable SourceFolder. For example: D:\Export\
    Dim mostRecentFile As String = ""
    Dim currentFileDate As Integer
    Dim mostRecentFileDate As Integer
    Dim currentFileCreationDate As Date
    Dim mostRecentFileCreationDate As Date

    Dim dateInFilename As Boolean = Dts.Variables("DateInFilename").Value 'If your filename ends with the date YYYYMMDD set SSIS variable DateInFilename to True. If not set to False.


    If dateInFilename Then

        'Check which file is the most recent
        For Each fi As FileInfo In rootDirectory.GetFiles("*.zip")

            currentFileDate = CInt(Left(Right(fi.Name, 12), 8)) 'Get date from current filename (based on a file that ends with: YYYYMMDD.zip)

            If currentFileDate > mostRecentFileDate Then

                mostRecentFileDate = currentFileDate
                mostRecentFile = fi.Name

            End If

        Next

    Else 'Date is not in filename, use creation date

        'Check which file is the most recent
        For Each fi As FileInfo In rootDirectory.GetFiles("*.zip")

            currentFileCreationDate = fi.CreationTime 'Get creation date of current file

            If currentFileCreationDate > mostRecentFileCreationDate Then

                mostRecentFileCreationDate = currentFileCreationDate
                mostRecentFile = fi.Name

            End If

        Next

    End If


    'Delete all files except the most recent one
    For Each fi As FileInfo In rootDirectory.GetFiles("*.zip")

        If fi.Name <> mostRecentFile Then

            File.Delete(rootDirectory.ToString + "\" + fi.Name)

        End If

    Next

    Dts.TaskResult = ScriptResults.Success
End Sub
Published Friday, May 28, 2010 2:53 PM by jorg

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Todd McDermid said:

Great post, Jorg - I can't tell you how often this question is asked on the forums.  Now I'll just have to remember where to find this post so I can refer people to it!

Can you edit that sample above to use FileInfo to get the actual modification date for your comparison?  Even though most people inquire about doing it exactly the way you have (the date is encoded in the file name) - they also tend to ask the same question when it's not.

Thanks!

May 28, 2010 11:49 AM
 

Mike C said:

Todd - I created a similar package a while back -- FileInfo was very slow over the network for some reason.  I found the Directory class methods worked much faster than DirectoryInfo and FileInfo.  If you want to quickly modify Jorg's example, just modify it to use the FileInfo .CreationTime property (it's a System.DateTime property): http://msdn.microsoft.com/en-us/library/system.io.filesysteminfo.creationtime(v=VS.100).aspx

May 29, 2010 10:17 PM
 

jorg said:

@Todd: Thanks, I assume this post can help quite some people and that why I wrote it :-) I have changed the code so you can choose between using a date in the filename or the creation date. Thanks for the suggestion.

@Mike C: I haven't experienced any problems with FileInfo over the network here so I haven't changed the code. Thanks for pointing me to FileInfo .CreationTime!

June 1, 2010 8:34 AM
 

Paul said:

I would suggest you use ModificationTime on the file, as creation time is when the file arrives on your filesystem, not when it was last written.

If you copy the file from a server to your machine, the Creation Time is when you copied the file, Not the CreationDate you actually expect.

Modificationtime remains consistent in both scenarios.

August 18, 2010 9:08 PM
 

e said:

d

February 14, 2011 11:10 AM
 

Dave Ruijter said:

Hi Jorg,

You might note that it is a Microsoft Visual Basic (2008) script and that the System.IO namespace is required, e.g. "Imports System.IO"

Great post!

October 18, 2011 3:34 PM
 

Mark said:

Very useful thanks.  Needed some code to find the latest csv file in a folder a log it's name, modification\created date etc.  This helped me on my way.  Excellent post.

February 17, 2014 10:27 AM
 

Gusmundo said:

Very good. Just what I needed. :-)

As per a previous post, the only thing missing was.

Imports System.IO

Also very easy to modify to delete all .JPG in directory. which is another common issue if retrieving attachments from emails,

Thanks again.

April 23, 2014 7:44 PM

Leave a Comment

(required) 
(required) 
Submit

About jorg

Jorg Klein, Microsoft-only BI consultant from the Netherlands, blogging about BI on SQL Server with a focus on SSIS.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement