THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Merrill Aldrich

PowerShell to Validate SAN File Migration

This post is a bit off topic for SQL Server proper, but I thought someone out 'oogling teh inter-tubes might find it useful. Our group recently had to perform a large SAN migration both to new equipment and to a new geographic location. We started with a replication method provided by the SAN vendor. I won't mention who, but from where I sit I can't prevent you from guessing :-). We had to move a lot of files; the great majority made the move intact but, perhaps because we are picky, we needed every single, individual file to be copied successfully. The files are just files, not SQL Server data, but they are important. It came to light that a few got ... what's the technical term ... "whacked" in transit.

File Copy Validation

After the initial copy, we needed a method to validate the files. Since the files are in two physical locations, directly comparing them is impractical, as it would take all the same bandwidth that it took to transmit the files in the first place. So, we cooked up a PowerShell script based on some examples that find duplicate files using a simple MD5 hash, and used the script to output hash values and full paths for each system into two sets of CSV files, each in one of the two locations.

(If you haven't tried PowerShell yet, I have to say I am finding that in the grand scheme of things, the order of good stuff goes "Sliced Bread," then "PowerShell," a very close second.)

From the resulting CSV files, a quick SSIS load into SQL Server tables and it was possible to have SQL do the heavy lifting, finding any files where the full path is the same but the computed hashes don't match. From there, thirty seconds of additional effort made XCOPY commands to re-write the corrupt files from the source system to the destination. We were seaching for relatively few files out of millions, so this was a huge time saver.

This is just a quick and dirty administrative script, so I'm afraid it's not polished code, lacking niceties like error handling or parameters. YMMV, &c.:


#-------------------------------------------------#
# This script will read all files in a directory and subdirectories
# and compute a hash value for the content of each file, storing the
# hash in a text file report. Use to binary-compare sets of files in
# two locations.
#
# Created 8/27/2009 by Merrill Aldrich using logic from this
# duplicate file scanning method: 
# http://blogs.msdn.com/powershell/archive/2006/04/25/583225.aspx
#-------------------------------------------------#

# Set directory to scan and output file to store hash values

$searchFolder = "E:\SomeDirectory"
$outFile = "C:\File_Name_Of_your_choice.csv"

$cryptoServiceProvider = [System.Security.Cryptography.MD5CryptoServiceProvider]
$hashAlgorithm = new-object $cryptoServiceProvider

# Header for CSV format output file:
"File,CreateDate,ModDate,Hash" | Out-File $outFile 

# Scan each target file, recursively walk subfolders
"Scanning"
[int]$i = 0 # counter, just for user feedback

Get-Childitem -recurse $searchFolder `
	| Where-Object { $_.PSIsContainer -eq $false } `
	| foreach {   
	
		# Show progress every 'x' files
		$i += 1
		if ( $i%100 -eq 0 ) { $i.toString() + " Files Processed" | out-host }

		# compute a hash value for file content
		$stream = $null
		$hashByteArray = $null
		
		$stream = $_.OpenRead();
		$hashByteArray = $hashAlgorithm.ComputeHash($stream);
		$stream.Close();
	    
		# close the file stream in case of error
		trap {
			$_.Exception.Message | out-host
			if ($stream -ne $null) { $stream.Close(); }
			break;
		}

		$currentHash = [string]$hashByteArray

		# append file name, dates, hash to the output text file
		"""" + $_.fullname + """," `
			+ $_.creationtime.ToString( "s" ) + "," `
			+ $_.lastwritetime.ToString( "s" ) + "," `
			+ $currentHash `
			| Out-File -append $outFile
		
	}
$i.toString() + " Files Processed"
"Scan Complete."

This process has worked well for us. The only speed bump: the standard format for writing date stamps in sortable format, $_.creationtime.ToString( "s" ), makes strings like 2004-05-03T14:07:32, which are perfectly good ISO dates as fas as I know, but SSIS's standard import-wizard-generated packages seem to choke on the "T." To work around that I had to add a little Derived Column / Replace expression into the import packages to remove the "T" character from the date stamps. It's so like SSIS, which I generally like very much, to go 95% of the way but make you work for that last 5%...

Published Monday, August 31, 2009 1:03 PM by merrillaldrich
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Ben Thul said:

I make my money as a SQL admin, but can't help but shake my head at the hoops that people have to jump through because MS doesn't provide good tools out of the box.  Case in point: had this been in any sort of *nix environment, you could have leaned on rsync and/or md5sum instead of rolling your own.  People that have "grown up" in a Windows environment don't know how bad they have it.  Not to take away from your accomplishment..it's good and got the job done.  It's just unfortunate that you had to do it at all.

August 31, 2009 4:47 PM
 

merrillaldrich said:

Ben - I agree in principle (to me it often feels like Windows 7 is Windows plus 20% more Mac OS and 20% more *nix, and I'll be thankful for all 40% when I can really upgrade :-). BUT, to be fair, I think PowerShell took many good lessons from *nix and brought them to Windows, which is all good. Plus I didn't have to roll much but the logic to walk the files and write out results, as the actual MD5 code is the .NET framework, ready to use.

August 31, 2009 5:47 PM
 

Stephen Mills said:

For the date portion, you can use ( Get-Date ).ToString('yyyy-MM-dd HH:mm:ss').  There isn't much in the way of built in documentation for Datetime datatypes and formatting, but it would eliminate a step in your package.

You might also want to look at the PowerShell Community Extensions, which has a Get-Hash Cmdlet.  Here's an example using it, ToString, and Export-CSV that does a lot of what your script does.  If you are using PowerShell V2, then you have even more options, i.e. ConvertTo-Csv.

dir -recurse | ? { -not $_.PSIsContainer  } |

select fullname,

@{n='Hash';e= { (Get-Hash -Algorithm MD5 -Path $_.FullName).HashString }},

@{n='LastWriteTime';e={ $_.LastWriteTime.ToString('yyyy-MM-dd HH:mm:ss')}},

@{n='CreationTime';e={ $_.CreationTime.ToString('yyyy-MM-dd HH:mm:ss')}} |

% { $i++; if ($i%10 -eq 0) { Write-Progress 'Export MD5' "$i exported" -CurrentOperation $_.FullName }; $_ } |

Export-Csv -NoTypeInformation -Path c:\ExportedHash.csv

August 31, 2009 11:03 PM
 

merrillaldrich said:

Nice - I like how the @{} expressions line the columns up for a cleaner CSV export. Thanks.

September 1, 2009 12:00 AM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement