Recently, I got sucked into importing Excel sheets into a database. I had thought, well that's easy with a tool such as SQL Server 2000 DTS Import/Export Wizard or SQL Server 2005 Import/Export Wizard. Needless to say, the Excel import business ran into a snag. With the error messages from the wizard not being really helpful, I ditched the wizard and wrote the following little ADO.NET program to read the Excel cells and dump them to a text file for a quick bcp import:
using System;
using System.IO;
using System.Data;
using System.Data.OleDb;
class Junk
{
static void Main(string[] args)
{
String sConnectionString =
"Provider=Microsoft.Jet.OLEDB.4.0;" +
"Data Source=" + "Junk.xls" + ";" +
"Extended Properties=Excel 8.0;";
OleDbConnection conn = new OleDbConnection(sConnectionString);
conn.Open();
OleDbCommand cmd =
new OleDbCommand("SELECT * FROM [Vendors$]", conn);
OleDbDataReader reader =
cmd.ExecuteReader(CommandBehavior.CloseConnection);
while (reader.Read())
{
for(int i=0; i< reader.FieldCount-1; i++)
{
Console.Write(reader.GetValue(i).ToString() + "|");
}
Console.Write(reader.GetValue(reader.FieldCount-1).ToString() + "}");
}
conn.Close();
}
}
But as soon as I thought I was done, I noticed that some cells were truncated in the output text file. I'm sure there is a switch somewhere that I could have used to tell the OLE DB reader not to truncate anything. But I was pressed for time and I had to cut the loss. Plus, I really started to worry about the data quality of the Excel sheet and wanted to be able to check/verify each cell. Instead of figuring out how to get the C# code to work, I ditched it and put together the following little Perl script:
use strict;
use Win32::OLE;
use Win32::OLE::Const 'Microsoft.Excel';
use Win32::OLE::Variant;
use Win32::OLE::NLS qw(:LOCALE :DATE);
my $ex;
eval {$ex = Win32::OLE->GetActiveObject('Excel.Application')};
die "Excel not installed" if $@;
unless (defined $ex)
{
$ex = Win32::OLE->new('Excel.Application', sub {$_[0]->Quit;})
or die "Oops, cannot start Excel";
}open(LOG, ">readExcel.log") or
die "can't open readExcel.log for write.\n";my $book = $ex->Workbooks->Open('c:\Junk.xls') or
die Win32::OLE->LastError();
my $sheet = $book->Worksheets(1);
my $array = $sheet->Range("A2:Z1878")->{Value}; # specify the rangemy $row_cnt = 0;
for (@$array) {
foreach my $i (0..(scalar(@$_) - 2))
{
if (ref($_->[$i]))
{
print $_->[$i]->Date("MM-dd-yyyy") . "\t";
}
else
{
my $b = $_->[$i];
$b =~ s/\s/ /g; # convert whitespaces to blanks
print "$b" . "\t";
}
}
my $b = $_->[scalar(@$_) - 1];
$b =~ s/\s/ /g;
print "$b" . "\n";
$row_cnt++;
print LOG "($row_cnt," . scalar(@$_) . ")\n";
}
close(LOG);
I used several variations of this script to help check the Excel sheet. For instance, I changed the script to count the number of characters in each cell and figured out the max number of characters in each column in the sheet. This helped in determining the width for each column in the destination table. The table automatically generated by the import wizard just got all the column widths wrong.
Now, I'm not suggesting that this particular approach was the best or the script was such a terrific piece of code--I couldn't care less about its style or whether it's most efficient, I just wanted to get the data out of the cells and get the work done. Sure, there are many other ways to accomplish this, and your favorite method depends on your experience and the tool you are familiar with.
The point is that when a pre-built tool doesn't quite work out, it's good to know a little bit scripting that can help fill the gap.