THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Allen White

PowerShell, SQL Server and Large Resultsets

Last week I posted about using PowerShell and SQL Server together, and I used a SQLDataAdapter and populated a DataTable with the results. This approach is fine if what you're doing returns a reasonably small resultset, but what if you're returning millions of rows. In this case a DataTable isn't too practical, and a forward-only DataReader is the best way to go.

In my company we have some legacy applications which need data from a large relational database. The preferred format is a comma-separated data file with the columns quoted (to handle whatever imbedded data might be found). I needed to supply separate text files, with subscriber lists, with the list name as the file name of the text file, and the first row a list of the column names of the data.

The first thing I need to do is open a connection to the database.

#extract_subs.ps1
#This script will extract information for subscribers
#and write the results into text files named with the list name.

$cn = new-object System.Data.SqlClient.SqlConnection("Data Source=MyServer/MyInstance;Integrated Security=SSPI;Initial Catalog=Subscribers");
$cn.Open()

Once the connection is open I build a SqlCommand object with the query to be processed. (To prevent collisions with the application using the database, and because dirty reads are acceptable to this process, I use the NOLOCK hint.) The results are ordered by list name and email address - the list name order is important because I'm creating separate files based on that list name. I also need to set the CommandTimeout property of the SqlCommand object to 0 to prevent PowerShell from timing out waiting for results - as I mentioned, there are millions of rows coming back.

$q = "SELECT	List,"
$q = $q + " EmailAddr,"
$q = $q + " FullName,"
$q = $q + " UserID,"
$q = $q + " CompanyName,"
$q = $q + " Address1,"
$q = $q + " CityName,"
$q = $q + " StateProvince,"
$q = $q + " PostalCode,"
$q = $q + " Country,"
$q = $q + " Telephone,"
$q = $q + " FROM .[dbo].[SubList] WITH (NOLOCK)"
$q = $q + " ORDER BY [List], [EmailAddr]"
$cmd = new-object "System.Data.SqlClient.SqlCommand" ($q, $cn)
$cmd.CommandTimeout = 0

Now I can use the ExecuteReader method of the SqlCommand object to return a SqlDataReader object. I'll also initialize a variable I'll use to know when the List name changes, so I can write a header row to the output file. I then use a while loop to iterate through the results. The DataReader Read() method returns true if it has data and false if it is done, so using that method in the while condition returns a row and tests for the end of the data.

$dr = $cmd.ExecuteReader()
$l = ""

while ($dr.Read()) {

There are a number of methods to return column data from a DataReader, but GetValue returns a datatype appropriate to the data returned, so I use that method pretty consistently. I return that data to variables for clarity.

	$List = $dr.GetValue(0)
$EmailAddr = $dr.GetValue(1)
$FullName = $dr.GetValue(2)
$UserID = $dr.GetValue(3)
$CompanyName = $dr.GetValue(4)
$Address1 = $dr.GetValue(5)
$CityName = $dr.GetValue(6)
$StateProvince = $dr.GetValue(7)
$PostalCode = $dr.GetValue(8)
$Country = $dr.GetValue(9)
$Telephone = $dr.GetValue(10)

Now that I have the row data in variables I test to see if the List name changed. If it did I build a header row. I then use the new List name and concatenate the ".txt" extension onto it for the output file name, then pipe the header row to the out-file cmdlet. I specify -append in case the file from the previous run still exists (it shouldn't), and I specify -encoding ASCII because by default PowerShell will create a Unicode format file. I also set my list change variable to the name of the list.

	if ($List -ne $l) {
$r = """List"",""EmailAddr"",""FullName"",""UserID"
$r = $r + """,""CompanyName"",""Address1"
$r = $r + """,""CityName"",""StateProvince"",""PostalCode"
$r = $r + """,""Country"",""Telephone"""

$f = $List + ".txt"
$r | out-file $f -append -encoding ASCII

$l = $List
}

Now I build the row data into its comma-separated string and write out the data using the out-file cmdlet.

	$r = """" + $List + """,""" + $EmailAddr + ""","""
$r = $r + $FullName + """,""" + $UserID + ""","""
$r = $r + $CompanyName + """,""" + $Address1 + ""","""
$r = $r + $CityName + """,""" + $StateProvince + """,""" + $PostalCode + ""","""
$r = $r + $Country + """,""" + $Telephone + """"

$f = $List + ".txt"
$r | out-file $f -append -encoding ASCII

}

The last thing we need to do is to close the DataReader and the Connection.

$dr.Close()
$cn.Close()

There are many different ways to accomplish a task like this. PowerShell gives us one more tool in the toolbox to get tasks done.

Allen

Published Friday, February 01, 2008 8:59 AM by AllenMWhite

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Forrest said:

DataReader objects should be the only allowed method of consuming SQL data.  Well, that's only true 90 % of the time ... but DataTable and DataSet objects maybe ought to produce compiler warnings.

Most people use the DataAdapter.Fill() method to populate a DT or DS.  Behind the scenes, this iterates the SQL result set row by row to build an in memory representation of the data.  People tend to consume these row by row, populating business objects from the DataRows in the DataTables.  For small result sets, the difference is hardly noticeable, but if you query 25 million rows, and your middle tier code has to iterate each of them twice, that's a substantial performance hit ... with no benefit in return.

March 13, 2008 1:55 PM
 

SQL_Guru said:

Forest:

Unfortunately that is only true if you have a small database or a very small number of clients.  While the DataReader is being used, it is holding resources in the SQL Server.  So if they do anything that takes more than a few milliseconds to run per row, you can easily run out of memory in SQL 2000 or SQL 2005.  I've seen deadlocks/blocking caused by this in 2005 and it hanging the server in 2000.

The best thing to do, is to get your data out of the database server as quickly as possible and close your connection.  Allen is doing this.  He is grabbing his data, writting it to a file and closing the connection.  If he adds anything in the middle, it could cause problems.

The problem is that once you are getting the data back row by row, you want to process that data before you continue.  Let's say you want to generate and send an email.  That may take a while to do and suddenly you notice that nothing is running on the server.  You kill a few tasks and things start working again.  I actually saw this and switching to DataTables gave a 10x improvement in scalability.  I don't want developers to use DataReaders at all unless they understand the side effects.  Since they usually don't at first, they shouldn't use them until checking with a DBA or very experienced SQL coder.

April 9, 2008 8:17 PM

Leave a Comment

(required) 
(required) 
Submit

About AllenMWhite

Allen White is a consultant and mentor for Upsearch Technology Services in Northeast Ohio. He has worked as a Database Administrator, Architect and Developer for over 30 years, supporting both the Sybase and Microsoft SQL Server platforms over that period.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement