THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Joe Chang

So when is Parallel SQL arriving? Denali?

For some time, Intel C/C++ compilers addons had the ability to automaticly generate parallel code (as in the coder does not have to do it). I also recall hearing some thing about Parallel LINQ, but thats outside the scope of my narrow view of the world. I was just browsing a Sybase script (for the TPC-H benchmark) and notice the following, which is our version of BULK INSERT.

LOAD TABLE PARTSUPP (PS_PARTKEY '|',PS_SUPPKEY '|',PS_AVAILQTY '|',PS_SUPPLYCOST '|',PS_COMMENT '|')
FROM '/rawdata/partsupp.tbl.1',
'/rawdata/partsupp.tbl.2',
'/rawdata/partsupp.tbl.3',
'/rawdata/partsupp.tbl.4',
'/rawdata/partsupp.tbl.5',
'/rawdata/partsupp.tbl.6',
etc

Now I have not inclination to read Sybase documentation. I am inclined to interpret that from a single client session, Sybase will fire off multiple threads, one per file to do the BULK INSERT into the PARTSUPP table. (Paul White NZ looked this up the Sybase doc, and it is in fact a not a parallel command). No matter, even if does not do parallelism, it should and the syntax just cries: do me in parallel!

SQL CAT has put out a great white paper showing that SQL Server can be tuned for truely amazing parallel bulk load performance. But from what I recall, some degree of spoon bending was required. I think they used SSIS. Sure we could open multiple SSMS windows, with each one loading a different file, But, I capable of redefining what it means to be a lazy @$$.

SSMS 2008 already allows Multiple Server Query Execution, but what I want is to execute against one server BULK INSERT, each thread assigned a different file. The simplicity of the Sybase statement syntax is really nice so we don't have to write some parameter substitution code.

Potentially, this could also be able to parallel INSERT/UPDATE  on a partitioned table, with each thread handling a single partition. It would be nice do parallel INSERT/UPDATE on a non-partitioned table, but if there are technical reasons this can't be done,  I would happy with just parallel write ops to a partitioned table.

So will this be in Denali? if not, perhaps a soon to follow hot-fix (lets set aside the feature-fix delineation for this). Long ago, the MS x86 OS people was wanted really anxious to facilitate handling larger than 32-bit virtual addresses (until full 64-bit was available) and advised don't wait for Willamette, put it in Katmai. When that did not produce a favorable response, a team from another company did. 

 

Per request from Joe B
 1) only the Select portion of a query can have a parallel execution plan,
the Insert, Update and Delete portions are serial.
Adam suggests updating the indexes in parallel for IUD

2) Create Index can be fully parallel.

3) IUD cannot be done in parallel from a single query,
but you could try running multiple queries concurrently,
or use Adams query parallelization CLR.

4) DBCC can exploit parallelism to some degree

If I recall, parallel execution initiated in SQL Server 7.0

5) SSIS, which has some other tricks for parallel bulk load like the ability
to specify that the data is ordered
(important if you want to do a parallel load into an indexed table).


Third Party & Open Source
sqlbulktool?  http://www.sqlbi.com/Default.aspx?tabid=78

http://www.SQLsharp.com/

Published Monday, October 04, 2010 4:38 PM by jchang
Filed under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Adam Machanic said:

Hi Joe,

On the topic of "spoon bending," I've just released a parallel SQLCLR library that supports parallel processing as well as bulk load:

http://sqlblog.com/files/folders/beta/entry29021.aspx

Webcast on it here:

https://www.livemeeting.com/cc/usergroups/view?id=QTNS3M

--Adam

October 4, 2010 4:39 PM
 

jchang said:

thanks adam, I will look at it as soon as i can

October 4, 2010 10:59 PM
 

TiborKaraszi said:

Hi Joe,

Such an ... obvious addition! And methinks shouldn't be too difficult to implement in the product (basically it is only at the interface leyer). Did you file a connect entry which we can vote for?

One would consider how to handle if the load *cannot* be done in parallel. Do such cases exist? I don't know, but possibly. Also, of parallel handling isn't beneficial (perhaps htings like index layout, number  of CPUs etc). I.e., a few things to work out, but not rocket science.

October 5, 2010 1:58 AM
 

DM Unseen said:

October 5, 2010 7:54 AM
 

Adam Machanic said:

DM Unseen: Shoot me an e-mail when you get a chance (adam [at] sqlblog). Would love to chat about your use cases for the tool. It is, BTW, quite stable in its current form but I want to add a few more features before I call it finished at v1.

October 5, 2010 8:30 AM
 

DM Unseen said:

Adam,

I'll wait for your additions. I would love some options to execute a set of arbitrary SQL statements in paralel, but the issues I had with my solutions (job agenst, service broker) all had issues with ways to implement parameter(ized) queries.

October 5, 2010 8:48 AM
 

jchang said:

Tibor: Like I said about bringing new meaning to lazy, please file on Connect if you are inclined, post the link and I will vote for it too.

DM Unseen: Outstanding, a reference to CJ himself, plus research paper citations. We need more professionals in this field.

Adam: Too bad I did not know about your tool a few months ago, still I will try to get it in the next version.

ps - Itzik may have copyright on the term "spoon-bending",

October 5, 2010 8:52 AM
 

joe blow said:

If someone could put up a table showing which parts of the product are parallel aware...would be nice.

I doubt SQLCMD is yet...

but maybe DTEXEC is,

Doubt SSRS is.

SQL aint a small product anymore.

queries since what, 7.0?

October 5, 2010 2:21 PM
 

jchang said:

I will take an initial stab at this, but others should continue it, 1) only the Select portion of a query can have a parallel execution plan, the Insert, Update and Delete portions are serial. 2) Create Index can be fully parallel. 3) IUD cannot be done in parallel from a single query, but you could try running multiple queries concurrently, or use Adams query parallelization CLR.

If I recall, parallel execution initiated in SQL Server 7.0?? Right now I cannot picture a parallel execution plan in 6.5, but I drank too much back in those days.

October 5, 2010 2:32 PM
 

Adam Machanic said:

Joe:

1) Correct. And this is a shame because if a query has 10 nonclustered indexes and you do an update, the rows in each of those indexes will be updated one at a time. We can now create 1000 indexes on a table. The lack of parallelism at that phase is painful. Connect item:

https://connect.microsoft.com/SQLServer/feedback/details/581069/modifications-to-nonclustered-indexes-should-be-done-in-parallel-when-appropriate

2) Correct

3) Correct, but it's also important to note SSIS, which has some other tricks for parallel bulk load like the ability to specify that the data is ordered (important if you want to do a parallel load into an indexed table).

And you're right, parallel query processing was added in 7.0, kind of. I seem to recall that it was quite difficult to see a parallel plan. In 2000 I saw a lot more of them and we would set serverwide MAXDOP to 1 to turn it off, because in most cases it would make the query run 3x longer :-) ... Took until 2005 for the team to really get the hang of it.

October 5, 2010 5:10 PM
 

Adam Machanic said:

Another thing that can exploit parallelism to some degree is DBCC.

October 5, 2010 5:10 PM
 

Paul White said:

Joe,

I looked up Sybase's LOAD TABLE command, and the manual says:

"The files are read one at a time, and they are processed in a left-to-right order as specified in the FROM clause."

So it's not even 'parallel' bcp, let alone as awesome as Adam's QP.

Paul

October 6, 2010 3:17 AM
 

DM Unseen said:

For completeness sake, maybe we should also mention the sqlbulktool?

http://www.sqlbi.com/Default.aspx?tabid=78

October 6, 2010 4:05 AM
 

Solomon Rutzky said:

Hello.  I have not had a chance to check out Adam's project yet, but I also have a parallel processing feature built into my SQLCLR library: SQL#.  Most of the package is free but the parallel stuff is in the part that is not free.  If you are interested, check out the OS commands.  They can be used to parallel execute any DOS / Windows command: SQLCMD, BCP, DTEXEC, or anything else.  You get the Process IDs back and can monitor them and even send kill signals.

http://www.SQLsharp.com/

There is a PDF manual on the Download page that explains each feature and gives basic examples.

October 6, 2010 6:56 AM
 

jchang said:

Adam: I some slide decks on parallelism in SQL Server 2000 and 2005, http://www.qdpma.com/ppt/SQL%20Server%202005%20Performance%20Enhancements%20for%20Large%20Queries.ppt
there were definitely anomalies with parallel plans, but if you had a map of parallel execution behavior, then you can make good use of parallelism, probably default off, and explicitly enable for specific queries. But those were the good old days?

Paul: thanks much for looking up the Sybase doc. Like I said, even it is not, the syntax just screams parallel. Sybase is very rich in tuning capability.

DM: I will look at this, thanks

Solomon: I am swamped right now, but if any has a heavy machine and heavy storage, we need a comparison. High-band width IO requires using specialized IO APIs and the descriptions are not entirely intuititive,

October 6, 2010 9:58 AM

Leave a Comment

(required) 
(required) 
Submit

About jchang

Reverse engineering the SQL Server Cost Based Optimizer (Query Optimizer), NUMA System Architecture, performance tools developer - SQL ExecStats, mucking with the data distribution statistics histogram - decoding STATS_STREAM, Parallel Execution plans, microprocessors, SSD, HDD, SAN, storage performance, performance modeling and prediction, database architecture, SQL Server engine

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement